Thread: PHP Compressor
View Single Post
Old 05-26-2008, 09:00 PM   #1 (permalink)
Kalle
The Addict
Zend Certified 
 
Join Date: Sep 2007
Location: Denmark
Posts: 249
Thanks: 6
Kalle is on a distinguished road
Default PHP Compressor

Hi Talk'ing PHP'ers :P

This is my first script for the script giveaway! Its called PHP Compressor and simply compress PHP source code into a lower file size by removing whitespace and comments. Theres also an option to enable GZIP compression on it.

Source:
PHP Code:
<?php
    
/**
     * PHP Compressor
     * ========================================================================
     *
     * @author    Kalle Sommer Nielsen <kalle@php.net>
     * @package    PHP_Compressor
     * @version    1.0
     * @license    http://www.php.net/license/ The PHP License v3.01
     * @copyright    2002+
     *
     * ========================================================================
     */


    /**
     * Standard compression with no options
     *
     * @var        integer
     */
    
define('COMPRESS_STANDARD',        0);

    
/**
     * Compression without comments
     *
     * @var        integer
     */
    
define('COMPRESS_STRIP_COMMENTS',     1);

    
/**
     * Compression with GZIP
     *
     * @var        integer
     */
    
define('COMPRESS_GZIP',            2);

    
/**
     * Compression with all options
     *
     * @var        integer
     */
    
define('COMPRESS_ALL',            3);


    
/**
     * Compress PHP code into a lower size wheres possible
     *
     * Example of usage:
     * <code>
     * <?php
     *    // Include the compression function
     *    require_once './phpcompress.php';
     *
     *    echo htmlentities(php_compress(file_get_contents(__FILE__), COMPRESS_STRIP_COMMENTS));
     * ?>
     * </code>
     *
     * @param    string        Code to compress
     * @param    integer        Options bitfield
     * @return    string        Returns compressed string
     *
     * @see        COMPRESS_STANDARD
     * @see        COMPRESS_STRIP_COMMENTS
     * @see        COMPRESS_GZIP
     * @see        COMPRESS_ALL
     */
    
function php_compress($code$flags COMPRESS_STANDARD)
    {
        static 
$magic_defines;

        
$code         = (string) $code;
        
$strip_comments = (boolean) ($flags COMPRESS_STRIP_COMMENTS);

        if(empty(
$code))
        {
            return(
'');
        }

        
$tokens token_get_all($code);

        if(!
sizeof($tokens))
        {
            return(
'');
        }

        
/** Magic defines for older versions */
        
if(!$magic_defines)
        {
            
$magic_defines = Array();

            
/** PHP 5.0 */
            
if(!defined('T_DOC_COMMENT'))
            {
                
$magic_defines['abstract']     = 345;
                
$magic_defines['clone']        = 298;
                
$magic_defines['const']        = 334;
                
$magic_defines['final']        = 344;
                
$magic_defines['implements']    = 355;
                
$magic_defines['instanceof']    = 288;
                
$magic_defines['interface']    = 353;
                
$magic_defines['private']    = 343;
                
$magic_defines['protected']    = 342;
                
$magic_defines['public']    = 341;
                
$magic_defines['throw']        = 338;
            }
        }

        
$in_php     false;
        
$compiled_code     '';
        
$last_token    $tokens[0];

        foreach(
$tokens as $no => $token)
        {
            
$is_char = !is_array($token);

            if(
$no)
            {
                
$last_token $tokens[($no -1)];
            }

            if(
$in_php)
            {
                if(!
$is_char)
                {
                    if(
$token[0] == T_STRING)
                    {
                        
/**
                         * This provides compability for older versions of PHP, note
                         * that line numbers aren't currently 
                         */
                        
if(array_key_exists(strtolower($token[1]), $magic_defines))
                        {
                            
$token = Array(
                                    
$magic_defines[strtolower($token[1])], 
                                    
$token[1]
                                    );
                        }
                    }

                    if(!
defined('T_DOC_COMMENT') && $token[0] == T_ML_COMMENT)
                    {
                        
/**
                         * Cross version patch for multi line comments / document block
                         * comments
                         */
                        
$token[0] = 366;
                    }


                    
/**
                     * Note that numbers are used here where tokens aren't available from 
                     * PHP 4.0 in order to prevent defining them and break other scripts 
                     * that may rely on them being / not being defined
                     */
                    
switch($token[0])
                    {
                        case(
T_CLOSE_TAG):
                        {
                            
$in_php     false;
                            
$compiled_code     .= ' ' $token[1];

                            continue 
2;
                        }
                        break;
                        case(
T_WHITESPACE):
                        {
                            
/**
                             * We do not need to count whitespace tokens 
                             * in the last tokens array
                             */
                            
continue 2;
                        }
                        break;
                        case(
T_EXTENDS):
                        case(
T_FUNCTION):
                        case(
355):
                        case(
288):
                        case(
T_AS):
                        case(
T_LOGICAL_OR):
                        {
                            
/** 
                             * These needs a space infront and behind to 
                             * prevent a parse error
                             */
                            
$token[1] = ' ' $token[1] . ' ';
                        }
                        break;
                        case(
345):
                        case(
T_CASE):
                        case(
T_CLASS):
                        case(
298):
                        case(
334):
                        case(
344):
                        case(
T_GLOBAL):
                        case(
353):
                        case(
T_NEW):
                        case(
343):
                        case(
342):
                        case(
341):
                        case(
T_RETURN):
                        case(
T_STATIC):
                        case(
338):
                        {
                            
/**
                             * All these just needs a space behind them to 
                             * prevent a parse error
                             */
                            
$token[1] .= ' ';
                        }
                        break;
                        case(
T_COMMENT):
                        case(
366):
                        {
                            
/**
                             * For comments
                             */
                            
if($strip_comments)
                            {
                                continue 
2;
                            }
                            elseif(!
$strip_comments && $token[0] == T_COMMENT && (substr($token[1], 02) == '//' || $token[1]{0} == '#'))
                            {
                                
/**
                                 * C++/Perl style comments needs a new line after them
                                 */
                                
$token[1] .= "\r\n";
                            }
                        }
                        break;
                    }
                }
            }

            if(
$in_php)
            {
                
/**
                 * Optimzation when its best, truncate the space added to the return 
                 * to save one byte if the space aren't needed there
                 */
                
if($last_token && $last_token[0] == T_RETURN && $is_char && $token == ';')
                {
                    
$compiled_code substr($compiled_code0, -1);
                }

                
$compiled_code .= ($is_char $token $token[1]);
            }


            if(!
$in_php && (!$is_char && $token[0] != T_CLOSE_TAG))
            {
                
$compiled_code .= trim($token[1]);

                if(
$token[0] != T_OPEN_TAG)
                {
                    continue;
                }

                
$in_php true;
                
$compiled_code .= ' ';
            }
        }

        
/**
         * Compress if possible
         */


        
if(($flags COMPRESS_GZIP) && function_exists('gzdeflate'))
        {
            
$compiled_code '<?php ob_start(); ?>' str_replace('<?''&lt;?'gzdeflate('?>' $compiled_code '<?php '9)) . '<?php eval(gzinflate(str_replace(\'&lt;?\', \'<?\', ob_get_clean()))); ?>';
        }

        return(
$compiled_code);
    }

    
/**
     * Compresses a PHP file into a lower size wheres possible
     *
     * Example of usage:
     * <code>
     * <?php
     *    // Include the compression function
     *    require_once './phpcompress.php';
     *
     *    php_compress_file(__FILE__, COMPRESS_STRIP_COMMENTS) or die('Compression failed!');
     *
     *    echo htmlentities(file_get_contents(__FILE__));
     * ?>
     * </code>
     *
     * @param    string        PHP file to compress
     * @param    integer        Options bitfield
     * @return    boolean        True if all operations was successful otherwise false
     *
     * @see        php_compress()
     */
    
function php_compress_file($filename$flags COMPRESS_STANDARD)
    {
        
$code = @file_get_contents($filename);

        if(!
$code)
        {
            return(
false);
        }

        return((boolean) @
file_put_contents($filenamephp_compress($code$flags)));
    }
?>
All documentation are placed in the docblocks and should pass if you run it though a program like PHPDocumentor.

An example output of a compression where whitespace and comments are removed will look someway similar to this:

PHP Code:
<?php define('COMPRESS_STANDARD',0);define('COMPRESS_STRIP_COMMENTS',1);define('COMPRESS_GZIP',2);define('COMPRESS_ALL',3); function php_compress($code,$flags=COMPRESS_STANDARD){static $magic_defines;$code=(string)$code;$strip_comments=(boolean)($flags&COMPRESS_STRIP_COMMENTS);if(empty($code)){return ('');}$tokens=token_get_all($code);if(!sizeof($tokens)){return ('');}if(!$magic_defines){$magic_defines=Array();if(!defined('T_DOC_COMMENT')){$magic_defines['abstract']=345;$magic_defines['clone']=298;$magic_defines['const']=334;$magic_defines['final']=344;$magic_defines['implements']=355;$magic_defines['instanceof']=288;$magic_defines['interface']=353;$magic_defines['private']=343;$magic_defines['protected']=342;$magic_defines['public']=341;$magic_defines['throw']=338;}}$in_php=false;$compiled_code='';$last_token=$tokens[0];foreach($tokens as $no=>$token){$is_char=!is_array($token);if($no){$last_token=$tokens[($no-1)];}if($in_php){if(!$is_char){if($token[0]==T_STRING){if(array_key_exists(strtolower($token[1]),$magic_defines)){$token=Array($magic_defines[strtolower($token[1])],$token[1]);}}if(!defined('T_DOC_COMMENT')&&$token[0]==T_ML_COMMENT){$token[0]=366;}switch($token[0]){case (T_CLOSE_TAG):{$in_php=false;$compiled_code.=' '.$token[1];continue2;}break;case (T_WHITESPACE):{continue2;}break;case (T_EXTENDS):case (T_FUNCTION):case (355):case (288):case (T_AS):case (T_LOGICAL_OR):{$token[1]=' '.$token[1].' ';}break;case (345):case (T_CASE):case (T_CLASS):case (298):case (334):case (344):case (T_GLOBAL):case (353):case (T_NEW):case (343):case (342):case (341):case (T_RETURN):case (T_STATIC):case (338):{$token[1].=' ';}break;case (T_COMMENT):case (366):{if($strip_comments){continue2;}elseif(!$strip_comments&&$token[0]==T_COMMENT&&(substr($token[1],0,2)=='//'||$token[1]{0}=='#')){$token[1].="\r\n";}}break;}}}if($in_php){if($last_token&&$last_token[0]==T_RETURN&&$is_char&&$token==';'){$compiled_code=substr($compiled_code,0,-1);}$compiled_code.=($is_char?$token:$token[1]);}if(!$in_php&&(!$is_char&&$token[0]!=T_CLOSE_TAG)){$compiled_code.=trim($token[1]);if($token[0]!=T_OPEN_TAG){continue;}$in_php=true;$compiled_code.=' ';}}if(($flags&COMPRESS_GZIP)&&function_exists('gzdeflate')){$compiled_code='<?php ob_start(); ?>'.str_replace('<?','&lt;?',gzdeflate('?>'.$compiled_code.'<?php ',9)).'<?php eval(gzinflate(str_replace(\'&lt;?\', \'<?\', ob_get_clean()))); ?>';}return ($compiled_code);} function php_compress_file($filename,$flags=COMPRESS_STANDARD){$code=@file_get_contents($filename);if(!$code){return (false);}return ((boolean)@file_put_contents($filename,php_compress($code,$flags)));} ?>
Usage:
You may simply call php_compress() where the first parameter is a string with the php code to compress, this may contain HTML and jump in and out of the php tags, the compressor will only compress whats inside the php tags.

You may pass an secondary parameter to php_compress() that tells the compressor what you want to be compressed. Currently theres two options, this is defined using bitfields and you can use some of the constants defined in the start.

COMPRESS_STANDARD - Standard used, doesn't removes comments or GZIP
COMPRESS_STRIP_COMMENTS - Strip comments
COMPRESS_GZIP - Compress using GZIP
COMPRESS_ALL - (Same as "COMPRESS_STRIP_COMMENTS | COMPRESS_GZIP")

Theres also a second function which allows you to compress a file by only specifying the file name as the first parameter insted of the code, the function is called php_compress_file() and the secondary parameter may be passed with options just like in php_compress().



I did some testing on SimplePie if anyone knows that, with stripped comments/whitespace I got the file size from 279kb down to 193kb and with gzip I got it down to 42kb.

Ofcourse with the lowest size comes with the lowest speed because gzip has to inflate the binary data, this is around 20 times slower than just a normal compression.

My small benchmarking also indicated on my PC that the compressed (strip comments/whitespace) was about 0.0002 to 0.0003 times faster than with.


Note: I tried to implement a compability patch to make even PHP 4.0.0 tokenize PHP 5.0.0+ code properly, but its not fully tested!

Another note: I know the GZIP'ed generated code aren't the best but it was a better way that using base64 encoding for the binary data


Anyway hopes this will be any useful to some as it may become to me ;)
__________________

Last edited by Kalle : 05-28-2008 at 01:13 AM.
Send a message via MSN to Kalle Send a message via Skype™ to Kalle
Kalle is offline  
Reply With Quote
The Following 5 Users Say Thank You to Kalle For This Useful Post:
ETbyrne (05-27-2008), Matt (05-27-2008), ReSpawN (05-26-2008), sketchMedia (05-27-2008), Wildhoney (05-27-2008)