TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Script Giveaway (http://www.talkphp.com/script-giveaway/)
-   -   PHP Compressor (http://www.talkphp.com/script-giveaway/2848-php-compressor.html)

Kalle 05-26-2008 08:00 PM

PHP Compressor
 
Hi Talk'ing PHP'ers :P

This is my first script for the script giveaway! Its called PHP Compressor and simply compress PHP source code into a lower file size by removing whitespace and comments. Theres also an option to enable GZIP compression on it.

Source:
PHP Code:

<?php
    
/**
     * PHP Compressor
     * ========================================================================
     *
     * @author    Kalle Sommer Nielsen <kalle@php.net>
     * @package    PHP_Compressor
     * @version    1.0
     * @license    http://www.php.net/license/ The PHP License v3.01
     * @copyright    2002+
     *
     * ========================================================================
     */


    /**
     * Standard compression with no options
     *
     * @var        integer
     */
    
define('COMPRESS_STANDARD',        0);

    
/**
     * Compression without comments
     *
     * @var        integer
     */
    
define('COMPRESS_STRIP_COMMENTS',     1);

    
/**
     * Compression with GZIP
     *
     * @var        integer
     */
    
define('COMPRESS_GZIP',            2);

    
/**
     * Compression with all options
     *
     * @var        integer
     */
    
define('COMPRESS_ALL',            3);


    
/**
     * Compress PHP code into a lower size wheres possible
     *
     * Example of usage:
     * <code>
     * <?php
     *    // Include the compression function
     *    require_once './phpcompress.php';
     *
     *    echo htmlentities(php_compress(file_get_contents(__FILE__), COMPRESS_STRIP_COMMENTS));
     * ?>
     * </code>
     *
     * @param    string        Code to compress
     * @param    integer        Options bitfield
     * @return    string        Returns compressed string
     *
     * @see        COMPRESS_STANDARD
     * @see        COMPRESS_STRIP_COMMENTS
     * @see        COMPRESS_GZIP
     * @see        COMPRESS_ALL
     */
    
function php_compress($code$flags COMPRESS_STANDARD)
    {
        static 
$magic_defines;

        
$code         = (string) $code;
        
$strip_comments = (boolean) ($flags COMPRESS_STRIP_COMMENTS);

        if(empty(
$code))
        {
            return(
'');
        }

        
$tokens token_get_all($code);

        if(!
sizeof($tokens))
        {
            return(
'');
        }

        
/** Magic defines for older versions */
        
if(!$magic_defines)
        {
            
$magic_defines = Array();

            
/** PHP 5.0 */
            
if(!defined('T_DOC_COMMENT'))
            {
                
$magic_defines['abstract']     = 345;
                
$magic_defines['clone']        = 298;
                
$magic_defines['const']        = 334;
                
$magic_defines['final']        = 344;
                
$magic_defines['implements']    = 355;
                
$magic_defines['instanceof']    = 288;
                
$magic_defines['interface']    = 353;
                
$magic_defines['private']    = 343;
                
$magic_defines['protected']    = 342;
                
$magic_defines['public']    = 341;
                
$magic_defines['throw']        = 338;
            }
        }

        
$in_php     false;
        
$compiled_code     '';
        
$last_token    $tokens[0];

        foreach(
$tokens as $no => $token)
        {
            
$is_char = !is_array($token);

            if(
$no)
            {
                
$last_token $tokens[($no -1)];
            }

            if(
$in_php)
            {
                if(!
$is_char)
                {
                    if(
$token[0] == T_STRING)
                    {
                        
/**
                         * This provides compability for older versions of PHP, note
                         * that line numbers aren't currently 
                         */
                        
if(array_key_exists(strtolower($token[1]), $magic_defines))
                        {
                            
$token = Array(
                                    
$magic_defines[strtolower($token[1])], 
                                    
$token[1]
                                    );
                        }
                    }

                    if(!
defined('T_DOC_COMMENT') && $token[0] == T_ML_COMMENT)
                    {
                        
/**
                         * Cross version patch for multi line comments / document block
                         * comments
                         */
                        
$token[0] = 366;
                    }


                    
/**
                     * Note that numbers are used here where tokens aren't available from 
                     * PHP 4.0 in order to prevent defining them and break other scripts 
                     * that may rely on them being / not being defined
                     */
                    
switch($token[0])
                    {
                        case(
T_CLOSE_TAG):
                        {
                            
$in_php     false;
                            
$compiled_code     .= ' ' $token[1];

                            continue 
2;
                        }
                        break;
                        case(
T_WHITESPACE):
                        {
                            
/**
                             * We do not need to count whitespace tokens 
                             * in the last tokens array
                             */
                            
continue 2;
                        }
                        break;
                        case(
T_EXTENDS):
                        case(
T_FUNCTION):
                        case(
355):
                        case(
288):
                        case(
T_AS):
                        case(
T_LOGICAL_OR):
                        {
                            
/** 
                             * These needs a space infront and behind to 
                             * prevent a parse error
                             */
                            
$token[1] = ' ' $token[1] . ' ';
                        }
                        break;
                        case(
345):
                        case(
T_CASE):
                        case(
T_CLASS):
                        case(
298):
                        case(
334):
                        case(
344):
                        case(
T_GLOBAL):
                        case(
353):
                        case(
T_NEW):
                        case(
343):
                        case(
342):
                        case(
341):
                        case(
T_RETURN):
                        case(
T_STATIC):
                        case(
338):
                        {
                            
/**
                             * All these just needs a space behind them to 
                             * prevent a parse error
                             */
                            
$token[1] .= ' ';
                        }
                        break;
                        case(
T_COMMENT):
                        case(
366):
                        {
                            
/**
                             * For comments
                             */
                            
if($strip_comments)
                            {
                                continue 
2;
                            }
                            elseif(!
$strip_comments && $token[0] == T_COMMENT && (substr($token[1], 02) == '//' || $token[1]{0} == '#'))
                            {
                                
/**
                                 * C++/Perl style comments needs a new line after them
                                 */
                                
$token[1] .= "\r\n";
                            }
                        }
                        break;
                    }
                }
            }

            if(
$in_php)
            {
                
/**
                 * Optimzation when its best, truncate the space added to the return 
                 * to save one byte if the space aren't needed there
                 */
                
if($last_token && $last_token[0] == T_RETURN && $is_char && $token == ';')
                {
                    
$compiled_code substr($compiled_code0, -1);
                }

                
$compiled_code .= ($is_char $token $token[1]);
            }


            if(!
$in_php && (!$is_char && $token[0] != T_CLOSE_TAG))
            {
                
$compiled_code .= trim($token[1]);

                if(
$token[0] != T_OPEN_TAG)
                {
                    continue;
                }

                
$in_php true;
                
$compiled_code .= ' ';
            }
        }

        
/**
         * Compress if possible
         */


        
if(($flags COMPRESS_GZIP) && function_exists('gzdeflate'))
        {
            
$compiled_code '<?php ob_start(); ?>' str_replace('<?''&lt;?'gzdeflate('?>' $compiled_code '<?php '9)) . '<?php eval(gzinflate(str_replace(\'&lt;?\', \'<?\', ob_get_clean()))); ?>';
        }

        return(
$compiled_code);
    }

    
/**
     * Compresses a PHP file into a lower size wheres possible
     *
     * Example of usage:
     * <code>
     * <?php
     *    // Include the compression function
     *    require_once './phpcompress.php';
     *
     *    php_compress_file(__FILE__, COMPRESS_STRIP_COMMENTS) or die('Compression failed!');
     *
     *    echo htmlentities(file_get_contents(__FILE__));
     * ?>
     * </code>
     *
     * @param    string        PHP file to compress
     * @param    integer        Options bitfield
     * @return    boolean        True if all operations was successful otherwise false
     *
     * @see        php_compress()
     */
    
function php_compress_file($filename$flags COMPRESS_STANDARD)
    {
        
$code = @file_get_contents($filename);

        if(!
$code)
        {
            return(
false);
        }

        return((boolean) @
file_put_contents($filenamephp_compress($code$flags)));
    }
?>

All documentation are placed in the docblocks and should pass if you run it though a program like PHPDocumentor.

An example output of a compression where whitespace and comments are removed will look someway similar to this:

PHP Code:

<?php define('COMPRESS_STANDARD',0);define('COMPRESS_STRIP_COMMENTS',1);define('COMPRESS_GZIP',2);define('COMPRESS_ALL',3); function php_compress($code,$flags=COMPRESS_STANDARD){static $magic_defines;$code=(string)$code;$strip_comments=(boolean)($flags&COMPRESS_STRIP_COMMENTS);if(empty($code)){return ('');}$tokens=token_get_all($code);if(!sizeof($tokens)){return ('');}if(!$magic_defines){$magic_defines=Array();if(!defined('T_DOC_COMMENT')){$magic_defines['abstract']=345;$magic_defines['clone']=298;$magic_defines['const']=334;$magic_defines['final']=344;$magic_defines['implements']=355;$magic_defines['instanceof']=288;$magic_defines['interface']=353;$magic_defines['private']=343;$magic_defines['protected']=342;$magic_defines['public']=341;$magic_defines['throw']=338;}}$in_php=false;$compiled_code='';$last_token=$tokens[0];foreach($tokens as $no=>$token){$is_char=!is_array($token);if($no){$last_token=$tokens[($no-1)];}if($in_php){if(!$is_char){if($token[0]==T_STRING){if(array_key_exists(strtolower($token[1]),$magic_defines)){$token=Array($magic_defines[strtolower($token[1])],$token[1]);}}if(!defined('T_DOC_COMMENT')&&$token[0]==T_ML_COMMENT){$token[0]=366;}switch($token[0]){case (T_CLOSE_TAG):{$in_php=false;$compiled_code.=' '.$token[1];continue2;}break;case (T_WHITESPACE):{continue2;}break;case (T_EXTENDS):case (T_FUNCTION):case (355):case (288):case (T_AS):case (T_LOGICAL_OR):{$token[1]=' '.$token[1].' ';}break;case (345):case (T_CASE):case (T_CLASS):case (298):case (334):case (344):case (T_GLOBAL):case (353):case (T_NEW):case (343):case (342):case (341):case (T_RETURN):case (T_STATIC):case (338):{$token[1].=' ';}break;case (T_COMMENT):case (366):{if($strip_comments){continue2;}elseif(!$strip_comments&&$token[0]==T_COMMENT&&(substr($token[1],0,2)=='//'||$token[1]{0}=='#')){$token[1].="\r\n";}}break;}}}if($in_php){if($last_token&&$last_token[0]==T_RETURN&&$is_char&&$token==';'){$compiled_code=substr($compiled_code,0,-1);}$compiled_code.=($is_char?$token:$token[1]);}if(!$in_php&&(!$is_char&&$token[0]!=T_CLOSE_TAG)){$compiled_code.=trim($token[1]);if($token[0]!=T_OPEN_TAG){continue;}$in_php=true;$compiled_code.=' ';}}if(($flags&COMPRESS_GZIP)&&function_exists('gzdeflate')){$compiled_code='<?php ob_start(); ?>'.str_replace('<?','&lt;?',gzdeflate('?>'.$compiled_code.'<?php ',9)).'<?php eval(gzinflate(str_replace(\'&lt;?\', \'<?\', ob_get_clean()))); ?>';}return ($compiled_code);} function php_compress_file($filename,$flags=COMPRESS_STANDARD){$code=@file_get_contents($filename);if(!$code){return (false);}return ((boolean)@file_put_contents($filename,php_compress($code,$flags)));} ?>

Usage:
You may simply call php_compress() where the first parameter is a string with the php code to compress, this may contain HTML and jump in and out of the php tags, the compressor will only compress whats inside the php tags.

You may pass an secondary parameter to php_compress() that tells the compressor what you want to be compressed. Currently theres two options, this is defined using bitfields and you can use some of the constants defined in the start.

COMPRESS_STANDARD - Standard used, doesn't removes comments or GZIP
COMPRESS_STRIP_COMMENTS - Strip comments
COMPRESS_GZIP - Compress using GZIP
COMPRESS_ALL - (Same as "COMPRESS_STRIP_COMMENTS | COMPRESS_GZIP")

Theres also a second function which allows you to compress a file by only specifying the file name as the first parameter insted of the code, the function is called php_compress_file() and the secondary parameter may be passed with options just like in php_compress().



I did some testing on SimplePie if anyone knows that, with stripped comments/whitespace I got the file size from 279kb down to 193kb and with gzip I got it down to 42kb.

Ofcourse with the lowest size comes with the lowest speed because gzip has to inflate the binary data, this is around 20 times slower than just a normal compression.

My small benchmarking also indicated on my PC that the compressed (strip comments/whitespace) was about 0.0002 to 0.0003 times faster than with.


Note: I tried to implement a compability patch to make even PHP 4.0.0 tokenize PHP 5.0.0+ code properly, but its not fully tested!

Another note: I know the GZIP'ed generated code aren't the best but it was a better way that using base64 encoding for the binary data


Anyway hopes this will be any useful to some as it may become to me ;)

ETbyrne 05-27-2008 03:35 PM

Looks cool, but would this also compress HTML? Example:
PHP Code:

<?php
echo "Hello World!";
?>

Would that become this?
PHP Code:

<?php echo"HelloWorld!";?>


Salathe 05-27-2008 03:55 PM

If you're not wanting to use this particular function, you can also use the built-in php_strip_whitespace which leaves a little bit more whitespace in the resulting code but not much.

ETbyrne, no the resulting code will be functionally identical to the original and any functional whitespace (cf. presentational whitespace in the code) will be left alone.

ETbyrne 05-27-2008 04:41 PM

Great! Can't wait to compress Kudos now...

delayedinsanity 05-27-2008 05:13 PM

Well that blows my little hack away... All I've done to date as far as compression was to add this to the top of my config file (included first on every page, of course);

PHP Code:

ob_start("compress");

function 
compress ($buffer) {
    return 
str_replace(array("\r\n""\r""\n""\t"'  ''    ''    '), ''$buffer);


Looks pretty caveman now... ;-)
-m

Kalle 05-27-2008 07:54 PM

Like Salathe said it does not alter the functionallity, meaning it will not edit anything like variable values (T_CONSTANT_ENCAPSED_STRING).

And thanks for all the thanks, they are greatly appreciated :-)

Salathe 05-27-2008 07:56 PM

delayedinsanity, it looks like your compression is something different from Kalle's. It looks like you're removing whitespace from output (HTML) whereas his is removing whitespace from the PHP code itself.

delayedinsanity 05-27-2008 10:09 PM

I just meant similar in nature because they both wind up outputting a one liner of text, and mine does look like caveman PHP in comparison... no elegance.

BTW, line 222, there's a misspelled variable,

PHP Code:

...
elseif(!
$stip_comments && ... 

I think this is a case of, Those who can, do, those who can't, debug. :-)
-m

Kalle 05-28-2008 12:14 AM

Cheers delayedinsanity, I updated the source with that fix and I forgot to change the sourcecode comment to say that it also handled PERL style comments (#) =)


All times are GMT. The time now is 09:36 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0