Hi Talk'ing PHP'ers :P
This is my first script for the script giveaway! Its called PHP Compressor and simply compress PHP source code into a lower file size by removing whitespace and comments. Theres also an option to enable GZIP compression on it.
Source:
PHP Code:
<?php
/**
* PHP Compressor
* ========================================================================
*
* @author Kalle Sommer Nielsen <kalle@php.net>
* @package PHP_Compressor
* @version 1.0
* @license http://www.php.net/license/ The PHP License v3.01
* @copyright 2002+
*
* ========================================================================
*/
/**
* Standard compression with no options
*
* @var integer
*/
define('COMPRESS_STANDARD', 0);
/**
* Compression without comments
*
* @var integer
*/
define('COMPRESS_STRIP_COMMENTS', 1);
/**
* Compression with GZIP
*
* @var integer
*/
define('COMPRESS_GZIP', 2);
/**
* Compression with all options
*
* @var integer
*/
define('COMPRESS_ALL', 3);
/**
* Compress PHP code into a lower size wheres possible
*
* Example of usage:
* <code>
* <?php
* // Include the compression function
* require_once './phpcompress.php';
*
* echo htmlentities(php_compress(file_get_contents(__FILE__), COMPRESS_STRIP_COMMENTS));
* ?>
* </code>
*
* @param string Code to compress
* @param integer Options bitfield
* @return string Returns compressed string
*
* @see COMPRESS_STANDARD
* @see COMPRESS_STRIP_COMMENTS
* @see COMPRESS_GZIP
* @see COMPRESS_ALL
*/
function php_compress($code, $flags = COMPRESS_STANDARD)
{
static $magic_defines;
$code = (string) $code;
$strip_comments = (boolean) ($flags & COMPRESS_STRIP_COMMENTS);
if(empty($code))
{
return('');
}
$tokens = token_get_all($code);
if(!sizeof($tokens))
{
return('');
}
/** Magic defines for older versions */
if(!$magic_defines)
{
$magic_defines = Array();
/** PHP 5.0 */
if(!defined('T_DOC_COMMENT'))
{
$magic_defines['abstract'] = 345;
$magic_defines['clone'] = 298;
$magic_defines['const'] = 334;
$magic_defines['final'] = 344;
$magic_defines['implements'] = 355;
$magic_defines['instanceof'] = 288;
$magic_defines['interface'] = 353;
$magic_defines['private'] = 343;
$magic_defines['protected'] = 342;
$magic_defines['public'] = 341;
$magic_defines['throw'] = 338;
}
}
$in_php = false;
$compiled_code = '';
$last_token = $tokens[0];
foreach($tokens as $no => $token)
{
$is_char = !is_array($token);
if($no)
{
$last_token = $tokens[($no -1)];
}
if($in_php)
{
if(!$is_char)
{
if($token[0] == T_STRING)
{
/**
* This provides compability for older versions of PHP, note
* that line numbers aren't currently
*/
if(array_key_exists(strtolower($token[1]), $magic_defines))
{
$token = Array(
$magic_defines[strtolower($token[1])],
$token[1]
);
}
}
if(!defined('T_DOC_COMMENT') && $token[0] == T_ML_COMMENT)
{
/**
* Cross version patch for multi line comments / document block
* comments
*/
$token[0] = 366;
}
/**
* Note that numbers are used here where tokens aren't available from
* PHP 4.0 in order to prevent defining them and break other scripts
* that may rely on them being / not being defined
*/
switch($token[0])
{
case(T_CLOSE_TAG):
{
$in_php = false;
$compiled_code .= ' ' . $token[1];
continue 2;
}
break;
case(T_WHITESPACE):
{
/**
* We do not need to count whitespace tokens
* in the last tokens array
*/
continue 2;
}
break;
case(T_EXTENDS):
case(T_FUNCTION):
case(355):
case(288):
case(T_AS):
case(T_LOGICAL_OR):
{
/**
* These needs a space infront and behind to
* prevent a parse error
*/
$token[1] = ' ' . $token[1] . ' ';
}
break;
case(345):
case(T_CASE):
case(T_CLASS):
case(298):
case(334):
case(344):
case(T_GLOBAL):
case(353):
case(T_NEW):
case(343):
case(342):
case(341):
case(T_RETURN):
case(T_STATIC):
case(338):
{
/**
* All these just needs a space behind them to
* prevent a parse error
*/
$token[1] .= ' ';
}
break;
case(T_COMMENT):
case(366):
{
/**
* For comments
*/
if($strip_comments)
{
continue 2;
}
elseif(!$strip_comments && $token[0] == T_COMMENT && (substr($token[1], 0, 2) == '//' || $token[1]{0} == '#'))
{
/**
* C++/Perl style comments needs a new line after them
*/
$token[1] .= "\r\n";
}
}
break;
}
}
}
if($in_php)
{
/**
* Optimzation when its best, truncate the space added to the return
* to save one byte if the space aren't needed there
*/
if($last_token && $last_token[0] == T_RETURN && $is_char && $token == ';')
{
$compiled_code = substr($compiled_code, 0, -1);
}
$compiled_code .= ($is_char ? $token : $token[1]);
}
if(!$in_php && (!$is_char && $token[0] != T_CLOSE_TAG))
{
$compiled_code .= trim($token[1]);
if($token[0] != T_OPEN_TAG)
{
continue;
}
$in_php = true;
$compiled_code .= ' ';
}
}
/**
* Compress if possible
*/
if(($flags & COMPRESS_GZIP) && function_exists('gzdeflate'))
{
$compiled_code = '<?php ob_start(); ?>' . str_replace('<?', '<?', gzdeflate('?>' . $compiled_code . '<?php ', 9)) . '<?php eval(gzinflate(str_replace(\'<?\', \'<?\', ob_get_clean()))); ?>';
}
return($compiled_code);
}
/**
* Compresses a PHP file into a lower size wheres possible
*
* Example of usage:
* <code>
* <?php
* // Include the compression function
* require_once './phpcompress.php';
*
* php_compress_file(__FILE__, COMPRESS_STRIP_COMMENTS) or die('Compression failed!');
*
* echo htmlentities(file_get_contents(__FILE__));
* ?>
* </code>
*
* @param string PHP file to compress
* @param integer Options bitfield
* @return boolean True if all operations was successful otherwise false
*
* @see php_compress()
*/
function php_compress_file($filename, $flags = COMPRESS_STANDARD)
{
$code = @file_get_contents($filename);
if(!$code)
{
return(false);
}
return((boolean) @file_put_contents($filename, php_compress($code, $flags)));
}
?>
All documentation are placed in the docblocks and should pass if you run it though a program like PHPDocumentor.
An example output of a compression where whitespace and comments are removed will look someway similar to this:
PHP Code:
<?php define('COMPRESS_STANDARD',0);define('COMPRESS_STRIP_COMMENTS',1);define('COMPRESS_GZIP',2);define('COMPRESS_ALL',3); function php_compress($code,$flags=COMPRESS_STANDARD){static $magic_defines;$code=(string)$code;$strip_comments=(boolean)($flags&COMPRESS_STRIP_COMMENTS);if(empty($code)){return ('');}$tokens=token_get_all($code);if(!sizeof($tokens)){return ('');}if(!$magic_defines){$magic_defines=Array();if(!defined('T_DOC_COMMENT')){$magic_defines['abstract']=345;$magic_defines['clone']=298;$magic_defines['const']=334;$magic_defines['final']=344;$magic_defines['implements']=355;$magic_defines['instanceof']=288;$magic_defines['interface']=353;$magic_defines['private']=343;$magic_defines['protected']=342;$magic_defines['public']=341;$magic_defines['throw']=338;}}$in_php=false;$compiled_code='';$last_token=$tokens[0];foreach($tokens as $no=>$token){$is_char=!is_array($token);if($no){$last_token=$tokens[($no-1)];}if($in_php){if(!$is_char){if($token[0]==T_STRING){if(array_key_exists(strtolower($token[1]),$magic_defines)){$token=Array($magic_defines[strtolower($token[1])],$token[1]);}}if(!defined('T_DOC_COMMENT')&&$token[0]==T_ML_COMMENT){$token[0]=366;}switch($token[0]){case (T_CLOSE_TAG):{$in_php=false;$compiled_code.=' '.$token[1];continue2;}break;case (T_WHITESPACE):{continue2;}break;case (T_EXTENDS):case (T_FUNCTION):case (355):case (288):case (T_AS):case (T_LOGICAL_OR):{$token[1]=' '.$token[1].' ';}break;case (345):case (T_CASE):case (T_CLASS):case (298):case (334):case (344):case (T_GLOBAL):case (353):case (T_NEW):case (343):case (342):case (341):case (T_RETURN):case (T_STATIC):case (338):{$token[1].=' ';}break;case (T_COMMENT):case (366):{if($strip_comments){continue2;}elseif(!$strip_comments&&$token[0]==T_COMMENT&&(substr($token[1],0,2)=='//'||$token[1]{0}=='#')){$token[1].="\r\n";}}break;}}}if($in_php){if($last_token&&$last_token[0]==T_RETURN&&$is_char&&$token==';'){$compiled_code=substr($compiled_code,0,-1);}$compiled_code.=($is_char?$token:$token[1]);}if(!$in_php&&(!$is_char&&$token[0]!=T_CLOSE_TAG)){$compiled_code.=trim($token[1]);if($token[0]!=T_OPEN_TAG){continue;}$in_php=true;$compiled_code.=' ';}}if(($flags&COMPRESS_GZIP)&&function_exists('gzdeflate')){$compiled_code='<?php ob_start(); ?>'.str_replace('<?','<?',gzdeflate('?>'.$compiled_code.'<?php ',9)).'<?php eval(gzinflate(str_replace(\'<?\', \'<?\', ob_get_clean()))); ?>';}return ($compiled_code);} function php_compress_file($filename,$flags=COMPRESS_STANDARD){$code=@file_get_contents($filename);if(!$code){return (false);}return ((boolean)@file_put_contents($filename,php_compress($code,$flags)));} ?>
Usage:
You may simply call php_compress() where the first parameter is a string with the php code to compress, this may contain HTML and jump in and out of the php tags, the compressor will only compress whats inside the php tags.
You may pass an secondary parameter to php_compress() that tells the compressor what you want to be compressed. Currently theres two options, this is defined using bitfields and you can use some of the constants defined in the start.
COMPRESS_STANDARD - Standard used, doesn't removes comments or GZIP
COMPRESS_STRIP_COMMENTS - Strip comments
COMPRESS_GZIP - Compress using GZIP
COMPRESS_ALL - (Same as "COMPRESS_STRIP_COMMENTS | COMPRESS_GZIP")
Theres also a second function which allows you to compress a file by only specifying the file name as the first parameter insted of the code, the function is called php_compress_file() and the secondary parameter may be passed with options just like in php_compress().
I did some testing on SimplePie if anyone knows that, with stripped comments/whitespace I got the file size from 279kb down to 193kb and with gzip I got it down to 42kb.
Ofcourse with the lowest size comes with the lowest speed because gzip has to inflate the binary data, this is around 20 times slower than just a normal compression.
My small benchmarking also indicated on my PC that the compressed (strip comments/whitespace) was about 0.0002 to 0.0003 times faster than with.
Note: I tried to implement a compability patch to make even PHP 4.0.0 tokenize PHP 5.0.0+ code properly, but its not fully tested!
Another note: I know the GZIP'ed generated code aren't the best but it was a better way that using base64 encoding for the binary data
Anyway hopes this will be any useful to some as it may become to me ;)