Best way to automatically remove comments from PHP code
Question
Whats the best way to remove comments from a PHP file?
I want to do something similar to strip-whitespace() - but it shouldn't remove the line breaks as well.
EG:
I want this:
<?PHP
// something
if ($whatsit) {
do_something(); # we do something here
echo '<html>Some embedded HTML</html>';
}
/* another long
comment
*/
some_more_code();
?>
to become:
<?PHP
if ($whatsit) {
do_something();
echo '<html>Some embedded HTML</html>';
}
some_more_code();
?>
(Although if the empty lines remain where comments are removed, that wouldn't be ok).
It may not be possible, because of the requirement to preserve embedded html - thats whats tripped up the things that have come up on google.
Solution
I'd use tokenizer. Here's my solution. It should work on both PHP 4 and 5:
$fileStr = file_get_contents('path/to/file');
$newStr = '';
$commentTokens = array(T_COMMENT);
if (defined('T_DOC_COMMENT'))
$commentTokens[] = T_DOC_COMMENT; // PHP 5
if (defined('T_ML_COMMENT'))
$commentTokens[] = T_ML_COMMENT; // PHP 4
$tokens = token_get_all($fileStr);
foreach ($tokens as $token) {
if (is_array($token)) {
if (in_array($token[0], $commentTokens))
continue;
$token = $token[1];
}
$newStr .= $token;
}
echo $newStr;
OTHER TIPS
How about using php -w to generate a file stripped of comments and whitespace, then using a beautifier like PHP_Beautifier to reformat for readability?
$fileStr = file_get_contents('file.php');
foreach (token_get_all($fileStr) as $token ) {
if ($token[0] != T_COMMENT) {
continue;
}
$fileStr = str_replace($token[1], '', $fileStr);
}
echo $fileStr;
edit I realised Ionut G. Stan has already suggested this, but I will leave the example here
Here's the function posted above, modified to recursively remove all comments from all php files within a directory and all its subdirectories:
function rmcomments($id) {
if (file_exists($id)) {
if (is_dir($id)) {
$handle = opendir($id);
while($file = readdir($handle)) {
if (($file != ".") && ($file != "..")) {
rmcomments($id."/".$file); }}
closedir($handle); }
else if ((is_file($id)) && (end(explode('.', $id)) == "php")) {
if (!is_writable($id)) { chmod($id,0777); }
if (is_writable($id)) {
$fileStr = file_get_contents($id);
$newStr = '';
$commentTokens = array(T_COMMENT);
if (defined('T_DOC_COMMENT')) { $commentTokens[] = T_DOC_COMMENT; }
if (defined('T_ML_COMMENT')) { $commentTokens[] = T_ML_COMMENT; }
$tokens = token_get_all($fileStr);
foreach ($tokens as $token) {
if (is_array($token)) {
if (in_array($token[0], $commentTokens)) { continue; }
$token = $token[1]; }
$newStr .= $token; }
if (!file_put_contents($id,$newStr)) {
$open = fopen($id,"w");
fwrite($open,$newStr);
fclose($open); }}}}}
rmcomments("path/to/directory");
a version more powerful : remove all comments in the folder
<?php
$di = new RecursiveDirectoryIterator(__DIR__,RecursiveDirectoryIterator::SKIP_DOTS);
$it = new RecursiveIteratorIterator($di);
$fileArr = [];
foreach($it as $file){
if(pathinfo($file,PATHINFO_EXTENSION) == "php"){
ob_start();
echo $file;
$file = ob_get_clean();
$fileArr[] = $file;
}
}
$arr = [T_COMMENT,T_DOC_COMMENT];
$count = count($fileArr);
for($i=1;$i < $count;$i++){
$fileStr = file_get_contents($fileArr[$i]);
foreach(token_get_all($fileStr) as $token){
if(in_array($token[0],$arr)){
$fileStr = str_replace($token[1],'',$fileStr);
}
}
file_put_contents($fileArr[$i],$fileStr);
}
If you already use an editor like UltraEdit, you can open one or multiple PHP file/s and then use a simple Find&Replace (CTRL+R) with the following Perl regexp
(?s)/\*.*\*/
Beware the above regexp removes also comments inside a sring, i.e. in echo "hello/*babe*/";
the /*babe*/
would be removed too. Hence, it could be a solution if you have few files to remove comments, in order to be absolutely sure it does not wrongly replace something that is not a comment you would have to run the Find&Replace command and approve each time what is getting replaced.
/*
* T_ML_COMMENT does not exist in PHP 5.
* The following three lines define it in order to
* preserve backwards compatibility.
*
* The next two lines define the PHP 5 only T_DOC_COMMENT,
* which we will mask as T_ML_COMMENT for PHP 4.
*/
if (! defined('T_ML_COMMENT')) {
define('T_ML_COMMENT', T_COMMENT);
} else {
define('T_DOC_COMMENT', T_ML_COMMENT);
}
/*
* Remove all comment in $file
*/
function remove_comment($file) {
$comment_token = array(T_COMMENT, T_ML_COMMENT, T_DOC_COMMENT);
$input = file_get_contents($file);
$tokens = token_get_all($input);
$output = '';
foreach ($tokens as $token) {
if (is_string($token)) {
$output .= $token;
} else {
list($id, $text) = $token;
if (in_array($id, $comment_token)) {
$output .= $text;
}
}
}
file_put_contents($file, $output);
}
/*
* Glob recursive
* @return ['dir/filename', ...]
*/
function glob_recursive($pattern, $flags = 0) {
$file_list = glob($pattern, $flags);
$sub_dir = glob(dirname($pattern) . '/*', GLOB_ONLYDIR);
// If sub directory exist
if (count($sub_dir) > 0) {
$file_list = array_merge(
glob_recursive(dirname($pattern) . '/*/' . basename($pattern), $flags),
$file_list
);
}
return $file_list;
}
// Remove all comment of '*.php', include sub directory
foreach (glob_recursive('*.php') as $file) {
remove_comment($file);
}
For ajax/json responses, I use following PHP code, to remove comments from HTML/JavaScript code, so it would be smaller (about 15% gain for my code).
// Replace doubled spaces with single ones (ignored in HTML any way)
$html = preg_replace('@(\s){2,}@', '\1', $html);
// Remove single and multiline comments, tabs and newline chars
$html = preg_replace(
'@(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|((?<!:)//.*)|[\t\r\n]@i',
'',
$html
);
Short and effective, but can produce unexpected results, if your code has $itty syntax.
Bash solution: If you want to remove recursively comments from all PHP files starting from the current directory you can write in terminal this one-liner. ( it uses temp1
file to store PHP content for processing )
Note that this will strip all white spaces with comments.
find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1 ; cat temp1 > $VAR; done
Then you should remove temp1
file after.
if PHP_BEAUTIFER is installed then you can get nicely formatted code without comments with
find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1; php_beautifier temp1 > temp2; cat temp2 > $VAR; done;
then remove two files ( temp1
, temp2
)
Run the command php --strip file.php
in a command prompt (i.e. cmd.exe), then browse to http://www.writephponline.com/phpbeautifier.
Here, file.php is your own file.
The catch is that a less robust matching algorithm (simple regex, for instance) will start stripping here when it clearly shouldn't:
if (preg_match('#^/*' . $this->index . '#', $this->permalink_structure)) {
It might not affect your code, but eventually someone will get bit by your script. So you will have to use a utility that understands more of the language than you might otherwise expect.
-Adam
in 2019 could works like this
<?php
/* hi there !!!
here are the comments */
//another try
echo removecomments('index.php');
/* hi there !!!
here are the comments */
//another try
function removecomments($f){
$w=Array(';','{','}');
$ts = token_get_all(php_strip_whitespace($f));
$s='';
foreach($ts as $t){
if(is_array($t)){
$s .=$t[1];
}else{
$s .=$t;
if( in_array($t,$w) ) $s.=chr(13).chr(10);
}
}
return $s;
}
?>
if you want to see the results just let's run it first in xampp then you get a blank page but if you right click and click on view source you get your php script .. it's loading itself and it's removing all comments and also tabs. i prefer this solution too 'cause i use it to speed up my framework one file engine "m.php" and after php_strip_whitespace all source without this script i observe is slowest :i did 10 benchmarks then i calculate the math average (i think php 7 is restoring back the missings cr_lf when is parsing or is take a while when these are missing)
php -w
or php_strip_whitespace($filename);