Regex条的评论和多线路的评论和空线

https://stackoverflow.com/questions/643113

22-07-2019
|

题

我想要分析一个文件，我想到使用php和regex条：

空白或空线
单行意见
多线评论

基本上，我想要删除任何含线

/* text */

或多线评论

/***
some
text
*****/

如果可能，另一个regex检查，如果该线是空的(清除的空白行)

这可能吗？有人可以后给我一个regex，不只是这样？

非常感谢。

解决方案

$text = preg_replace('!/\*.*?\*/!s', '', $text);
$text = preg_replace('/\n\s*\n/', "\n", $text);

其他提示

请记住，任何regex使用将会失败，如果的文件在分析有一个包含的东西相匹配这些条件。例如，它将把这个:

print "/* a comment */";

到这个：

print "";

这可能是不你想要什么。但是也许是的，我不知道。无论如何，regex在技术上不能解析数据的方式，以避免这一问题。我是说技术上是因为现代PCRE regex已经上涨了一些黑客，使他们能够这样做的，更重要的是，不再定期表情，但无论。如果你想要避免剥离这些事情里面报价或在其他情况下，是不能代替全面分析器(尽管它仍然可以是非常简单的).

//  Removes multi-line comments and does not create
//  a blank line, also treats white spaces/tabs 
$text = preg_replace('!^[ \t]*/\*.*?\*/[ \t]*[\r\n]!s', '', $text);

//  Removes single line '//' comments, treats blank characters
$text = preg_replace('![ \t]*//.*[ \t]*[\r\n]!', '', $text);

//  Strip blank lines
$text = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $text);

它的是可能的，但我不会这样做。你需要分析整个php文件，以确保你不去除任何必要的空白(strings,whitespace所占比率的关键字/识(publicfuntiondoStuff())等)。更好的使用 tokenizer扩展 PHP.

此应该在更换所有/*向*/.

$string = preg_replace('/(\s+)\/\*([^\/]*)\*\/(\s+)/s', "\n", $string);

$string = preg_replace('#/\*[^*]*\*+([^/][^*]*\*+)*/#', '', $string);

这是我的解决方案，如果有一个不是用来regexp.以下代码中删除所有的意见分隔#和检索的价值变量中的这种风格NAME=VALUE

  $reg = array();
  $handle = @fopen("/etc/chilli/config", "r");
  if ($handle) {
   while (($buffer = fgets($handle, 4096)) !== false) {
    $start = strpos($buffer,"#") ;
    $end   = strpos($buffer,"\n");
     // echo $start.",".$end;
       // echo $buffer ."<br>";



     if ($start !== false)

        $res = substr($buffer,0,$start);
    else
        $res = $buffer; 
        $a = explode("=",$res);

        if (count($a)>0)
        {
            if (count($a) == 1 && !empty($a[0]) && trim($a[0])!="")
                $reg[ $a[0] ] = "";
            else
            {
                if (!empty($a[0]) && trim($a[0])!="")
                    $reg[ $a[0] ] = $a[1];
            }
        }




    }

    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

这是一个很好的功能和工作！

<?
if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}
function strip_comments($source) {
    $tokens = token_get_all($source);
    $ret = "";
    foreach ($tokens as $token) {
       if (is_string($token)) {
          $ret.= $token;
       } else {
          list($id, $text) = $token;

          switch ($id) { 
             case T_COMMENT: 
             case T_ML_COMMENT: // we've defined this
             case T_DOC_COMMENT: // and this
                break;

             default:
                $ret.= $text;
                break;
          }
       }
    }    
    return trim(str_replace(array('<?','?>'),array('',''),$ret));
}
?>

现在，使用该功能'strip_comments'对于通过代码中包含的一些变量：

<?
$code = "
<?php 
    /* this is comment */
   // this is also a comment
   # me too, am also comment
   echo "And I am some code...";
?>";

$code = strip_comments($code);

echo htmlspecialchars($code);
?>

将结果作为输出

<?
echo "And I am some code...";
?>

装载从一个php文件：

<?
$code = file_get_contents("some_code_file.php");
$code = strip_comments($code);

echo htmlspecialchars($code);
?>

加载一php文件，剥离的评论和节省回来

<?
$file = "some_code_file.php"
$code = file_get_contents($file);
$code = strip_comments($code);

$f = fopen($file,"w");
fwrite($f,$code);
fclose($f);
?>

资料来源： http://www.php.net/manual/en/tokenizer.examples.php

我找到一个适合我越好， (\s+)\/\*([^\/]*)\*/\n* 它消除了多线、标签或未意见和间隔。我会留下你的评论如此regex会相匹配。

/**
 * The AdditionalCategory
 * Meta informations extracted from the WSDL
 * - minOccurs : 0
 * - nillable : true
 * @var TestStructAdditionalCategorizationExternalIntegrationCUDListDataContract
 */

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow