Question

I'm trying to parse PHPDoc tags with preg_match, but I'm having some issue with negative lookbehind. I've never used those before, but it is my understanding that they're used as exclusions.

Here is my pattern:

/\*\*.+?(?<! \*/)@access public.+? \*/\s+?function\s+[a-zA-Z0-9_]+\(

Here is my sample PHP file I'm trying to parse:

<?php

/**
 * This is the shortcut to DIRECTORY_SEPARATOR
 */
defined('DS') or define('DS',DIRECTORY_SEPARATOR);

/**
 * Foo
 * 
 * @return bool
 * @access public
 */
function foo()
{
    return true;
}

I want to match any function with an @access public tag, but in this case the match starts at the DS constant's comment. I thought the (?<! \*/) would exclude it matching the closing comment tag of the DS comment.

What am I missing?

Was it helpful?

Solution

Following the link by @bishop, I found an example using negative lookahead that works for me.

I changed

.+?(?<! \*/)

to

(?:(?! \*/).)+?

So the full pattern is now:

/\*\*(?:(?! \*/).)+?@access public.+? \*/\s+?function\s+[a-zA-Z0-9_]+\(

EDIT:

Full pattern that also matches function types and parameters:

(?<full>[\t ]*?/\*\*(?:(?! \*/).)+?@access public(?:(?! \*/).)+? \*/\s+?(?:public |protected |private )??(?:static )??function\s+[a-zA-Z0-9_]+\(.*?\))

And class matching:

(?<full>(?<indent>[\t ]*?)/\*\*(?:(?! \*/).)+?@access public.+? \*/\s+?(?:abstract )??class\s+[a-zA-Z0-9_]+\s??.*?{)

OTHER TIPS

A negative lookbehind must be of fixed length. It sounds like you would be better served using some sort of DocBlock parser. There are numerous solutions available.

With the token_get_all() function:

$tokens = token_get_all($code);
$result = array();

foreach ($tokens as $k=>$token) {
    switch ($token[0]):
        case T_DOC_COMMENT:
            $isPublic = strpos($token[1], '@access public');
            break;

        case T_FUNCTION:
            $isFunction = true;
            break;

        case T_WHITESPACE:
            break;

        case T_STRING:
            if ($isFunction && $isPublic) $result[] = $token[1];

        default:
            $isFunction = false;
    endswitch;
}    

print_r($result);

To have an idea of what you can extract with the tokenizer, I suggest you to put the following code in the foreach loop, under the endswitch;:

if ($isPublic && isset($token[1]))
    printf("%s\t%s\t%s\n", $token[0],
                           token_name($token[0]),
                           strtr($token[1], "\n", ' ')
                           ); 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top