Question

I want to split an arithmetic expression into tokens, to convert it into RPN.

Java has the StringTokenizer, which can optionally keep the delimiters. That way, I could use the operators as delimiters. Unfortunately, I need to do this in PHP, which has strtok, but that throws away the delimiters, so I need to brew something myself.

This sounds like a classic textbook example for Compiler Design 101, but I'm afraid I'm lacking some formal education here. Is there a standard algorithm you can point me to?

My other options are to read up on Lexical Analysis or to roll up something quick and dirty with the available string functions.

Was it helpful?

Solution

As often, I would just use a regular expression to do this:

$expr = '(5*(7 + 2 * -9.3) - 8 )/ 11';
$tokens = preg_split('/([*\/^+-]+)\s*|([\d.]+)\s*/', $expr, -1,
        PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$tts = print_r($tokens, true);
echo "<pre>x=$tts</pre>";

It needs a little more work to accept numbers with exponent (like -9.2e-8).

OTHER TIPS

This might help.

Practical Uses of Tokenizer

OK, thanks to PhiLho, my final code is this, should anyone need it. It's not even really dirty. :-)

static function rgTokenize($s)
{
    $rg = array();

    // remove whitespace
    $s = preg_replace("/\s+/", '', $s);

    // split at numbers, identifiers, function names and operators
    $rg = preg_split('/([*\/^+\(\)-])|(#\d+)|([\d.]+)|(\w+)/', $s, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

    // find right-associative '-' and put it as a sign onto the following number
    for ($ix = 0, $ixMax = count($rg); $ix < $ixMax; $ix++) {
        if ('-' == $rg[$ix]) {
            if (isset($rg[$ix - 1]) && self::fIsOperand($rg[$ix - 1])) {
                continue;
            } else if (isset($rg[$ix + 1]) && self::fIsOperand($rg[$ix + 1])) {
                $rg[$ix + 1] = $rg[$ix].$rg[$ix + 1];
                unset($rg[$ix]);
            } else {
                throw new Exception("Syntax error: Found right-associative '-' without operand");
            }
        }
    }
    $rg = array_values($rg);

    echo join(" ", $rg)."\n";

    return $rg;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top