Question

Here is a starting point for a grammar:

%%{
  machine xo;

  char = "x" | "o";
  group = "(" char* ")";
  main := group;
}%%

It handles (xxxx(oo)()xx), for example. How do I extend it to allow nested groups; e.g. (xxxx(o(x)o)()xx?

I know recursion is not, generally, supported by one Ragel machine. So this won't work:

group = "(" ( char | group )* ")";

From the Ragel State Machine Compiler User Guide (PDF): (bold text added for emphasis):

"In general Ragel cannot handle recursive structures because the grammar is interpreted as a regular language. However, depending on what needs to be parsed it is sometimes practical to implement the recursive parts using manual coding techniques. This often works in cases where the recursive structures are simple and easy to recognize, such as in the balancing of parentheses."

"One approach to parsing recursive structures is to use actions that increment and decrement counters or otherwise recognize the entry to and exit from recursive structures and then jump to the appropriate machine defnition using fcall and fret. Alternatively, semantic conditions can be used to test counter variables.

"A more traditional approach is to call a separate parsing function (expressed in the host lan- guage) when a recursive structure is entered, then later return when the end is recognized."

From a mailing list discussion on nested brackets, the same three approaches are mentioned:

  1. Specify a growable stack using prepush and postpop and then use fcall and fret.

  2. Counting and then verify in actions or conditions.

  3. Call a new parsing function (in the host language) when the recursive structure is entered.

Can you point me to an example of each -- preferably using my example above -- in Ruby? Thanks!

Was it helpful?

Solution

A general pattern using fcall/fret is like:

balanced = [^(){}\[\]] |
               '(' @{ fcall balancedTokensParen; } |
               '[' @{ fcall balancedTokensBracket; } |
               '{' @{ fcall balancedTokensBrace; };
balancedTokensParen   := balanced* ')' @{ fret; };
balancedTokensBracket := balanced* ']' @{ fret; };
balancedTokensBrace   := balanced* '}' @{ fret; };

So your case could be handled as

  char = [xo];
  group = '(' @{ fcall group_rest; };
  group_rest := (char|group)* ')' @{ fret; };

  main := group;

the lexer function should contain the stack array, and you have to manually to check the top to ensure there isn't unclosed '(':

stack = []
%% write init;
%% write exec;
if top > 0 
    cs = %%{ write error; }%%
end

OTHER TIPS

I also have been searching for days on that Ragel problem!

Ragel is not well documented for such needs as passing recursive [nested parentheses].

The only example code I found after 5 days of Google searching was this:

https://bitbucket.org/mitsuhiko/arana-main/src/289ad1a6f083/arana/lexnparse.rl

Looking at the Ragel stack needs and the Ragel code over-head of the fgoto() [or fcall()], fret() and other code management needed to do such a thing, I am (like so many others) thinking Ragel is not the easy tool for such needs. Otherwise there would be more than one (1) working example available.

Roughly, if you're trying to match parens, the solution is going to involve something like:

open_paren = '(' @{ @paren_count += 1}
close_paren = (')' when @paren_count > 0) @{ @parent_count -= 1}

Check out the section at the end of the User Guide on Semantic Conditions.

As an aside: Ragel is a really powerful tool that you have to understand the underpinnings of to make real use of. The first step to using Ragel is reading the User Guide and understanding it - while there are still parts you're unsure of, Ragel will be very frustrating to use.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top