Roman numerals in treetop grammar

https://stackoverflow.com/questions/17560522

ruby
treetop

02-06-2022
|

Question

I want to parse an ordered list, which is something like:

I - Something
II - Something else...
IX - Something weird
XIII - etc

So far, my treetop grammar is:

rule text
    roman_numeral separator text newline
end

rule roman_numeral
    &. ('MMM' / 'MM' / 'M')? (('C' [DM]) / 
    ('D'? ('CCC' / 'CC' / 'C')?))? (('X' [LC]) / 
    ('L'? ('XXX' / 'XX' / 'X')?))? (('I' [VX]) / 
    ('V'? ('III' / 'II' / 'I')?))?
end

rule separator
    [\s] "-" [\s]
end

rule text
    (!"\n" .)*
end

rule newline
    ["\n"]
end

However, the corresponding parser is unable to parse the text. What is broken?

Solution

You accidentally overloaded text. Rename the first to line, and then add another rule for lines.

The quotes around newline also seem unnecessary.

Side tip - you can reuse the newline rule in your text rule to keep it DRY.

grammar Roman

  rule lines
    line*
  end

  rule line
    roman_numeral separator text newline
  end

  rule roman_numeral
    &. ('MMM' / 'MM' / 'M')? (('C' [DM]) /
    ('D'? ('CCC' / 'CC' / 'C')?))? (('X' [LC]) /
    ('L'? ('XXX' / 'XX' / 'X')?))? (('I' [VX]) /
    ('V'? ('III' / 'II' / 'I')?))?
  end

  rule separator
    [\s] "-" [\s]
  end

  rule text
    (!newline .)*
  end

  rule newline
    [\n]
  end

end

Update

You can simplify the grammar a little bit by removing the negative lookahead and the single character classes.

rule separator
  " - "
end

rule text
  [^\n]*
end

The resulting syntax graph becomes much simpler.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow