Lrparsing author here. As Serge said it was a bug, and is fixed in 1.0.8. This only happened because Serge reported it on Source Forge's but tracker - otherwise I would not have known. Thank you Serge.
The comments about it possibly being a bug in Repeat()
hints at not understanding what lrparsing does. Lrparsing is a rather complex beast. It lets you enter grammar in a way I hope is natural for a Python programmer. It then compiles into something a LR(1) parser generator can understand, which is a series of productions. Then it generates an LR(1) parsing table from those productions. And finally it feeds your input language and the parsing table to an LR(1) parser to generate the parse tree. For what it's worth, the bug was in the part that generates the parsing table.
Debugging such a series of transformations would be near impossible for me I if I could not see what each step produces. Accordingly lrparsing has a repr_xxxx()
function that displays the output of each step. The first transformation is parsing your grammar. The result is displayed by repr_grammar()
:
<G> = START + __end_of_input__
START = expr
block = '{' + expr * () + '}'
expr = block | /[0-9]/
Which looks very similar to the original grammar presented in the question. The next step is to compile those rules in productions, which is what an LR(1) parser generator can understand. These are printed by repr_productions()
:
<G> = START __end_of_input__
START = expr
block = '{' '}'
block = '{' block.Sequence.Repeat '}'
block.Sequence.Repeat = expr
block.Sequence.Repeat = block.Sequence.Repeat expr
expr = block
expr = /[0-9]/
The block.Sequence.Repeat
is a new Nonterminal lrparsing introduced in order to handle the Repeat()
. Those productions look like a faithful representation of the original grammar to me.
Lrparsing goes out of it's way to hide the nonterminals it introduces like block.Sequence.Repeat
. For example they won't appear in the output parse tree. That means there is no need for an lrparsing user to care about them - except for 2 cases. Those 2 cases are error recovery and trying to understand the log output of the parse engine. The former is a complex technique most won't attempt. But some here looked at the latter in order to try and understand what lrparsing was doing. The log won't make much sense unless you can see the productions the LR(1) parser is trying to recognise. But if you had seen them, you would have known there wasn't a bug in Repeat()
.
You can also dump the generated LR(1) parse table. If you really want to understand how a LR(1) parser works, that is what you should be trying to grok. Unless you happen to find parsing a deeply interesting topic I don't recommend it.