Why are code readability and debugging arguments often expressed as a counter-argument for the use of generated LR parsers?

https://softwareengineering.stackexchange.com/questions/414390

14-03-2021
|

문제

When it comes to using an LR parser generated by a tool, such as Bison, a disadvantage that often comes up as counterarguments is that the resulting parser will be unreadable and complicated to debug, which is true.

However, I don't really understand this argument since we just use one or two functions in the generated output, and that this last one is not supposed to contain any bugs (a priori).

This sounds to me like saying that we shouldn't use a compiler, because the generated assembler is not readable and difficult to certify as bug-free; but, I may not be aware of all the worries this could cause in a project needing a parser... so why should we care about this argument in the plan to use a a generated LR parser?

해결책

Enh... it isn’t a solid argument, but it isn’t something you should just ignore. Parsers are already a pain and a half to debug, so making them even more difficult to debug isn’t a great idea.

There’s also the occasional desire to edit the parser with certain rules that can’t be easily defined in grammars - sometimes for academic reasons, sometimes for performance reasons, sometimes because someone thinks it is easier to do the check there rather than as a post-parsing step.

Mostly though, when I hear this argument it isn’t so much that the parsing code is ugly, but that the output tree is ugly. This isn’t unique to LR parsers of course, but generated parsers do tend towards outputting unwieldy (but technically correct) parse trees. Some see the work to massage that into a nice tree to be nearly as much work as parsing in the first place.

Again, not strong arguments but situations that can make the generator a worse option than it first appears.

다른 팁

Perhaps the most important aspect of debugging the result of a parser generator happens when you're still developing the language. If you have a language grammar fully formed and ready to go, then debugging the output may not matter.

But if you're developing the grammar itself, that's going to mean a lot of iterating on the grammar. Part of that means that you're going to have grammars that are themselves buggy, that don't represent the language you intended for them to represent. But those bugs will manifest as getting the wrong result from the parsing. And that parsing ultimately comes from the generated code.

So at some point, you're going to have to go stepping through the generated parser code to get to the place where it's parsing the section of interest, then reading through the generated code's data structures and trying to figure out what all of that code is doing and how it relates to the grammar. The more confused and difficult to read the generated code is, the more difficult this process becomes.

So I would not take this issue lightly.

Put differently, what are the odds of a bug in a custom-built parser versus in a generated parser and the library code that Bison includes in your system? The Bison library code has had literally millions of uses, for decades. The generated parser will be peculiar to your input grammar, but again, Bison's generator has been used for 35 years.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 softwareengineering.stackexchange