What are the 'practical' advantages of LR parser over LL parser 'in today'?

https://softwareengineering.stackexchange.com/questions/419368

18-03-2021
|

Question

My English skill is poor because I'm not a native English speaker. Please understand.

I write this article because want to discuss about this topic.

I think LR parser has no 'practical' advantages over the LL parser. The fact that many tools uses LR parser is not a effective advantage.

I think the criteria of practical advantage is like below.

Speed (fast -> practical, slow -> impractical)
Error recovery (easy -> practical, difficult -> impractical)
Convenience of compiler error display (easy -> practical, difficult -> impractical)
The cost for parser develop (easy -> practical, difficult -> impractical)

I think that the speed of LL parser and LR parser is not much different. LR parser was fast enough to feel than LL parser because the performance of hardware was low in past. But in today, anymore LR parser is not fast enough to feel than LL parser. addition in today, other factors affect speed are greater. I am emphasizing 'in Today', yes I think the advantage in past is meanless.

So I think there is no significant difference on speed.

And on 2,3,4 feature, LL parser is good than LR parser overwhelmingly. The typical disadvantage of LR parser is difficult error handling. And this cause the external program (ex : IDE) difficult to display useful error. Addition the writing LL parser is easy than LR parser. this means the develop cost is low than LR parser.

'I guess' that clang compiler was consider this (error handling convenience, the development cost) and used LL(k) algorithm. Many modern parser generator, including the clang compiler, use LL(k) on the other hand LR-based compilers and parser generators were created a long time ago and are often updated until now.

I think this phenomenon inform the trend has started to change from LR to LL.

AI algorithm was useless in past because the performance of hardware was low but with improvement of performance of hardware AI algorithm show true worth.

Likewise that in past the performance of hardware was low so the speed of algorithm was top important so has to fast. In the result LR parser was useful greatly. but 'in today' I think the speed is not the only thing that matters because the performance of hardware is so high.

I think the advantage LR parser compare to LL parser is only speed and the speed advantage also is not big. I think this only merit of LR parser gets lower with time because the performance of hardware increase.

I wander what are the 'practical' advantages of LR parser over LL parser 'in today'?

Thank you for reading.

Solution

True LL parsers are fast and simple – but also completely unable to parse many interesting languages. Many real-world programming languages are in the LALR class, which is closely related to LR but can be parsed far more efficiently. These different approaches like LL, LALR, LR are not just different algorithms, but they also correspond to a different set of languages that can be parsed.

For example, let's consider arithmetic expressions with this simple grammar:

Expr ::= Expr "+" Term;
Expr ::= Term;
Term ::= Term "*" Factor;
Term ::= Factor;
Factor ::= Var;
Factor ::= Num;

This grammar would turn the input x * 3 + 2 into the parse tree Expr(Term(Term(Factor(Var(x)), "*", Factor(Num(3))), "+", Term(Factor(Num(2)))).

While LL(k) is able to recognize this language, it cannot produce the correct parse tree because it cannot handle left-recursive productions directly. Where an LL parser is used, this frequently requires substantial post-processing of the parse tree to get a useful representation of the input. Also, massaging the grammar to fit the limitations of LL is very tedious, though some parser generators can do this automatically. And to be clear: LR also has some restrictions that the grammar needs to be written around.

So no one actually uses LL parsers. They seem simple but usually introduce more problems than they solve. They are not practical for most languages.

What is frequently used is Recursive Descent or PEG (= formalized RecDesc). These are closely related to LL in that their parse tree is the leftmost derivation, but have important extensions.

First, the concept of choice. LL(k) needs to choose the next production with constant lookahead, LL(*) has infinite lookahead but can only use a regular language (which can't even detect balanced parentheses…). In contrast, RecDesc and PEG use an ordered choice: first try this production, and backtrack if it doesn't work out to try another alternative. This works very well in practice but leads to exponential worst-time complexity. PEG can limit this with lookaheads and has a Packrat optimization that allows for a space–time tradeoff.

Second, these parsers are often not pure top-down parsers all the way. It is common to extend a RecDesc parser with a bottom-up parser for expression parsing (e.g. using Shunting Yard) or to combine PEG with a Pratt Parser for expressions. After all, any terminal in the grammar can be its own nested sublanguage in a top-down parser.

Third, the "computational" approach of RecDesc and PEG allow straightforward interaction with side effects. In some languages like Perl or C++, the correct parse depends on the current state of a symbol table. Whereas Perl extends a Yacc-generated LR parser, most C++ parsers are now RecDesc parsers.

OTHER TIPS

LL is a top down, depth first Algorithm.
LR is a bottom up, breadth first Algorithm.

They both have their uses, they both have their issues.

To claim one as outright superior to the other is both arrogant, and silly.

LL parsers break apart their desired goal down into sub-goals, and rinse and repeat.

It maps well onto function/call stack which is supported by all main stream languages making it easier to implement by hand.
This is great when similar text strings have different meanings based on where in the expansion they occur.
But it does lead to infinite expansion cycles in poorly constructed grammars
And the least expected input is the last expansion to be generated

LR parsers build up from the input to an output. Rinse and repeat till the input isn't reducable.

This does ensure a ceiling on the max runtime of the algorithm
But it can lead to blow out of states that must be considered especially when similar text strings have different meanings based on where in the reduction they occur (also a symptom of a poor grammar).
Dynamic programming is also not well catered for in many mainstream languages which makes implementing them more difficult.

Personally LL can be fast, yet it can also be brutally slow to the point where memoisation (packrat optimisation) has to be made to allow any kind of speed on modestly complicated languages. It is however very easy to teach, and just as easy to implement.

LR conversely takes time and space proportional to the input size, and the language complexity. This is a good feature to have even today. The downside is that breadth first searching (as dynamic programming does) is a much harder idea to teach, and also suffers from lack of support in main stream languages (at least relative to depth first search that LL relies on).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange