Why are there LR(0) parsers but not LL(0) parsers?

Question

The difference has to do with what the k means in LR(k) versus LL(k).

In LL(k), the parser maintains information about a top-down, left-to-right parse that traces out a leftmost derivation. The parser works by repeatedly looking at the current nonterminal symbol, then inspecting the next k tokens of the input stream to determine which production should be used. As a result, if you have an LL(0) parser, the parser would have to predict which production to use based purely on the current nonterminal. This is only possible if each nonterminal has just one production associated with it, which means that the grammar either produces exactly one string or no strings at all (by going into a loop). Therefore, while it is mathematically well-defined, LL(0) parsing is never used in practice.

In LR(k), the parser works bottom-up. It maintains a stack of symbols, along with a current "state," and then continuously decides whether to perform a shift (pushing another symbol on top of the stack) or a reduce (popping some number of symbols from the stack and applying a production in reverse). One key difference between an LL(k) and LR(k) parser is how the LR(k) parser decides which action to perform. In the LR(k) parser, the decision of what to do next s based on both the next k tokens of lookahead and on the current state of the parser. As a result, an LR(0) parser can still make somewhat intelligent decisions about which action to perform even if it can't see any lookahead tokens, because the parser's current state can encode a huge amount of information about where in a production the parser is and what it can realistically expect to see as the next tokens of input (even if the parser can't directly look at those tokens).

In short, LL(0) is extremely weak because the parser has to purely base its decision on the current nonterminal, meaning that it can't take one of many different actions based on which production might be used. An LR(0) parser is decently powerful because the parser bases its choice on its internal state, which is usually sufficiently detailed to let the parser make informed decisions about what to do next.

There is one other reason LL(0) is weak while LR(0) is reasonably powerful. In an LL(0) parser, the parser has to immediately decide which productions ought to be performed, which means that the parser has to guess productions completely blindly. In an LR(0) parser, the parser can shift multiple symbols before deciding that it is time to reduce. Consequently, the parser, without any lookahead, can defer making its decision about which reduction to use until after it has seen enough tokens of input to get a sense for the structure of the string. This is a special case of the more general fact that any LL(k) grammar is automatically LR(k).

Hope this helps!