Question

In many programming language string is a token.

For example:

 token               ::= '"' string
                       | digit nat

 string              ::= char string
                       | '"'

 nat                 ::= digit nat
                       | ϵ

This is a LL(1) grammar for some programming language's toke grammar.

When parsing a string, there is no need to check follow set, because there is a " at the end of each string.

Comparing with nat, string is more easy to parse.

My question is

Is there any official terminology about this kind of grammar?

Thanks.


Editing:

There was some mistake in the original grammar, thanks @rici for pointing out my mistakes.

Was it helpful?

Solution

FOLLLOW sets are not used in the parsing of either string or nat. In both cases, the parser merely needs to determine if the input is in a set of valid symbols. In the case of nat, the valid symbols are digits; in the case of string, they are characters other than ". (In real languages, the parser would also be checking for \). But in both cases, a check is necessary, and there's no good criterion for saying one test is "simpler" than another. (In practice, both checks are likely to be a simple table lookup. So they're O(1).)

FOLLOW sets are only needed when the grammar contains ε productions. Even then, the parser's actions are not complicated. What is more complicated is building the parser, something which only happens once. It's not really that big of a deal, but it's sufficiently notable that "ε-free grammars" are a thing. I don't think there's any common vocabulary to describe the difference between explicitly and implicitly-terminated repetition, and any way the distinction will be very hard to define strictly. Your string would be parsed by a parser which used a different rule to collect the trailing ", and it's quite possible that the two parsers end up with the same implementation.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top