Is there any official terminology about something like double quotes “” grammar?
-
29-09-2020 - |
Question
In many programming language string is a token.
For example:
token ::= '"' string
| digit nat
string ::= char string
| '"'
nat ::= digit nat
| ϵ
This is a LL(1) grammar for some programming language's toke grammar.
When parsing a string
, there is no need to check follow set, because there is a "
at the end of each string
.
Comparing with nat
, string
is more easy to parse.
My question is
Is there any official terminology about this kind of grammar?
Thanks.
Editing:
There was some mistake in the original grammar, thanks @rici for pointing out my mistakes.
Solution
FOLLLOW sets are not used in the parsing of either string
or nat
. In both cases, the parser merely needs to determine if the input is in a set of valid symbols. In the case of nat
, the valid symbols are digits; in the case of string
, they are characters other than "
. (In real languages, the parser would also be checking for \
). But in both cases, a check is necessary, and there's no good criterion for saying one test is "simpler" than another. (In practice, both checks are likely to be a simple table lookup. So they're O(1).)
FOLLOW sets are only needed when the grammar contains ε productions. Even then, the parser's actions are not complicated. What is more complicated is building the parser, something which only happens once. It's not really that big of a deal, but it's sufficiently notable that "ε-free grammars" are a thing. I don't think there's any common vocabulary to describe the difference between explicitly and implicitly-terminated repetition, and any way the distinction will be very hard to define strictly. Your string
would be parsed by a parser which used a different rule to collect the trailing "
, and it's quite possible that the two parsers end up with the same implementation.