Recognizing terminals in a CFG production previously not defined as tokens

https://stackoverflow.com/questions/2937988

05-10-2019
|

Question

I'm making a generator of LL(1) parsers, my input is a CoCo/R language specification. I've already got a Scanner generator for that input. Suppose I've got the following specification:

COMPILER 1

CHARACTERS

digit="0123456789".

TOKENS
number = digit{digit}. 
decnumber = digit{digit}"."digit{digit}.

PRODUCTIONS

Expression = Term{"+"Term|"-"Term}.      
Term = Factor{"*"Factor|"/"Factor}.       
Factor = ["-"](Number|"("Expression")").
Number = (number|decnumber).

END 1.

So, if the parser generated by this grammar receives a word "1+1", it'd be accepted i.e. a parse tree would be found.

My question is, the character "+" was never defined in a token, but it appears in the non-terminal "Expression". How should my generated Scanner recognize it? It would not recognize it as a token.

Is this a valid input then? Should I add this terminal in TOKENS and then consider an error routine for a Scanner for it to skip it?

How does usual language specifications handle this?

Solution

Anything on the RHS of a grammar rule (that's not part of the grammar notation itself) must be either a nonterminal symbol, or a terminal symbol (synonymous with "token"). So yes, you should make your operators tokens. Looking at the CoCo/R documentation, it seems that it will accept literal strings as terminal symbols in the PRODUCTIONS section, so you may not have to do anything else...the parser generator should already treat them as tokens.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow