Question

I'm in the midst of writing a lexical scanner, and I'm wondering how I would distinguish between an operation (eg. -) and a signed number (eg. -14). For example, both of the following lines are valid:

+12
12 +12

Currently, my lexical scanner would parse them like so:

+12
12 +12

However, when checking the second statement's validity later in the program, it is flagged as invalid: one numeric token can't be followed by another without an adjoining operator. I would like them scanned as:

+12
12 + 12

I could implement this by simply checking whether the preceding character is an operator, generating a signed-number token if so, else an operator followed by a number, but doing so would be against the rules of context free grammars and would drastically increase the complexity of my scanner.

How might I scan signed numbers in an unambiguous way that correctly determines what is and isn't part of a numeric token?

Was it helpful?

Solution

Most scanners I've dealt with treat signs as an operator. So, -12 isn't just an integer literal, it's an integer literal and a unary sign operator. I think that would solve your problem while keeping your scanner simple (it just has to recognize + and - as tokens; your parser can work out which semantics to use later).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top