Question

I'm trying to draw a FSM for finding tokens using given Microsyntax

microsyntax
// Uses .Net regular expression syntax.

Identifier <|[a-zA-Z][\w_.]*

IntegerValue <|\d+

// real values must include a decimal point.
RealValue <|\d*\.\d+

// Note that strings do not have any escape characters
// and will be prematurely terminated with a newline.
StringValue <|"[^"\n]*"

My diagram for the FSM looks like this: enter image description here

I'm not sure if the diagram I made is entirely correct. My confusion in drawing diagram lies in: 1) the looped transition for Identifier a-z,A-Z, _. 2) transition from integer to realValue: will state 3 have looped transition from 0-9? and 3) transition to stringValue.

It would be very helpful if anyone could let me know whether or not the diagram is correct and if it isn't correct, what are my mistakes?

Was it helpful?

Solution

It's fine except that state 3 is not final and shouldn't loop. (Also, I don't see your whitespace and operator rules, but the diagram looks plausible.)

The issue with state 3 is that:

8.

does not match either \d+ nor \d*.\d+; the former because of the . and the latter because it insists on at least one digit after the .. Consequently, state 3, which is where you are after reading 8., is not final. Once it gets another digit it goes to state 4 (so no loop) which is correctly final.

I would have written state 1 with one loop, instead of two, but I don't suppose it makes any difference. Also, the semantic label under state 6 (not final) should go under state 7 (final). (Oh, and it wouldn't hurt to label the start state, although it was obvious in the end.)

HTH.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top