Question

I want to build a lexer in C and I am following the dragon book, I can understand the state transitions but how to implement them?

Is there a better book?

The fact that I have to parse a string through a number of states so that I can tell whether the string is acceptable or not!

Was it helpful?

Solution

G'day,

Assuming you mean The Dragon book on compiler design, I'd recommend having a look around this page on compiler tools.

The page itself is quite small but has links through to various excellent resources on lexical analysers.

HTH

cheers,

OTHER TIPS

You can implement simple state transitions with a single state variable, for example if you want to cycle through the states start->part1->part2->end then you can use an enum to keep track of the current state and use a switch statement for the code you want to run in each state.

enum state { start=1, part1, part2, end} mystate;

// ...
mystate = start;
do {
  switch (mystate) {
    case start:
      // ...
    case part1:
      // ...
    case part2:
      // ...
      if (part2_end_condition) mystate = end; // state++ will also work
      // Note you could also set the state back to part1 on some condition here
      // which creates a loop
      break;
  }
} while (mystate != end);

For more complex state transitions that depend on several variables, you should use tables/arrays like this:

var1    var2    var_end    next_state
0       0       0          state1
0       1       0          state2
1       0       0          state3
1       1       0          state4
-1      -1      1          state_end // -1 represents "doesn't matter" here

There's more than one way to do it. Every regular expression corresponds directly to a simple structured program. For example, an expression for numbers could be this:

// regular expression
digit* [.digit*]

and the corresponding C code would be:

// corresponding code
while(DIGIT(*pc)) pc++;
if (*pc=='.'){
    pc++;
    while(DIGIT(*pc)) pc++;
}

The transition-table way of building lexers is, in my opinion, needlessly complicated, and obviously runs slower.

If you're looking for a more modern treatment than the dragon book(s) : Andrew W. Appel and Maia Ginsburg, Modern Compiler Implementation in C, Cambridge University Press, 2008.

Chapter 2 is focused on Lexical Analysis : Lexical tokens, Regular expressions, Finite automata; Nondeterministic Finite Automata; Lexical analyzer generators

Look at the Table of Contents

The program flex (a clone of lex) will create a lexer for you.

Given an input file with the lexer rules, it will produce a C file with an implementation of a lexer for those rules.

You can thus check the output of flex for how to write a lexer in C. That is, if you don't just want to use flex's lexer...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top