Question

For a class on compilers I am building a lexer. I have completed the assignment, but am left with one point that I am not fully satisfied with.

The language supports string literals with escape sequences, where a string literal is defined as a sequence of characters enclosed by double quotes (") and an escape sequence starts with a backslash (\). The lexer is supposed to produce a token for string literals with the escape sequences already processed (such as replacing \n with a newline character and \t with a tab).

My question is, is it possible to recognize such string literals (and process the escape sequences contained in them) without copying the parts matched so far to a temporary buffer? And if it is possible, how to do that.

Was it helpful?

Solution

It's certainly possible to recognize the literals without use of a temporary buffer, but it is not possible to process them, for the simple reason that flex owns the input buffer and the string pointed to by yytext (which happens to be a pointer into the input buffer, but that isn't guaranteed either).

But it doesn't really matter. As you will soon discover, in the normal usage of the lexer in combination with yacc/bison, you need to copy any lexical string into a temporary buffer anyway, because yacc/bison needs to read one token ahead in the input, and -- as above -- lex/flex doesn't guarantee to preserve the string pointed to by yytext once it starts recognizing the next token. (That's not theoretical: it really doesn't preserve the string's value, and "Why do my strings keep changing?" is probably the most frequently asked yacc/lex question (and according to the bison manual, the most common invalid bug report).

There are workarounds -- including writing your own lexer -- but they require an awful lot of bookkeeping and for very little value, because copying a token string is actually not a very expensive operation. So my advice is: copy your strings and stop worrying. :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top