سؤال

I've read a lot about (F)Lex so far, but I couldn't find an answer. Actually I have 2 questions, and getting the answer for one would be enough.

I have strings like:

TOTO 123 CD123 RGF 32/FDS HGGH

For each token I find, I put it in a vector. For example, for this string, I get a vector like this:

vector = TOTO, whitespace, CD, 123, whitespace, RGF, whitespace, 32, FDS, whitespace, HGGH

The "/" does not match any rules, but still, i would like to put it in my vector when I reach it and get:

vector = TOTO, whitespace, CD, 123, whitespace, RGF, whitespace, 32, /, FDS, whitespace, HGGH

So my questions are:

1) Is there a possibility to modify the default action when an input does not match any rule? (instead of print on stdout ?)

2) If it is not possible, how to catch this ? because here, "/" is an example but it can be everything ( % , C, 3, Blabblabla, etc that does not match my rules), and I can't put

 .*   { else();  }

cause Flex uses the regex which matches the longest string. I would like that my rules to be "sorted", and ".*" would be the last, like changing the "preferences" of Flex.

Any idea ?

هل كانت مفيدة؟

المحلول

The usual way is to have a rule something like

.    { do_something_with_extra_char(*yytext); }

at the END of your rules. This will match any single character (other than newline -- you need a rule that matches newline somewhere too) that doesn't match any other rule. If you have multiple unmatched characters, this rule will trigger multiple times, but generally that is fine.

نصائح أخرى

EDIT: I think Chris Dodd's answer is better. Here are two alternative solutions.

One solution would be to use states. When you read a single unrecognized character, enter into a different state, and build up the unrecognized token.

%{
char str[1024];
int strUsed;
%}
%x UNRECOGNIZED
%%
{SOME_RULE} {/* do processing */ }
. {BEGIN(UNRECOGNIZED); str[0] = yytext[0]; strUsed = 1; }
<UNRECOGNIZED>{bad_input} { strcpy(str+strUsed, yytext); strUsed+=yyleng; }
<UNRECOGNIZED>{good_input} { str[strUsed] = 0; vector_add(str); BEGIN(INITIAL); }

This solution works well if it's easy to write a regular expression to match "bad" input. Another solution is to slowly build up bad characters until the next valid match:

%{
char str[1024];
int strUsed = 0;
void goodMatch() {
  if(strUsed) {
    str[strUsed] = 0;
    vector_add(str);
    strUsed = 0;
  }
}
%}
%%
{SOME_RULE} { goodMatch(); /* do processing */ }
. {str[strUsed++] = yytext[0]; }

Note that this requires you to modify all existing rules to add in a call to function goodMatch.

Note for both solutions: if you use a statically sized buffer, you'll have to ensure you don't overflow it on the strcpy. If you end up using a dynamically sized string, you'll have to be sure to correctly clean up memory.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top