문제

I've read a lot about (F)Lex so far, but I couldn't find an answer. Actually I have 2 questions, and getting the answer for one would be enough.

I have strings like:

TOTO 123 CD123 RGF 32/FDS HGGH

For each token I find, I put it in a vector. For example, for this string, I get a vector like this:

vector = TOTO, whitespace, CD, 123, whitespace, RGF, whitespace, 32, FDS, whitespace, HGGH

The "/" does not match any rules, but still, i would like to put it in my vector when I reach it and get:

vector = TOTO, whitespace, CD, 123, whitespace, RGF, whitespace, 32, /, FDS, whitespace, HGGH

So my questions are:

1) Is there a possibility to modify the default action when an input does not match any rule? (instead of print on stdout ?)

2) If it is not possible, how to catch this ? because here, "/" is an example but it can be everything ( % , C, 3, Blabblabla, etc that does not match my rules), and I can't put

 .*   { else();  }

cause Flex uses the regex which matches the longest string. I would like that my rules to be "sorted", and ".*" would be the last, like changing the "preferences" of Flex.

Any idea ?

도움이 되었습니까?

해결책

The usual way is to have a rule something like

.    { do_something_with_extra_char(*yytext); }

at the END of your rules. This will match any single character (other than newline -- you need a rule that matches newline somewhere too) that doesn't match any other rule. If you have multiple unmatched characters, this rule will trigger multiple times, but generally that is fine.

다른 팁

EDIT: I think Chris Dodd's answer is better. Here are two alternative solutions.

One solution would be to use states. When you read a single unrecognized character, enter into a different state, and build up the unrecognized token.

%{
char str[1024];
int strUsed;
%}
%x UNRECOGNIZED
%%
{SOME_RULE} {/* do processing */ }
. {BEGIN(UNRECOGNIZED); str[0] = yytext[0]; strUsed = 1; }
<UNRECOGNIZED>{bad_input} { strcpy(str+strUsed, yytext); strUsed+=yyleng; }
<UNRECOGNIZED>{good_input} { str[strUsed] = 0; vector_add(str); BEGIN(INITIAL); }

This solution works well if it's easy to write a regular expression to match "bad" input. Another solution is to slowly build up bad characters until the next valid match:

%{
char str[1024];
int strUsed = 0;
void goodMatch() {
  if(strUsed) {
    str[strUsed] = 0;
    vector_add(str);
    strUsed = 0;
  }
}
%}
%%
{SOME_RULE} { goodMatch(); /* do processing */ }
. {str[strUsed++] = yytext[0]; }

Note that this requires you to modify all existing rules to add in a call to function goodMatch.

Note for both solutions: if you use a statically sized buffer, you'll have to ensure you don't overflow it on the strcpy. If you end up using a dynamically sized string, you'll have to be sure to correctly clean up memory.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top