(F)Lex : get text not matched by rules / get default output

Question 1

The usual way is to have a rule something like

.    { do_something_with_extra_char(*yytext); }

at the END of your rules. This will match any single character (other than newline -- you need a rule that matches newline somewhere too) that doesn't match any other rule. If you have multiple unmatched characters, this rule will trigger multiple times, but generally that is fine.

Question 2

EDIT: I think Chris Dodd's answer is better. Here are two alternative solutions.

One solution would be to use states. When you read a single unrecognized character, enter into a different state, and build up the unrecognized token.

%{
char str[1024];
int strUsed;
%}
%x UNRECOGNIZED
%%
{SOME_RULE} {/* do processing */ }
. {BEGIN(UNRECOGNIZED); str[0] = yytext[0]; strUsed = 1; }
<UNRECOGNIZED>{bad_input} { strcpy(str+strUsed, yytext); strUsed+=yyleng; }
<UNRECOGNIZED>{good_input} { str[strUsed] = 0; vector_add(str); BEGIN(INITIAL); }

This solution works well if it's easy to write a regular expression to match "bad" input. Another solution is to slowly build up bad characters until the next valid match:

%{
char str[1024];
int strUsed = 0;
void goodMatch() {
  if(strUsed) {
    str[strUsed] = 0;
    vector_add(str);
    strUsed = 0;
  }
}
%}
%%
{SOME_RULE} { goodMatch(); /* do processing */ }
. {str[strUsed++] = yytext[0]; }

Note that this requires you to modify all existing rules to add in a call to function goodMatch.

Note for both solutions: if you use a statically sized buffer, you'll have to ensure you don't overflow it on the strcpy. If you end up using a dynamically sized string, you'll have to be sure to correctly clean up memory.