How does flex match the beginning of line anchor?
-
28-06-2021 - |
Question
I've always wondered how the beginning of input anchor (^
) was converted to a FSA in flex. I know that the end of line anchor ($
) is matched by the expression r/\n
where r
is the expression to match. How's the beginning of input anchor matched? The only solution I see is to use start conditions. How can it be implemented in a program?
Solution
End of line marker $
is different from \n
in that it matches EOF
as well, even if the end-of-line marker \n
or \r\n
is not found at the end of the file.
I did not look at flex's implementation, but I would implement both ^
and $
using boolean flags. The ^
flag would be initially set, then reset to false
after the first character in a line, then set back to true
after the next end-of-line marker, and so on.
OTHER TIPS
If your scanner uses the ^anchor, then every start-condition needs two initial-state entries:
- Beginning-of-line, and
- otherwise.
Flex does this, and peeks behind the input pointer to determine which entry to consult.
The beginning of line anchor is matched by the pattern:
beginningOfLine ^.
(a caret followed by a point)
Example (numbering lines of a text):
%{
int ln = 1;
%}
beginningOfLine ^.
newline \n
%%
{beginningOfLine} { if (ln == 1) {
printf ("%d \t", ln);
printf (yytext);
ln++;
}else{
printf (yytext);
}
}
{newline} { printf ("\n");
printf ("%d \t", ln);
ln++; }
%%