Question

I've always wondered how the beginning of input anchor (^) was converted to a FSA in flex. I know that the end of line anchor ($) is matched by the expression r/\n where r is the expression to match. How's the beginning of input anchor matched? The only solution I see is to use start conditions. How can it be implemented in a program?

Was it helpful?

Solution

End of line marker $ is different from \n in that it matches EOF as well, even if the end-of-line marker \n or \r\n is not found at the end of the file.

I did not look at flex's implementation, but I would implement both ^ and $ using boolean flags. The ^ flag would be initially set, then reset to false after the first character in a line, then set back to true after the next end-of-line marker, and so on.

OTHER TIPS

If your scanner uses the ^anchor, then every start-condition needs two initial-state entries:

  • Beginning-of-line, and
  • otherwise.

Flex does this, and peeks behind the input pointer to determine which entry to consult.

The beginning of line anchor is matched by the pattern:

beginningOfLine ^.

(a caret followed by a point)

Example (numbering lines of a text):

%{
int ln = 1;
%}

beginningOfLine ^.
newline \n

%%
{beginningOfLine} { if (ln == 1) {
                        printf ("%d \t", ln);
                        printf (yytext);
                        ln++; 
                    }else{
                        printf (yytext);
                    }
                  }

{newline}         { printf ("\n");
                    printf ("%d \t", ln);
                    ln++; }


%%
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top