Flex/Bison EOF propagation from stdin vs a file

https://stackoverflow.com/questions/20458469

30-08-2022
|

문제

I have a scanner, parser and a main from which I create an executable via

bison -d parser.y; flex scanner.l; gcc main.c parer.tab.c lex.yy.c

When I run ./a.out it does what I want: If Ctrl+D is pressed an EOF is detected and main can act accordingly. This means: if yyin is stdin then hitting Return ends the parsing of that line and the main loop waits for the next input line. Pressing Ctrl+D ends parsing input with a break in the main loop and exits. If the input comes from a file, e,g, testFile that file can contain 1 expression to be parsed until an EOF. In the file scenario new lines should be eaten up like spaces and tabs. All this should behave like an interpreter when input is from stdin and like a script evaluator when the input is from a file. An example content of such a test file would be:test\n. Here the EOF is not detected. And I have trouble understanding why that is the case. In other words I'd like an extension of the question here to additionally work with input files

parser.y:

%{
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

/* stuff from flex that bison needs to know about: */
int yylex();
int yyparse();
FILE *yyin;

static int parseValue;

void yyerror(const char *s);
%}

%token TWORD
%token TEOF
%token TJUNK

%start input 

%%
input: word                         {   printf("W"); parseValue =  1;   }   
    | eof                           {   printf("eof"); parseValue = -11;}
    | /* empty */                   {   printf("_"); parseValue = -1;   }   
    | error                         {   printf("E"); parseValue = -2;   }   
    ;

eof: TEOF
    ;

word: TWORD
    ;
%%

void yyerror(const char *s) {
    printf("nope...");
}

int getWord( FILE *file) {
    int err;

    if (file) {
        yyin = file;
    } else /* error */ {
        printf("file not valid");
        return -3; 
    }   

    err = yyparse();
    if (!err) {
        return parseValue;
    } else /* error */ {
        printf("parse error");
        return -4;
    }
}

scanner.l:

%{
#include <stdio.h>
#include "parser.tab.h"
#define YYSTYPE int

int yylex();
%}

/* avoid: implicit declaration of function ‘fileno’ */
/*%option always-interactive*/

%option noyywrap
/* to avoid warning: ‘yyunput’ defined but not used */
%option nounput
/* to avoid warning: ‘input’ defined but not used */
%option noinput

%%
<<EOF>>                     {   return TEOF;    }
[ \t]                       {   }
[\n]                        {   if (yyin == stdin) return 0;   }
[a-zA-Z][a-zA-Z0-9]*        {   return TWORD; }
.                           {   return TJUNK;   }
%%

main.c:

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <stdbool.h>

int main(int argc, char *argv[]) {

    int result = 0;
    FILE *fOut = stdout, *fIn = stdin;

    /* skip over program name */
    ++argv, --argc;
    if ( argc > 0 ) { 
        fIn = fopen( argv[0], "r" );
    }   

    while (true) {
        fprintf(fOut, "\nTEST : ", result);

        result = getWord(fIn);

        if (result == -11) {
            printf(" %i ", result); printf("--> EOF");
            break;
        }   
        if (result < 0) {
            printf(" %i ", result); printf("--> <0");
            /*continue;*/
            break;
        }   

        fprintf(fOut, " => %i", result);
    }   

    fprintf(fOut, "\n\n done \n ");
    exit(EXIT_SUCCESS);
}

I have tried to rewrite the parse according to suggestions made here or here, without much success. What is correct way for main to become aware of an EOF when input is read from a file?

Update: One suggestion was that the issue may be due to the return 0; on the \n. As a quick test, I only return 0 if yyin == stin but calling ./a.out testFile still does not catch the EOF. Update 2: I got this to work via using yywrap. I got rid of all the TEOF stuff. The scanner has a part:

extern int eof;

and at the end:

int yywrap() {
    eof = 1;
    return 1;
}

In the parser there is a:

int eof = 0;

and further down in the file:

err = yyparse();
if (err != 0) return -4;
else if (eof) return -11;
else return parseValue;

If someone can show me a more elegant solution, I'd still appreciate that. This is probably a good way to make a clean version.

해결책

As noted in your links, flex has syntax for recognizing the end of an input file or stream (e.g., an input from a string).

In fact, flex effectively has such a rule operating at all times. By default, the rule calls yywrap. You turned this off (with %noyywrap). That's fine, except...

The default action on encountering an "EOF token" is to return 0.

The parsers generated by bison (and byacc) need to see this zero token. See this answer to END OF FILE token with flex and bison (only works without it).

Your lexer returns a 0 token on encountering a newline. That will cause all kinds of trouble. and is no doubt leading to what you observe when reading from a file.

Edit: OK, with that out of the way and the update applied, let's consider your grammar.

Remember that bison adds a special production that looks for the zero-token. Let's represent that with $ (as people generally do, or sometimes it's $end). So your entire grammar (with no actions and with "error" removed since it's also special) is:

$all : input $;

input: word | eof | /* empty */;

word: TWORD;

eof: TEOF;

which means the only sentences your grammar accepts are:

TWORD $

or:

TEOF $

or:

So when you call yyparse(), the loop inside yyparse() will read-ahead one token from the lexer and accept (and return) the result if the token is the zero-valued end-of-file $. If not, the token needs to be one of TWORD or TEOF (anything else results in a call to yyerror() and an attempt to resync). If the token is one of the two valid tokens, yyparse() will call the lexer once more to verify that the next token is the zero-valued end-of-file $ token.

If all of that succeeds, yyparse() will return success.

Adding the actions back in, you should see printf output, and get a value stored in parseValue, based on whichever reduction rule is used to recognize the (at most one) token.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow