Parsing a sequence of expressions using yacc

Question

1. Why is my grammar seemingly ambiguous?

Your grammar is ambiguous. It's not an illusion.

Suppose f is a function.

f  x + 7

Is that f(x) + 7 or f(x+7)?. Your grammar produces both.

IIRC, function application binds very tightly and associates to the left. So the above expression should parse as f(x) + 7.

2. Can I fix it using precedence and/or associativity rules, or if not,

You can disambiguate function application with precedence and associativity rules; you just need to declare a precedence for it with %prec. However, it ends up looking a bit ugly and ...

3. Do I need to rewrite the grammar, and if so, roughly, how do I do it?

... I don't believe it is correct to represent function application as Name ExprList. It's a lot cleaner if you curry the arguments one at a time, at least while building the AST, and it looks prettier if you do it in the grammar rather than with precedence rules, which really weren't designed for invisible operators. See below.

4. Is yacc a suitable tool for a construct like this?

Sure, why not?

Here are two working (as far as I know) yacc grammars. The first uses precedence declarations for everything; the second separates out function application, which I think is cleaner:

// grammar1.y:

%left '+'
%left '*'
%left ATOM ';' '(' ')'

%%

program: /* empty */           { $$ = ""; }
       | program statement ';' { std::cout << $2 << std::endl; }
       | program ';'
       ;

statement: expr
         ;

expr: ATOM
    | '(' expr ')'             { $$ = $2; }
    | expr expr %prec ATOM     { $$ = '(' + $1 + ' ' + $2 + ')'; }
    | expr '+' expr            { $$ = "(+ " + $1 + ' ' + $3 + ')'; }
    | expr '*' expr            { $$ = "(* " + $1 + ' ' + $3 + ')'; }
    ;

// grammar2.y

%token ATOM

%left '+'
%left '*'

%%

program: /* empty */           { $$ = ""; }
       | program statement ';' { std::cout << $2 << std::endl; }
       | program ';'
       ;

statement: expr
         ;

term : ATOM
     | '(' expr ')'             { $$ = $2; }
     ;

apply: term
     | apply term              { $$ = '(' + $1 + ' ' + $2 + ')'; }
     ;

expr : apply
     | expr '+' expr            { $$ = "(+ " + $1 + ' ' + $3 + ')'; }
     | expr '*' expr            { $$ = "(* " + $1 + ' ' + $3 + ')'; }
     ;