Grammar: difference between a top down and bottom up?

https://stackoverflow.com/questions/3181960

02-10-2019
|

Question

What is the difference between a top down and bottom up grammar? An example would be awesome.

Solution

First of all, the grammar itself isn't top-down or bottom-up, the parser is (though there are grammars that can be parsed by one but not the other).

From a practical viewpoint, the main difference is that most hand-written parsers are top-down, while a much larger percentage of machine-generated parsers are bottom-up (though, of course, the reverse is certainly possible).

A top-down parser typically uses recursive descent, which typically means a structure something like this (using typical mathematical expressions as an example):

expression() { term() [-+] expression }
term() { factor() [*/] term() }
factor() { operand() | '(' expression() ')' }

A bottom-up parser work in the reverse direction -- where a recursive descent parser starts from the full expression, and breaks it down into smaller and smaller pieces until it reaches the level of individual tokens, a bottom-up parser starts from the individual tokens, and uses tables of rules about how those tokens fit together into higher and higher levels of the expression hierarchy until it reaches the top level (what's represented as "expression" above).

Edit: To clarify, perhaps it would make sense to add a really trivial parser. In this case, I'll just do the old classic of converting a simplified version of a typical mathematical expression from infix to postfix:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void expression(void);

void show(int ch) { 
    putchar(ch);
    putchar(' ');
}

int token() { 
    int ch;
    while (isspace(ch=getchar()))
        ;
    return ch;
}

void factor() { 
    int ch = token();
    if (ch == '(') {
        expression();
        ch = token();
        if (ch != ')') {
            fprintf(stderr, "Syntax error. Expected close paren, found: %c\n", ch);
            exit(EXIT_FAILURE);
        }
    }
    else
        show(ch);
}

void term() { 
    int ch;
    factor();
    ch = token();
    if (ch == '*' || ch == '/') {
        term();
        show(ch);
    }
    else
        ungetc(ch, stdin);
}

void expression() {
    int ch;
    term();
    ch = token();
    if (ch == '-' || ch=='+') {
        expression();
        show(ch);
    }
    else 
        ungetc(ch, stdin);
}

int main(int argc, char **argv) {
    expression();
    return 0;
}

Note that the lexing here is pretty stupid (it basically just accepts a single character as a token) and the expressions allowed are quite limited (only +-*/). OTOH, it's good enough to handle an input like:

1+2*(3+4*(5/6))

from which it does produce what I believe is correct output:

1 2 3 4 5 6 / * + * +

OTHER TIPS

Afaik it doesn't make any difference for the grammar itself, but it does for the parser.

Wikipedia has a quite lengthy explanation of both bottom-up and top-down parsing.

Generally the (imho) more intuitive way is top-down. You start with the start symbol and apply the transformation rules that fit, while with bottom-up you need to apply transformation rules backwards (which usually created quite a headache for me).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow