سؤال

Has anyone got a simple example of how to define a grammar that parses python-like indentation for blocks using Jison?

هل كانت مفيدة؟

المحلول

I created a language using Jison which uses python-style indentation. It's an automated white-box algorithm testing language called Bianca.

Bianca only has two dependencies - one is Jison and the other one is Lexer. Jison supports custom scanners and Lexer is one such scanner.

In C-style programming languages blocks of code are delimited by curly braces. In python-style indentation however you have INDENT and DEDENT tokens.

Writing a rule to generate INDENT and DEDENT tokens in Lexer is brain-dead simple. In fact the Lexer documentation shows precisely how to do it.

This snippet of code is taken directly from the source code of Bianca (lexer.js):

var indent = [0];

lexer.addRule(/^ */gm, function (lexeme) {
    var indentation = lexeme.length;

    col += indentation;

    if (indentation > indent[0]) {
        indent.unshift(indentation);
        return "INDENT";
    }

    var tokens = [];

    while (indentation < indent[0]) {
        tokens.push("DEDENT");
        indent.shift();
    }

    if (tokens.length) return tokens;
});

A brief explanation of how this code works can be found in the Python documentation:

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line's indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top