Question

Is there an option or some way to preserve the token's offset position of its source, besides beginLine, beginColumn, endLine and endColumn?

I probably have to make use of the TOKEN_EXTENDS, COMMON_TOKEN_ACTION (or TOKEN_FACTORY) option to store extra token information, but I wouldn't know how to retrieve the token's offset. Any ideas?

I'm looking for a pure parser solution, that means, no interaction with the actual document (which I could use to calculate the offset afterwards).

Was it helpful?

Solution 2

So, eventually I found out that SimpleCharStream also keeps track of the buffer's current offset and is even called tokenBegin. You can make use of it when a new token was created/requested via TokenManager's getNextToken() which in turn calls CommonTokenAction.

So a simple setup might look like this:

options
{
    TOKEN_EXTENDS="MyToken";
    COMMON_TOKEN_ACTION=true;
}

The token base class:

class MyToken
{
    public int offset;
    public Token token;
}

Then the CommonTokenAction definition:

TOKEN_MGR_DECLS:
{
    void CommonTokenAction(Token t)
    {
        MyToken token = (MyToken) t;
        token.offset = input_stream.tokenBegin;
        token.token = t;
    }
}

OTHER TIPS

A cheat that I've used in the past is to co-opt the default line/column information for an offset. If you don't need the line/column information, you can do something like this:

options {
    COMMON_TOKEN_ACTION = true;
}
...
TOKEN_MGR_DECLS : {
    static long offset = 0;
    static void CommonTokenAction(Token t) {
        // Poor-man's re-initialization.
        if ((t.beginLine == 1) && (t.beginColumn == 0)) { offset = 0; }
        offset += t.image.length();
        t.beginLine = (int)(offset >> 32);
        t.endLine = (int)(offset);
    }
}

Neither the token manager nor the parser rely on line/column information, so this is safe to do. The offset information of a token t can be likewise recovered.

If you do need to preserve the line/column information, you can specify a base class for the token type, along with a token factory:

options {
    TOKEN_EXTENDS = "my.AbstractToken";
    TOKEN_FACTORY = "my.TokenFactory";
}
...

Define the base token class:

package my;
public abstract class AbstractToken {
    private long offset;
    protected AbstractToken() {
        // The offset hasn't been initialized.
        offset = -1;
    }
    public long getOffset() { return this.offset; }
    void setOffset(long offset) { this.offset = offset; }
}

And define the token factory:

package my;
public class TokenFactory {
    private static long offset = 0;
    public static Token newToken(int kind, String image) {
        Token token = new Token(kind, image);
        token.setOffset(offset);
        offset += image.length();
    }
}

You'll have to reset the offset manually for the next parse. I've glossed over some of the other details, but it's worth noting that any SKIP definitions should be converted to SPECIAL_TOKEN definitions, in order to advance the offset for otherwise ignored whitespace.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top