How to keep track of token's offset position besides just line and column?

Question 1

So, eventually I found out that SimpleCharStream also keeps track of the buffer's current offset and is even called tokenBegin. You can make use of it when a new token was created/requested via TokenManager's getNextToken() which in turn calls CommonTokenAction.

So a simple setup might look like this:

options
{
    TOKEN_EXTENDS="MyToken";
    COMMON_TOKEN_ACTION=true;
}

The token base class:

class MyToken
{
    public int offset;
    public Token token;
}

Then the CommonTokenAction definition:

TOKEN_MGR_DECLS:
{
    void CommonTokenAction(Token t)
    {
        MyToken token = (MyToken) t;
        token.offset = input_stream.tokenBegin;
        token.token = t;
    }
}

Question 2

A cheat that I've used in the past is to co-opt the default line/column information for an offset. If you don't need the line/column information, you can do something like this:

options {
    COMMON_TOKEN_ACTION = true;
}
...
TOKEN_MGR_DECLS : {
    static long offset = 0;
    static void CommonTokenAction(Token t) {
        // Poor-man's re-initialization.
        if ((t.beginLine == 1) && (t.beginColumn == 0)) { offset = 0; }
        offset += t.image.length();
        t.beginLine = (int)(offset >> 32);
        t.endLine = (int)(offset);
    }
}

Neither the token manager nor the parser rely on line/column information, so this is safe to do. The offset information of a token t can be likewise recovered.

If you do need to preserve the line/column information, you can specify a base class for the token type, along with a token factory:

options {
    TOKEN_EXTENDS = "my.AbstractToken";
    TOKEN_FACTORY = "my.TokenFactory";
}
...

Define the base token class:

package my;
public abstract class AbstractToken {
    private long offset;
    protected AbstractToken() {
        // The offset hasn't been initialized.
        offset = -1;
    }
    public long getOffset() { return this.offset; }
    void setOffset(long offset) { this.offset = offset; }
}

And define the token factory:

package my;
public class TokenFactory {
    private static long offset = 0;
    public static Token newToken(int kind, String image) {
        Token token = new Token(kind, image);
        token.setOffset(offset);
        offset += image.length();
    }
}

You'll have to reset the offset manually for the next parse. I've glossed over some of the other details, but it's worth noting that any SKIP definitions should be converted to SPECIAL_TOKEN definitions, in order to advance the offset for otherwise ignored whitespace.