Writing a parser: is it better to track character position or splice/substring at each iteration?

Question

I've written several json parsers (unfortunately none of them is opensource to be able to point at). In my opinion one of the best techniques when writing a json parser is first writing a "SAX" like parser that takes an input string or stream, parses the input and emits events (for example to the listener you set):

BeginDictionary() (dictionary ~ json object if you like that more)
BeginDictionaryItem(key, quoted)
EndDictionary()
BeginList()
EndList()
Literal(string_value, quoted)

Literal: A quoted or non-quoted string. This sax parser doesn't have to recognize json data types like string, float, bool, this parser just recognizes everything as either a quoted or non-quoted string and passes it to your event listener. Sometimes when you parse a config file it can come handy to allow dictionary keys without quotes, the config looks much nicer this way... Also alowing comments and commas after the last dictionary/list item can be practical but that is already nonstandard json...

If you want a parser that parses only standard json data types not all kind of quoted/nonquoted strings/words then you can implement a listener that converts "literals" to json types and passes them to another kind of listener that accepts only json specific types. One possible optimization here is that the listener passes only indexes to the listener through events and these indexes point into the input string/stream if you don't want duplication/copying. Optionally you can write a parser listener implementation that unquotes a string in place and stores it right into the input buffer destructively if the given platform/language allows that assuming that the parser doesn't look back.

You can also write a parser listener that just stores syntax highlight info. You can write one that builds an in-memory DOM-like tree from your json... And you can use just one simple-stupid parser that can be plugged together with many different parser-listener implementations for different kind of tasks. You can also chain together listeners if you write them to allow this like: parser -> json type converter listener -> json schema validator listener -> json syntax highlight listener -> json dom builder listener...

And there is no need to separate a tokenizer and a parser in case of a simple language like json, write it as a single piece of recursive descent parser that does everything.

If you do it by optimizing for memory usage then the variable memory requirement of the parser is one bit per nested json object/list (but this version is not a recursive descent parser).