Best way to store tokens during lexing stage

https://stackoverflow.com/questions/23122381

05-07-2023
|

Question

Hi I am currently implementing a lexer that breaks XML files up into tokens, I'm considering ways of passing the tokens onto a parser to create a more useful data structure out of said tokens - my current plan is to store them in an arraylist and pass this to the parser , would a link list where each token points to the next be better suited? Or is being able to access tokens by index easier to make a parser for? Or is this all a terrible strategy?

Also if anyone has used antlr , I know it uses a token stream to pass tokenized input to the parser, how can the parser make decisions on if the input is valid / create a data structure if it does not have all the tokens from the input yet?

Any feedback / opinion welcome, thanks!

Solution

The most common architecture for this type of parser, to run the lexer inside your parser. Every time you need a token , make a call to a function (from lexer) that retrieves the next one.

I don't know Antlr, but I think they all uses the same. What I'm proposing is how the yacc and lex work.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow