A pretty traditional lexer can work something like this:
- Get a character from somewhere, be it a file or a buffer
- Check what the current character is:
- Is it a whitespace? Skip all whitespace
- Is it a comment introduction character? Get and skip the comment
- Is it a digit? Then try to get a number
- Is it a
"
? Then try to get a string - Is it a character? Then try to get an identifier
- Is the identifier a keyword/reserved word?
- Otherwise, is it a valid operator sequence?
- Return the token type
Instead of checking single characters at a time, you can of course use regular expressions.
The best way to learn how a hand-written lexer works, is (IMO) to find simple existing lexers and try to understand them.