As with many things, it's just a historical accident. It certainly would have been possible for the token declarations to have been produced by lex
(but see below). Or it would have been possible to force the user to write their own declarations.
It is more convenient for yacc
/bison
to produce the token numberings, though, because:
The terminals need to be parsed by
yacc
because they are explicit elements in the grammar productions. Inlex
, on the other hand, they are part of the unparsed actions andlex
can generate code without any explicit knowledge about token values; andyacc
(andbison
) produce parse tables which are indexed by terminal and non-terminal numbers; the logic of the tables require that terminals and non-terminals have distinct codes.lex
has no way of knowing what the non-terminals are, so it can't generate appropriate codes.
The second argument is a bit weak, because in practice bison
-generated parsers renumber token ids to fit them into the id-numbering scheme. Even so, this is only possible if bison
is in charge of the actual numbers. (The reason for the renumbering is to make the id value contiguous; by another historical accident, it's normal to reserve codes 0 through 255 for single-character tokens, and 0 for EOF; however, not all the 8-bit codes are actually used by most scanners.)