lex/yacc: why do lexers have to include a parser's header file?

Question 1

As with many things, it's just a historical accident. It certainly would have been possible for the token declarations to have been produced by lex (but see below). Or it would have been possible to force the user to write their own declarations.

It is more convenient for yacc/bison to produce the token numberings, though, because:

The terminals need to be parsed by yacc because they are explicit elements in the grammar productions. In lex, on the other hand, they are part of the unparsed actions and lex can generate code without any explicit knowledge about token values; and
yacc (and bison) produce parse tables which are indexed by terminal and non-terminal numbers; the logic of the tables require that terminals and non-terminals have distinct codes. lex has no way of knowing what the non-terminals are, so it can't generate appropriate codes.

The second argument is a bit weak, because in practice bison-generated parsers renumber token ids to fit them into the id-numbering scheme. Even so, this is only possible if bison is in charge of the actual numbers. (The reason for the renumbering is to make the id value contiguous; by another historical accident, it's normal to reserve codes 0 through 255 for single-character tokens, and 0 for EOF; however, not all the 8-bit codes are actually used by most scanners.)

Question 2

In the lexer, the tokens are only present in the return value: they are part of the target language (ie. C++), and lex itself knows nothing about them.

In the parser, on the other hand, tokens are part of the definition language: you write them in the actual parser definition, and not just in the target language. So yacc has to know about these tokens.

Question 3

The ordering of the phases is not necessarily reflected in the architecture of the compiler. The scanner is the first phase and the parser the second, so in a sense data flows from the scanner to the parser, but in a typical Bison/Flex-generated compiler it is the parser that controls everything, and it is the parser that calls the lexer as a helper subroutine when it needs a new token as input in the parsing process.