Lasse's answer is quite good; I just want to emphasize a few of his points.
First, C# is translated into an equivalent IL program, and then the IL program is translated into an equivalent machine code program. The IL language has the concept of an evaluation stack; this is purely a concept in the IL language. All that is required is that the jitter translate that IL into machine code that has the same final result. There is no requirement whatsoever that the jitter actually use the "real" stack just because the IL program was written to use an "evaluation stack".
Basically, IL is a language which assumes an extremely simplified kind of processor; there are no "registers" in IL, only stack. In a real processor of course there are registers, and so of course the machine-code translation of the IL will not slavishly follow its usage of an evaluation stack; that would be silly.
The "max stack" refers to the "abstract" stack of the IL language; it has no particular connection to the actual thread's stack in the machine code.
I've written a number of articles about the meaning of "the stack" and why we use IL; you might want to check them out if this subject interests you: