Physical or virtual addressing is used in processors x86/x86_64 for caching in the L1, L2 and L3?

Question

The answer to your question is - it depends. That's strictly a CPU design decision, which balances over the tradeoff between performance and complexity.

Take for example recent Intel Core processors - they're physically tagged and virtually indexed (at least according to http://www.realworldtech.com/sandy-bridge/7/). This means that the caches can only complete lookups in pure physical address space, in order to determine if the line is there or not. However, since the L1 is 32k, 8-way associative, it means that it uses 64 sets, so you need only address bits 6 to 11 in order to find the correct set. As it happens to be, virtual and physical addresses are the same in this range, so you can lookup the DTLB in parallel with reading a cache set - a known trick (see - http://en.wikipedia.org/wiki/CPU_cache for a good explanation).

In theory one can build a virtually index + virtualy tagged cache, which would remove the requirement to go through address translation (TLB lookup, and also page walks in case of TLB misses). However, that would cause numerous problems, especially with memory aliasing - a case where two virtual addresses map to the same physical one.

Say core1 has virtual addr A caches in such a fully-virtual cache (it maps to phys addr C, but we haven't done this check yet). core2 writes to virtual addr B that map to the same phys addr C - this means we need some mechanism (usually a "snoop", term coined by Jim Goodman) that goes and invalidates that line in core1, managing the data merge and coherency management if needed. However, core1 can't answer to that snoop since it doesn't know about virtual addr B, and doesn't store physical addr C in the virtual cache. So you can see we have an issue, although this is mostly relevant for strict x86 systems, other architectures may be more lax and allow a simpler management of such caches.

Regarding the other questions - there's no real connection with PAT that I can think of, the cache is already designed, and can't change for different memory types. Same answer for the other question - the HW is mostly beneath the distinction between user/kernel mode (except for the mechanisms it provides for security checking, mostly the various rings).