How does CPU make data request via TLBs and caches?

Question 1

The pagemap is only applicable for virtual to physical address translation. However, as it's residing in memory and only partially cached in the TLBs, you may have to access it there during the translation process.

The basic flow is as follows:

Execution calculates the address (actually some calculations like scale and offsets could be done in the memory unit).
Lookup in the DTLB
2.a. If missed, lookup in the 2nd level TLB.
2.a.a. if missed - start a page walk.
2.a.b. if hit the 2nd level TLB, fill into the DTLB and proceed with the new physical address
2.b. is hit in the DTLB proceed with physical address
Lookup the L1, if missed - lookup the L2, if missed again lookup the L3, if missed - send to the memory controller, wait for DRAM access.
When data returns (from whichever level), fill in to the caches along the way (depending on fill policy, cache inclusiveness, and instruction temporality specifications, memory region type, and probably other factors as well).

If a pagewalk was required, stall main request, and issue physical loads to the pagemap (according to the architectural definition). In x86 it may include CR3, PDPTR, PDP, PDE, PTE, etc.. depending on the paging mode, page sizes, etc.. Note that under virtualization, each pagewalk level on the VM may require a full pagewalk on the host (so you actually square the number of steps needed).

Note that a pagemap is basically a tree structure, where each access depends on the value of the previous one (and part of the virtual address you translate). These accesses are therefore dependent, and only once the last one is done you get the physical address and can go back to #3. All along, the line you want may be sitting in your L1 without you being able to know (although to be honest, if you did a pagewalk you're not likely to still have the line in your upper caches).

Other important notes - the pagemap is in physical space and accessed that way. You don't want to have to translate the accesses you need for translation, that could be a deadlock :)
More importantly, the pagemap data can be cached, so while a simple memory access may expand to multiple ones due to a TLB miss, the pagewalk may still be fairly cheap.

Question 2

Yes, as explained in a long description here:

http://lwn.net/Articles/252125/

the passage from CPU to L1 to L2 to L3 is pictorially illustrated.

enter image description here