What exactly do shadow page tables (for VMMs) do?

https://stackoverflow.com/questions/9832140

25-05-2021
|

Question

My understanding is that shadow page tables eliminate the need to emulate physical memory inside of the VM.

ie.

Instead of:

guest OS -> VMM + virtual physical memory -> host OS -> host hardware

It's just:

guest OS -> VMM -> host OS -> host hardware

The shadow page tables just allows the process to access the host hardware's memory properly. I also do not understand how page faults would work (or since all physical memory is handled by the host, the host takes care of page faults, swap, etc).

Solution

Shadow page tables are used by the hypervisor to keep track of the state in which the guest "thinks" its page tables should be. The guest can't be allowed access to the hardware page tables because then it would essentially have control of the machine. So, the hypervisor keeps the "real" mappings (guest virtual -> host physical) in the hardware when the relevant guest is executing, and keeps a representation of the page tables that the guest thinks it's using "in the shadows," or at least that's how I like to think about it.

Notice that this avoids the GVA->GPA translation step.

As far as page faults go, nothing changes from the hardware's point of view (remember, the hypervisor makes it so the page tables used by the hardware contain GVA->HPA mappings), a page fault will simply generate an exception and redirect to the appropriate exception handler. However, when a page fault occurs while a VM is running, this exception can be "forwarded" to the hypervisor, which can then handle it appropriately.

The hypervisor must build up these shadow page tables as it sees page faults generated by the guest. When the guest writes a mapping into one of its page tables, the hypervisor won't know right away, so the shadow page tables won't instantly "be in sync" with what the guest intends. So the hypervisor will build up the shadow page tables in, e.g., the following way:

Guest writes a mapping for VA 0xdeadbeef into it's page tables (a location in memory), but remember, this mapping isn't being used by the hardware.
Guest accesses 0xdeadbeef, which causes a page fault because the real page tables haven't been updated to add the mapping
Page fault is forwarded to hypervisor
Hypervisor looks at guest page tables and notices they're different from shadow page tables, says "hey, I haven't created a real mapping for 0xdeadbeef yet"
So it updates its shadow page tables and creates a corresponding 0xdeadbeef->HPA mapping for the hardware to use.

The previous case is called a shadow page fault because it is caused solely by the introduction of memory virtualization. So the handling of the page fault will stop at the hypervisor and the guest OS will have no idea that it even occurred. Note that the guest can also generate genuine page faults because of mappings it hasn't tried to create yet, and the hypervisor will forward these back up into the guest. Also realize that this entire process implies that every page fault that occurs while the guest is executing must cause an exit to the VMM so the shadow page tables can be kept fresh. This is expensive, and one of the reasons why hardware support was introduced for memory virtualization. (here is one quick intro to nested, or extended page tables)

A good reference for this is this book

OTHER TIPS

When the guest writes a mapping into one of its page tables, the hypervisor won't know right away, so the shadow page tables won't instantly "be in sync" with what the guest intends.

Not precisely. Guest page tables are read-only. Whenever there is an update (e.g., a new mapping added) in the guest page table, it traps to the hypervisor and the hypervisor updates the shadow page table accordingly to be "in sync" with the guest.

References:

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow