Writing the translation lookaside buffer

Question 1

I want to summarize the comments in this community wiki post, combined with my current understanding. You might want to read this in case you get started knowing about the page table and the translation lookaside buffer from an abstract perspective. However, this post can probably not guarantee 100% correctness in bits and bytes.

Booting the PC

i386 PCs boot in real mode. This mode uses physical memory only and does not care about virtual memory yet. All instructions are executed with high privileges. We can think of this as being in kernel mode.

The operating system (doesn't really matter if it's Linux or Windows) becomes executed in this mode. It will set up the page table and then switch the CPU to protected mode.

; set PE bit
mov eax, cr0
or eax, 1
mov cr0, eax
; far jump (cs = selector of code segment)
jmp cs:@pm
@pm:
; Now we are in protected mode.

Opcodes for writing the page table

The page table resides in physical memory and will never be swapped to disk. Because the page table is not in the translation lookaside buffer, we can use simple memory write instructions such as mov to fill the page table. There needn't be specific assembler instructions to set up the page table.

The translation lookaside buffer

The translation lookaside buffer is just a cache for the page table. To not mix it up with the "normal" cache, it resides in a different part of the CPU.

In case the operating system writes to the page table (in RAM, not in the cache), there needs to be at least one specific assembler instruction on every CPU: for clearing the TLB so that the CPU will re-read the page table from memory.

Clearing the complete TLB may be waste of performance, because only single pages may be swapped to disk. The i486 assembler instruction invlpg therefore invalidates a single page only.

Privileged TLB opcodes or not?

Clearing the TLB seems not very critical to applications. Even if it would be possible to execute the instruction in user mode, the CPU would just read the original page table again. However, because only the kernel can write to the page table, it sounds like a good idea that the assembler instruction for clearing the TLB is a high privilege instruction.

Initial content of the TLB

The initial content doesn't really matter. The OS will set up the page table and then

clear the TLB
switch the CPU to protected mode

Question 2

On first bootup, paging is disabled so linear address = physical address.

x86-64 UEFI firmware would have to set up a simple page table of some sort, probably identity mapping physical memory so virtual = physical (but IDK, check the spec if you care), because x86-64 long mode requires that paging is enabled.

For booting in legacy BIOS mode, the firmware switches the CPU back into real mode and installs legacy BIOS int 10h and so on handlers before your MBR bootloader runs.

On x86, the TLB is managed by hardware (page-walk in response to TLB miss, invisible to software). The page tables are must be in the radix-tree data structure (using physical addresses for pointers between levels) that the hardware knows how to read directly. (https://wiki.osdev.org/Paging / https://wiki.osdev.org/Page_Tables).

See What happens after a L2 TLB miss? for more about the fact that HW page-walk fetches data itself and creates the TLB entry so the load or store (or code-fetch) can complete.

The only control software has is invlpg to invalidate cached information for one 4k page (e.g. after changing the page-table entry for it), so the HW will reload it with a page-walk on next access. (Or reloading CR3 invalidates everything except "Global" entries. There's also PCID (Process Context ID) HW support to tag TLB entries with an ID so frequently swapping between a few different page tables on the same physical core doesn't have to be a performance disaster.)

You tell the CPU where to find the page tables with mov cr3, reg to set the physical address of the top level page-directory. (There's also a control-register bit that controls whether paging is even enabled; it's optional in protected mode.)