Question

Applications work with virtual memory. That's a nice concept, we can treat it like a contiguous block and needn't care about whether it's contiguous in physical RAM or not or if it's even on hard disk.

As far as I understand the mapping between virtual addresses and physical addresses is done directly by the CPU (or at least in hardware) using the translation lookaside buffer.

I assume that the kernel somehow maintains the contents of the TLB, but here's where I'm a little bit stuck, so I have 3 questions:

What are the specific assembler instructions for querying, reading and writing the contents of the TLB? I was not very lucky googling for opcodes related to this topic. I just found INVLPG and TLBWI. Because I'm new to that level of depth, maybe I just use the wrong search terms. If you find it immediately, which search terms do you use?

Do all those assembler instructions need privileged CPU mode?

What is the initial content of the TLB? If the TLB is empty at the beginning, does that mean the very first assembler instructions of an operating system will be mapped directly to physical memory (so that virtual address == physical address)?

Was it helpful?

Solution

I want to summarize the comments in this community wiki post, combined with my current understanding. You might want to read this in case you get started knowing about the page table and the translation lookaside buffer from an abstract perspective. However, this post can probably not guarantee 100% correctness in bits and bytes.

Booting the PC

i386 PCs boot in real mode. This mode uses physical memory only and does not care about virtual memory yet. All instructions are executed with high privileges. We can think of this as being in kernel mode.

The operating system (doesn't really matter if it's Linux or Windows) becomes executed in this mode. It will set up the page table and then switch the CPU to protected mode.

; set PE bit
mov eax, cr0
or eax, 1
mov cr0, eax
; far jump (cs = selector of code segment)
jmp cs:@pm
@pm:
; Now we are in protected mode.

Opcodes for writing the page table

The page table resides in physical memory and will never be swapped to disk. Because the page table is not in the translation lookaside buffer, we can use simple memory write instructions such as mov to fill the page table. There needn't be specific assembler instructions to set up the page table.

The translation lookaside buffer

The translation lookaside buffer is just a cache for the page table. To not mix it up with the "normal" cache, it resides in a different part of the CPU.

In case the operating system writes to the page table (in RAM, not in the cache), there needs to be at least one specific assembler instruction on every CPU: for clearing the TLB so that the CPU will re-read the page table from memory.

Clearing the complete TLB may be waste of performance, because only single pages may be swapped to disk. The i486 assembler instruction invlpg therefore invalidates a single page only.

Privileged TLB opcodes or not?

Clearing the TLB seems not very critical to applications. Even if it would be possible to execute the instruction in user mode, the CPU would just read the original page table again. However, because only the kernel can write to the page table, it sounds like a good idea that the assembler instruction for clearing the TLB is a high privilege instruction.

Initial content of the TLB

The initial content doesn't really matter. The OS will set up the page table and then

  • clear the TLB
  • switch the CPU to protected mode

OTHER TIPS

On first bootup, paging is disabled so linear address = physical address.

x86-64 UEFI firmware would have to set up a simple page table of some sort, probably identity mapping physical memory so virtual = physical (but IDK, check the spec if you care), because x86-64 long mode requires that paging is enabled.

For booting in legacy BIOS mode, the firmware switches the CPU back into real mode and installs legacy BIOS int 10h and so on handlers before your MBR bootloader runs.


On x86, the TLB is managed by hardware (page-walk in response to TLB miss, invisible to software). The page tables are must be in the radix-tree data structure (using physical addresses for pointers between levels) that the hardware knows how to read directly. (https://wiki.osdev.org/Paging / https://wiki.osdev.org/Page_Tables).

See What happens after a L2 TLB miss? for more about the fact that HW page-walk fetches data itself and creates the TLB entry so the load or store (or code-fetch) can complete.

The only control software has is invlpg to invalidate cached information for one 4k page (e.g. after changing the page-table entry for it), so the HW will reload it with a page-walk on next access. (Or reloading CR3 invalidates everything except "Global" entries. There's also PCID (Process Context ID) HW support to tag TLB entries with an ID so frequently swapping between a few different page tables on the same physical core doesn't have to be a performance disaster.)

You tell the CPU where to find the page tables with mov cr3, reg to set the physical address of the top level page-directory. (There's also a control-register bit that controls whether paging is even enabled; it's optional in protected mode.)

See also How does x86 paging work?


By contrast, some ISAs like MIPS do use software TLB management, where a TLB miss traps to an OS-supplied handler, a bit like a page-fault handler, that uses its own data structures (in a special area of memory that can't TLB-miss). TLBWI is a MIPS instruction, not x86.

This is not an option for x86.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top