Question

I'm studying through Tanenbaum's "Modern Operating Systems" book and just read the following paragraph in the book:

When a process is started up, all of its page table entries are marked as not in memory. As soon as any page is referenced, a page fault will occur. The operating system then sets the R bit (in its internal tables), changes the page tables entry to point to the correct page, with mode READ ONLY, and restarts the instruction. If the page is subsequently modified, another page fault will occur, allowing the operating system to set the M bit and change the page's mode to READ/WRITE.

It seems to be extremely inneficient for me. He suggests that when a process is started up a lot of page faults must occur and the real memory is being filled up as the instructions are being executed.

It appears more logical to me that at least the text of the process is put in memory at the beginning, instead of it being put at every instruction execution (with a page fault per instruction execution).

Could someone explain me what is the advantage of this method that the book explains?

Was it helpful?

Solution

Tanenbaum describese two techniques in this paragraph:

When a process is started up, all of its page table entries are marked as not in memory. As > soon as any page is referenced, a page fault will occur. The operating system then sets the > R bit (in its internal tables), changes the page tables entry to point to the correct page, > with mode READ ONLY, and restarts the instruction.

This technique is also called demand-paging (the pages are loaded from disk to memory on-demand, if a page-fault occurs). I can think of at least two reasons why you would want to do this:

  1. Memory consumption: Only the pages that are really needed are loaded from disk into main memory, there might be parts of your program you never execute or parts in your data section you never write to during the execution. In that case, these parts are never loaded in the first place, which means you have more RAM available for other processes. Nowadays, with huge amounts of memory you can of course debate if this is still a valid argument.

  2. Speed: Loading from disk is slow and was much slower a decade ago. Doing the pagetable setup on-demand in a lazy fashion allows to defer the block fetching from the disk. Loading everything at once might delay the execution of your program. Again, disks are now a lot faster and SSDs make this argument even more void. On the other hand, because of dynamic libraries, binaries are not that big and usually require only a few page-faults until they are loaded in RAM.

If the page is subsequently modified, another page fault will occur, allowing the operating system to set the M bit and change the page's mode to READ/WRITE.

Again, the reason for this is memory consumption. In the old days, where memory was scarce, swapping (moving pages back to disk again if the memory became full) was the solution to provide you with an illusion of a much larger working set of pages. If a page was already swapped out before and never modified inbetween you could just get rid of the page by removing the present bit in the pagetable, thus freeing up the memory the page previosuly occupied to load another frame. The modified bit helps you to detect if you need to write a new version of the page back out to disk, or if you can actually leave the old version as is and swap it back in again once it is needed.

The method you mention where you setup a process with all page table entries prepopulated (also known as pre-paging) is perfectly valid. You are trading memory consumption for speed. The page-table walk and also setting the modified bit is implemented in hardware (on x86) which means it performs not that bad. However, pre-population saves you from executing the page-fault handler, which altough usually heavily optimized, is implemented in software.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top