Segmented Memory vs Flat Memory

https://stackoverflow.com/questions/11074099

15-06-2021
|

Question

I just don't get it. Any manual is too technical. What are flat and segmented memory? Ways of addressing a memory, ways of organizing bytes in memory? Which of them is best for 32-bit computers? Can anybody explain it? What does real-mode and protected-mode have to do with flat or segmented memory? Thanks!

Solution

If you're only interested in applications running on existing 32/64 bits operating systems, you can simply forget segmented memory. On 32 bits OSes, you can assume that you have 4 GB of “flat” memory space. Flat means that you can manipulate addresses with 32 bits values and registers, as you would expect.

On 16 bits processors, I believe an address was 20 bits wide, and you couldn't store that in a register, so you had to store a base in one register, and to specify an actual address, you had to add an offset to that base. (If I remember correctly, the base was multiplied by 16, then the offset was added to get the actual address.) This means that you could only address 64 KB at once; memory had to be “segmented” in 64 KB blocks.

To be honest, I think the only reason beginners still hear about that is because a lot of old 16 bits tutorials and books are still around. It's really not needed to understand how a program works at the assembly level. Now if you want to learn OS development, that's another story. Since a PC starts up in 16 bits mode, you will need to learn at least enough to be able to activate the flat 32 bits mode.

Just noticed you also asked about real mode vs protected mode. Real mode is the mode that MS DOS used. Any program had access to any hardware feature, for example it was common to directly talk to the graphics card's controller to print something. It didn't cause any problem because it wasn't a multitasking OS.

But on any modern OS, normal programs don't access hardware directly, they don't even access the memory directly. The OS manages the hardware and decides which process gets to run on the processor(s). It also manages a virtual address space for every process. This kind of feature is available with protected mode, which I believe came with the 386, which was the first 32 bits processor for PC.

OTHER TIPS

Instructions that access something using an address (memory, I/O, memory mapped I/O, etc) sometimes provide the complete (from the perspective of that layer of processor execution) address, sometimes they provide an offset. Your near or relative jumps for example the program counter is the base address and the instruction provides an offset to that base, add the two together and you get the address (at that level).

Take a 16 bit system where you have 16 bit registers and a 64KByte maximum address space limit. A very simple way to expand that memory is to segment. Instead of the register containing the entire address, the register in your instruction contains an offset to a base, much like a pc-relative instruction. Except in this case there is yet another register that is used as the base address. You see this in a number of architectures that wanted to easily expand their address range without too much if any modification to the core. (can be done in the memory controller without modification to the core) In the case of the x86 there were a few registers. One was used to expand the reach of execution, branches. Another to extend the reach of data accesses, loads and stores. The address of a non-pc-relative branch was computed using the code segment shifted left 4 bits then added to the register specified in the instruction. For loads and stores that are not pc-relative the data segment register was used, shift left 4 add the register specified in the instruction. so if you want to address 0x123456789 you could have the segment register contain 0x12340000 and the register used for addressing contain 0x56789, or the segment 0x12345678 and the gpr contain 0x9. Pc relative addressing is of course segment + pc + offset.

This lead to the adoption of various memory models. tiny, small, medium, large, huge. You can imagine that the smallest model would have the rule or assume in the case of x86 that everything is within a 64K memory space, the compiler and your code never have to worry about segment registers, they are assumed to stay fixed. For larger models or when using a far pointer reaching farther is no big deal, you set the data segment then set the data offset and perform the load or store. For code you could imagine it a little harder, since as soon as you change the code segment register it affects the overall address where you are fetching instructions. You might want a hardware solution to allow a branch to modify both segment and offset, or you could do it in code (if the hardware allowed). I wont confuse you with that one for now.

Whenever you have an array in code:

unsigned char abc[123];

That is basically the same. The base address, the address where the array starts in memory is like your segment and the index is your offset. If in the above abc was at the address 0x1004 then abc[5] is at address 0x1004+5 = 0x1009. Not shifted like the x86 segment:offset addressing, but the same concept of adding the base and offset. Some segmented architectures you dont have addition, some bits in some register somewhere are the upper bits. Take address 0x12345 on one of these systems 0x1 has to be in the segment and 0x2345 in the 16 bit gpr. You can think of it as a shift and add if you want, but unlike the x86 segment:offset you can also think of it as a shift and or.

Flat memory space is a bit of an illusion, esp in x86 systems. x86 computers, 32 bit and even many 64 bit, limit the amount of the flat memory space for plug in cards to be a total of 1Gig, makes a lot of sense for a 32 bit system where you have 4 gig address space total, and is why some of these give you a 3 gig limit, or give you the illusion of 4 gig, but have chopped out some of that for the plug in cards. (many of your on motherboard items are in this space as well as the actual plug in cards). Depending on the video card and resolution, etc you sometimes cannot fit the entire frame buffer in the subset of that peripheral space, so you have to segment your access. the bios may have given you address 0x80000000 as the base in the x86 address space, then in some other register in the video card you specify the address within the video cards address space. for demonstration purposes lets say you were given a 16MByte window at x86 address 0x80000000. 16Mbytes is 0x01000000. if you wanted to access address 0x04321888 in video memory you can imagine having to set a segment register in the video card to 0x04, then in x86 address space (which is also pci(e) address space) use address 0x80321888.

The bottom line here is take some bits from here and some bits from there, put them together and that is the address at the target. When dealing with peripherals be it a video card or the on board I/O controllers, or pci or pcie controller, you have to learn to think in terms of the targets address space. the processor has an address space from your programs perspective. The mmu can and does scramble that into a physical address space, then you have your pcie address space, and then peripherals accessed through pcie have their own address space. What intel and the intel based pc world did is make the processors physical address space and the pcie address space the same. The virtual vs physical scrambling in the mmu is still there, and the window into the peripherals address space is still there, you still need to take a little bit of address from here and a little from there to get the final address at whatever target.

Real and protected has to do with access. In C for example you can create pointers, change the pointers and create whatever address you want, wouldnt that imply that you can poke around in another applications memory, or the kernel's memory? Ideally you dont want to let that happen so for each application when you are executing instructions for that application you are in a bit of a virtual machine, every memory access be it code or instruction goes through a filter if you will. That filter checks to see if that access is within the programs allowed space, if it goes outside that space an exception happens (think interrupt) that exception allows the kernel (which doesnt have these restrictions or has different restrictions) to decide to allow that access, or perhaps virtualize an access to something, or throw up a warning to the user (General protection fault). Take for example an actual virtual machine program like vmware, allow the virtualized program to actually run instructions on the processor, when that virtualized program accesses what it thinks is the address to a video card, a protection fault happens, the vmware driver/applicaiton (think kernel level) takes that address and fakes the video cards response and returns control to the application. Letting instructions execute on the metal allows much faster virtualization to the alternative which is to simulate every processor instruction. That is the extreme case, even the web browser you are reading this on has been virtualized so that it thinks it has a memory space that is based on some base address like 0x000 or 0x8000, you compile each program for a specific OS to the same flat virtual memory space and the operating system takes care of changing the addresses from virtual to physical. Your web browsers access to its address 0x8000 might be physical 0x12345678, and your mp3 player programs 0x8000 access might be physical 0x2345678 but to both applications their instructions are computing 0x8000.

Asking what is best is always a relative term, one persons best is another persons worst. You have to define best from worst for yourself. Momentum and public opinion drove x86 into flat memory space or at least the illusion of flat memory space from the programmers perspective so you will have less trouble going with the flow.

I recommend getting a copy of the 8086/8088 programmers and hardware reference manual, one can be had for a few bucks.

http://www.amazon.com/Manual-Programmers-Hardware-Reference-240487-001/dp/1555120814/ref=sr_1_1?ie=UTF8&qid=1340000636&sr=8-1&keywords=8088+programmers+reference

Take a simulator like pcemu (I have a clone for this purpose http://github.com/dwelch67/pcemu_samples) and play with the instruction set, old school well before virtualization, protection, etc. Back when you had segments and offsets computed as mentioned above, shift the segment left four and add the offset (which is all described in this manual). Every generation since has done something to try to improve upon while try to be reverse compatible. Which has helped the profits of course but turned the processors into nasty beasts. You are better off forgetting about x86 details and learn some cleaner systems, because of how x86 has evolved, you are going to have minimal gains for example trying to write something in asm over compiled code. As processors from different families execute the same code at different speeds, often the new generation executes hand tuned code from the prior at a much slower rate. You cannot hand tune something to be fast on all platforms, not any faster than the compiler can so just leave x86 asm code to the compilers. Work on sane platforms where you dont have these issues and can tune if you like or make a better compiler, etc.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow