Starting to write an ARM RTOS and confused about memory mapping

Question

So there is the processor, then the mmu, then l1 cache then the edge of the "processor core" even though what is buried under the early memory systems is a core also but deeper.

When the "processor" accesses some address that the programmer is directly manipulating (the value you have in the register used to hold the address for a load or store, plus any offsets you encode).

If the mmu is enabled, then the mmu takes some bits from the address, does some math based on an address register and how the mmu is configured then generates its own memory cycle on the memory side of the mmu (there is the processor side and the memory side). Which looks up the user programmed mmu table information which is how things like changing the virtual to physical address happens. Once the mmu has completed the number of memory cycles it needs to gather then data (note the mmu might have a small cache of prior lookups to save on actual memory cycles), then, so long as there is no fault (the address being accessed is described in the tables and the permissions match such that you are allowed to access that memory, and/or the mmu table lookups didnt themselves cause a fault) the access the processor was wanting to do is performed using the physical address.

if the mmu is disabled then the processors access goes right into the l1 cache and then off into the memory system via the axi or amba bus. The l2 cache lives on the amba/axi if there is an l2.

Once you get to the amba/axi then you get into the vendor logic, whomever made the chip (arm doesnt make chips it makes processor cores, some vendor wraps that core with their own logic and then makes chips and sells chips). You get into that vendors memory system which could be very simple to very complicated. You could for example have configurable settings such that some address space like zero can at one point (on power up for example) point at a rom, then later if you change a setting accesses at or near zero will cause it to go to some ram. You can have some logic that manipulates the whole address space, say the top two bits of the address for example go into some logic that has four sets of control registers and for each of the quarters of the address space those control registers may do stuff to the address or other things not unlike an mmu.

Eventually the vendor logic will start to decode more of the address bits and determine if you are trying to get to actual ram, rom, or peripherals, then as it gets closer to the final target a csr within a peripheral, etc the address bits are further decoded. The address space of any processor can (isnt always) but can be like a tree, the trunk is where the address leaves the processor but as the address is parsed it can branch off in different directions eventually it finds the individual leaf you were trying to address, be it a memory location in ram, a csr in some peripheral or some ram or other item in a peripheral (which could cause some other chain of events on some other bus, say usb or pcie for example).

So after saying all of that the short answer here is, first off you should run without the mmu and caches, understand the "physical" address space of the (vendor side of the) chip, which you will have to understand with or without the mmu. this address space is very specific to that vendor and probably that chip or chip family so you will need the chip vendors documentation. Then later learn how to use the mmu, I would first recommend trying it with the virtual addresses being the same as the physical, learn to mark ram as cacheable and peripheral address space as non-cacheable (then turn on the data cache and see if it worked). then learn to add blocks of virtual address space that point to different physical addresses, from there you are ready to start using the mmu for an operating system.

The arm processor itself doesnt know a peripheral from a ram from a rom from a hole in the wall. The address bits are just bits, some patterns that it generally doesnt care about, with some exceptions on some arm processor architectures where there are some peripherals inside the arm itself which it does decode and not let you take those further down the line, but the vendors peripherals, ram, rom, etc the arm does not know or care about so those address spaces can have their virtual addresses different from the physical. What you dont want to do is have the peripherals address spaces (data) cacheable. Some newer arm cores have some rules about branch prediction and the limits for which you might get some fetching, so if you have that fine tuned flexibility in your setup and you have peripherals that perform some change (clear on read, auto incrementing an address within the peripheral, etc) which are bad design choices in modern systems anyway, you would want to avoid possibly creating a situation where a branch prediction instruction fetch might cause a read to one of those locations.

Another problem you may face that redboot may or may not have fixed for you is dram, or even sram for that matter, but if the ram is outside the processor and it requires training or tuning and that training or tuning was done for you buy redboot or a pre-redboot bootloader, then you really will want to just let redboot do that and then have redboot load and start your rtos. initializing the physical memory can be (not always) a learning experience on par with writing your own rtos from scratch so on a number of these linux capable systems, you need to decide how much stuff you want to re-invent and why. You will have plenty enough work to get something going with no mmu and a flat physical address based memory space, build one bridge at a time dont try to build them all at once, wait till you get to the next obstacle before you try to tackle it.