Since your main concern seems to be learning about microcontroller design, a good approach could be taking a look into some of the earlier microprocessor models. Take for instance the Z80:
- Source: http://landley.net/history/mirror/cpm/z80.html
- Another good Z80 HW description: http://www.msxarchive.nl/pub/msx/mirrors/msx2.com/zaks/z80prg02.htm
To answer your first question (single vs. multiple buses), this chip uses a single bus for everything, and it has a very simple design. You could probably use something similar. To make the terminology clear, a single system bus may be composed of sub-buses (and they are also called buses). The figure shows a system bus composed of a bidirection data bus (8-bit wide) and an address bus (16-bit wide).
To answer your second question (how do components know when they are active), in the image above you see two distinct signals, memory request and I/O request. Only one will be active at a time, and when I/O request is active, that's when a peripheral could potentially be accessed.
If you don't have many peripherals, you don't need to use all 16 address lines (some Z80's have an 8-bit I/O space). Each peripheral would be accessed through some addresses in this space. For instance, in a very simple system:
- a timer peripheral could use addresses from 00h to 03h
- a uart could addresses from 08h to 0Fh
In this simple example, you need to provide two circuits: one would detect when the address is within the range 00-03h, and another would do the same for 08-0Fh. If you do a logic "and" between the output of each detector and the I/O request signal, then you would have two signals indicating when each of the peripherals is being accessed. Your peripheral hardware should primarily listen to this signal.
Finally, regarding your question about instructions, the dataflow inside your microprocessor would have several stages. This is usually called a processor's datapath. It is common to divide the stages into:
- FETCH: read an instruction from program memory
- DECODE: check specific bits within the instructions, and decide what type of instruction it is
- EXECUTE: take the actions required by the instruction (e.g., ALU operations)
- MEMORY: for some instructions, you need to do a data read or write
- WRITE BACK: update your CPU registers with new values affected by the instruction
Source: https://www.cs.umd.edu/class/fall2001/cmsc411/projects/DLX/proj.html
Most of your job of dealing with individual instructions would be done in the DECODE and EXECUTE stages. As for the datapath control, you will need a state machine that controls the sequence of operations through the 5 stages. This functional block is usually called a Control Unit. Here you have a few choices:
- Your state machine could go throgh all stages sequentially, one at a time. An instruction would take several clock cycles to execute.
- Similar as the choice above, but combining two or more stages in a single cycle if you want to make things simpler and faster.
- Pipeline the execution of instructions. This can give a great speed boost, but maybe it's better left for later because things can get quite complex.
As for the implementation, I recommend keeping the functional blocks as separate entities, and make sure you write a testbench for each block. Your job will go faster if you write those testbenches.
As for the blocks, the Register File is pretty easy to code. The Instruction Decoder is also easy if you have a clear idea of your instruction layout and opcodes. And the ALU is also easy if you know the operations it needs to perform.
I would start by writing testbenches for the Instruction Decoder and the Register File. Then I would write a script that runs all the testbenches and checks their results automatically. Only then I would focus on the implementation of the functional blocks themselves.