Program and Data share RAM and have different word lengths

Question

One solution is quite simple: buffering. If the memory interface is 64 bits wide (with no caches), an instruction which crosses the 64-bit boundary can have the first part placed in a small buffer and keep fetching until a full instruction is in the buffer. Because instruction fetch tends to be sequential (i.e., apart from taken branches, jumps, function calls, and returns, the next instruction will start immediately after the preceding instruction ends), any excess bits fetched to the buffer are likely to be used for the next instruction to be executed.

(A similar problem occurs even with fixed length 32-bit instructions—as in a classic RISC—in an instruction cache when seeking to fetch several per cycle. If one sought to fetch four 32-bit instructions, a 128-bit wide interface might be used; but if a branch jumped to the second instruction in a 128-bit block, only three instructions could be fetched that cycle. One solution to this problem is a self-aligned cache which fetches two 128-bit blocks—typically using banking since sequential fetches are guaranteed to use odd-even or even-odd banks and so avoid bank conflicts—so the appropriate bits are guaranteed to be available. Obviously, a self-aligned cache could be used to reduce the impact of variable starting position associated with variable length instructions.)

With an aligned memory interface (almost guaranteed), the same type of problem can occur for unaligned loads where the first portion resides in one access and the second portion resides in a second access.

Such considerations were part of the motivation behind common RISC design choices of not supporting unaligned loads and using fixed-length instructions. Avoiding the crossing of memory access boundaries can substantially simplify the design of a simple pipelined processor. (Variable length instructions can also be used without the boundary crossing issue if no instruction crosses a fetch boundary. The CDC 6600 was an example of such; it used a 60-bit fetch block and 15-bit and 30-bit instructions but did not allow an instruction to cross a fetch block boundary. The Mitsubishi/Renesas M32R is another example, having 16-bit and 32-bit instructions but not allowing a 32-bit instruction to be unaligned so that simple 32-bit aligned fetch could be used.)