Read odd addresses, half words?

Question 1

Cell's SPEs only have 16-byte quad-word load/store, and those must be aligned on 16-byte boundaries.

If you need to address at finer granularity, you have to read-modify-write, and use bit-masks to only update the relevant parts of the data.

Obviously in C/C++ the compiler will assist with that, and there is support in the instruction set for generating and using masks.

For your example of reading 16-bits from address '2', you would have to read 128-bits from address '0' and mask out the bits you need. If you wanted to write 16-bits to address '2', you would need to read all 128-bits first, then update the appropriate 16-bits, and write the whole lot back.

Question 2

Some of the early IBM Power processors could only read/write on item-size boundaries, but would handle unaligned data with a trap (exception) handler. (And I think there was even one early version that would say "OK" and silently give you the contents of the aligned word, ignoring the low-order bits of your address.)

Pretty sure the old IBM 7000 series boxes could only read/write a full word on a 36-bit boundary (the word size), simply because there was no concept of an address more granular than that. But I believe they had read/write low/high halfword operations.

The HP 2100 series processors, IIRC, only had word (16 bit) addresses, but could do byte indexing. However, the index was only interpreted as a byte index for byte ops -- otherwise it was a word index.

In terms of aligning mallocs, though, you generally should align on a cache line boundary. Otherwise it's hard to prevent cache thrashing is an MP environment.

Question 3

If you can address a 16-bit quantity, then you can definitely read 16-bit aligned quantities. I think you are probably assuming that you will have a byte addressable address space. You may not, so caution is advised. It is definitely conceivable that some architectures (particular embedded ones) may not be byte or even 16-bit addressable -- although I don't know specific (and current) examples.

Does that actually matter? If you happen to have a machine that is word addressable, with a 32-bit addressable word size, then you could never actually address only 16 bits anyway. Be careful with sizeof, though.

You asked about the amd64 (x86-64). It has no restriction on memory aligned access, but you may lose cycles for misaligned access. Keep in mind that misaligned accesses are never going to be portable.

UPDATE: What is an aligned address?

An aligned address of type T is any address that is a multiple of sizeof(T), where sizeof(T) is the number of addressable units the value occupies. For example, if you have a 32-bit word size in a byte addressable space, the aligned addresses are at least every multiple of 4. However, if the machine is addressable in 16-bit units, then every address that is a multiple of 2 will be an aligned address for 32-bit quantities.

If you are reading 16-bit quantities, there are three cases:

Byte addressing: odd addresses are potentially misaligned. An architecture is free to treat these as aligned, but does not have too.
Addressable units are 16-bit: all addresses are align for 16-bit quantities.
Addressable units are larger: you don't actually have 16-bit quantities. They are silently larger.

UPDATE 2: Is there a CPU were reading 16 bits from address 0x2 (assuming that address range is valid) would give a bus error?

There cannot ever be such a CPU, unless the addressable unit is below 8 bits. The reason is that the alignment of address 0x2 is 2 addressable units. If the addressable unit is 8-bits, then it is 16-bit aligned.

In addition, strange values for the addressable unit size are ruled out by the intention of 16 bits. If 16 bit values are real quantities of the architecture, the the addressable unit must be a factor of 16. So it could only be 1, 2, 4, 8, or 16 bits. If it happens to be higher, the alignment is trivially satisfied.

Since an architecture that addresses less than 8 bits is not worth the trouble, you are all but guaranteed that the address 0x2 will be an aligned address for 16 bit quantities.