Question

Please see below a question regarding word sizes in various instruction set architectures, and how that ties to assembly languages. Thank you for any and all help.

First a few facts (please correct me if any of these are wrong). The word size of a processor architecture indicates (EDIT: Some of these were wrong, please see Seva's post below):

  1. The largest size of each register
  2. The largest size of each memory address (amount of memory that can be addressed)
  3. The largest integer the CPU can process in a single instruction
  4. Largest piece of data that can be transferred to and from the working memory in a single operation

Now here comes the really strange thing: In assembly language for IA-32 a word is specified to be 16 bit long. IA-32 refers to all x86 versions that supports 32-bit computing (i.e. word is supposed to be 32 bit long).

That just makes my whole understanding of words and what it indicates (list above) fall apart.

Thank you for helping me get to the bottom of this,

Magnus

EDIT 2: Please see below two helpful links on the x86 architecture.

  1. As posted by altie below: http://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture
  2. A simple x86 guide I stumbled accross: http://www.swansontec.com/sintel.html
Was it helpful?

Solution

All of those assumptions have counterexamples.

The size of each instruction

Intel x86 had variable length instructions since forever ago. So does ARM when in Thumb-2 mode.

The largest size of each register

You mean - integer registers, right? Floating point registers on Intel are, for example, 10 bytes long. This one is the closest. But Intel x86 is a notable exception - its definition of "word" was fixed at the time of 16-bit CPUs; as the ISA moved on, the definition stuck.

Similarly, in the context of the ARM's AArch64 instruction set, "word" means "32 bits" while the general purpose registers are 64-bit. The definition of "word" was fixed in ARM's 32-bit heyday and stayed since.

The largest size of each memory address

Clearly wrong. 16-bit Intel 286 had a 24-bit address space. This is performed via a memory management unit (MMU) - the user-level address in a register is not the same as the physical address that goes into the memory subsystem. Same with PAE on recent Intel CPUs. In olden days, Intel x86 had its linear 20-bit address made out of 16-bit segment and offset.

The largest integer the CPU can process in a single instruction

This one is close - but again, with exceptions. There are two-register commands here and there. MIPS has hi:lo - a dedicated pair of 32-bit registers that may act as a single 64-bit one. Intel has commands that operate on xDX:xAX pair. And don't get me started on SIMD.

Largest piece of data that can be transferred to and from the working memory in a single operation

ARM has "load multiple" and "store multiple" commands that can store up to 16 registers in one go. Intel has PUSHA/POPA. On the physical level, memory buses vary, too.

The dirty little truth is that there's no single true definition of word outside of the context of a book it appears in and the assembler that uses it. On Intel, "word" used to denote a 16-bit chunk from times immemorial; as the CPUs became 32-bit and 64-bit, they've retained the definition, now we're talking about DWORDs and QWORDS. The registers on modern 64-bit Intel CPUs are QWORD-sized. Windows API, which is not strictly Intel any longer, was born on 16-bit Intel and still retains the datatypes. WORD is defined in windows.h to be unsigned short (2 bytes), and they can't change it ever - that'd break struct layouts, therefore binary formats, for everyone everywhere.

On ARM, on the other hand, "word" denotes 32 bits, even in the context of the AArch64 instruction set. So there are assembly commands like "load half-word" that work with 16 bit operands. So when coding in C for Windows on ARM (i. e. Windows Phone, Windows RT, Windows CE/Mobile) and in assembly for the same, you have to keep in mind two different definitions. Fortunately, given the ambiguity, no one thinks in terms of words - at least not without keeping the real size in the back of one's mind. Also, ARM's assembly language strongly encourages working with 32-bit values as much as you can, promoting 16-bit variables when necessary. So even 16-bit parameters to functions are internally passed as 32-bit registers.

OTHER TIPS

To elaborate on Seva's statement "don't even get me started on SIMD", many x86 instructions support lots of operand types. See here for a discussion: http://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture

Various fields in the outputs of the CPUID instruction will tell you which of these modes are supported. For example, the SSE flags will tell you if XMM registers are available, and the AVX flag will tell you if there are YMM registers available.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top