What is the “EU” in x86 architecture? (calculates effective address?)

https://stackoverflow.com/questions/791798

16-09-2019
|

Question

I read somewhere that effective addresses (as in the LEA instruction) in x86 instructions are calculated by the "EU." What is the EU? What is involved exactly in calculating an effective address?

I've only learned about the MC68k instruction set (UC Boulder teaches this first) and I can't find a good x86 webpage by searching the web.

Solution

"EU" is the generic term for Execution Unit. The ALU is one example of an execution unit. FADD and FMUL, i.e. the floating point adder or multiplier, are other examples - as, for that matter are (is) the memory unit, for loads and stores.

The EUs relevant to LEA instructions are the ALU (add, subtract, AND/OR, etc.) and the AGU (Address Generation Unit). The AGU is coupled to the memory pipelines, TLB, data cache, etc.

A typical Intel x86 CPU back when I wrote the first codegen guide had 2 ALUs, 1 load pipeline tied to an AGU, a store address pipeline tied to a second AGU, and a store data pipeline. As of 2016 most have 3 or 4 ALUs and more than one load pipe.

LEA is a 3 input instruction - BaseReg+IndexReg*Scale+Offset. Just like the memory addressing mode of x86, which actually has a 4th input, the segment base, that is not part of the LEA calculation. 3 inputs necessarily costs more than the 2 inputs needed for ADD.

On some machines, the ALU can only do 2 input operations. LEA therefore can only execute on an AGU, specifically the AGU used for load (because the store ALU doesn't write a register). This may mean that you cannot do LEA at the same time as Load, or two LEAs at the same time, whereas you can two Adds and a load in the same cycle.

On other machines, LEA can be done by one, or two or three, of the ALUs. Possibly instead of the AGU - possibly as well as the ALU. This proves more flexibility.

Or, the simple LEAs, eg regscale+offset, can be done on the ALUs, whereas the biggest LEAs, eg breg+iregscale+offset, may be restricted, or possibly even broken into two uops.

So, the question comes down to: which EU (Execution Unit) handles which LEAs? The ALU or the AGU? The answer depends on the machine.

Generic text in an optimization guide may simply say "EU" rather than "AGU or ALU, depending on the model" or "whichever EU is capable of handling that particular LEA".

OTHER TIPS

Intel's own Software Developer's Manuals are a good source of information on the x86, though they may be bit of an overkill (and are more reference-like rather than tutorial-like).

The EU (Execution Unit) reference was most likely in contrast to ALU (Arithmetic Logic Unit) which is usually the part of the processor responsible for arithmetic and logic instructions. However, the EU has (or had) some arithmetic capabilities as well, for calculating memory addresses. The x86 LEA instruction conveys these capabilities to the assembly programmer.

Normally you can supply some pretty complex memory addresses to an x86 instruction:

sub eax, [eax + ebx*4 + 0042]

and while the ALU handles the arithmetic subtraction, the EU is responsible for generating the address.

With LEA, you can use the limited address-generating capabilities for other purposes:

lea ebx, [eax + ebx*4 + 0042]

Compare with:

mul ebx, 4
add ebx, eax
add ebx, 0042

"Volume 1" on the page I've linked has a section "3.7.5" dicussing addressing modes - what kind of memory addresses you can supply to an instruction expecting a memory operand (of which LEA is one), reflecting what kind of arithmetic the EU (or whatever the memory interface part is called) is capable of.

"Volume 2" is the instruction set reference and has definitive information on all instructions, including LEA.

EU = Execution Unit?

Effective Address is the address that would have been accessed if the LEA instruction had been an instruction that actually performed some sort of arithmetic or other data access. Its 'intended' use is to calculate the resulting pointer from a pointer arithmetic or array indexing operation. However, because it can perform some combination of multiply and add, it's also used to optimize some regular calculations.

The internals of processors inside a single family have changed a lot over the years, so that "EU" reference would need to be clarified with the exact cpu model. As an analogy to your m68k experience, the instruction set for 68000, 010, 020, 030, 040 and 060 are mostly the same but their internals are really different, so any reference to an internal name needs to come with their part number.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow