Question

In 8086 assembly programming, we can only load a data into a segment register by, first loading it into a general purpose register and then we have to move it from this general register to the segment register.

Why can't we load it directly? Is there any special reason for not being allowed?

What is the difference between mov ax,5000H and mov ax,[5000H]? Does [5000h] mean content in memory location 5000h?

Was it helpful?

Solution

Remember that the syntax of assembly language (any assembly) is just a human-readable way to write machine code. The rules of what you can do in machine code depend on how the processor's electronics were designed, not on what the assembler syntax could easily support.

So, just because it looks like you could write mov DS, 5000h and that conceptually it doesn't seem like there is a reason why you shouldn't be able to do it, it's really about "is there a mechanism by which the processor can load a segment register directly from an immediate value?"

In the case of 8086 assembly, I figure that the reason is simply that the engineers just didn't create an electric path that could feed a signal from the memory I/O data lines to the lines that write to the segment registers.


Why? I have several theories, but no authoritative knowledge.

The most likely reason is simply one of simplifying the design: it takes extra wiring and gates to do that, and it's an uncommon enough operation (this is the 70's) that it's not worth the real estate in the chip. This is not surprising; the 8086 already went overboard allowing any of the normal registers to be connected to the ALU (arithmetic logic unit) which allows any register to be used as an accumulator. I'm sure that wasn't cheap to do. Most processors at the time only allowed one register (the accumulator) to be used for that purpose.


As far as the brackets, you are correct. Let's say memory position 5000h contains the number 4321h. mov ax, 5000h puts the value 5000h into ax, while mov ax, [5000h] loads 4321h from memory into ax. Essentially, the brackets act like the * pointer dereference operator in C.

Just to highlight the fact that assembly is an idealized abstraction of what machine code can do, you should note that the two variations are not the same instruction with different parameters, but completely different opcodes. They could have used – say – MOV for the first and MVD (MoVe Direct addressed memory) for the second opcode, but they must have decided that the bracket syntax was easier for programmers to remember.

OTHER TIPS

x86 machine code only has one opcode for move-to-Sreg. That opcode is
8E /r mov Sreg, r/m16, and allows a register or memory source (but not immediate).

Contrary to some claims in other answers, mov ds, [5000h] runs just fine, assuming the 2 bytes at address 5000h hold a useful segment value for the mode you're in. (Real mode where they're used directly as numbers vs. protected where Sreg values are selectors that index the LDT / GDT).

x86 always uses a different opcode for the immediate form of an instruction (with a constant encoded as part of the machine code) vs. the register/memory source version. e.g. add eax, 123 assembles to a different opcode from add eax, ecx. But add eax, [esi] is the same add r, r/m32 opcode as add eax, ecx, just a different ModR/M byte.


NASM listing, from nasm sreg.asm -l/dev/stdout, assembling a flat binary in 16-bit mode and producing a listing.

I edited by hand to separate the bytes into opcode modrm extra. These are all one-byte opcodes (with no extra opcode bits borrowing space in the /r field of the ModRM byte), so just look at the first byte to see what opcode it is, and notice when two instructions share the same opcode.

   address    machine code         source           ;  comments
 1 00000000 BE 0050           mov si, 5000h     ; mov si, imm16
 2 00000003 A1 0050           mov ax, [5000h]   ; special encoding for AX, no modrm
 3 00000006 8B 36 0050        mov si, [5000h]   ; mov r16, r/m16 disp16
 4 0000000A 89 C6             mov si, ax        ; mov r/m16, r16
 5                                  
 6 0000000C 8E 1E 0050        mov ds, [5000h]   ; mov Sreg, r/m16
 7 00000010 8E D8             mov ds, ax        ; mov Sreg, r/m16
 8                                  
 9                            mov ds, 5000h
 9          ******************       error: invalid combination of opcode and operands

Supporting a mov Sreg, imm16 encoding would need a separate opcode. This would take extra transistors for 8086 to decode, and it would use up more opcode coding space leaving less room for future extensions. I'm not sure which of these was considered more important by the architect(s) of the 8086 ISA.

Notice that 8086 has special mov AL/AX, moffs opcodes which save 1 byte when loading the accumulator from an absolute address. But it couldn't spare an opcode for mov-immediate to Sreg? This design decision makes good sense. How often do you need to reload a segment register? Very infrequently, and in real large programs it often wouldn't be with a constant (I think). But in code using static data, you might be loading / storing the accumulator to a fixed address inside a loop. (8086 had very weak code-fetch, so code-size = speed most of the time).

Also keep in mind that you can use mov Sreg, r/m16 for assemble-time constants with just one extra instruction (like mov ax, 4321h). But if we'd only had mov Sreg, imm16, runtime variable segment values would have required self-modifying code. (So obviously you wouldn't leave out the r/m16 source version.) My point is if you're only going to have one, it's definitely going to be the register/memory source version.

About segment registers

The segment registers are not the same (on hardware level) as the general purpose registers. Of course, as Mike W said in the comments, the exact reason why you can't move directly immediate value into the segment register is known only by the Intel developers. But I suppose, it is because the design is simple this way. Note that this choice does not affects the processor performance, because the segment register operations are very rare. So, one instruction more, one less is not important at all.

About syntax

In all reasonable implementations of x86 assembler syntax, mov reg, something moves the immediate number something to the register reg. For example:

NamedConst = 1234h
SomeLabel:
    mov  edx, 1234h      ; moves the number 1234h to the register edx
    mov  eax, SomeLabel  ; moves the value (address) of SomeLabel to eax
    mov  ecx, NamedConst ; moves the value (1234h in this case) to ecx

Closing the number in square brackets means that the content of memory with this address is moved to the register:

SomeLabel dd 1234h, 5678h, 9abch

    mov  eax, [SomeLabel+4]  ; moves 5678h to eax
    mov  ebx, dword [100h]   ; moves double word memory content from the 
                             ; address 100h in the data segment (DS) to ebx.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top