Trying to assemble the output of an disassembler (such as objdump) [duplicate]

https://stackoverflow.com/questions/8510129

16-03-2021
|

Domanda

Possible Duplicate:
Disassembling, modifying and then reassembling a Linux executable

I've been told that assembly and dissassembly are not inverses. Apparently, you can't dissassemble a program, put that output directly into an assembler, and expect it to run correctly because information is lost.

My question is, why is information lost? Also, what information is lost?

Soluzione

One important thing that disassemblers (or their users) routinely do not preserve is the instruction encoding. Some instructions can be encoded in multiple different ways, e.g.:

mov rdx, -1 is either 48,BA,FF,FF,FF,FF,FF,FF,FF,FF (10 bytes) or 48,C7,C2,FF,FF,FF,FF (7 bytes).

If the rest of the program somehow functionally depends on the length of the above instruction being exactly 10 (or 7) bytes or on those specific byte values and the assembler chooses to assemble mov rdx, -1 differently from what it was in the original program, then after disassembly+assembly you get a different program that will work differently. For instructions with ambiguous encoding the assembler must use not the instruction mnemonic (mov rdx, -1) but its exact encoding in the disassembly of the original program (e.g. 48,BA,FF,FF,FF,FF,FF,FF,FF,FF).

There may be other things that the assembler or linker may do differently (e.g. do additional aligning of code/data, name and order things (sections/segments) differently in the output file), which usually aren't a problem, but, again, if there're some unusual dependencies on these things in the original program, then, the reassembled program will work differently.

Altri suggerimenti

Its not a loss it is actually a gain. it sounds like you have not tried this yet, why not try it?

.global reset
reset:

  mov #0x0280,r1
  call #notmain
  jmp hang

.global hang
hang:
  jmp hang

which you can assemble looks like this with objdump:

0000f800 <reset>:
    f800:   31 40 80 02     mov #640,   r1  ;#0x0280
    f804:   b0 12 b2 f8     call    #0xf8b2 
    f808:   00 3c           jmp $+2         ;abs 0xf80a

0000f80a <hang>:
    f80a:   ff 3f           jmp $+0         ;abs 0xf80a

you can see the core code is still there and if you have a text editor with column or some other rectangle cut and paste you can cut that code out of the middle and either directly or with a little massaging re-assemble it.

There is no reason why you could not have a disassembler that generates output that can be re-assembled, I have done it many times and seen it many times. The thing is with a disassembler, the use case is to see that extra information. A use case for a disassembler that can re-assemble is for like hacking someones code or something like that.

I highly recommend for people to write disassemblers anyway, and this would be a good reason to, your education both in the art of learning the instruction set and how it is encoded, if a variable instruction length instruction set (x86) there is a lot more to learn (I recommend NOT learning one of those first, go with arm or thumb or something like that first, or at least something not as painful as x86, like the msp430). A good way to test your disassembler is to output code that can be re-assembled. assemble, disassemble, assemble and if the two assembly outputs match then your disassembler did a good job.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow