C++ Translate bytes to opcodes?

Question 1

Generally, a disassembler will have a combination of tables and a "decode type" (which usually is a function pointer or something that goes into a switch statement) - the decode type tells which class the instruction is - for example, xor, or, and, add, sub would have the same decoding, but call, jmp would be a different decoding. jnz, jz, jnc, jc, ja, jb, jbe, etc would have yet another decode type.

So the first level table will be 256 entry table. You then have certain entries that are "prefix", such as 0xff, where the next byte tells what the instruction "really is". Again, you get a table of 256 prefix0xff entry table.

Some entries may not be valid, as not ALL combinations are taken so far [although nearly all].

A tricky one is the "modifier prefix" entries. For examble, 0x66 will switch an instruction from 32 to 16 bit operand size (or vice versa if the processor is in 16-bit mode).

A lot of the actual decoding inside each category will involve twiddling bits and translating "bits 5-3" to register number or "bits 1-2" to address mode (is it eax, [eax] or [eax+esi], for example).

It's quite a lot of work. I wrote a disassembler for 80186, and it took me about two days of pretty much all day work. However, I knew already what I was doing. To convert that to 386 took another 2-3 days, and I wouldn't want to think about doing it for a modern x86 processor with all the SSE, MMX, 3DNow! etc instructions.

[And I've taken far too long explaining how to do this to get a "correct answer" - even though this IS the correct answer of how you do this - of course, using an already existing library is clearly the simpler way to do it].

Question 2

This is a very daunting task. The x86 instruction set is very complicated. Your best bet would be to use one of the existing x86 disassembly libraries to do what you want.

These links should get you started.

Question 3

You can use bitwise operations, for example, if your instruction is XOR and your opcode=4 bits long, and the code is 3, you need to perform a MASK and a Shift to obtain that 3, to do that, you:

your example in bin:   0011 0011 1100 0000
make a AND with:       1111 0000 0000 0000
Result:                0011 0000 0000 0000
Shift 12 places:       0000 0000 0000 0011 <-- This is 3, so you got the instruction 3

Do the same to other parts of the bits to obtain the parameters for each function.