What does actual machine code look like at various points? [closed]

https://stackoverflow.com/questions/10324835

03-06-2021
|

Question

There seems to be many opinions on what machine code actually is. I've heard some say it's assembly, or binary, or hex.

Is it correct to say that machine code is essentially a set of instructions for a particular processor? If so, I imagine these can be represented in binary or hexadecimal notation, or assembly. But what does the non-translated "actual" machine code look like? Is it based on the word size of the architecture? Or is hexadecimal for all intents and purposes the default representation?

What does it look like when sitting on a hard drive? What does it look like when sitting in a register? How about when it's being processed, is it simply a set of voltage changes at that point?

Solution

Machine code is simply binary data that corresponds to CPU instructions for a specific processor architecture.

I won't go into how it is stored too much, because that depends on where it is stored. On disk, for example, it is generally stored as a sequence of magnetized regions. Machine code is no different from other binary data in the storage aspect. If your question is more about how data is stored on a computer, you should research the various data-storage devices in a computer, like HDD, RAM, and registers, to name a few.

The easiest way to visualize how machine code is stored is to look at some in a hex editor. This shows you the binary data represented by hex numbers. For example, take the instruction:

0xEB 0xFE

This could easily be written 1110101111111110, or 60414. It depends how you want to convert binary into human-readable form.

This instruction represents an infinite loop. (This is assuming it is being run on an x86 CPU. Other CPU's could interpret it however they want.) It can be coded in assembly like this:

j:
jmp j

When you run the assembler, it takes the above code and turns it into the binary machine code above.

The instruction is really two parts. The first is what is known as the opcode, and is the 0xEB. When this code goes into the CPU, it means: Read a byte from the program, and skip that many bytes of data. Then the CPU reads the byte 0xFE. Since it expects a signed integer, it interprets the binary data as the number -2. The instruction is then done being read, and the instruction pointer moves forward 2 bytes. The instruction is then executed, causing the instruction pointer to move forward -2 (0xFE) bytes, which effectively sets the instruction pointer to the same value as it had when the instruction was started.

I hope this answers your question. If you are wondering about the internal workings of CPU's, read up on microcode and electronic logic gates. Basically, it's a bunch of voltage differences, such as a 1 bit being a 5 volt charge and a 0 bit being a 0 bit charge.

OTHER TIPS

Like me, you seem to be curious about how computers work under the hood. I don't know enough to answer your questions well (and it's a large topic anyway), but I highly recommend Steve Gibson's "Let's Design a Computer" podcast series. Here's an excerpt from the "Machine language" transcript, to give you a flavor of it. . .

And all skipping means is, instead of adding one to the program counter, we add two, or we add one twice, which is actually how these machines worked back then. And that just causes us to skip over a jump. So essentially that means we can branch to anywhere we want to in memory or continue on our way, which gives us, even though that's very simple, that gives us enough power to allow machines to make decisions. And we've got input/output; we've got math; we've got the ability to transfer data from one location in memory to another. Those are all the essentials of the way a machine functions. That is machine language.

Now, the one layer of humanity that's put on top of that is what's called "assembly language," which is nothing but naming things. For example, you create sort of a so-called mnemonic for the different instructions. So, for example, load the accumulator would be LDA. Store the accumulator, STA. You want them to be short because you're going to be typing them a lot. Remember that you end up using lots of little instructions in order to get something done. And then the only other thing really that assembly language does, it allows you to name locations in memory.

So, for example, you might say LDA, for load accumulator, current score. And current score would simply refer to a, like a variable essentially, a location in memory that you had labeled "current score." And then if you did STA, store accumulator, new score, well, it would first load the current score into the accumulator, and then store that into a different location called new score. So really that's all we're talking about is some simple abbreviations for helping sort of remember and use these individual instructions and convenient labels for locations in memory so that you're not having to remember, oh, that's in location 329627. I mean, who can do that? So instead you just, you label that location with an English, an alphanumeric phrase of some sort, and then you refer to that location by the phrase rather than by its actual number.

And in fact you don't care what the number is. That's one of the things that the assembler will do for you is you just say I need memory called these things. And it worries about where they go because it doesn't really matter to you as long as they're consistently referred to. And that's the whole process. That's machine language and assembly language. And that's the way it was 50 years ago, and more or less that's the way it is now.

. . but he backs up even further than this and starts with transistors and logic gates. From what I can tell, here's the complete series (and the listening audience has contributed helpful diagrams in the wiki):

Let's Design a Computer: Transistors, Logic Gates (Security Now 233, 42:00) http://www.grc.com/sn/sn-233.htm and http://wiki.twit.tv/wiki/Security_Now_233
Machine language (Security Now 235, 46:00) http://www.grc.com/sn/sn-235.htm and http://wiki.twit.tv/wiki/Security_Now_235
Listener feedback (Security Now 236, 51:43) http://www.grc.com/sn/sn-236.htm and http://wiki.twit.tv/wiki/Security_Now_236
Indirection: The Power of Pointers (Security Now 237, 27:45) http://www.grc.com/sn/sn-237.htm and http://wiki.twit.tv/wiki/Security_Now_237
Listener feedback (Security Now 238, 40:28) http://www.grc.com/sn/sn-238.htm and http://wiki.twit.tv/wiki/Security_Now_238
Stacks, Registers & Recursion (Security Now 239, 1:01:00) http://www.grc.com/sn/sn-239.htm and http://wiki.twit.tv/wiki/Security_Now_239
Listener feedback (Security Now 240 48:16, 1:01:21) http://www.grc.com/sn/sn-240.htm and http://wiki.twit.tv/wiki/Security_Now_240
Hardware Interrupts (Security Now 241, 42:25) http://www.grc.com/sn/sn-241.htm and http://wiki.twit.tv/wiki/Security_Now_241
Listener feedback (Security Now 242, 59:27) http://www.grc.com/sn/sn-242.htm and http://wiki.twit.tv/wiki/Security_Now_242
The Multiverse: multi-threading, multi-tasking, multi-processing, multi-core (Security Now 247, 47:15) http://www.grc.com/sn/sn-247.htm and http://wiki.twit.tv/wiki/Security_Now_247
Listener feedback (Security Now 249, 1:30:21) http://www.grc.com/sn/sn-249.htm and http://wiki.twit.tv/wiki/Security_Now_249
Operating Systems (Security Now 250, 43:22) http://www.grc.com/sn/sn-250.htm and http://wiki.twit.tv/wiki/Security_Now_250
Listener feedback (Security Now 251, 1:00:22) http://www.grc.com/sn/sn-251.htm and http://wiki.twit.tv/wiki/Security_Now_251
RISC vs CISC (Security Now 252, 53:31) http://www.grc.com/sn/sn-252.htm and http://wiki.twit.tv/wiki/Security_Now_252
Listener feedback (Security Now 253, 50:38) http://www.grc.com/sn/sn-253.htm and http://wiki.twit.tv/wiki/Security_Now_253
What we'll do for speed: pipelining, branch prediction (Security Now 254, 29:30) http://www.grc.com/sn/sn-254.htm and http://wiki.twit.tv/wiki/Security_Now_254

If anyone takes issue with anything Steve says in these episodes, the best places to provide feedback are http://www.grc.com/feedback.htm or http://www.grc.com/discussions.htm or https://twitter.com/SGgrc

Explained for beginners

From the ground up a computer has a lot of 'switches'. For example an LED light can be turned off
or on, there are only 2 options (1=on or 0=off). If you have 2 LED's you can turn LED 1 off and
2 on and vice versa, or you can turn them both on or off. There are now more possibilities.

You can calculate how many different possibilities there are.
1 lamp = 2^1 = 2 possibilities
2 lamps= 2^2 = 4 possibilities
8 lamps= 2^8 = 256 possibilities

So the computer reads only zero's and one's. A computer has a lot of switches depending on the CPU capacity. For telling the computer to activate a lamp you need to add 0's and 1's into the system and that would be a very hard task. To avoid this they converted the possibilities into a hexadecimal numbers. Assembly is just a computer language that will convert the letters you typed into 0 and 1's (binary code) and follow the instructions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow