How instructions are differentiated from data?

https://stackoverflow.com/questions/2022489

19-09-2019
|

Question

While reading ARM core document, I got this doubt. How does the CPU differentiate the read data from data bus, whether to execute it as an instruction or as a data that it can operate upon?

Refer to the excerpt from the document -

"Data enters the processor core through the Data bus. The data may be an instruction to execute or a data item."

Thanks in advance for enlightening me! /MS

Solution

Each opcode will consist of an instruction of N bytes, which then expects the subsequent M bytes to be data (memory pointers etc.). So the CPU uses each opcode to determine how manyof the following bytes are data.

Certainly for old processors (e.g. old 8-bit types such as 6502 and the like) there was no differentiation. You would normally point the program counter to the beginning of the program in memory and that would reference data from somewhere else in memory, but program/data were stored as simple 8-bit values. The processor itself couldn't differentiate between the two.

It was perfectly possible to point the program counter at what had deemed as data, and in fact I remember an old college tutorial where my professor did exactly that, and we had to point the mistake out to him. His response was "but that's data! It can't execute that! Can it?", at which point I populated our data with valid opcodes to prove that, indeed, it could.

OTHER TIPS

Simple answer - it doesn't. Machine code instructions are just binary numbers, as are data. More complicated answer - your processor may (or may not) provide segmentation of memory, meaning that attempting to execute what has been specified as data causes a trap of some sort. This is one of the the meaning of a "segmentation fault" - the processor tried to execute something that was not labelled as being executable code.

The original ARM design had a three-stage pipeline for executing instructions:

FETCH the instruction into the CPU
DECODE the instruction to configure the CPU for execution
EXECUTE the instruction.

The CPU's internal logic ensures that it knows whether it is fetching data in stage 1 (i.e. an instruction fetch), or in stage 3 (i.e. a data fetch due to a "load" instruction).

Modern ARM processors have a separate bus for fetching instructions (so the pipeline doesn't stall while fetching data), and a longer pipeline (to allow faster clock speeds), but the general idea is still the same.

Each read by the processor is known to be a data fetch or an instruction fetch. All processors old and new know their instruction fetches from data fetches. From the outside you may or may not be able to tell, usually not except for harvard architecture processors of course, which the ARM is not. I have been working with the mpcore (ARM11) lately and there are bits on the external interface that tell you a little about what kind of read it is, mostly to hook up an external cache, combine that with knowledge of if you have the mmu and L1 cache on and you can tell data from instruction, but that is the exception to the rule. From a memory bus perspective it is just data bits you dont know data from instruction, but the logic that initiated that memory cycle and is waiting for the result knew before it started the cycle what kind of fetch it was and what it is going to do with that data when it gets it.

I think its down to where the data is stored in the program and OS support for informing the CPU whether it is code or data.

All code is placed in different segment of the image (along with static data like constant character strings) compared to storage for variables. The OS (and memory management unit) need to know this because they can swap code out of memory by simply discarding it and reloading it from the original disk file (at least that's how Windows does it).

So, I think the CPU 'knows' whether memory is data or code. No doubt the modern pipeling CPUs we have now also have instructions to read this memory differently to assist the CPU is processing it as fast as possible (eg code may not be cached, data will always be accessed randomly rather than in a stream)

Its still possible to point your program counter at data, but the OS can tell the CPU to prevent this - see NX bit and Windows' "Data Execution Protection" settings (system control panel)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow