What is the minimum instruction set required for any Assembly language to be considered useful?

https://stackoverflow.com/questions/9439001

12-11-2019
|

Question

I am studying Assembly programming in general, so I've decided to try and implement a "virtual microprocessor" in software, which has registers, flags and RAM to work with, implemented with variables and arrays. But since I want to simulate only the most basic behavior of any microprocessor, I want to create an assembly language that has only the essential instructions, only those instructions without which it couldn't be useful. I mean, there are assembly languages that can do multiplication and swapping register values, etc, but these operations are not basic because you can implement them using simpler instructions. I don't want to implement instructions like those.

I can imagine a couple of instructions which (I believe) must always be present in any assembly language, such as MOV to move bytes around and JP to send the instruction pointer to another address.

Could you suggest a set of the most basic and essential assembly instructions? Thanks!

Solution

Well, this is a very broad subject. I suppose you need to get familiar with Random Access Machine. I'm not an expert, but it's difficult to tell which instructions should be supported by this very basic microprocessor. For example: Subtraction and multiplication may be simulated by Addition operation. Multiplication is possible if microprocessor supports jumps and conditional instructions and subtraction is possible by adding negative number.

OTHER TIPS

Control structures comprise the basic feature without which there is no language. This means that your language must provide arithmetic operations on two variables; and then allow a program to change the program counter -- that is, to branch -- based on the result of an operation. Very often, the crucial operation is SUB, for subtract one operand from another. And the conditions on which you would allow a branch are:

result is zero;
result is greater than zero;
result is less than zero.
no condition, i.e., branch unconditional

You also need instructions to move data around: LOAD and STORE, say.

These three conditions and their corresponding branches (or skips, which is another way to do it) are necessary for any program. Not only that, but just these three simple operations plus data-moving instructions, are sufficient to do anything in a program except I/O. If you wanted to, and given a cooperating memory organization, you could rewrite Linux using just LOAD, STORE, ADD, SUB, and the three conditional branches.

The PDP-8 was a much more powerful machine than this: it had a rich set of eight instructions, including I/O.

HTH

Surprisingly, there is such a thing as a one instruction set computer.

The least instruction set requires no instruction or maybe zero instruction. I don't know if they had come into real devices or not, but the one instruction set computer (OISC) has been implemented and run successfully in carbon nanotubes computers and MAXQ.

In fact x86 can also be used as an OISC architecture because it's possible to do anything with just a single mov because it has been proved to be Turing-complete. There's even a compiler named movfuscator to compile valid C code into a program with only MOVs (or only either of XOR, SUB, ADD, XADD, ADC, SBB, AND/OR, PUSH/POP, 1-bit shifts, or CMPXCHG/XCHG)

However IMO an architecture should be "fast" enough (or do not require too many instructions like OISC for a task compared to other architectures) to be considered useful.

The most basic instruction types for a computer are data movements, logic/arithmetic operations and branching. For arithmetic operations, just an add/subtract is enough. For logic, we can calculate any functions with only a NOR or NAND, so only one is needed. For jumping, we'll need one jump on "<=" or jump on "<" instruction. Data movements can be emulated by add/sub. Like that, we can use 2 bits to encode 3 opcodes (add, nand, jump on "<=") and leave one for future expansion. But since this has no separate load/store instructions, it must operate directly on large register file instead of memory, or the instructions must have the ability to use memory as operands.

If more speed is needed then some more logic, branching instructions and possibly load/store can be added, increasing the opcode space to 3 bits. The instruction set may be:

load
store
add
and
nor
jump on less than
jump on equal

Left shift can be done with add but right shift is trickier, so you may also want to add a right shift to ease up some common operations

You can survive perfectly well with a minimal instruction set consisting only of SOB: subtract one and branch. Entire programs can and have been written with this.

Look at commercial implementations

The best answer is likely to look at existing commercial implementations.

Anything that is not commercially sold, is likely to not be useful.

What is the definition of an instruction?

For example, I could make one instruction that implements the unzip algorithm, based on a hardware implementation of unzip, and this would of course be the most efficient machine possible for unzipping.

Would it be commercially appealing however? Unlikely, since that hardware would probably be too specialized to justify the development cost.

But there are much more nuanced cases than this extreme one, and the answer will likely vary with existing competitor technologies and market demand in time to make things even worse.

In the end, "efficient hardware" means:

take a set of benchmarks, assign one importance weight for each
write optimal software that solves those benchmarks

Possible reasons why very small Turing complete ISAs may be inefficient

the few instructions that they do have are very complex, and incur large costs every time they are called, e.g. you cannot do certain pipelining optimizaitons
the code density is very small which implies:
- performance may be bound by instruction fetching
- not good for embedded devices with small ROM memory

Notable OISC implementations

It would be interesting to analyze those to have a more concrete answer.

movfuscator

https://github.com/xoreaxeaxeax/movfuscator

Toy C compiler for x86 that uses just the mov x86 instructions, showing in a very concrete way that a single instruction suffices.

The Turing completeness seems to have been proven in a paper: https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

subleq

Educational OSIC, previously mentioned at https://stackoverflow.com/a/9439153/895245 but without the name:

https://github.com/hasithvm/subleq-verilog Verilog, Xilinx ISE.
https://github.com/purisc-group/purisc Verilog and VHDL, Altera. Maybe that project has a clang backend, but I can't use it: https://github.com/purisc-group/purisc/issues/5
http://mazonka.com/subleq/sqasm.cpp | http://mazonka.com/subleq/sqrun.cpp C++-based assembler and emulator.

Theoretically, a single instruction computer is possible. However on real hardware, you would need a minimum of 4. Assuming a memory only architecture (no user accessible registers).

MOV mem1 mem2 - copy the contents of memory location mem1 to memory location mem2 leaving mem1 unchanged

NAND mem1 mem2 to mem3- perform a bitwise logical NAND between the data at mem1 and mem2 and write the result to mem3

BITSHIFTR mem1 mem2 mem3- bitshift right the data at mem1 mem2 places and write the output to mem3

JMPcond mem1 mem2 - test the least significant bit at mem1 and if it is true(1) jump to mem2

Now it won't be super fast and it will eat memory bandwidth like crazy, but it can be used to implement a virtual machine with any arbitrary instruction set. Additionally there are certain programming constraints, like the need to program in all starting data, or the very least a memory location with only the LSB set to 1. hardware peripherals will have to use DMA for I/O access, and if implemented on a harvard architecture the program will not have direct access to things like pointers.

You may also want to look up Turing Completeness.

http://en.wikipedia.org/wiki/Turing_completeness

http://c2.com/cgi/wiki?TuringComplete

What is Turing Complete?

It means that a language is sufficient to compute anything that can be computed.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow