How to Simulate Control-Flow without using Control-Flow Primitives

https://softwareengineering.stackexchange.com/questions/373704

06-02-2021
|

Question

Basically, I want to know how to simulate while and if if I'm handling the control flow myself through an array of instructions.

The while loop can be simulated by if, as seen with assembly branching with je and such. But the question is, can a if statement be simulated somehow, perhaps by simulating a program counter or instruction pointer in a loop?

By simulated I mean replicated without using any control-flow primitives directly, other than the while (true) { ... } loop to loop through instructions.

For example:

var pointer = 0

var instructions = [
  doa,
  dob,
  dob2,
  dob3,
  doc,
  dod
]

var a = 1
var b = 2
var c = 3

while (true) {
  var instruction = instructions[pointer]
  instruction()
}

function doa() {
  a = 10
  pointer = 1
}

function dob() {
  doif(a == 10, dob2)
  doif(a != 10, dob3)
}

function dob2() {
  b = 20
  pointer = 4
}

function dob3() {
  b = 2
  pointer = 4
}

function doc() {
  c = 30
  pointer = 5
}

function dod() {
  console.log(a, b, c)
}

function doif(a, b) {
  // How to remove this:
  if (a) b()
}

At the bottom is a doif. Wondering if there is any way to change its implementation so it doesn't use the built-in if, as in if (a) b(). Somehow maybe it dynamically adds an instruction to the stack, and perhaps sets the instructions pointer, then pops it off somehow. But I don't see how to do that without resorting to if somewhere.

As mentioned already, I want to do it without using any control-flow primitives direct. So I know there is a way to simulate an if statement, like a && b(), but I am wanting to avoid that because it's just syntactic shortcut. Also using a while loop as in while (a) { b(); break } is same thing, and so is using the ternary operator a? b() : null;

Any thoughts would be appreciated. Thank you.

Solution

You're mixing two ways of control flow. On one hand, you're using the instructions array to pass control to the next instruction.

But on the other when you write:

function dob() {
  doif(a == 10, dob2)
  doif(a != 10, dob3)
}

Then you're using JavaScript's method calling to invoke doif.

So our first order of business is to fix this. Perhaps by doing something like

function dob() {
  // note that you need additional structure to support not just method names in instructions, but also arguments
  instructions.push({
    method: doif,
    args: [a == 10, dob2]
  })

  instructions.push({
    method: doif,
    args: [a != 10, dob3]
  })
}

But problem now is that you've hardcoded the pointer jumps in your instructions. This makes it tougher to manipulate the instructions array.

This is what you'd ideally want to do:

while (true) {
  var instruction = instructions[pointer] // get current instruction
  instruction.method.apply(instruction.args) // execute it
  pointer++ // increase program counter
}

You could also switch 2nd and 3rd instructions, then whatever you do with pointer in the execution of current instruction is what will be final. In the way I've written it, you'll just set pointer to one less than what you want to actually set it to, since it'll be incremented. This shouldn't make much of a difference

With this correct machinery in place, your doif becomes:

function doif(condition, methodAndArgs) {
  instructions.splice(pointer + 1, 0, methodAndArgs) // insert methodAndArgs after current instruction
  pointer += !condition // if false, add one which makes you jump over the next instruction
}

And dob actually becomes:

function dob() {
  instructions.splice(pointer + 1, 0, {
    method: doif,
    args: [a == 10, {method: dob2, args: []}]
  })

  instructions.splice(pointer + 1, 0, {
    method: doif,
    args: [a != 10, {method: dob3, args: []}]
  })
}

Here's how your full code would look like:

var pointer = 0

var instructions = [
  {method: doa, args: []},
  {method: dob, args: []},
  {method: dob2, args: []},
  {method: dob3, args: []},
  {method: doc, args: []},
  {method: dod, args: []}
]

var a = 1
var b = 2
var c = 3

while (true) {
  var instruction = instructions[pointer]
  instruction.method.apply(instruction.args)
  pointer++
}

function doa() {
  a = 10
}

function dob() {
  instructions.splice(pointer + 1, 0, {
    method: doif,
    args: [a == 10, {method: dob2, args: []}]
  })

  instructions.splice(pointer + 1, 0, {
    method: doif,
    args: [a != 10, {method: dob3, args: []}]
  })
}

function dob2() {
  b = 20
}

function dob3() {
  b = 2
}

function doc() {
  c = 30
}

function dod() {
  console.log(a, b, c)
}

function doif(condition, methodAndArgs) {
  instructions.splice(pointer + 1, 0, methodAndArgs)
  pointer += !condition // if false, add one which makes you jump over the next instruction
}

OTHER TIPS

To compile an if/else into some kind of opcodes, we need a conditional and unconditional jump instruction.

Let's assume this code:

if (cond()) {
  when_true();
} else {
  when_false();
}
afterwards();

This would be compiled into opcodes like this:

  ...       ; cond
  jz ELSE   ; if condition is false, jump to the label
  ...       ; when_true
  jmp END
ELSE:
  ...       ; when_false
END:
  ...       ; afterwards

Depending on the instruction set the jumps might have an absolute location or may be relative to the current instruction pointer position. In either case, the location is not yet known when the jump instruction is written. This can be solved with a two-pass approach:

In the first pass we emit the instructions. Jump targets are left empty. When we encounter a label, we store the location of the label in a dictionary.
In the second pass we visit all jump instructions. Now the locations of the labels are known, so we can rewrite the instruction to use the correct target location.

A good high-level example of this is in the bytecode generation in the CPython sourcecode.

When writing an interpreter, it is not necessary to keep instructions fixed and simple. We can in principle use an arbitrary object graph as opcodes. This means we could maintain a stack of opcodes. At each evaluation step we pop off one instruction and interpret it. Certain instructions can push new opcodes onto the stack. A native if/then/else could be such an instruction. It would then contain an array of instructions for both cases when the condition is true or false. Depending on the condition result it will push one or the other array on the instruction stack.

These kind of opcodes are however not very common:

The SECD machine is an abstract virtual machine intended for proofs over lambda calculus programs. It uses 4 stacks: data, environment (variables, necessary for closures), control (opcodes), and continuations (needed for certain control flows). In this encoding there are no jumps, instead a conditional decides which function to call. A function call replaces the control stack with the contents of the function.
The Perl virtual machine does use an opcode graph (so conditional opcodes point to other opcodes that will be executed in case of a true/false result). However, this opcode graph is not modified at runtime.
Concatenative languages can be interpreted as pushing operations onto a stack.
I like to use this pattern in interpreters because it's extremely easy to implement in high-level languages.

This is similar to splicing opcodes into the instructions array as suggested by Peeyush Kushwaha, but using a stack or walking a graph means that we don't have to shift existing elements in the instructions array. Note that relative or absolute jumps are meaningless if instructions are on a stack or the instruction array can be modified, so that you will have to keep opcode snippets within the opcode objects.

This is way out of the realm of real-world applications, but it's Friday ...

You can branch without using JMP if your processor architecture treats the program counter as any other general-purpose register. For example, on the PDP-11, you could create a table containing destination addresses, use a value in a register to pick one of those values, and store it into the program counter:

mov TABLE(R0), R7

That's a fairly standard way to implement a switch statement, but it could also be used for simple test-and-jump operations (although nobody would do that because the PDP-11 had a full complement of test-and-jumps).

Getting a little bit stranger is the "47" instruction, so named because its numeric value is 014747:

mov -(R7), -(R7)

This uses pre-decrement register indirect mode: the value of the register is decremented (by 1 or 2, depending on whether it's a byte or word operation) and then the address pointed to by that register is used as the operand.

As you might have guessed, R7 is just another name for the program counter. After reading an instruction, it would normally be incremented by 2. So the first predecrement undoes this, meaning that the original instruction is the source of the move. The second pre-decrement sets the program counter to the address before the original instruction, and is used as the destination of the move.

Which means the instruction copies itself to the next lower memory address, and then executes the instruction at that address. Effectively running the processor backwards, filling memory with 014747 until it falls off the world.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange