Question

Along the lines of How to Simulate Control-Flow without using Control-Flow Primitives, I am wondering how to simulate return from a function.

Given an example setup like this:

console.log(a(10))

function a(a) {
  var x = b(a)
  var y = c(a)
  return x + y
}

function b(a) {
  return a * 2
}

function c(a) {
  return a + 1
}

How to reimplement those functions without using function or return. Something it seems like using a stack somehow like below. I am trying to learn how to think about assembly. I am not sure how the return from the function should work, the nested aspects of the functions, or calling the functions, when it is implemented with an iterative stack in a while loop like below.

var stack = []
var pointer = 0
var a = 10

while (true && pointer < 3) {
  switch (pointer) {
    case 0:
      // var x = b(a)
      // var y = c(a)
      // stack.push(x + y)
      pointer++
      break
    case 1:
      stack.push(a * 2)
      pointer++
      break
    case 2:
      stack.push(a + 1)
      pointer++
      break
  }
}

console.log(stack.pop())
Was it helpful?

Solution

If you are writing an interpreter for an machine code instruction set architecture, you provide for and maintain (1) the CPU's registers, which includes the Program Counter (aka Instruction Pointer), and (2) some main memory.

Then the interpreter loop fetches the instruction referenced by the Program Counter.  Virtually all instructions will modify some CPU register, in particular: the Program Counter.

Arithmetic instructions will modify memory, registers, and/or the flags register; while also explicitly incrementing the Program Counter to refer to the sequentially next instruction — thus, the subsequent iteration of the interpreter loop will naturally execute the interpreted program's instructions sequentially.

Unconditional branch instructions will modify the Program Counter in a more direct way.  Conditional branch instructions will test some condition and then either increment the PC as with arithmetic instructions (on false condition: fall thru) or modify the PC directly as with unconditional branches (on true).

The main interpreter loop simply proceeds to execute the next instruction as referenced by the Program Counter.  Thus, if the (interpreted) program has a loop, the interpreter loop will revisit instructions.

Call & return are simulated in an interpreter by simply carrying out the side effects of the machine code call & return instructions.  For example, a call instruction might (1) push the return address onto the stack (a write to memory at the stack pointer along with a modification of the stack pointer), (2) transfer control as with an unconditional branch (e.g. change the Program Counter).  The return instruction does the reverse: pop the return address off the stack (read memory at stack top while also adjusting the stack pointer), followed by placing that value into the Program Counter.

Despite calls and/or returns in the interpreted program, the interpreter loop continues Ad-nauseam, being relatively unaware of these flow of control changes, simply faithfully executing the next instruction as referenced by its Program Counter.  By way of pushing & popping the caller & callee execute with a different area of the stack, and thus coordinate in some sense, not to trash each other, while requiring little of the hardware (or interpreter).

As with real hardware, an interpreter need only maintain a single copy of the CPU registers.  The caller and callee are written (by a compiler, perhaps) to follow the Calling Convention of the machine architecture.  This convention allows the caller and callee to share CPU registers, and keep out of each others' way.


ok, let's postulate a very simple machine code where all instrutions are 16-bits and the first byte the opcode, second is an operand:

// CPU State
int pc;
int regs[8];

// Main memory
unsigned char *mem;

// Interpreter inf. loop
while (true) {
    int opcode = mem[pc++];
    int operand = mem[pc++];
    // note: at this point, pc points to sequentially next instruction

    // decode the various opcodes
    switch (opcode) {
    case 0:
    ...
    case 100: // let's say this is an unconditional branch instruction
        pc = operand;
        break;
    case 101: // let's say this is a conditional branch (here testing reg#0)
        int condition = regs[0] == 0;
        if (condition)
           pc = operand;
        // recall that the pc has already been incremented for fall thru
        break;
    ...
    case 200: // let's say this is a machine code call instruction
        regs[7]--;     // let's say reg#7 is the stack pointer
        mem[regs[7]] = pc; // note this pc is 1 instruction further along than the call itself, hence this is the return address
        pc = operand;
        break;
    case 201: // let's say this is a machine code return instruction
        pc = mem[++regs[7]];
        break;            
    case 255:
    ...
    }
}

You might want to have a look at assembly for a function definition, and also for a call to a function.

The function will have some prologue (and some epilogue) as it sets up (tears down) the machine state for for the function to execute given that both the caller and callee share the CPU with its fixed register resources (this is done according to the calling convention).

The epilogue code will contain the machine language version of "return" as the last instruction.  The machine code "return" operation transfers flow of control from callee to caller by merely altering the Program Counter.  It does not accomplish returning a value so return x+y; in a higher level language requires more code: code for computing x+y and then returning that to the caller, in addition to the epilogue (for the calling convention) and the machine code "return" that performs the final change in the flow of control.

The call will also consist of several instructions, the first of which set up the arguments for the callee, and the last instruction of the call sequence (modulo branch delay slots on some architectures) is the machine code "call" instruction, which captures the return address for later use (e.g. pushing it onto the runtime stack) and otherwise accomplishes an unconditional branch. So, a call, as in f(a+1,b-1) will first compute a+1 and b-1, then pass those as parameters according to the calling convention, and finally transfer flow of control to f, by way of a machine code "call" instruction.

OTHER TIPS

Seems like, algebraically, you can substitute.

Starting with...

function a(a) {
 var x = b(a)
 var y = c(a)
 return x + y
}

If b(a) = a * 2 then we can substitute a * 2 everywhere we see b(a):

function a(a) {
 var x = a * 2
 var y = c(a)
 return x + y
}

And if c(a) = a + 1 we can substitute again:

function a(a) {
 var x = a * 2
 var y = a + 1
 return x + y
}

Which is the same as

function a(a) {
 return (a * 2) + (a + 1)
}

Simplifying:

function a(a) {
 return (a * 3) + 1
}

Or just:

whatever = (a * 3) + 1
Licensed under: CC-BY-SA with attribution
scroll top