Befunge is impossible to really AOT compile due to p
. In terms of JITs, it's a cakewalk compared to all those dynamic languages out there. I've worked on some fast implementations.
marsh
gains it's speed by being a threaded interpreter. In order to speed up instruction dispatch it has to create 4 copies of the interpreter, for each direction. I optimize bounds checking & lookup by storing the program in a 80x32 space instead of a 80x25 space
bejit
was my observation that the majority of program time is spent in moving around. bejit
records a trace as it interprets, & if the same location is ever hit in the same direction we jump to an internal bytecode format that the trace recorded. When p
performs a write on program source that we've traced, we drop all traces & return to the interpreter. In practice this executes stuff like mandel.bf
3x faster. It also opens up peephole optimization, where the tracer can apply constant propagation. This is especially useful in Befunge due to constants being built up out of multiple instructions
My python implementations compile the whole program before executing since Python's function's bytecode is immutable. This opens up possibility of whole program analysis
funge.py
traces befunge instructions into CPython bytecode. It has to keep an int at the top of the stack to track stack height since CPython doesn't handle stack underflow. I was originally hoping to create a generic python bytecode optimizer, but I ended up realizing that it'd be more efficient to optimize in an intermediate format which lacked jump offsets. Besides that the common advice that arrays are faster than linked lists doesn't apply in CPython as much since arrays are arrays of pointers & a linked list will just be spreading those pointers out. So I created funge2.py
(wfunge.py
is a port of funge.py in preparation for http://bugs.python.org/issue26647)
funge2.py
traces instructions into a control flow graph. Unfortunately we don't get to have the static stack adjustments that the JVM & CIL demand, so optimizations are a bit harder. funge2.py does constant folding, loop unrolling, some stack depth tracking to reduce stack depth checks, & I'm in the process of adding more (jump to jump optimizations, smarter stack depth juggling, not-jump or jump-pop or dup-jump combining)
By the time funge2 gets to optimizing Befunge, it's a pretty simple IR
- load const
- binop (+, -, *, /, %, >)
- not
- pop
- dup
- swap
- printint/printchar/printstr (the last for when constant folding makes these deterministic)
- getint/getchar
- readmem
- writemem
- jumprand
- jumpif
- exit
Which doesn't seem so hard to compile