Why is Befunge considered hard to compile?

Question 1

To translate a Befunge program to equivalent machine code, one can hard-code the Befunge code into the C interpreter, and compile the resulting C program to machine code.

Yes, sure. This can be applied to any interpreter, esoteric language or not, and under some definitions this can be called a compiler.

But that's not what is meant in "compilation" in the context of Befunge - and I'd argue that calling this a "compiler" is very much missing the point of compilation, which is to convert code in some (higher) language to semantically equivalent code in some other (lower) language. No such conversion is being done here.

Under this definition, Befunge is indeed a hard language to convert in such a way, since given an instruction it's hard to know - at compile time - what the next instruction will be.

Question 2

Befunge is impossible to really AOT compile due to p. In terms of JITs, it's a cakewalk compared to all those dynamic languages out there. I've worked on some fast implementations.

marsh gains it's speed by being a threaded interpreter. In order to speed up instruction dispatch it has to create 4 copies of the interpreter, for each direction. I optimize bounds checking & lookup by storing the program in a 80x32 space instead of a 80x25 space

bejit was my observation that the majority of program time is spent in moving around. bejit records a trace as it interprets, & if the same location is ever hit in the same direction we jump to an internal bytecode format that the trace recorded. When p performs a write on program source that we've traced, we drop all traces & return to the interpreter. In practice this executes stuff like mandel.bf 3x faster. It also opens up peephole optimization, where the tracer can apply constant propagation. This is especially useful in Befunge due to constants being built up out of multiple instructions

My python implementations compile the whole program before executing since Python's function's bytecode is immutable. This opens up possibility of whole program analysis

funge.py traces befunge instructions into CPython bytecode. It has to keep an int at the top of the stack to track stack height since CPython doesn't handle stack underflow. I was originally hoping to create a generic python bytecode optimizer, but I ended up realizing that it'd be more efficient to optimize in an intermediate format which lacked jump offsets. Besides that the common advice that arrays are faster than linked lists doesn't apply in CPython as much since arrays are arrays of pointers & a linked list will just be spreading those pointers out. So I created funge2.py

(wfunge.py is a port of funge.py in preparation for http://bugs.python.org/issue26647)

funge2.py traces instructions into a control flow graph. Unfortunately we don't get to have the static stack adjustments that the JVM & CIL demand, so optimizations are a bit harder. funge2.py does constant folding, loop unrolling, some stack depth tracking to reduce stack depth checks, & I'm in the process of adding more (jump to jump optimizations, smarter stack depth juggling, not-jump or jump-pop or dup-jump combining)

By the time funge2 gets to optimizing Befunge, it's a pretty simple IR

load const
binop (+, -, *, /, %, >)
not
pop
dup
swap
printint/printchar/printstr (the last for when constant folding makes these deterministic)
getint/getchar
readmem
writemem
jumprand
jumpif
exit

Which doesn't seem so hard to compile

Question 3

Befunge is hard to compile due to the p and g commands. With these you can put and get commands during runtime, i.e. write self-altering code.

There is no way you can translate that directly to assembly, let alone binary code.

If you embed a Befunge-program into the interpreter code and compile that, you are still compiling the interpreter, not the Befunge-program...