Ahead-of-time compilation to native machine code of dynamically typed languages

https://softwareengineering.stackexchange.com/questions/402329

05-03-2021
|

Pergunta

It is my understanding that:

It's particularly difficult to compile ahead of time, to efficient native machine code, a dynamically typed language like Python.
Largely as a result of the above, the implementation of these languages is often interpreted (usually with JITC). In other words, largely as a result of point 1, the implementation of these languages often do not AoT compile to machine code, and instead AoT compile to bytecode (and then do the best they can, e.g. JITC/ cache etc at runtime).

Is the above correct? If so, why? (e.g. for Python). I suspect complexity theory, and more specifically the halting problem itself (?), explains this challenge when it comes to name binding / dispatching. But I'd like to know if this has been formally studied, for either any of these languages in particular, or for a larger class of languages.

Perhaps also to help with the above, what are some notable examples of dynamically-typed languages with "practical" (efficient) AoT-compilers to native machine code?

Related questions

Is dynamic language always interpreted?

Solução

Actually, it is easy to compile python to native machine code, in fact the Cython compiler can do it.

However, the Cython compiler converts code like:

A = B + C

Into something like the following C code:

PyObject * A = PyNumber_Add(B, C);

PyNumber_Add is a generic function which will eventually invoke a function pointer to call the actual implementation of add for the types of B and C. It is, as you might expect, much slower then code that would simply add the numbers directly.

So the question is not whether we can generate machine code for a dynamically typed language, that's easy. The question is whether we can generate efficient machine code. In order to generate efficient code, we need to figure out the types of the variables so that we can generate fast code specific to that type instead of generic code that works for any type.

The halting problem does show that we cannot always figure out the types. Consider the following example:

def crazy():
   if really_hard_math_problem():
      return 42
   else:
      return "hello!"

In order to correctly determine the type of this function, the compiler would have to figure out the solution to the really hard math problem. This means that in order to always get the correct answer, the compiler would have to be able to solve all possible math problems.

But there is something goofy about this. You don't, or at least shouldn't write code like this. Realistic programs have saner types and don't depend on unsolved or unsolvable problems. So perhaps it would be possible to resolve all the types in a realistic program. The halting problem is not actual a practical issue here.

The Julia language is a dynamically typed language that uses extensive type inference to generate efficient code. It is able to resolve the types of most variables by tracing through your code.

Outras dicas

We are going to be taking chunks of source code, translating them to machine code (directly or indirectly) and executing them. If we do it all in one pass, it is ahead of time execution. If we do it in chunks interleaved with execution, it is just in time.

What do you call interpreted? My guess is that you mean compiling just time, except you forget the chunks you already translated, instead of reusing them and composing them into bigger chunks.

It's particularly difficult to compile ahead of time, to native machine code, a dynamically typed language like Python.

Yes. Not impossible.

Largely as a result of the above, the implementation of these languages is often interpreted.

Yes. Well, depends on what you call interpreted. However, just in time, yes.

Why?

Let us break it down...

As you already know, you need to know the target machine. Which means that you either compile your code for multiple target machines before deployment, or you do it on deployment (the installer does the ahead of time compilation).

Now, Let me talk to you about type erasure. The true type erasure, not the Java brand, however I will not go there.

If the compiler knows all the types ahead of time, that is during static analysis, which is easy with static types...

Then it knows exactly how each routine/method/function will be called, and thus it knows the exact memory layout and how to perform operations on it. Thus, it can emit optimized native code for that, and there would be no need to do type checks in runtime.

Thanks to that, the compiler can write native code that does not need type information. Which means that type information can be erased from the runtime. Hence type erasure.

You can see where I am going, however there are a couple stations I want to visit on the way. First stop: generics.

When we have a generic type, we have type parameters that might or might not be known ahead of time. For an application, the compiler can look at every place the type is used, and emit optimized code for each variant. That also means that compiling libraries with open generic types ahead of time will only be possible along side the final application.

By the way, to use libraries in any shape or form, they must have some level of type information anyway. Thus, not full type erasure.

Second stop: reflection and emission.

Let us say we want to create a new type on runtime. Well, the information of that type would only be available on runtime in particular if it depends on external input.

Furthermore, if you have something like monkey patching (where a type can change in runtime) or "eval" it is all worse.

Destination: dynamic types.

If we do not know the types of variables and parameters, we cannot emit optimized code for them. It is possible to do static analysis on a dynamically typed language and get type information. This is a huge task, because the source does not mark types... the static analysis would have to follow the values in the code... and you know what? It is easier to execute the code and see where the values go (which would be just in time compilation). In essence the compiler would have to do that, except following every possible branch.

Just in time compilation still emits native machine code, except it does it on demand.

When a routine/method/function is being called, the JIT will see if it has already been compiled for the particular combination of types it takes, if so, it can execute that machine code. Otherwise, now that we know the types, it can be compiled and executed. There are also tiered JIT systems that emit native code quickly the first time (reducing start up time), and then on the background work on more optimized code to replace it with.

Alright, there is a twist: what if we settle for sub-optimal code? Above I was saying optimized code. However, we could handle dynamic types on machine code with sub-optimal code. In particular, we can use a variant type for primitive types, and something like an Entity-Component-System for compound types.

It would be full native code from a dynamically typed language. However, it will might have worse performance than JIT.

No, this does not solve generics, it solves dynamic types.

So, yes dynamically typed languages are hard to compile to native code ahead of time. However, they are often compiled just in time.

I also want to mention that there are solutions that will compile ahead of time what they can, and use JIT for everything else.

I suspect complexity theory (and more specifically the halting problem itself when it comes to the problem name binding / dispatching) is really to be blamed here

I suppose there is an analogy in trying to figure out type information, and that it is sometimes undecidable.

However, it is not really the halting problem. Most compilers are happy to produce machine code that will never halt (just try an infinite loop in your favorite language). They do not care. They do not try to proof the program will halt.

There are languages that do proof automation, they will try to check if the program will halt, and give you one of three: a) we know it will halt. b) we are not sure, proceed on your own risk. c) we know it will not halt. Those languages are not dynamically typed.

But I'd like to know if this has been formally studied, for either any of these languages in particular, or a larger class of languages.

Skip.

Perhaps also to help with the above, what are some notable examples of dynamically-typed languages with AoT-compilers to native machine code?

Since you ask for Python, I had a look. There seem to be a few options: Nukita, mypyc, Numba. I do not know much of them.

Any language that can run on .NET or Java, for which there already are ahead of time compilation solutions, could be compiled ahead of time. Those are a lot of languages. And there is already ahead of time compilation for a few. First to come to mind is Ruby. Yes, it has “eval” and monkey patch, I know. It will not be possible for every program to be compiled ahead of time, because sometimes the program will depend on user input.

That is the thing. Not every valid program in a dynamically typed language can be compiled ahead of time. If a program creates or modifies types on runtime, there is no way to have that type information ahead of time. The same goes for any library with open generic types. Not because language complexity or the halting problem, but because of the space time continuum. We haven't figured out time travel to get the type information of types that are yet to exist.

Yet, as I was saying above, you could have sub-optimal code. Have you seen .NET Dynamic Language Runtime? .NET is statically typed, with reflection. Using the .NET Dynamic Language Runtime (which is full .NET code), it can have objects statically typed as dynamic. This was developed to port Python and Ruby to .NET, which was done successfully, and was followed by other languages (note that these ports are behind the main version of those languages).

By the way, you can do this: make it so there are only native types, function and object. No new types can be created. No inheritance or anything. Instead, you can attach properties to objects on runtime, a method is a function attached to an object, and have cloning to create multiple similar ones. Implement them in runtime as dictionaries, maps, or similar. Wait a minute, that sounds similar to JavaScript…

Ahead-of-time compilation of JavaScript programs. Yup. Ahead of time, on deploy, of course.

Yeah, the real problem for ahead of time compilation is not the dynamic types part, the problem are code emission, open generics and similar forms of metaprogramming.

It's particularly difficult to compile ahead of time, to efficient native machine code, a dynamically typed language like Python.

This is mostly true, but only because of the "efficient" part, and only because of the way current mainstream CPUs and OSs are designed.

It is not hard to ahead-of-time compile Python. In fact, the CPython implementation does that without you even noticing. It just doesn't compile it to "native" machine code. (Although technically, it is native machine code for the CPython virtual machine. After all, when talking about "native machine code", you always have to specify what machine you are talking about!)

It is also not hard to ahead-of-time compile Python to native machine code. After all, what does an interpreter do? It walks the abstract syntax tree, and for every expression in there, it executes a little bit of code that implements that expression. It could, in theory, instead write that little bit of code to a file. It would be horribly inefficient code, since for the non-leaf nodes, you would essentially have to generate the code for every possible type of the constituent expressions because (unlike the interpreter) you don't know what they are yet.

As mentioned in the other answers, you can always fall back to generating native machine code that simply calls the same runtime functions that the interpreter would also call. You might say that is cheating, but even C typically does not work without a runtime library. (If you've ever tried to write free-standing C, you know how hard that is.)

It's also not hard to ahead-of-time compile Python to efficient native machine code, BUT you have to use a machine that is designed for efficiently executing Python! Modern mainstream OSs and CPUs are designed for efficiently executing C-like languages. They contain features like virtual memory management that are explicitly there for C and make Python slower. (In particular, virtual memory hurts garbage collection performance.)

Now, all of this does not mean that it is impossible! There are high-performance AOT compiled implementations of Common Lisp (SBCL, CMUCL, ClozureCL) and Scheme (Ikarus, Gambit, Chicken, Chez), for example.

Also, Objective-C is a dynamic language, and somehow nobody is surprised that it can be compiled to efficient native machine code.

Largely as a result of the above, the implementation of these languages is often interpreted.

That, however, is not true.

High-performance implementations of dynamic languages typically are compiled, just not ahead-of-time. They are just-in-time compiled.

This allows the compiler access to all runtime information, and even statistical information that an ahead-of-time compiler would never have. Problems like the Halting Problem or Rice's Theorem only apply to statically analyzing programs without running them, they do not apply to simply watching what the running program does.

I wrote about different execution and compilation models in this answer of mine:

Understanding the differences: traditional interpreter, JIT compiler, JIT interpreter and AOT compiler

There are a couple of questions on Stack Overflow, that may also be of interest to you:

........ No.

All languages are interpreted.

The appropriate questions are:

Who is interpreting
How they are interpreting

The Who can be:

Doped Silicon
Water Buckets
Another Program
- A Compiler
- An Interpreter
- An Analyser

The How is one of:

Subsitutional
Denotational
Big Step
Small Step

Matt has a good write up on the various Hows.

To address the example directly:

Python can be compiled to a machine language relatively easily.

It is already compiled to a Byte code, which is the machine code for a Python Machine.

A Python machine has a well-defined set of instructions.

Those instructions could be built in silicon.

To address the proposition directly:

It has nothing to do with complexity of the source language. It is simply a trade-off decision. Is it worth compiling to machine code?

Why do you need to compile to machine code and deal with the vagaries of the poorly specified and diverse set of Machine languages, when an interpreter can be trivially ported to new machines, and offers sufficient speed?

Really go and take a look at x86, 64K, PowerPC, or even ARM. Which extensions will you support? How about the OS that sits on top of these? Some OS's have several ABIs for the same hardware. And different OS will manage the resources differently.

An interpreter if well written can rely on the host language to hide all of these little decisions. A really well written interpreter will have a platform independence layer that allows it to work over a myriad of platforms. And most modern machines have lots of memory, lots of processing power, and users have low performance requirements (1 sec ~= 1000000000 instructions these days. Twenty years ago 1 sec ~= 100000).

Given the choices most languages opt for an interpreter first. Then they spend the effort in making that interpreter as fast as possible.

But a language that will last the test of time will need to target machine code eventually. Otherwise its own fortunes rise and fall with the language hosting the interpreter.

Addressing complexity:

Python is a difficult language to compile because parts of the language are in fact meta-programs. Lisp is a much better example of this.

Meta-Programs by essence treat data as code, and code as data - allowing new code to be written, and old code to be edited as the program runs.

The problem with writing machine code meta-programs is that the hardware, and the OS are not developed to easily support them (or even treat them as equal citizens).

OS often enforce read-only constraints on executable memory as a security feature.
Anti-virus programs will treat Meta-Programs as viruses - or at the very least actively slow them down to analyse their consequences.
Hardware was developed to optimise execution speed, which poses limitations on dynamically writing code.

Even with these challenges in place it is possible to write a machine code program to emit, or inact the instructions required to execute any particular meta-program in Python (or any other language). The problem is, this is called a Specialised JIT/Interpreter (or a Partially Evaluated JIT/Interpreter).

Given that:

Any Python program might contain several hundred such Meta-Program.
That each of these Meta-Programs may have significant overlap.
That these Machine Coded Meta-Programs are not trivially portable to different hardware.

There will be cases where compiling to machine code does not offer many benefits and perhaps even offers more disadvantages.

With this in mind the next optimisation to perform, essentially reassembles the interpreter from these partially evaluated meta-programs turning those meta-programs back into a form of byte-code. And seen as you already have much (if not most) of the interpreter why not just place the whole interpreter there and offer runtime evaluation of strings?

So a fully compiled to machine code program comes right back where we started with an optimised JIT interpreter.

So where do you get off this merry-go-round? Where it makes most sense for your particular program/programming language.

Some examples:

Java -> GNU Compiler for Java.
.net -> Mono
Python -> many partial compilers eg: PyPy, and the Python Machine is itself technically a JIT interpreter "compiling" as needed.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange