How can an implementation of a language in the same language be faster than the language?

https://stackoverflow.com/questions/9379560

28-10-2019
|

Question

If I make a JVM in Java, for example, is it possible to make the implementation I made actually faster than the original implementation I used to build this implementation, even though my implementation is built on top of the original implementation and may even be dependant on that implementation?

( Confusing... )

Look at PyPy. It's a JIT Compiler for Python made in Python. That's alright, but how can it claim to be faster than the original implementation of Python which it is using and is dependent on?

Solution

You are confused between a language and the execution apparatus for that language.

One of the reasons why PyPy can be faster than CPython is because PyPy is compiled to a completely separate native executable, and does not depend on, nor execute in, CPython.

Nevertheless, it would be possible for an inefficient implementation of one language to be surpassed by an interpreter written in that same language, and hosted in the inefficient interpreter, if the higher-level interpreter made use of more efficient execution strategies.

OTHER TIPS

Absolutely, it is possible. Your JVM implementation could compile Java bytecodes to optimized machine code. If your optimizer was more sophisticated that that in the JVM implementation which you run your Java compiler on, then the end result could be faster.

In that case, you could run your Java compiler on its own source code, and benefit from faster compilation speeds from then on.

You said that PyPy is a JIT compiler for Python (I'm not familiar with it myself). If that's the case, then it converts a Python program to machine code, and then runs the machine code. Another poster said that the PyPy compiler runs as a standalone executable, separate from CPython. But even if it was to run on CPython, once your program is JIT'd to machine code, and the compiled machine code is running, the performance of the compiler no longer matters. The speed of the compiler only has an effect on startup time.

PyPy isn't Python interpreter implemented in Python, it's Python interpreter and compiler implemented in RPython, which is a restricted statically typed subset of Python:

RPython is a restricted subset of Python that is amenable to static analysis. Although there are additions to the language and some things might surprisingly work, this is a rough list of restrictions that should be considered. Note that there are tons of special cased restrictions that you’ll encounter as you go.

The real speed difference comes from the fact, that unlike CPython which is interpreting whole program as bytecode, PyPy uses just-in-time (JIT) compilation (into machine code) for RPython parts.

I don't think it's possible to implement an interpreter for a language in that language (call this A), then run it on top of another existing interpreter (call this B) for that language and execute a program (call this P), and have P running on (A running on B) be faster than P running on B.

Every operation of A is going to have to be implemented with at least one operation of B. So even if B is atrociously bad and A, is optimally good, the fact that A is being run on B means that B's badness will slow down A.

It could be possible to implement an interpreter + JIT compiler for a language in the language itself, where the JIT compiler produces some other faster code at runtime, and have P running on (A running on B) be faster than P running on B. The part of P's runtime that isn't JIT compiled will be slower (much slower, normally) but if the JIT compiler successfully identifies the "hot" parts of P and executes them more quickly than B would then the whole system might run faster overall.

But that's not really interesting. It's also possible to implement a compiler for a language in that language (C), compile it with an existing compiler (D), and have the new compiler language produce code that is faster than what the original compiler would have produced. I hope that doesn't startle you; it should be clear that the speed of the code emitted by D will only have an effect on the execution time of C, not on the execution time of other programs compiled with C.

Writing compilers in the languages they compile has been done for decades (GCC is written in C, for example), and isn't really relevant to the real question I think you're asking; neither is JIT-compiling a language using itself. In both cases the underlying execution is something other than the language you're considering; usually machine code.

However, the source of your question is a misconception. PyPy's Python interpreter isn't actually implemented in Python. The PyPy project has an interpreter for Python written in RPython. RPython is a subset of Python, chosen so that it can be efficiently compiled down to machine code; as a language RPython is much more like Java with type inference and indented blocks instead of braces. The PyPy project also has a compiler for RPython which is written in Python, and is capable of (mostly) automatically adding a JIT compiler to any interpreter it compiles.

When you're actually using the PyPy interpreter in production, you're using a machine-code interpreter compiled from the RPython sources, just as when you're using the CPython interpreter you use a machine-code interpreter compiled from C source code. If you execute the PyPy interpreter on top of another Python interpreter (which you can do because valid RPython code is also valid Python code; but not the other way around), then it runs hugely slower than the CPython interpreter.

The pypy translation process runs on CPython, but the output is a list of .c files (19 files last time I checked) which are then compiled to a binary : pypy-c. At runtime pypy-c does not have any relation with CPython, that's why it can be faster.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow