Question

How feasible would it be to compile Python (possibly via an intermediate C representation) into machine code?

Presumably it would need to link to a Python runtime library, and any parts of the Python standard library which were Python themselves would need to be compiled (and linked in) too.

Also, you would need to bundle the Python interpreter if you wanted to do dynamic evaluation of expressions, but perhaps a subset of Python that didn't allow this would still be useful.

Would it provide any speed and/or memory usage advantages? Presumably the startup time of the Python interpreter would be eliminated (although shared libraries would still need loading at startup).

Was it helpful?

Solution

Try ShedSkin Python-to-C++ compiler, but it is far from perfect. Also there is Psyco - Python JIT if only speedup is needed. But IMHO this is not worth the effort. For speed-critical parts of code best solution would be to write them as C/C++ extensions.

OTHER TIPS

As @Greg Hewgill says it, there are good reasons why this is not always possible. However, certain kinds of code (like very algorithmic code) can be turned into "real" machine code.

There are several options:

  • Use Psyco, which emits machine code dynamically. You should choose carefully which methods/functions to convert, though.
  • Use Cython, which is a Python-like language that is compiled into a Python C extension
  • Use PyPy, which has a translator from RPython (a restricted subset of Python that does not support some of the most "dynamic" features of Python) to C or LLVM.
    • PyPy is still highly experimental
    • not all extensions will be present

After that, you can use one of the existing packages (freeze, Py2exe, PyInstaller) to put everything into one binary.

All in all: there is no general answer for your question. If you have Python code that is performance-critical, try to use as much builtin functionality as possible (or ask a "How do I make my Python code faster" question). If that doesn't help, try to identify the code and port it to C (or Cython) and use the extension.

py2c ( http://code.google.com/p/py2c) can convert python code to c/c++ I am the solo developer of py2c.

Nuitka is a Python to C++ compiler that links against libpython. It appears to be a relatively new project. The author claims a speed improvement over CPython on the pystone benchmark.

PyPy is a project to reimplement Python in Python, using compilation to native code as one of the implementation strategies (others being a VM with JIT, using JVM, etc.). Their compiled C versions run slower than CPython on average but much faster for some programs.

Shedskin is an experimental Python-to-C++ compiler.

Pyrex is a language specially designed for writing Python extension modules. It's designed to bridge the gap between the nice, high-level, easy-to-use world of Python and the messy, low-level world of C.

Pyrex is a subset of the Python language that compiles to C, done by the guy that first built list comprehensions for Python. It was mainly developed for building wrappers but can be used in a more general context. Cython is a more actively maintained fork of pyrex.

This might seem reasonable at first glance, however there are a lot of ordinary things in Python that aren't directly mappable to to a C representation without carrying over a lot of the Python runtime support. For example, duck typing comes to mind. Many functions in Python that read input can take a file or file-like object, as long as it supports certain operations, eg. read() or readline(). If you think about what it would take to map this type of support to C, you begin to imagine exactly the sorts of things that the Python runtime system already does.

There are utilities such as py2exe that will bundle a Python program and runtime into a single executable (as far as possible).

Some extra references:

Jython has a compiler targeting JVM bytecode. The bytecode is fully dynamic, just like the Python language itself! Very cool. (Yes, as Greg Hewgill's answer alludes, the bytecode does use the Jython runtime, and so the Jython jar file must be distributed with your app.)

Psyco is a kind of just-in-time (JIT) compiler: dynamic compiler for Python, runs code 2-100 times faster, but it needs much memory.

In short: it run your existing Python software much faster, with no change in your source but it doesn't compile to object code the same way a C compiler would.

The answer is "Yes, it is possible". You could take Python code and attempt to compile it into the equivalent C code using the CPython API. In fact, there used to be a Python2C project that did just that, but I haven't heard about it in many years (back in the Python 1.5 days is when I last saw it.)

You could attempt to translate the Python code into native C as much as possible, and fall back to the CPython API when you need actual Python features. I've been toying with that idea myself the last month or two. It is, however, an awful lot of work, and an enormous amount of Python features are very hard to translate into C: nested functions, generators, anything but simple classes with simple methods, anything involving modifying module globals from outside the module, etc, etc.

This doesn't compile Python to machine code. But allows to create a shared library to call Python code.

If what you are looking for is an easy way to run Python code from C without relying on execp stuff. You could generate a shared library from python code wrapped with a few calls to Python embedding API. Well the application is a shared library, an .so that you can use in many other libraries/applications.

Here is a simple example which create a shared library, that you can link with a C program. The shared library executes Python code.

The python file that will be executed is pythoncalledfromc.py:

# -*- encoding:utf-8 -*-
# this file must be named "pythoncalledfrom.py"

def main(string):  # args must a string
    print "python is called from c"
    print "string sent by «c» code is:"
    print string
    print "end of «c» code input"
    return 0xc0c4  # return something

You can try it with python2 -c "import pythoncalledfromc; pythoncalledfromc.main('HELLO'). It will output:

python is called from c
string sent by «c» code is:
HELLO
end of «c» code input

The shared library will be defined by the following by callpython.h:

#ifndef CALL_PYTHON
#define CALL_PYTHON

void callpython_init(void);
int callpython(char ** arguments);
void callpython_finalize(void);

#endif

The associated callpython.c is:

// gcc `python2.7-config --ldflags` `python2.7-config --cflags` callpython.c -lpython2.7 -shared -fPIC -o callpython.so

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <python2.7/Python.h>

#include "callpython.h"

#define PYTHON_EXEC_STRING_LENGTH 52
#define PYTHON_EXEC_STRING "import pythoncalledfromc; pythoncalledfromc.main(\"%s\")"


void callpython_init(void) {
     Py_Initialize();
}

int callpython(char ** arguments) {
  int arguments_string_size = (int) strlen(*arguments);
  char * python_script_to_execute = malloc(arguments_string_size + PYTHON_EXEC_STRING_LENGTH);
  PyObject *__main__, *locals;
  PyObject * result = NULL;

  if (python_script_to_execute == NULL)
    return -1;

  __main__ = PyImport_AddModule("__main__");
  if (__main__ == NULL)
    return -1;

  locals = PyModule_GetDict(__main__);

  sprintf(python_script_to_execute, PYTHON_EXEC_STRING, *arguments);
  result = PyRun_String(python_script_to_execute, Py_file_input, locals, locals);
  if(result == NULL)
    return -1;
  return 0;
}

void callpython_finalize(void) {
  Py_Finalize();
}

You can compile it with the following command:

gcc `python2.7-config --ldflags` `python2.7-config --cflags` callpython.c -lpython2.7 -shared -fPIC -o callpython.so

Create a file named callpythonfromc.c that contains the following:

#include "callpython.h"

int main(void) {
  char * example = "HELLO";
  callpython_init();
  callpython(&example);
  callpython_finalize();
  return 0;
}

Compile it and run:

gcc callpythonfromc.c callpython.so -o callpythonfromc
PYTHONPATH=`pwd` LD_LIBRARY_PATH=`pwd` ./callpythonfromc

This is a very basic example. It can work, but depending on the library it might be still difficult to serialize C data structures to Python and from Python to C. Things can be automated somewhat...

Nuitka might be helpful.

Also there is numba but they both don't aim to do what you want exactly. Generating a C header from Python code is possible, but only if you specify the how to convert the Python types to C types or can infer that information. See python astroid for a Python ast analyzer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top