Question

I'm thinking to do some bytecode manipulation (think genetic programming) in Python.

I came across a test case in crashers test section of Python source tree that states:

Broken bytecode objects can easily crash the interpreter. This is not going to be fixed.

Thus the question, how to validate given tweaked byte code that it will not crash interpreter? Is it even possible?

Test source, after http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

cc = (lambda fc=(
    lambda n: [
        c for c in
            ().__class__.__bases__[0].__subclasses__()
            if c.__name__ == n
        ][0]
    ):
    fc("function")(
        fc("code")(
            0, 0, 0, 0, "KABOOM", (), (), (), "", "", 0, ""
        ), {}
    )()
)

Here, this module defines cc that, if called, mymod.cc() crashes interpreter. Granted this is a very tricky example that created new code object with custom bytecode "KABOOM" and then runs it.

I'd accept something that verifies predefined bytecode, e.g. from a .pyc file.

Was it helpful?

Solution 2

Both outdated, the first one without code (at least I can't find) but may be useful to give an idea of what/how can be done and what are the limitations.

perfectly valid bytecode can still do horrible things

OTHER TIPS

Using a byte code Assembler does the Stack tracking across jumps, globally verifying stack level prediction consistency and automatically rejecting attempts to generate dead code. It is virtually impossible to accidentally generate bytecode that can crash the interpreter.

This Link might help you.

Python might be not an ideal language for such tasks, for the reasons stated in the question.

One approach: Don't create or accept raw bytecode, accept only Python source code and compile it yourself.

Further, there exists libraries (RestrictedPython) which manipulate Python on AST level to have some security guarantees e.g. to prevent sandbox escaping.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top