Question

Suppose you have to write a program that will test all programs in search of one that completes a specific task. For example, consider this JavaScript function:

function find_truth(){
    for(n=0;;++n){
        try {
            var fn = Function(string(n));
            if (fn() == 42)
                return fn;
        } catch() {
            continue;
        }
    }
}

As long as string(n) returns the nth string possible ("a", "b", "c", ... "aa", "ab" ...), this program would eventually output a function that evaluated to 42. The problem with this method is that it is enumerating strings that could or couldn't be a valid program. My question is: is it possible to enumerate programs themselves? How?

Was it helpful?

Solution

Yes, there are ways that this is possible. A few months ago I made a little experimental project to do something like it, which works reasonably well considering what it's doing. You give it a type and some tests to pass, and it searches for a suitable function that passes the tests. I never put in the work to make it mature, but you can see that it does in fact eventually generate functions that pass the tests in the case of the examples. Requiring the type was an essential component for the practicality of this search -- it drastically reduced the number of programs that needed to be tried.

It is important to have a firm grasp of the theory here before making assertions about what is and is not possible -- there are many who will jump to the halting problem and tell you that it's impossible, when the truth is quite a bit more nuanced than that.

  • You can easily generate all syntactically valid programs in a given language. Naively, you can generate all strings and filter out the ones that parse/typecheck; but there are better ways.
  • If the language is not turing complete -- e.g. the simply-typed lambda calculus, or even something very powerful like Agda, you can enumerate all programs that generate 42, and given any program you can decide "this generates 42" or "this does not generate 42". (The language I used in my experimental project is in this class). The tension here is that in any such language, there will be some computable functions which you cannot write.
  • Even if the language is turing complete, you can still enumerate all programs which eventually generate 42 (do this by running them all in parallel, and reporting success as they finish).

What you cannot do to a turing complete language is decide that a program will definitely not ever generate 42 -- it could run forever trying, and you won't be able to tell whether it will eventually succeed until it does. There are some programs which you can tell will infinite loop, though, just not all of them. So if you have a table of programs, you might expect your enumerator program in the case of a non-turing complete language to be like this:

Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6  |  P7  | ...
42?     | No   | No   | No   | Yes  | No   | No   | No   | ...

Whereas in a turing complete language, your output (at a given time) would be like this:

Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6    |  P7  | ...
42?     | No   | No   | Loop | Yes  | No   | Dunno  | No   | ...

And later that Dunno might turn into a Yes or a No, or it might stay dunno forever -- you just dunno.

This is all very theoretical, and actually generating programs in a turing complete language to search for ones that do a certain thing is pretty hard and takes a long time. Certainly you don't want to do it one by one, because the space is so big; you probably want to use a genetic search algorithm or something.

Another subtle point in the theory: talking about programs which "generate 42" is missing some important detail. Usually when we talk about generating programs, we want it to generate a certain function, not just output something specific. And this is when things you might want to do get more impossible. If you have an infinite space of possible inputs -- say, inputting a single number, then

  • You can't in general tell whether a program computes a given function. You can't check every input manually because infinity is too many to check. You can't search for proofs that your function does the right thing, because for any computable function f, for any axiom system A, there are programs that compute f such that A has no proof that they compute f.
  • You can't tell whether a program is going to give any kind of answer for every possible input. It might work for the first 400,000,000 of them but infinite loop on the 400,000,001st.
  • Similarly, you can't tell whether two programs compute the same function (i.e. same inputs -> same outputs).

There you have it, a daily dose of decidability theory phenomenology.

If you are interested in doing it for real, look into genetic algorithms, and put timeouts on the functions you try and/or use a decidable (non-turing-complete) language.

OTHER TIPS

It is certainly possible to enumerate all programs in a given language that are syntactically valid (and in a statically typed language even only those that type check): You could simply enumerate all strings like you proposed, try to feed each one into a parser for the language and then reject those that can't be parsed. So if that's your definition of a valid program, then yes, it's possible.

It is however not true that your program would necessarily output a function that returns 42 eventually - even if you replace string with a function that only returns syntactically valid programs. If a returned function contains an infinite loop, it will run forever and thus your program will never get to try another function. Thus you might never reach a function that returns 42.

To avoid this issue, you might say that the string(n) function should only produce code that is syntactically valid and does not contain an infinite loop (while still being able to return all such functions). That, however, is not possible because that would entail deciding the Halting Problem (which is, of course, undecidable).

As has been noted, you can trivially turn a "generate all strings" program into a "generate all valid programs in language X" by plugging in a parser/compiler for language X. Generally if you can write a program which takes text as input and returns true/false indicating whether the text represents a valid program, then you can use it as a filter on the "generate all strings" program.

You could also design a programming language in which every string of characters is a valid program (perl springs to mind).

Probably more interesting is that given a formal grammar for a language, you could use that to generate programs in the language instead of parsing them. You just need to do a breadth-first traversal of the grammar to be sure that every finite-length program will be reached in some finite time (if you do a depth-first traversal you'll get struck exploring all programs consisting solely of a variable whose name is an ever-longer sequence of 'A's, or something).

Most grammars actually used in parsing programming languages wouldn't work directly for this though; they're usually slightly over-permissive. For example, a grammar may define identifiers as anything matching a regex [_A-Za-z][0-9_A-Za-z]*, which allows variable names of unbounded length, but many language implementations will actually choke on programs with variable names that are gigabytes long. But you could in principle find out about all of these kind of gotchas and write an enumerable grammar that exactly covers all of the valid programs in some language of interest.


So that lets you enumerate all programs. That's not actually sufficient to let you run your find_truth program and find a program that returns 42 though, because it'll get stuck infinitely evaluating the first program that happens to contain an infinite loop.

But it's still actually possible to do this! You just need to pick an order in which to examine all the possibilities so that everything is eventually reached in some finite time. You've got two infinite "dimensions" to search in; the enumeration of all the programs, and the depth of evaluation of each program. You can make sure you cover all bases by doing something like the following strategy:

  1. Run all programs of length up to 1 for up to 1 step
  2. Run all programs of length up to 2 for up to 2 steps
  3. Run all programs of length up to 3 for up to 3 steps
  4. ...

And so on. This guarantees that whatever the length of the program and number of "steps" needed, you'll eventually hit them without needing to have done an infinite amount of work "first" (so long as a program with your desired result actually exists).

If you've got unbounded storage available you can avoid repeating work between these phases (you store all programs of length N that haven't finished in N steps, along with their state, and then in the next round you run the new programs up to N+1 steps, and run all the "pending" programs for one more step each). The definition of "step" doesn't matter much, so long as it doesn't admit infinite loops. Some finite number of bytecodes, or CPU instructions, or even seconds; you'd need a suspendable implementation of the language, of course.

Assuming I am correctly interpreting your phrase "is it possible to enumerate programs themselves?" Then Yes you can enumerate programs. This is essentially a Genetic Programming problem. See :

http://en.wikipedia.org/wiki/Genetic_programming

Here Programs themselves are created, run, and evaluated (with a resulting fitness value, where the optimum value = 42), just as with Genetic Algorithms, so yes this would provide your enumeration. Furthermore over several generations you could have it evolve your program to produce 42.

It is impossible. The problem is that some programs may take a long time to finish computing. And some programs may be stuck in an infinite loop. Ideally you would like to abort running those programs that are stuck in infinite loops, and keep only the long running programs. You could implement a timer, but what if you had a program that ran longer than the timer, but still would return the correct answer?

In general, the only way to see if a computer program will terminate is to run it and see, with the risk of entering an infinite loop. Of course, you could implement some heuristics to recognize common forms of infinite loops, but in general it is impossible. This is known as the halting problem.

EDIT:

I realize that I only partially answer your question. You ask if it is possible to enumerate programs themselves. This is certainly possible. You already have a way of generating all possible strings in order. Then you could just see which strings parse correctly as a javascript program, and just keep those.

I would suggest starting from the grammar of javascript, for example of ANTLR.

https://github.com/antlr/grammars-v4/blob/master/javascript/javascript/JavaScriptParser.g4

The grammar defines a directed graph similar to this one:

grammar Exp;

/* This is the entry point of our parser. */
eval
    :    additionExp
    ;

/* Addition and subtraction have the lowest precedence. */
additionExp
    :    multiplyExp 
         ( '+' multiplyExp 
         | '-' multiplyExp
         )* 
    ;

/* Multiplication and division have a higher precedence. */
multiplyExp
    :    atomExp
         ( '*' atomExp 
         | '/' atomExp
         )* 
    ;

Using this structure you can create a program that creates all gramatically valid programs of different depths 1, 2, 3, 4, ... and runs these in an emulator. These would be syntactically valid programs although many would return errors (think division by zero or accessing an undefined variable). Also some you would not be able to prove if they finish or not. But generating as many gramatically correct programs is possible with a properly defined grammar like the one provided by ANTLR.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top