Question

(The following question concerns the OCaml language and has examples in OCaml, but the question is very general and probably a correct answer for any other computer language will solve my problem too. So, just assume this question in your favourite language.)

I want to write a function which takes an arbitrary program in OCaml as a string and decides if the program is correct or incorrect and, in the latter case, whether or not I can make it into a correct one by concatenating appropriate characters at the end.

I'm assuming there is a compiler of the language somewhere and that I can apply it and get a reply saying either "Compiles" or "Doesn't compile -- error at line X, character Y" (as is the case with most languages anyway). In summary, I'd like to have a function which takes a program and returns:

  • Correct -- if the string contains a correct program;
  • Erroneous -- if the string contains an incorrect program which, no matter how you concatenate characters to it, will never become Correct;
  • Incomplete -- if the string contains an incorrect program which is not Erroneous.

For example, the OCaml program let x = f is incorrect because f has not been defined by the time it is used. And it can't be continued, because whatever you write after f will always be some identifier which hasn't been defined before. The program let x = is also incorrect; but if we extend to let x = 5 then we have a perfectly valid program. So, my function should return Erroneous in the first case and Incomplete in the second.

Things may become tricky if we have the program

let ans = 5
let x = a

because my function would have to see that if I continue the program with ns then the program becomes correct.

My question is: do you think it is possible to write such function/algorithm? If so, what's the general idea? If not, try to convince me that it's not.

(I'll be happy with any insights or partial answers, e.g. something which implies Incomplete. For example, I believe that if the language compiler says there is an error at line 3 and the program has 100 lines, then there is no possible continuation of the program.)

Was it helpful?

Solution

In your first example, let x = f, what if I add un y -> y ?

I think what you want is possible, but no with the current tools. If you were only interested in syntactic correctness, the basic idea would be to run the parser/lexer, return "erroneous" if it raises an error, and "incomplete" if it hasn't returned a complete AST, but no error (so it is still waiting for more input).

Note : there is still a small mismatch as the lexer will return a token just before EOF, which could have been continued. You would need not to consider that token as a complete token and do some finer reasoning at that point. More generally, the extremum of your input will require specialized reasoning that I don't cover here.

The properties that makes it easy at the lexing/parsing stage is that the lexer is demand-driven by the parser -- it only reads as far as necessary for the parser to reason about the token stream -- and the parser is "strict", or fails early, instead of asking for more information at a failure site.

The latter major phase of a program correctness are the identifier resolution (what does this variable name refer to ?) and the type system -- there are other criterions, such as checking the arity of constructors and type names, but they're not very interesting wrt. you problem. Those are generally not written in a demand-driven style, or do not more generally attempt to reason about partial information. It should be possible to redesign them in this way, but this would probably require a lot of effort.

A good direction to go would be "incremental parsers". A lot of people have recognized the need for incrementality in program tools related to an editor (where programs are incrementally written). They tackle the more general problem of updating the abstract information after a concrete change in the source code; not only a change "added at the end", but more general kind of changes. Their tools could probably solve your problem too.

Edit: Ah, I finally found what I was looking for. You should have a look at Oleg's differentiating parsers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top