How to prove a function returns a value

https://softwareengineering.stackexchange.com/questions/371333

06-02-2021
|

Question

Say I have the following 2 JavaScript functions for demonstration purposes:

function colorize(integer) {
  if (integer === 1) return 'red'
  if (integer === 2) return 'blue'
  if (integer === 3) return 'yellow'
  // ...
  return 'green'
}

function max(a, b) {
  return Math.max(a, b)
}

The specifications are as follows:

colorize: The function returns only named CSS colors. It doesn't return hex values or anything.
max: Returns the max value given two integers. The integers have to be within the valid size of integers that JavaScript allows.

Now I would like to prove these properties. To prove that the specification matches the implementation.

First thing I consider is, will these functions ever throw an error, assuming they have proper inputs. Technically it could. The hardware could cause the browser to have a glitch perhaps, which causes the app to crash right at the moment it is in one of the functions. Or maybe it runs out of memory the moment it enters these functions. So theoretically it's possible to have an error in these functions, even though practically there would never be an error (given proper inputs).

The next thing is, how to go about proving the implementation matches the specification. Let's start with colorize. It needs to be shown that it returns only valid css named colors. Then I start thinking, well should it also prove that it doesn't return anything else. That is, test for some random out-of-bounds values like 'foo' or 10. This makes me start to get confused as to where the line is between exhaustive testing vs. proving. I can only imagine how to test that the implementation matches the specification. I don't see how to prove it matches it.

For max it is the same thing.

This leads to the main question. How to "prove" colorize and max match their specifications (as opposed to "test"). At a high level, what is necessary to write something like a unit test but instead a proof. Maybe there needs to be some changes to implementation somewhere. It's easy to write unit tests for this, I would just have a list of CSS colors and check each one against some integers. But I would like to see how to do proofs for this sort of example. For max I might check the "boundary" values (0, max integer size positive/negative, and a few medium values), and some edge cases like when they were equal. But these approaches aren't formal/proof-driven, they are just based on finding cases we can imagine errors occurring in and writing unit tests for them.

Solution

What you're trying to do is called Formal Verification, and is generally impossible in JS. If, however, you restrict yourself to a subset of the language, it becomes tenable. The usefulness of such an analysis is debatable.

It is certainly used for other languages in areas where safety and correctness is of utmost concern, and development speed can take a back seat (read: medical and space industries foremost). This is a developing field and the methods used are constantly being refined. Some of them are mentioned in the Wiki link.

Since you asked for an example even if this means restricting yourself to a subset, I'll give you a very simple one. You could expand it to use more lax definitions and add more language features to get to a more usable one, but that takes more time than I am ready to commit here. Do note, that I'm not actively working in this field and writing from memory of several courses in university that I took some 6-8 years ago.

Say you restrict your language to only have 1 datatype - 32 bit 2-complement int and only have these operations on it: +, -, *, /. You have local variables (initialized to 0, unless otherwise specified), that you can assign to via =. You also have an if statement that can have <, >, = in it, that is followed by an assignment or return, and return to return an int. You do not have any objects, threads, globals or whatevers. For the sake of simplicity, you also do not have method calls (otherwise you'll have to describe how that's done as well). Syntax is vaguely C-like.

Before you run the analyzer, I'll assume that the program actually compiles - no sense doing this otherwise.

Now, you have the method:

0    int max(int a, int b) {
1        if (a > b) return a;
2        if (b > a) return b;
3        return a;
4    }

And you want to check that it returns the maximum of the two ints. So, the post conditions you want to check are (these are usually written in a different language, since this is the input to the checker, not compiler):

res >= a
res >= b
function always returns

Now, the last condition is kinda silly. Since it compiles, the compiler already checked that every method termination ends with a return. And, since there are no loop, jump or call semantics in the language (except return), every operation advances control. But in a slightly more complex language this would be an important and difficult thing to check.

So, if you go the route of Model Checking, you just build a state machine that represents every single state that the program can be in. That's 2^32 * 2^32 * 4 (2 int vars and 4 lines to be on). Obviously that's too many states to hold, so we abstract them by the values the vars can take at the time.

You then start from the end and gather the restrictions on your inputs that got you there.

For ep3 - End Point 3, you return a. So you set res = a. You then step back once, finding all call sites to this return (in the case of this language it's always just 1 step back).

You find that you can only get to this return if the previous if was false. Hence, now you have a restriction - NOT b > a. You then step back another time and get the restriction NOT a > b. You've now reached the beginning of your method and you have a result: a and restrictions on input: NOT b > a AND NOT a > b.

Then you use formal logic rules defined on int that describe all various restrictions that you can have on variables to show that the restriction you have implies the conditions you want.

Rinse and repeat for the other 2 exit points.

Now you've proved that if the method returns, it returns the max value. The 3rd condition that I've skipped proved that it always returns. Hence the program is correct in the semantics given.

A less silly example would be:

0    int abs(int a) {
1        if (a < 0) return -1 * a;
2        return a;
3    }

With the post-condition of:

res >= 0
res == a OR res == -a

ep2 is very simple - you get res == a AND NOT a < 0, which clearly satisfies our conditions.

ep1 is tricky, since there are several operations there. The final operation is a * -1. This is typically done via a dedicated circuit, but we are essentially interested in everything that it can do to a number expressed in logical rules. For summation this would be something like:

a, b -> (res = a + b AND a + b <= MAX_INT AND a + b >= MIN_INT) 
    OR (res = a + b - INT_RANGE AND a + b >= MAX_INT) 
    OR (res = a + b + INT_RANGE AND a + b <= MIN_INT)

I'll note again, that since the logic specification is written in a different language, it uses a datatype that is guaranteed to not overflow on these ops.

You can imagine the same thing done for multiplication, explaining the exact pre- and post-conditions of multiplication for overflow.

So now we return to the method at hand and get the following result of the post-conditions:

a < 0 AND (
    (res = -a AND -a <= MAX_INT) //No overflow
    OR (res = -a mod INT_RANGE AND -a > MAX_INT) //positive overflow
    OR (res = -a mod INT_RANGE AND -a < MIN_INT) //negative overflow
    )

Since this is written in a language with guaranteed no overflows, you can use rules like a < 0 => -a >= 0. Do exactly that and you get:

(res = -a AND -a >= 0 AND -a <= MAX_INT)
OR (res = -a mod INT_RANGE AND -a >= 0 AND -a > MAX_INT)
OR (res = -a mod INT_RANGE AND -a >= 0 AND -a < MIN_INT)

You then eliminate the last expression - it's impossible. You also eliminate the first expression - the post-conditions obviously follow from it.

You're left with the possibility of res = -a mod INT_RANGE AND -a > MAX_INT. The last part of this morphs into a < -MAX_INT.

The analyzer will then, most likely, say - it can't prove correctness when input is below -MAX_INT. This is the general case of analyzers - most of the time they will not tell you there's a mistake, but rather say they can't prove there isn't one.

You can see how this is very tedious, and adding even simple OPs is difficult - you have to completely describe its operation and every possible outcome in your logic of choice. Adding language features will usually give an exponential explosion in complexity and allowing a fully Turing-complete language will make the general case impossible.

OTHER TIPS

You don't.

You're testing on a real world environment, and you're trying to prove underlying assumptions you have no control on, namely hardware and javascript implementation.

Let's say for absurd that some sort of proof exists that your code never fails on every single modern piece of hardware and js engine. I could just write another engine where every call to Math.max or every === comparison horribly crashes everything, and your "proof" would no longer be valid.

That's why developers only write tests for their own code, and assume the underlying framework/engine/hardware is flawless. If they find any bug with major engine/hardware implementations during later testing stages, they'll probably add a test for that specific case and call it a day. Testing someone else's code at a way lower level is simply not feasible in a real world scenario.

""Proof is for mathematical theorems and alcoholic beverages. It's not for science."" - Michael Mann (I think).

Proof is also not for computer programs either. The nearest you can get to such a "proof" is to follow the lead of science here and statistically demonstrate a strong correlation between your code and the specified output.

One such way of doing this is via tools such as QuickCheck for testing Haskell programes. They work on the basis of generating large amounts of random test data based on a formal specification of the function. If the test passes for all test cases, it gives degrees of confidence that the properties of the code meet the spec.

There appears to be a QuickCheck-inspired JavaScript tool, JSVerify, so this might be worth you checking out.

For the general case, in a Turing Complete language (as JavaScript), it has been proved impossible to write an algorithm that can prove, for every possible function, that it returns (and that's considering only the code, without the hardware malfunction you've mentioned).

Pragmatically, you can attain do some properties of your function and make an algorithm that proves that those functions return under certain criteria, for exampe:

Restrict the conditions to which your proof applies (e.g.: The computer works and the input is positive);
Separate the valid input range into smaller valid input subranges;
Prove that each valid input subrange follows a code path that leads to a result.

With this method, both functions that you provide can be proved to return.

It is quite normal to make hard assumptions. Most programs suppose, for example, that memory is infinite during production, and then, after profile, publish something like "this program requires at least 2Gb RAM"

Also, when you say

(...) To prove that the specification matches the implementation (...)

And then

(...)test for some random out-of-bounds values (...)

It seens to me that you're contradicting yourself, because some random out-of-bounds value is not in the specification.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange