How to find the fixpoint of a loop and why do we need this? [closed]

Question

Conceptually, the fixpoint corresponds to the most information you obtain about the loop by repeatedly iterating it on some set of abstract values. I'm going to guess that by "static analysis" you're referring here to "data flow analysis" or the version of "abstract interpretation" that most closely follows data flow analysis: a simulation of program execution using abstractions of the possible program states at each point. (Model checking follows a dual intuition in that you're simulating program states using an abstraction of possible execution paths. Both are approximations of concrete program behavior. )

Given some knowledge about a program point, this "simulation" corresponds to the effect that we know a particular program construct must have on what we know. For example, at some point in a program, we may know that x could (a) be uninitialized, or else have its value from statements (b) x = 0 or (c) x = f(5), but after (d) x = 42, its value can only have come from (d). On the other hand, if we have

if ( foo() ) {
    x = 42;  // (d)
    bar();
} else {
    baz();
    x = x - 1; // (e)
}

then the value of x afterwards might have come from either of (d) or (e).

Now think about what can happen with a loop:

while ( x != 0 ) {
    if ( foo() ) {
        x = 42;  // (d)
        bar();
    } else {
        baz();
        x = x - 1; // (e)
    }
}

On entry, we have possible definitions of x from {a,b,c}. One pass through the loop means that the possible definitions are instead drawn from {d,e}. But what happens if foo() fails initially so that the loop does not run at all? What are the possibilities for x then? Well, in this case, the loop body has no effect, so the definitions of x would come from {a,b,c}. But if it ran, even once, then the answer is {d,e}. So what we know about x at the end of the loop is that the loop either ran or it didn't, which means that the assignment to x could be any one or {a,b,c,d,e}: the only safe answer here is the union of the property known at loop entry ({a,b,c}) and the property know at the end of one iteration ({d,e}).

But this also means that we must associate x with {a,b,c,d,e} at the beginning of the loop body, too, since we have no way of determining whether this is the first or the four thousandth time through the loop. So we have to consider again what we can have on loop exit: the union of the loop body's effect with the property assumed to hold on entry to the last iteration. Happily, this is just {a,b,c,d,e} ∪ {d,e} = {a,b,c,d,e}. In other words, we've not obtained any additional information through this second simulation of the loop body, and thus we can stop, since no further simulated iterations will change the result.

That's the fixpoint: the abstraction of the program state that will cause simulation to produce exactly the same result.

Now as for ways to find it, there are many, though the most straightforward ("chaotic iteration") simply runs the simulation of every program point (according to some fair strategy) until the answer doesn't change. A good starting point for learning better algorithms can be found in most any compilers textbook, though it isn't usually taught in a first course. Steven Muchnick's Advanced Compiler Design and Implementation is a more thorough and very readable treatment of the subject. If you can find a copy, Matthew Hecht's Flow Analysis of Computer Programs is another classic treatment. Both books focus on the "data flow analysis" technique for static analysis. You might also try out Principles of Program Analysis, by Nielson/Nielson/Hankin, though the technical details in the book can be pretty hairy. On the other hand, it offers a more general treatment of static analysis overall.