Why can the KMP failure function be computed in O(n) time?

Question 1

This line: if (s[i] == s[j]) ++j; is executed at most O(n) times. It caused increase in the value of p[i]. Note that p[i] starts at same value as p[i-1].

Now this line: j = pi[j-1]; causes decrease of p[i] by at least one. And since it was increased at most O(n) times (we count also increases and decreases on previous values), it cannot be decreased more than O(n) times. So it as also executed at most O(n) times.

Thus the whole time complexity is O(n).

Question 2

There's already two answers here that are correct, but I often think a fully laid out proof can make things clearer. You said you wanted an answer for a 9-year-old, but I don't think it's feasible (I think it's easy to be fooled into thinking it's true without actually having any intuition for why it's true). Maybe working through this answer will help.

First off, the outer loop runs n times clearly because i is not modified within the loop. The only code within the loop that could run more than once is the block

while (j > 0 && s[i] != s[j])
{   
    j = pi[j-1]
}

So how many times can that run? Well notice that every time that condition is satisfied we decrease the value of j which, at this point, is at most pi[i-1]. If it hits 0 then the while loop is done. To see why this is important, we first prove a lemma (you're a very smart 9-year-old):

pi[i] <= i

This is done by induction. pi[0] <= 0 since it's set once in the initialization of pi and never touched again. Then inductively we let 0 < k < n and assume the claim holds for 0 <= a < k. Consider the value of p[k]. It's set precisely once in the line pi[i] = j. Well how big can j be? It's initialized to pi[k-1] <= k-1 by induction. In the while block it then may be updated to pi[j-1] <= j-1 < pi[k-1]. By another mini-induction you can see that j will never increase past pi[k-1]. Hence after the while loop we still have j <= k-1. Finally it might be incremented once so we have j <= k and so pi[k] = j <= k (which is what we needed to prove to finish our induction).

Now returning back to the original point, we ask "how many times can we decrease the value of j"? Well with our lemma we can now see that every iteration of the while loop will monotonically decrease the value of j. In particular we have:

pi[j-1] <= j-1 < j

So how many times can this run? At most pi[i-1] times. The astute reader might think "you've proven nothing! We have pi[i-1] <= i-1 but it's inside the while loop so it's still O(n^2)!". The slightly more astute reader notices this extra fact:

However many times we run j = pi[j-1] we then decrease the value of pi[i] which shortens the next iteration of the loop!

For example, let's say j = pi[i-1] = 10. But after ~6 iterations of the while loop we have j = 3 and let's say it gets incremented by 1 in the s[i] == s[j] line so j = 4 = pi[i]. Well then at the next iteration of the outer loop we start with j = 4... so we can only execute the while at most 4 times.

The final piece of the puzzle is that ++j runs at most once per loop. So it's not like we can have something like this in our pi vector:

0 1 2 3 4 5 1 6 1 7 1 8 1 9 1
           ^   ^   ^   ^   ^
Those spots might mean multiple iterations of the while loop if this 
could happen

To make this actually formal you might establish the invariants described above and then use induction to show that the total number of times that while loop is run, summed with pi[i] is at most i. From that, it follows that the total number of times the while loop is run is O(n) which means that the entire outer loop has complexity:

O(n)     // from the rest of the outer loop excluding the while loop
+ O(n)   // from the while loop
=> O(n)

Question 3

Let's start with the fact the outer loop executes n times, where n is the length of the pattern we seek. The inner loop decreases the value of j by at least 1, since pi[j] < j. The loop terminates at the latest when j == -1, therefore it can decrease the value of j at most as often as it has been increased previously by j++ (the outer loop). Since j++ is executed in the outer loop exactly n times, the overall number of executions of the inner while loop is limited to n. The preprocessing algorithm therefore requires O(n) steps.

If you care, consider this simpler implementation of the preprocessing stage:

/* ff stands for 'failure function': */
void kmp_table(const char *needle, int *ff, size_t nff)
{
    int pos = 2, cnd = 0;

    if (nff > 1){
        ff[0] = -1;
        ff[1] = 0;
    } else {
        ff[0] = -1;
    }

    while (pos < nff) {
        if (needle[pos - 1] == needle[cnd]) {
            ff[pos++] = ++cnd;
        } else if (cnd > 0) {
            cnd = ff[cnd]; /* This is O(1) for the reasons above. */
        } else {
            ff[pos++] = 0;
        }
    }
}

from which it is painfully obvious the failure function is O(n), where n is the length of the pattern sought.