Root finding with companion matrix

https://stackoverflow.com//questions/20033494

21-12-2019
|

Question

I would like to find all the real roots of a univariate polynomial. I could use Jenkins-Traub algorithm for example, but I want to learn how to solve it using the companion matrix.

I know how to turn a polynomial to companion matrix and I found a script which does QR decomposition: http://quantstart.com/articles/QR-Decomposition-with-Python-and-NumPy

And this is where I'm lost: what to do next? I think I have to calculate multiple decompositions but when I do, I always get the same result (obviously). I also read that it might be useful to first convert the companion matrix to Hessenberg form - but how? Then there are "shifts" - what are they?

I also found http://www.nr.com/webnotes/nr3web17.pdf but as I don't understand any of it I would like to know if there is any simpler method (even if slower or less stable).

In other words: reading http://en.wikipedia.org/wiki/QR_algorithm

"let A be a real matrix of which we want to compute the eigenvalues"
ok, that's my companion matrix, right?
"we compute the QR decomposition Ak=QkRk"
that would be Q, R = householder(A) from the very first link, right?
"We then form Ak+1 = RkQk"
easy, just multiply R and Q
"Under certain conditions,[2] the matrices Ak converge to a triangular matrix, the Schur form of A. The eigenvalues of a triangular matrix are listed on the diagonal, and the eigenvalue problem is solved."
...wait, what? I tried:
```
for i in range(100):
    Q, R = householder(A)
    A = mult_matrix(R, Q)
```

but there doesn't seem to be any progress made and I cannot see any numbers even close to the correct roots.

Please, can anyone explain this to me?

PS: I don't want to blindly use LAPACK or similar as I want to understand how it works, at least in very simplified terms.

PPS: There is also http://adorio-research.org/wordpress/?p=184 (not sure how is it different from the first method, though...)

Solution

If your companion matrix is the coefficient transformation matrix of the linear polynomial operation that maps q(x) to x*q(x) mod p(x)

where p(x)=x^(n+1)+p_n*x^n+...+p_1*x+p_0.

Explicitely, A has the shape of

0 0 0 ... 0 -p_0
1 0 0 ... 0 -p_1
0 1 0 ... 0 -p_2
. . . ... . . . 
0 0 0 ... 1 -p_n

which already is in Hessenberg form. Since this form is preserved during the QR algorithm, you may use the QR decomposition with Givens rotations where only rotations close to the diagonal occur.

In the unshifted QR algorithm you should observe a noticeable development at least in the lower right 3x3 block.

If you take one of the eigenvalues of the lower right 2x2 block and subtract it from every diagonal element, then you get the shifted QR algorithms as per Francis. The shift s, the number subtracted, is the current best estimate of the smallest eigenvalue. Note that quite probably, you now left the real domain and have to compute with complex matrix entries. You have to keep the shift s in memory and add any new shift in subsequent steps to it, and add the combined shift back to any eigenvalue found.

A matrix split occurs whenever any of the subdiagonal entries is practically zero. If the split occurs in the last row, then the last diagonal entry is an eigenvalue of the shifted matrix (A-s*I). If the split separates the last 2x2 block, then one can easily determine its eigenvalues which again are eigenvalues of the shifted matrix.

If the split happens anywhere else, the QR algorithm is recursively applied to the diagonal blocks separately.

Addendum I

Convergence of all variants of the algorithm is measured by the entries below the diagonal converging to zero.

The basic algorithm has geometric convergence in those sub-diagonal entries, the geometric factor of the entry at (i,i-1) is the fraction of the sizes of the eigenvalues at positions i-1 and i. A zero will be reached fast wherever there is a large jump.

Conjugate complex eigenvalues have the same size, so the algorithm will produce a 2x2 block on the diagonal, solve the quadratic characteristic equation of that block to get the corresponding eigenvalues.

Larger diagonal blocks will happen for multiple or clustered eigenvalues.

Addendum II

It is the line

   alpha = -cmp(x[0],0) * norm(x)

in the householder procedure. In most situations, x[0] will not be exactly 0, so this produces a sign. However, in the case of the companion matrix, x[0]==0 by construction, so no sign is produced, alpha gets the wrong value of 0. Change it to the more pedestrian

    alpha = norm(x)
    if x[0] < 0: alpha = -alpha

and it works well.

def companion_matrix(p):
    n=len(p)-1
    C=[[float(i+1 == j) for i in xrange(n-1)] for j in xrange(n)]
    for j in range(n): C[j].append(-p[n-j]/p[0])
    return C


def QR_roots(p):
    n=len(p)-1
    A=companion_matrix(p)
    for k in range(10+n):
        print "step: ",k+1," after ",(k+1)*(5+n), "iterations"
        for j in range(5+n):
            Q,R=householder(A)
            A=mult_matrix(R,Q)
        print("below diagonal")
        pprint([ A[i+1][i] for i in range(n-1) ])
        print("diagonal")
        pprint([ A[i][i] for i in range(n) ])
        print("above diagonal")
        pprint([ A[i][i+1] for i in range(n-1) ])


p=[ 1, 2, 5, 3, 6, 8, 6, 4, 3, 2, 7]
QR_roots(p)

#for a case with multiple roots at 1 and 4
#p= [1,-11,43,-73,56,-16]
#QR_roots(p)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow