Solving an overdetermined constraint system

https://stackoverflow.com/questions/7118274

02-01-2021
|

Question

I have n real number variables (don't know, don't really care), let's call them X[n]. I also have m >> n relationships between them let's call them R[m], of the form:

X[i] = alpha*X[j], alpha is a nonzero positive real number, i and j are distinct but the (i, j) pair is not necessarily unique (i.e. there can be two relationships between the same variables with a different alpha factor)

What I'm trying to do is find a set of alpha parameters that solve the overdetermined system in some least squares sense. The ideal solution would be to minimize the squared sum of differences between each equation parameter and it's chosen value, but I'm satisfied with the following approximation:

If I turn the m equations into an overdetermined system of n unknowns, any pseudo-inverse based numeric solver will give me the obvious solution (all zeroes). So what I currently do is add another equation into the mix, x[0] = 1 (actually any constant will do) and solve the generated system in the least squares sense using the Moore-Penrose pseudo-inverse. While this tries to minimize the sum of (x[0] - 1)^2 and the square sum of x[i] - alpha*x[j], I find it a good and numerically stable approximation to my problem. Here is an example:

a = 1
a = 2*b
b = 3*c
a = 5*c

in Octave:

A = [
  1  0  0;
  1 -2  0;
  0  1 -3;
  1  0 -5;
]

B = [1; 0; 0; 0]

C = pinv(A) * B or better yet:
C = pinv(A)(:,1)

Which yields the values for a, b, c: [0.99383; 0.51235; 0.19136] Which gives me the following (reasonable) relationships:

a = 1.9398*b
b = 2.6774*c
a = 5.1935*c

So right now I need to implement this in C / C++ / Java, and I have the following questions:

Is there a faster method to solve my problem, or am I on the right track with generating the overdetermined system and computing the pseudo-inverse?

My current solution requires a singular value decomposition and three matrix multiplications, which is a little much considering m can be 5000 or even 10000. Are there faster ways to compute the pseudo-inverse (actually, I only need the first column of it, not the entire matrix given that B is zero except for the first row) given the sparsity of the matrix (each row contains exactly two non-zero values, one of which is always one and the other is always negative)

What math libraries would you suggest to use for this? Is LAPACK ok?

I'm also open to any other suggestions, provided that they are numerically stable and asymptotically fast (let's say k*n^2, where k can be large).

Solution

The SVD approach is numerically very stable but not very fast. If you use SVD, then LAPACK is a good library to use. If it's just a one-off computation, then it's probably fast enough.

If you need a substantially faster algorithm, you might have to sacrifice stability. One possibility would be to use the QR factorization. You'll have to read up on this to see the details, but part of the reasoning goes as follows. If AP = QR (where P is a permutation matrix, Q is an orthogonal matrix, and R is a triangular matrix) is the economy QR-decomposition of A, then the equation AX = B becomes Q R P^{-1} X = B and the solution is X = P R^{-1} Q^T B. The following Octave code illustrates this using the same A and B as in your code.

[Q,R,P] = qr(A,0)
C(P) = R \ (Q' * B)

The nice thing about this is that you can exploit the sparsity of A by doing a sparse QR decomposition. There is some explanation in the Octave help for the qr function but it did not work for me immediately.

Even faster (but also even less stable) is to use the normal equations: If A X = B then A^T A X = A^T B. The matrix A^T A is a square matrix of (hopefully) full rank, so you can use any solver for linear equations. Octave code:

C = (A' * A) \ (A' * B)

Again, sparsity can be exploited in this approach. There are many methods and libraries for solving sparse linear systems; a popular one seems to be UMFPACK.

Added later: I don't know enough about this field to quantify. Whole books have been written on this. Perhaps QR is about a factor 3 or 5 faster SVD and normal equations twice as fast again. The effect on the numerical stability depends on your matrix A. Sparse algorithms can be much faster (say a factor of m), but their computational cost and numerical stability depend very much on the problem, in ways that are sometimes not well understood.

In your use case, my recommendation would be to try computing the solution with the SVD, see how long it takes, and if that is acceptable then just use that (I guess it would be about a minute for n=1000 and m=10000). If you want to study it further, try also QR and normal equations and see how much faster they are and how accurate; if they give approximately the same solution as SVD then you can be pretty confident they are accurate enough for your purposes. Only if these are all too slow and you are willing to sink some time into it, look at sparse algorithms.

OTHER TIPS

Your problem is ill-posed. If you treat the problem as a function of n variables, (The least square of the difference), the function has exactly ONE global minimum hyperplane.

That global minimum will always contain zero unless you fix one of the variables to be nonzero, or reduce the function domain in some other way.

If what you want is a paramaterization of the solution hyperplane, you can get that from the Moore-Penrose Pseudo-Inverse http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse and check the section on obtaining all solutions.

(Please note i've used the word "hyperplane" in a technically incorrect manner. I mean a "flat" unbounded subset of your parameter space... A line, a plane, something that can be paramaterized by one or more vectors. For some reason i can't find the general noun for such objects)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow