Your X^TX matrix should have a Cholesky decomposition. I'd look into this decomposition before LU. It is faster: http://en.wikipedia.org/wiki/Cholesky_decomposition
Linear Regression - What algorithm to use to solve least squares method - inverse or LU or ...?
-
01-07-2022 - |
Вопрос
I am working on algorithm to perform linear regression for one or more independent variables.
that is: (if I have m
real world values and in the case of two independent variables a
and b
)
C + D*a1 + E* b1 = y1
C + D*a2 + E* b2 = y2
...
C + D*am + E* bm = ym
I would like to use the least squares solution to find best fitting straight line.
I will be using the matrix notation so
where Beta is the vector [C, D, E] where these values will be the best fit line.
Question What is the best way to solve this formula? Should I compute the inverse of
or should I use the LU factorization/decmposition of the matrix. What is the performance of each on large amount of data (i.e a big value of m
, could be in order of 10^8 ...)
EDIT
If the answer was to use Cholesky decomposition or QR decomposition, are there any implementation hints/ simple libraries to use. I am coding in C/ C++.
Решение 2
Другие советы
Two straightforward approaches spring to mind for solving a dense overdetermined system Ax=b:
Form A^T A x = A b, then Cholesky-factorise A^T A = L L^T, then do two back-solves. This usually gets you an answer precise to about sqrt(machine epsilon).
Compute the QR factorisation A = Q*R, where Q's columns are orthogonal and R is square and upper-triangular, using something like Householder elimination. Then solve Rx = Q^T b for x by back-substitution. This usually gets you an answer precise to about machine epsilon --- twice the precision as the Cholesky method, but it takes about twice as long.
For sparse systems, I'd usually prefer the Cholesky method because it takes better advantage of sparsity.