Fitting multiple distributions
-
03-07-2021 - |
Question
I have a situation where two or more nd arrays, with some coefficients, should add up (roughly) to a third array.
array1*c1 + array2*c2 ... = array3
I'm looking for the c1
and c2
that make the first two arrays best approximate array3
. I'm sure some way of doing this exists in scipy, but I'm not sure where to start. Is there are specific module I should begin with?
Solution
numpy.linalg.lstsq solves this for you. Object-oriented wrappers for that function, as well as more advanced regression models, are available in both scikit-learn and StatsModels.
(Disclaimer: I'm a scikit-learn developer, so this is not the most unbiased advice ever.)
OTHER TIPS
This is just linear regression (http://en.wikipedia.org/wiki/Ordinary_least_squares).
Let the matrix A
be have columns of array1, array2, ...
Let the vector a
be array3
and x
be a the column vector [c1,c2,...]'
.
You want to solve the problem min_{x} (Ax-a)^2
.
Taking the derivative and setting to zero gives 0=A'Ax-A'a
, which gives the solution x=(A'A)^{-1}A'a
.
In numpy this is numpy.linalg.solve(numpy.dot(A.T,A),numpy.dot(A.T,a))
.