Multiple Linear Regression

https://stackoverflow.com/questions/1346107

20-09-2019
|

Question

I am trying to use GLSMultipleLinearRegression (from apache commons-math package) for multiple linear regression. It is expecting a covariance matrix as input -- I am not sure how to compute them. I have one array of dependent variables and 3 arrays of independent variables.
Any idea how to compute the covariance matrix?

Note: I have 200 items for each of the 3 independent variables

Thanks
Bharani

Solution

If you do not know the covariance between the errors you can take an iterative approach. You would first use Ordinary Least Squares, calculating the errors, and the covariances between the errors. You would then apply the GLS using the calculated covariance matrix and re-estimate the covariance matrix. You would continue iteration using GLS with the new covariance matrix until you have a convergence. Here is a link (.pdf warning) to an example of this method as well as a related discussion of Weighted and Iteratively Weighted Least Squares where you don't have a correlation between the errors as assumed in the GLS.

OTHER TIPS

Just came across Flanagan library that does this out of the box. Also got a mail from the commons user list that commons math at the moment does not support FGLS - automatic estimation of covariance matrix

-Bharani

If you have no idea of the covariance between the errors, I would use Ordinary Least Squares (OLS) instead of Generalized Least Squares (GLS). This amounts to taking the identity matrix as covariance matrix. The library appears to implement OLS in OLSMultipleLinearRegression .

Have you tried creating a Covariance matrix directly from your data?

new Covariance().computeCovarianceMatrix(data)

Using the information in the comment, we know that there are 3 independent, 1 dependent variables and 200 samples. That implies that you will have a data array with 4 columns and 200 rows. The end result will look something like this (typing everything out explicitly in order to try to explain what I mean):

double [] data = new double [4][];
data[0] = new double[]{y[0], x[0][0], x[1][0], x[2][0]};
data[1] = new double[]{y[1], x[0][1], x[1][1], x[2][1]};
data[2] = new double[]{y[2], x[0][2], x[1][2], x[2][2]};
// ... etc.
data[199] = new double[]{y[199], x[0][199], x[1][199], x[2][199]};
Covariance covariance = new Covariance().computeCovarianceMatrix(data);
double [][] omega = covariance.getCovarianceMatrix().getData();

Then, when you're doing your actual regression, you have your covariance matrix:

MultipleLinearRegression regression = new GLSMultipleLinearRegression();
// Assumes you put your independent variables in x and dependent in y
// Also assumes that you made your covariance matrix as shown above 
regression.addData(y, x, omega); // we do need covariance

@Mark Lavin

You would first use Ordinary Least Squares, calculating the errors, and the covariances between the errors

Im a bit confused.. Since we have only one response variable, the residual errors should be 1 dimensional variable. Then where does a covariance matrix of errors fit in?

You need to organize the 3 random independent variates as column vectors in a matrix: x1, x2, x3 (N) where each row is a observation (M). This will be an MxN matrix.

You then plug this data matrix into a covariance routine provided by Apache such as: Covariance.computeCovarianceMatrix(RealMatrix matrix).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow