Statsmodels OLS function for multiple regression parameters

https://stackoverflow.com/questions/21234539

30-09-2022
|

Question

Lets say I want to find the alpha (a) values for an equation which has something like

y=a+ax1+ax2+...+axi

Using OLS lets say we start with 10 values for the basic case of i=2

#y=a+ax1+ax2

y = np.arange(1, 10)
x = np.array([[ 5, 10], [10,  5], [ 5, 15],
       [15, 20], [20, 25], [25, 30],[30, 35],
       [35,  5], [ 5, 10], [10, 15]])

Using statsmodel I would generally the following code to obtain the roots of nx1 x and y array:

import numpy as np
import statsmodels.api as sm

X = sm.add_constant(x)

# least squares fit
model = sm.OLS(y, X)
fit = model.fit()
alpha=fit.params

But this does not work when x is not equivalent to y. The equation is here on the first page if you do not know what OLS.

Solution

The traceback tells you what's wrong

    raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes

Your x has 10 values, your y has 9 values. A regression only works if both have the same number of observations.

endog is y and exog is x, those are the names used in statsmodels for the independent and the explanatory variables.

If you replace your y by

y = np.arange(1, 11)

then everything works as expected.

OTHER TIPS

Here's the basic problem with the above, you say you're using 10 items, but you're only using 9 for your vector of y's.

>>> import numpy
>>> len(numpy.arange(1, 10))
9

This is because slices and ranges in Python go up to but not including the stop integer. If you had done:

numpy.arange(10)

you would have had a list of 10 items, starting at 0, and ending with 9.

For a regression, you require a predicted variable for every set of predictors. Otherwise, the predictors are useless. You may as well discard the set of predictors that do not have a predicted variable to go with them.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow