Question

Lets say I want to find the alpha (a) values for an equation which has something like

y=a+ax1+ax2+...+axi

Using OLS lets say we start with 10 values for the basic case of i=2

#y=a+ax1+ax2

y = np.arange(1, 10)
x = np.array([[ 5, 10], [10,  5], [ 5, 15],
       [15, 20], [20, 25], [25, 30],[30, 35],
       [35,  5], [ 5, 10], [10, 15]])

Using statsmodel I would generally the following code to obtain the roots of nx1 x and y array:

import numpy as np
import statsmodels.api as sm

X = sm.add_constant(x)

# least squares fit
model = sm.OLS(y, X)
fit = model.fit()
alpha=fit.params

But this does not work when x is not equivalent to y. The equation is here on the first page if you do not know what OLS.

Était-ce utile?

La solution

The traceback tells you what's wrong

    raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes

Your x has 10 values, your y has 9 values. A regression only works if both have the same number of observations.

endog is y and exog is x, those are the names used in statsmodels for the independent and the explanatory variables.

If you replace your y by

y = np.arange(1, 11)

then everything works as expected.

Autres conseils

Here's the basic problem with the above, you say you're using 10 items, but you're only using 9 for your vector of y's.

>>> import numpy
>>> len(numpy.arange(1, 10))
9

This is because slices and ranges in Python go up to but not including the stop integer. If you had done:

numpy.arange(10)

you would have had a list of 10 items, starting at 0, and ending with 9.

For a regression, you require a predicted variable for every set of predictors. Otherwise, the predictors are useless. You may as well discard the set of predictors that do not have a predicted variable to go with them.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top