Domanda

I have 3 variables; Market_Price, Hours, Age.

Using optimize I found the relationship between each of the variables and the Market_Price.

Data:

hours =  [1000,  10000,  11000,  11000,  15000,  18000,  37000,  24000,  28000,  28000,  42000,  46000,  50000,  34000,  34000,  46000,  50000,  56000,  64000,  64000,  65000,  80000,  81000,  81000,  44000,  49000,  76000,  76000,  89000,  38000,  80000,  69000,  46000,  47000,  57000,  72000,  77000,  68000]

market_Price =  [30945,  28974,  27989,  27989,  36008,  24780,  22980,  23997,  25957,  27847,  36000,  25588,  23980,  25990,  25990,  28995,  26770,  26488,  24988,  24988,  17574,  12995,  19788,  20488,  19980,  24978,  16000,  16400,  18988,  19980,  18488,  16988,  15000,  15000,  16998,  17499,  15780,  8400]

age =  [2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  5,  5,  5,  5,  5,  6,  6,  7,  8,  8,  8,  8,  8,  13,]

The relationship I derived was:

Hours to market_price = log(h)*h1+h2,

Age to market_price = log(a)*a1+a2

Where h1, h2, a1, a2 are found using Scipy's Optimize Curve Fit.

Now I would like to combine all 3 into one calculation, whereby having the age and hours I could determine the market_price.

The way I have been doing it so far is by finding the ratio between the two by determining which combination has the smallest standard deviation.

std_divs = []
for ratio in ratios:    
    n = 0
    price_difference_final = []
    while n < len(prices):
        predicted_price = (log(h)*h1+h1)*ratio + (log(a)*a1+a1)*(1-ratio)
        price_difference_final.append(prices[n] - predicted_price)
        n += 1
    data = np.array(price_difference_final)
    std_divs.append(np.std(data))
std_div = min(std_divs)
optimum_ratio = ratios[std_divs.index(min(std_divs))]

As you can see, I accomplish this by brute force which is not an elegant solution.

Furthermore, now I find that the relationship between the 3 cannot be expressed using a single ratio, instead the ratio needs to be sliding. As year increases the hours/age ratio decreases, giving age an increasing weight in regards to the market price.

Unfortunately, I haven't been able to implement this using Scipy's Curve Fit as it only accepts one pair of arrays.

Any thought of how this could be best achieved?

È stato utile?

Soluzione

It is possible to create an array with more than one dimension, in this case you can pass both your hours and age data into curve_fit. Such an example might be:

import numpy as np
from scipy.optimize import curve_fit

hours =  [1000,  10000,  11000,  11000,  15000,  18000,  37000,  24000,
          28000,  28000,  42000,  46000,  50000,  34000,  34000,  46000,
          50000,  56000,  64000,  64000,  65000,  80000,  81000,  81000,
          44000,  49000,  76000,  76000,  89000,  38000,  80000,  69000,
          46000,  47000,  57000,  72000,  77000,  68000]

market_Price =  [30945,  28974,  27989,  27989,  36008,  24780,  22980,
                 23997,  25957,  27847,  36000,  25588,  23980,  25990,  
                 25990,  28995,  26770,  26488,  24988,  24988,  17574,
                 12995,  19788,  20488,  19980,  24978,  16000,  16400,
                 18988,  19980,  18488,  16988,  15000,  15000,  16998,
                 17499,  15780,  8400]

age =  [2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  4,  4,  4,
        4,  4,  4,  4,  4,  4,  4,  4,  5,  5,  5,  5,  5,  6,  6,  7,  
        8,  8,  8,  8,  8,  13]

combined = np.array([hours, market_Price])

def f():
    # Some function which uses combined where
    # combined[0] = hours and combined[1] = market_Price
    pass

popt, pcov = curve_fit(f, combined, market_Price)

Altri suggerimenti

This is multiple regression problem, you don't need to write your own code, as it is already there:

http://wiki.scipy.org/Cookbook/OLS

Note: in the end you don't have 5 parameters, h1, h2, a1, a2, ratio. You only have three: h2*ratio+a2*(1-ratio) h1*ratio a1*(1-ratio)

In [26]:

y=np.array(market_Price)
x=np.log(np.array([hours, age])).T
In [27]:

mymodel=ols(y, x, 'Market_Price', ['Hours', 'Age'])
In [28]:

mymodel.p # return coefficient p-values
Out[28]:
array([  1.32065700e-05,   3.06318351e-01,   1.34081122e-05])
In [29]:

mymodel.summary()

==============================================================================
Dependent Variable: Market_Price
Method: Least Squares
Date:  Mon, 24 Mar 2014
Time:  15:40:00
# obs:                  38
# variables:         3
==============================================================================
variable     coefficient     std. Error      t-statistic     prob.
==============================================================================
const           45838.261850      9051.125823      5.064371      0.000013
Hours          -1023.097422      985.498239     -1.038152      0.306318
Age            -8862.186475      1751.640834     -5.059363      0.000013
==============================================================================
Models stats                         Residual stats
==============================================================================
R-squared             0.624227         Durbin-Watson stat   1.301026
Adjusted R-squared    0.602754         Omnibus stat         2.999547
F-statistic           29.070664         Prob(Omnibus stat)   0.223181
Prob (F-statistic)    0.000000          JB stat              1.807013
Log likelihood       -366.421766            Prob(JB)             0.405146
AIC criterion         19.443251         Skew                 0.376021
BIC criterion         19.572534         Kurtosis             3.758751
==============================================================================
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top