Question

I have 3 variables; Market_Price, Hours, Age.

Using optimize I found the relationship between each of the variables and the Market_Price.

Data:

hours =  [1000,  10000,  11000,  11000,  15000,  18000,  37000,  24000,  28000,  28000,  42000,  46000,  50000,  34000,  34000,  46000,  50000,  56000,  64000,  64000,  65000,  80000,  81000,  81000,  44000,  49000,  76000,  76000,  89000,  38000,  80000,  69000,  46000,  47000,  57000,  72000,  77000,  68000]

market_Price =  [30945,  28974,  27989,  27989,  36008,  24780,  22980,  23997,  25957,  27847,  36000,  25588,  23980,  25990,  25990,  28995,  26770,  26488,  24988,  24988,  17574,  12995,  19788,  20488,  19980,  24978,  16000,  16400,  18988,  19980,  18488,  16988,  15000,  15000,  16998,  17499,  15780,  8400]

age =  [2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  5,  5,  5,  5,  5,  6,  6,  7,  8,  8,  8,  8,  8,  13,]

The relationship I derived was:

Hours to market_price = log(h)*h1+h2,

Age to market_price = log(a)*a1+a2

Where h1, h2, a1, a2 are found using Scipy's Optimize Curve Fit.

Now I would like to combine all 3 into one calculation, whereby having the age and hours I could determine the market_price.

The way I have been doing it so far is by finding the ratio between the two by determining which combination has the smallest standard deviation.

std_divs = []
for ratio in ratios:    
    n = 0
    price_difference_final = []
    while n < len(prices):
        predicted_price = (log(h)*h1+h1)*ratio + (log(a)*a1+a1)*(1-ratio)
        price_difference_final.append(prices[n] - predicted_price)
        n += 1
    data = np.array(price_difference_final)
    std_divs.append(np.std(data))
std_div = min(std_divs)
optimum_ratio = ratios[std_divs.index(min(std_divs))]

As you can see, I accomplish this by brute force which is not an elegant solution.

Furthermore, now I find that the relationship between the 3 cannot be expressed using a single ratio, instead the ratio needs to be sliding. As year increases the hours/age ratio decreases, giving age an increasing weight in regards to the market price.

Unfortunately, I haven't been able to implement this using Scipy's Curve Fit as it only accepts one pair of arrays.

Any thought of how this could be best achieved?

Was it helpful?

Solution

It is possible to create an array with more than one dimension, in this case you can pass both your hours and age data into curve_fit. Such an example might be:

import numpy as np
from scipy.optimize import curve_fit

hours =  [1000,  10000,  11000,  11000,  15000,  18000,  37000,  24000,
          28000,  28000,  42000,  46000,  50000,  34000,  34000,  46000,
          50000,  56000,  64000,  64000,  65000,  80000,  81000,  81000,
          44000,  49000,  76000,  76000,  89000,  38000,  80000,  69000,
          46000,  47000,  57000,  72000,  77000,  68000]

market_Price =  [30945,  28974,  27989,  27989,  36008,  24780,  22980,
                 23997,  25957,  27847,  36000,  25588,  23980,  25990,  
                 25990,  28995,  26770,  26488,  24988,  24988,  17574,
                 12995,  19788,  20488,  19980,  24978,  16000,  16400,
                 18988,  19980,  18488,  16988,  15000,  15000,  16998,
                 17499,  15780,  8400]

age =  [2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  4,  4,  4,
        4,  4,  4,  4,  4,  4,  4,  4,  5,  5,  5,  5,  5,  6,  6,  7,  
        8,  8,  8,  8,  8,  13]

combined = np.array([hours, market_Price])

def f():
    # Some function which uses combined where
    # combined[0] = hours and combined[1] = market_Price
    pass

popt, pcov = curve_fit(f, combined, market_Price)

OTHER TIPS

This is multiple regression problem, you don't need to write your own code, as it is already there:

http://wiki.scipy.org/Cookbook/OLS

Note: in the end you don't have 5 parameters, h1, h2, a1, a2, ratio. You only have three: h2*ratio+a2*(1-ratio) h1*ratio a1*(1-ratio)

In [26]:

y=np.array(market_Price)
x=np.log(np.array([hours, age])).T
In [27]:

mymodel=ols(y, x, 'Market_Price', ['Hours', 'Age'])
In [28]:

mymodel.p # return coefficient p-values
Out[28]:
array([  1.32065700e-05,   3.06318351e-01,   1.34081122e-05])
In [29]:

mymodel.summary()

==============================================================================
Dependent Variable: Market_Price
Method: Least Squares
Date:  Mon, 24 Mar 2014
Time:  15:40:00
# obs:                  38
# variables:         3
==============================================================================
variable     coefficient     std. Error      t-statistic     prob.
==============================================================================
const           45838.261850      9051.125823      5.064371      0.000013
Hours          -1023.097422      985.498239     -1.038152      0.306318
Age            -8862.186475      1751.640834     -5.059363      0.000013
==============================================================================
Models stats                         Residual stats
==============================================================================
R-squared             0.624227         Durbin-Watson stat   1.301026
Adjusted R-squared    0.602754         Omnibus stat         2.999547
F-statistic           29.070664         Prob(Omnibus stat)   0.223181
Prob (F-statistic)    0.000000          JB stat              1.807013
Log likelihood       -366.421766            Prob(JB)             0.405146
AIC criterion         19.443251         Skew                 0.376021
BIC criterion         19.572534         Kurtosis             3.758751
==============================================================================
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top