I'm currently iterating through a very large set of data ~85GB (~600M lines) and simply using newton-raphson to compute a new parameter. As of right now my code is extremely slow, any tips on how to speed it up? The methods from BSCallClass & BSPutClass are closed-form, so there's nothing really to speed up there. Thanks.

class NewtonRaphson:

    def __init__(self, theObject):
        self.theObject = theObject

    def solve(self, Target, Start, Tolerance, maxiter=500):
        y = self.theObject.Price(Start)
        x = Start
        i = 0
        while (abs(y - Target) > Tolerance):
            i += 1
            d = self.theObject.Vega(x)
            x += (Target - y) / d
            y = self.theObject.Price(x)
            if i > maxiter:
                x = nan
                break
        return x

    def main():
        for row in a.iterrows():
            print row[1]["X.1"]
            T = (row[1]["X.7"] - row[1]["X.8"]).days
            Spot = row[1]["X.2"]
            Strike = row[1]["X.9"]
            MktPrice = abs(row[1]["X.10"]-row[1]["X.11"])/2
            CPflag = row[1]["X.6"]

            if CPflag == 'call':
                option = BSCallClass(0, 0, T, Spot, Strike)
            elif CPflag == 'put':
                option = BSPutClass(0, 0, T, Spot, Strike)

            a["X.15"][row[0]] = NewtonRaphson(option).solve(MktPrice, .05, .0001)

EDIT:

For those curious, I ended up speeding this entire process significantly by using the scipy suggestion, as well as using the multiprocessing module.

有帮助吗?

解决方案

Don't code your own Newton-Raphson method in Python. You'll get better performance using one of the root finders in scipy.optimize such as brentq or newton. (Presumably, if you have pandas, you'd also install scipy.)


Back of the envelope calculation:

Making 600M calls to brentq should be manageable on standard hardware:

import scipy.optimize as optimize
def f(x):
    return x**2 - 2

In [28]: %timeit optimize.brentq(f, 0, 10)
100000 loops, best of 3: 4.86 us per loop

So if each call to optimize.brentq takes 4.86 microseconds, 600M calls will take about 4.86 * 600 ~ 3000 seconds ~ 1 hour.


newton may be slower, but still manageable:

def f(x):
    return x**2 - 2
def fprime(x):
    return 2*x

In [40]: %timeit optimize.newton(f, 10, fprime)
100000 loops, best of 3: 8.22 us per loop
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top