Question

I am using scipy's curvefit module to fit a function and wanted to know if there is a way to tell it the the only possible entries are integers not real numbers? Any ideas as to another way of doing this?

Was it helpful?

Solution

In its general form, an integer programming problem is NP-hard ( see here ). There are some efficient heuristic or approximate algorithm to solve this problem, but none guarantee an exact optimal solution.

In scipy you may implement a grid search over the integer coefficients and use, say, curve_fit over the real parameters for the given integer coefficients. As for grid search. scipy has brute function.

For example if y = a * x + b * x^2 + some-noise where a has to be integer this may work:

  1. Generate some test data with a = 5 and b = -1.5:

    coef, n = [5, - 1.5], 50
    xs = np.linspace(0, 10, n)[:,np.newaxis]
    xs = np.hstack([xs, xs**2])
    noise = 2 * np.random.randn(n)
    ys = np.dot(xs, coef) + noise
    
  2. A function which given the integer coefficients fits the real coefficient using curve_fit method:

    def optfloat(intcoef, xs, ys):
        from scipy.optimize import curve_fit
        def poly(xs, floatcoef):
            return np.dot(xs, [intcoef, floatcoef])
        popt, pcov = curve_fit(poly, xs, ys)
        errsqr = np.linalg.norm(poly(xs, popt) - ys)
        return dict(errsqr=errsqr, floatcoef=popt)
    
  3. A function which given the integer coefficients, uses the above function to optimize the float coefficient and returns the error:

    def errfun(intcoef, *args):
        xs, ys = args
        return optfloat(intcoef, xs, ys)['errsqr']
    
  4. Minimize errfun using scipy.optimize.brute to find optimal integer coefficient and call optfloat with the optimized integer coefficient to find the optimal real coefficient:

    from scipy.optimize import brute
    grid = [slice(1, 10, 1)]  # grid search over 1, 2, ..., 9
    # it is important to specify finish=None in below
    intcoef = brute(errfun, grid, args=(xs, ys,), finish=None)
    floatcoef = optfloat(intcoef, xs, ys)['floatcoef'][0]
    

Using this method I obtain [5.0, -1.50577] for the optimal coefficients, which is exact for the integer coefficient, and close enough for the real coefficient.

OTHER TIPS

In general, the answer is No: scipy.optimize.curve_fit() and leastsq() that it is based on, and (AFAIK) all the other solvers in scipy.optimize work strictly on floating point numbers.

You could try increasing the value of epsfcn (which has a default value of numpy.finfo('double').eps ~ 2.e-16), which would be used as the initial step to all variables in the problem. The basic issue is that the fitting algorithm will adjust a floating point number, and if you do

    int_var = int(float_var)

and the algorithm changes float_var from 1.0 to 1.00000001, it will see no difference in the result and decide that that value does not actually alter the fit metric.

Another approach would be to have a floating point parameter 'tmp_float_var' that is freely adjusted by the fitting algorithm but then in your objective function use

    int_var = int(tmp_float_var / numpy.finfo('double').eps)

as the value for your integer variable. That might need a little tweaking, and might be a little unstable, but ought to work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top