Question

The problem is as simple as the title.

I want to train a model based on my score function and not on least square distances that LinearRegression uses.

The only place that I found to declare my own score function is in model evaluation algorithms.

The score function that I want to use is one that returns a score based on how close the monotony of two datasets is (one testing, one training).

def monotony_score_signed(y_true, y_pred):

    assert y_true.ndim == 1
    assert y_true.shape == y_pred.shape
    true_order = y_true.argsort(0);

    pred_sign = np.sign(np.diff(y_pred[true_order]))
    true_sign = np.sign(np.diff(y_true[true_order]))
    accuracy = float(np.count_nonzero(pred_sign == true_sign)/float(pred_sign.shape[0] - 1))
    return accuracy
Was it helpful?

Solution

What is your loss/score function? Are you looking for some kind of automatic differentiation routine that provides the gradient (for batch or stochastic gradient ascent) and possibly also the Hessian (for quasi-Newton methods)?

If your score function is not a traditional one, it's unlikely that the library will accept it, since scikits.learn is not an optimization library but rather a wrapper for common inference and machine learning algorithms.

If your cost function has well-known functional forms for the gradient and the Hessian, then there's probably already a built-in routine for fitting a model of that format. But we'll need to know the cost function before we can determine that.

Added after Update

The problem you describe is not appropriate for what you are calling "Linear Regression" (which is a way to learn weights that apply to the feature inputs to predict associated outputs). To goal implicit in regression is to learn a functional form. And in the presence of noise, there would be extremely severe and unnatural assumptions required if you wanted to simultaneously learn a functional form and also ensure that the functional form preserved rank ordering of a data set.

You seem to be interested in just the problem of preserving the rank ordering of the data based on the cost function you describe.

This is known as the "Learning to Rank" problem. If you Google for "scikits.learn ranking model" you'll see some results, including this example, that might be helpful for you.

I don't think there will be an easy way to just plug in your cost function to an existing method though. And I am also unsure if your cost function would really be the best way to model noisy rank-order preservation.

OTHER TIPS

If you want to build regression models under the constraints of a monotonic output vs a 1D input, you can use the IsotonicRegression class.

If you really want to optimize a linear model (or non linear model) for a custom objective function you can use a generic optimizer such as those available in scipy.optimize, climin or cvxopt for instance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top