Scikit-learn Ridge classifier: extracting class probabilities

Question 1

Further exploration lead to using the softmax function.

d = clf.decision_function(x)[0]
probs = np.exp(d) / np.sum(np.exp(d))

This guarantees a 0-1 bounded distribution that sums to 1.

Question 2

A little look at the source code of predict shows that decision_function is in fact the logit-transform of the actual class probabilities, i.e., if decision funciton is f, then the class probability of class 1 is exp(f) / (1 + exp(f)). This translates to following check in the sklearn source:

    scores = self.decision_function(X)
    if len(scores.shape) == 1:
        indices = (scores > 0).astype(np.int)
    else:
        indices = scores.argmax(axis=1)
    return self.classes_[indices]

If you observe this check, it tells you that if decision function is greater than zero, then predict class 1, otherwise predict class 0 - a classical logit approach.

So, you will have to turn the decision function into something like:

d = clf.decision_function(x)[0]
probs = numpy.exp(d) / (1 + numpy.exp(d))

And then take appropriate zip etc.

Question 3

The solutions provided here didn't work for me. I think the softmax function is the correct solution, so I extended RidgeClassifierCV class with a predict_proba method similar to LogisticRegressionCV

from sklearn.utils.extmath import softmax
class RidgeClassifierCVwithProba(RidgeClassifierCV):
    def predict_proba(self, X):
        d = self.decision_function(X)
        d_2d = np.c_[-d, d]
        return softmax(d_2d)