sklearn (scikit-learn) logistic regression package — set trained coefficients for classification.

https://stackoverflow.com/questions/8539141

18-03-2021
|

Question

So I read the scikit-learn package webpate:

http://scikit-learn.sourceforge.net/dev/modules/generated/sklearn.linear_model.LogisticRegression.html

I can use logistic regression to fit the data, and after I obtain an instance of LogisticRegression, I can use it to classify new data points. So far so good.

Is there a way to set the coefficients of LogisticRegression() instance though? Because after I obtain the trained coefficients, I want to use the same API to classify new data points.

Or perhaps someone else recommends another python machine learning package that have better APIs?

Thanks

Solution

Indeed, the estimator.coef_ and estimator.intercept_ attributes are read-only python properties instead of usual python attributes. Their values come from the estimator.raw_coef_ array whose memory layout directly maps the expected memory layout of the underlying liblinear C++ implementation of logistic regression so as to avoid any memory copy of the parameters when calling estimator.predict or estimator.predict_proba.

I agree that having read-only properties is a limitation and we should find a way to get rid of those properties but if we refactor this implementation we should also take care of not introducing any unnecessary memory copy which is not trivial to do after having a quick look at the source code.

I have opened an issue on the tracker not to forget about this limitation.

In the mean time you can read the @property annotated estimator.coef_ method to understand how estimator.coef_ and estimator.raw_coef_ are related and change the value in estimator.raw_coef_ directly.

OTHER TIPS

The coefficients are attributes of the estimator object--that you created when you instantiated the Logistic Regression class--so you can access them in the normal python way:

>>> import numpy as NP
>>> from sklearn import datasets
>>> from sklearn import datasets as DS
>>> digits = DS.load_digits()
>>> D = digits.data
>>> T = digits.target

>>> # instantiate an estimator instance (classifier) of the Logistic Reg class
>>> clf = LR()
>>> # train the classifier
>>> clf.fit( D[:-1], T[:-1] )
    LogisticRegression(C=1.0, dual=False, fit_intercept=True, 
      intercept_scaling=1, penalty='l2', tol=0.0001)

>>> # attributes are accessed in the normal python way
>>> dx = clf.__dict__
>>> dx.keys()
    ['loss', 'C', 'dual', 'fit_intercept', 'class_weight_label', 'label_', 
     'penalty', 'multi_class', 'raw_coef_', 'tol', 'class_weight', 
     'intercept_scaling']

So that's how to get the coefficients, but if you are going to just use those for prediction, a more direct way is to use the estimator's predict method:

>>> # instantiate the L/R classifier, passing in norm used for penalty term 
>>> # and regularization strength
>>> clf = LR(C=.2, penalty='l1')
>>> clf
    LogisticRegression(C=0.2, dual=False, fit_intercept=True, 
      intercept_scaling=1, penalty='l1', tol=0.0001)

>>> # select some "training" instances from the original data
>>> # [of course the model should not have been trained on these instances]
>>> test = NP.random.randint(0, 151, 5)
>>> d = D[test,:]     # random selected data points w/o class labels
>>> t = T[test,:]     # the class labels that correspond to the points in d

>>> # generate model predictions for these 5 data points
>>> v = clf.predict(d)
>>> v
    array([0, 0, 2, 0, 2], dtype=int32)
>>> # how well did the model do?
>>> percent_correct = 100*NP.sum(t==v)/t.shape[0]
>>> percent_correct
    100

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow