sklearn (scikit-learn) logistic regression package — set trained coefficients for classification.
-
18-03-2021 - |
Question
So I read the scikit-learn package webpate:
I can use logistic regression to fit the data, and after I obtain an instance of LogisticRegression, I can use it to classify new data points. So far so good.
Is there a way to set the coefficients of LogisticRegression() instance though? Because after I obtain the trained coefficients, I want to use the same API to classify new data points.
Or perhaps someone else recommends another python machine learning package that have better APIs?
Thanks
Solution
Indeed, the estimator.coef_
and estimator.intercept_
attributes are read-only python properties instead of usual python attributes. Their values come from the estimator.raw_coef_
array whose memory layout directly maps the expected memory layout of the underlying liblinear
C++ implementation of logistic regression so as to avoid any memory copy of the parameters when calling estimator.predict
or estimator.predict_proba
.
I agree that having read-only properties is a limitation and we should find a way to get rid of those properties but if we refactor this implementation we should also take care of not introducing any unnecessary memory copy which is not trivial to do after having a quick look at the source code.
I have opened an issue on the tracker not to forget about this limitation.
In the mean time you can read the @property
annotated estimator.coef_
method to understand how estimator.coef_
and estimator.raw_coef_
are related and change the value in estimator.raw_coef_
directly.
OTHER TIPS
The coefficients are attributes of the estimator object--that you created when you instantiated the Logistic Regression class--so you can access them in the normal python way:
>>> import numpy as NP
>>> from sklearn import datasets
>>> from sklearn import datasets as DS
>>> digits = DS.load_digits()
>>> D = digits.data
>>> T = digits.target
>>> # instantiate an estimator instance (classifier) of the Logistic Reg class
>>> clf = LR()
>>> # train the classifier
>>> clf.fit( D[:-1], T[:-1] )
LogisticRegression(C=1.0, dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l2', tol=0.0001)
>>> # attributes are accessed in the normal python way
>>> dx = clf.__dict__
>>> dx.keys()
['loss', 'C', 'dual', 'fit_intercept', 'class_weight_label', 'label_',
'penalty', 'multi_class', 'raw_coef_', 'tol', 'class_weight',
'intercept_scaling']
So that's how to get the coefficients, but if you are going to just use those for prediction, a more direct way is to use the estimator's predict method:
>>> # instantiate the L/R classifier, passing in norm used for penalty term
>>> # and regularization strength
>>> clf = LR(C=.2, penalty='l1')
>>> clf
LogisticRegression(C=0.2, dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l1', tol=0.0001)
>>> # select some "training" instances from the original data
>>> # [of course the model should not have been trained on these instances]
>>> test = NP.random.randint(0, 151, 5)
>>> d = D[test,:] # random selected data points w/o class labels
>>> t = T[test,:] # the class labels that correspond to the points in d
>>> # generate model predictions for these 5 data points
>>> v = clf.predict(d)
>>> v
array([0, 0, 2, 0, 2], dtype=int32)
>>> # how well did the model do?
>>> percent_correct = 100*NP.sum(t==v)/t.shape[0]
>>> percent_correct
100