Multiclassification Error: NotFittedError: This MultiLabelBinarizer instance is not fitted yet
Question
After picking the model, when I try to use it, I am getting error -
"NotFittedError: This MultiLabelBinarizer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."
X = <training_data>
y = <training_labels>
# Perform multi-label classification on class labels.
mlb = MultiLabelBinarizer()
multilabel_y = mlb.fit_transform(y)
p = Pipeline([
('vect', CountVectorizer(min_df=min_df, ngram_range=ngram_range)),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(clf))
])
# Use multilabel classes to fit the pipeline.
p.fit(X, multilabel_y)
Solution
This code will work. Just let sklearn.linear_model.LogisticRegression
handle the multiclassification for you.
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
X = ["How to join amazon company ","How to join google ",'Stay home']
y = ["Career Advice", "Fresher",'Other' ]
# Perform multi-label classification on class labels.
clf = LogisticRegression()
p = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(clf))
])
# Use multilabel classes to fit the pipeline.
p.fit(X, y);
p.predict(X)
```
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange