Question

After picking the model, when I try to use it, I am getting error -

"NotFittedError: This MultiLabelBinarizer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."

X = <training_data>
y = <training_labels>

# Perform multi-label classification on class labels.
mlb = MultiLabelBinarizer()
multilabel_y = mlb.fit_transform(y)

p = Pipeline([
('vect', CountVectorizer(min_df=min_df, ngram_range=ngram_range)),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(clf))
])

# Use multilabel classes to fit the pipeline.
p.fit(X, multilabel_y)
Was it helpful?

Solution

This code will work. Just let sklearn.linear_model.LogisticRegression handle the multiclassification for you.

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression

X = ["How to join amazon company ","How to join google ",'Stay home']
y = ["Career Advice", "Fresher",'Other' ]

# Perform multi-label classification on class labels.

clf = LogisticRegression()

p = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(clf))
])

# Use multilabel classes to fit the pipeline.
p.fit(X, y);
p.predict(X)
```
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top