Frage

I have some question about metrics in python. I have got next error: "ValueError: Can't handle mix of multiclass and continuous".

My code looks like here (additional the whole information over parameters):

X_train,X_test,y_train,y_test = cross_validation.train_test_split(data, target, test_size=0.3, random_state=42)
clf = RFC()
clf = clf.fit(X_train,y_train)
y_predict = clf.predict_proba(X_test)[:,1]
print f1_score(y_test,y_predict)

>>>X_train.shape
(7000, 576)
>>>X_test.shape
(3000, 576)
>>>y_train.shape
(7000,)
>>>y_test.shape
(3000,)
>>>X_train.dtype
dtype('float64')
>>>X_test.dtype
dtype('float64')
>>>y_train.dtype
dtype('float64')
>>>y_test.dtype
dtype('float64')
>>>y_predict.shape
(3000,)
>>>y_predict.dtype
dtype('float64')

I think, some parameter is wrong, but at the first look everything is good... Can not really check, where is a problem...

War es hilfreich?

Lösung

This is the problem:

y_predict = clf.predict_proba(X_test)[:,1]
print f1_score(y_test,y_predict)

F1 is defined on labels, not probability distributions, so use predict instead of predict_proba.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top