Formatting Data for use in SKmultilearn

https://datascience.stackexchange.com/questions/30573

31-10-2019
|

Question

I have a dataframe of X values and one with the y values in what I think is the proper format. I took the single feature y and converted it to dummies so it's binary and is m rows by n features.

from sklearn.model_selection import train_test_split

X = df.drop('target',axis=1)
y = pd.get_dummies(df['target'], sparse=True, drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=99, train_size=0.60

I've been trying to figure out how exactly the data should look from the documentation but am not having much luck.

I first tried the following:

import skmultilearn.problem_transform import BinaryRelevance
from sklearn.naive_bayes import GaussianNB

classifier = BinaryRelevance(GaussianNB())
classifier.fit(X_train, y_train)

where X, y train sets are pandas dataframes. This results in the following error:

TypeError: no suppoerted conversion for types: (dytype('O'),)

Then I converted to matrices:

X_train = X_train.asmatrix()
y_train = y_train.asmatrix()

I received the same error. Then I tried

X_train = np.matrix(X_train)
y_train = np.matrix(y_train)

Same error again. Is there an easy way to go from dataframe to proper format for skmultilearn models?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange