Formatting Data for use in SKmultilearn
-
31-10-2019 - |
Question
I have a dataframe of X values and one with the y values in what I think is the proper format. I took the single feature y and converted it to dummies so it's binary and is m rows by n features.
from sklearn.model_selection import train_test_split
X = df.drop('target',axis=1)
y = pd.get_dummies(df['target'], sparse=True, drop_first=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=99, train_size=0.60
I've been trying to figure out how exactly the data should look from the documentation but am not having much luck.
I first tried the following:
import skmultilearn.problem_transform import BinaryRelevance
from sklearn.naive_bayes import GaussianNB
classifier = BinaryRelevance(GaussianNB())
classifier.fit(X_train, y_train)
where X, y train sets are pandas dataframes. This results in the following error:
TypeError: no suppoerted conversion for types: (dytype('O'),)
Then I converted to matrices:
X_train = X_train.asmatrix()
y_train = y_train.asmatrix()
I received the same error. Then I tried
X_train = np.matrix(X_train)
y_train = np.matrix(y_train)
Same error again. Is there an easy way to go from dataframe to proper format for skmultilearn models?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange