Question

I am trying to do a simple ML re-sampling approach after the train-test split. However when I do this, it throws the below error. Can you please help me understand what this error is about?

KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'

The code is given below:

# split into training and testing datasets
from sklearn.model_selection import train_test_split
from sklearn.utils import resample
from imblearn.over_sampling import SMOTE
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 2, shuffle = True, stratify = y)
print("Number transactions X_train dataset: ", X_train.shape)
print("Number transactions y_train dataset: ", y_train.shape)
print("Number transactions X_test dataset: ", X_test.shape)
print("Number transactions y_test dataset: ", y_test.shape)
print("Before OverSampling, counts of label '1': {}".format(sum(y_train==1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train==0)))

sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())   # error is thrown here

print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape))

print("After OverSampling, counts of label '1': {}".format(sum(y_train_res==1)))
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res==0)))

Here is the full error message:

KeyError                                  Traceback (most recent call last)
<ipython-input-216-af83b63865ac> in <module>
      3 
      4 sm = SMOTE(random_state=2)
----> 5 X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
      6 
      7 print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
     86         if self._X_columns is not None:
     87             X_ = pd.DataFrame(output[0], columns=self._X_columns)
---> 88             X_ = X_.astype(self._X_dtypes)
     89         else:
     90             X_ = output[0]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5863                     results.append(
   5864                         col.astype(
-> 5865                             dtype=dtype[col_name], copy=copy, errors=errors, **kwargs
   5866                         )
   5867                     )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5846                 if len(dtype) > 1 or self.name not in dtype:
   5847                     raise KeyError(
-> 5848                         "Only the Series name can be used for "
   5849                         "the key in Series dtype mappings."
   5850                     )

KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'
Était-ce utile?

La solution

do it without ravel (or reshaping of any kind).

Or if you going to do than transform dataframe X_train into an matrix also. This is the correct format fit_sample

Autres conseils

change your dataframe into matrix :

sm.fit_sample(X_train.as_matrix(), y_train.ravel())
Licencié sous: CC-BY-SA avec attribution
scroll top