Question

I am using the sklearn 0.14 module in Python to create a decision tree. I was hoping to use the OneHotEncoder to convert some features into categorical features. According to the documentation, I should be able to provide an array of indices to indicate which features should be converted. However, trying the following code:

xs = [[64, 15230], [3, 67673], [16, 43678]]
encoder = preprocessing.OneHotEncoder(n_values='auto', categorical_features=[1], dtype=numpy.integer)
encoder.fit(xs)

I receive the following error:

Traceback (most recent call last):   File
"C:\Users\sara\Documents\Shipping
Project\PythonSandbox\CarrierDecisionTree.py", line 35, in <module>
    encoder.fit(xs)   File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
892, in fit
    self.fit_transform(X)   File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
944, in fit_transform
    self.categorical_features, copy=True)   File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
795, in _transform_selected
    return sparse.hstack((X_sel, X_not_sel))   File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 417,
in hstack
    return bmat([blocks], format=format, dtype=dtype)   File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 532,
in bmat
    dtype = upcast( *tuple([A.dtype for A in blocks[block_mask]]) )   File "C:\Python27\lib\site-packages\scipy\sparse\sputils.py", line 53,
in upcast
    raise TypeError('no supported conversion for types: %r' % (args,)) TypeError: no supported conversion for types: (dtype('int32'),
dtype('S6'))

If instead, I provide the array [0, 1] to categorical_features, it works correctly and converts both features properly. The same correct behavior occurs with using 'all' to categorical_features. However, I only want the second feature converted and not the first. I understand I could do this manually by converting one feature at a time, but I was hoping to use all the beauty of OneHotEncoder as I will be using many more features later on.

Was it helpful?

Solution

Posting as an answer, for the record:

TypeError: no supported conversion for types: (dtype('int32'), dtype('S6'))

means something in the true xs (not the one shown in the code snippet) is a string: dtype('S6') is NumPy's length-six string type.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top