OneHotEncoder
input needs to be 2-d, not 1-d (it expects a set of samples).
>>> X = [[1, 2, 2, 3, 3, 1, 2, 4, 4, 1]]
Let's suppose that your categorical features can all take on four values:
>>> n_values = np.repeat(4, len(X[0]))
>>> n_values
array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
Then OneHotEncoder
works fine:
>>> oh = OneHotEncoder(n_values=n_values)
>>> Xt = oh.fit_transform(X)
>>> Xt.toarray()
array([[ 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0.,
0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.,
1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0.,
0.]])
>>> Xt.shape
(1, 40)
It produces one dummy variable too many for each input variable, which is a bit wasteful. I've no idea what you mean by this NULL
stuff since I don't know what your data looks like. You might want to open a separate question for that.