Please check the dtype
of your matrix X
, e.g. by typing X.dtype
. If it is object
or dtype('O')
, then write the lengths of the lines of X
into an array:
lengths = [len(line) for line in X]
Then take a look to see whether all lines have the same length, by invoking
np.unique(lengths)
If there is more than one number in the output, then your line lengths are different, e.g. from line 5853 on, but possibly not all the time.
Numpy data arrays are only useful if all lines have the same length (they continue to work if not, but don't do what you expect.). You should check to see what is causing this, correct it, and then return to knn
.
Here is an example of what happens if line lengths are not the same:
import numpy as np
rng = np.random.RandomState(42)
X = rng.randn(100, 20)
# now remove one element from the 56th line
X = list(X)
X[55] = X[55][:-1]
# turn it back into an ndarray
X = np.array(X)
# check the dtype
print X.dtype # returns dtype('O')
from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors()
nbrs.fit(X) # raises your first error
from sklearn.cluster import KMeans
kmeans = KMeans()
kmeans.fit(X) # raises your second error