Question

I have a dataset with a row that has unknowns which I thought I could use k-Nearest Neighbor on. When I read through the description of the function knnimpute in Matlab it says that it replaces the NaN values with their closest neighbor column value. So I did a transpose on that column so all of the data is now in a single row. However, I get an error saying all the rows have NaN values in them. So I am a bit confused as to how to go about it.

Here's the code I ran:

knnimp = knnimpute(transpose(ds.stage),k);

I couldn't put a screen shot of the data but here's what it looks like (all of the data is in a row):

1 2 4 3 2 1 1 NaN 3 3 3 1 NaN 2 NaN

Here's the output I get after running the code on the transposed data:

All rows of the input data contains missing values. Unable to impute missing values.
Was it helpful?

Solution

By the looks of it you are running k-nearest neighbour on a single vector of data; that is a set of samples with only a single feature each.

Looking at example 1 on the method documentation, it expects a matrix in which each column is a sample, and each row is a feature. It seems the technique used to fill in missing nan values only works if there are multiple features for each sample (i.e.: if you are passing in a matrix).

As you are passing a vector (i.e. multiple samples and a single feature) the algorithm cannot fill in the NaNs, so you would have to remove them before applying the k-nearest neighbour function.

Something like:

temp_stage = ds.stage(~isnan(ds.stage));
knnimp = knnimpute(transpose(temp_stage),k);

OTHER TIPS

Even with a matrix as an argument, the knnimpute function throws the error message if every row of the matrix has at least one missing element. I am not sure how this constraint (having rows with no missing element) can generally be met. This is equivalent to expecting a data set with no missing values for one or more of the features.

Example:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top