Pergunta

I am trying to run knnreg from the package caret. For some reason, this training set works:

> summary(train1)
       V1                V2             V3             
 13     : 10474   1      :  6435   7      :  8929     
 10     : 10315   2      :  6435   6      :  8895     
 4      : 10272   3      :  6435   9      :  8892     
 1      : 10244   4      :  6435   10     :  8892     
 2      : 10238   7      :  6435   15     :  8874     
 24     : 10228   8      :  6435   40     :  8870                        
 (Other):359799   (Other):382960   (Other):368218   

While this one won't work:

> summary(train2)
        V1              V2               V3                   V4      
 13     : 10474   1      :  6436   7      :  8929   Christmas   :  5946  
 10     : 10315   2      :  6436   6      :  8895   Labor Day   :  8861  
 4      : 10272   3      :  6438   9      :  8892   None        :391909  
 1      : 10244   4      :  6435   10     :  8892   Super Bowl  :  8895  
 2      : 10238   7      :  6435   15     :  8874   Thanksgiving:  5959  
 24     : 10228   8      :  6435   40     :  8870                        
 (Other):359799   (Other):382960   (Other):368218   

Here is the target vector:

> summary(Target)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -499    200    712   1980   20210  693100 

The error I get is during the prediction phase:

> fit <- knnreg(train2, Target, k = 2)
> Prediction <- predict(fit,  newdata=test)
Error in knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,  : 
  NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning messages:
1: In knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,  :
  NAs introduced by coercion
2: In knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,  :
  NAs introduced by coercion

While this is my test set:

> summary(test)
     V1            V2           V3                    V4      
 13     : 2836   1      :  1755   51     : 3002   Christmas   :  2988  
 4      : 2803   2      :  1755   49     : 2989   Labor Day   :     0  
 19     : 2799   3      :  1755   52     : 2988   None        :106136  
 2      : 2797   4      :  1755   50     : 2986   Super Bowl  :  2964  
 27     : 2791   7      :  1755   6      : 2984   Thanksgiving:  2976  
 24     : 2790   8      :  1755   47     : 2976                        
 (Other):98248   (Other):104534   (Other):97139     

What am I missing?

EDIT: Switching the V4 set labels to '1', '2', ... actually fixes the problem. Is the algorithm considers my features as numerical even though they're factors?

Foi útil?

Solução

I realized that knnreg will receive only numerical values and when I tried to train the model with train1, it considered all values to be numerical (when in fact they are categorical). train2 returns an error because V4 is not numerical, and knnreg can't convert it into numerical either.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top