Pregunta

Sorry if it feels like a repeated question but honestly speaking, I've spent more than 12 hours on this and haven't yet found easy to understand and easy to apply method.

The situation is simple, I've created 2 models and need to apply them to the test data.

#Model 1 -

reg5 <- glm(train$survived ~ train$pclass_str + train$sex + 
             train$age_2 + train$sibsp + train$pclass_str*train$sex, 
             family = "binomial")

#Model 2 - 
reg6 <- randomForest(train$survived_str ~ train$pclass_str + train$sex + 
                      train$age_2 + train$sibsp, ntree=5000)

Applying it -

test$pred_reg5 <- predict(reg5, newdata = test, type="response")
test$pred_reg6 <- predict(reg6, newdata = test, type="response")

What I can assure is that both train and test data contain the variables used in the models by the same name. Though there are other unused variables.

The error I'm getting:

Error in `[<-.factor`(`*tmp*`, keep, value = c("0", "1", "1", "1", "0",  : 
  NAs are not allowed in subscripted assignments
In addition: Warning message:
'newdata' had 418 rows but variables found have 891 rows

Thanks for your help!

¿Fue útil?

Solución

Change your models to, e.g.:

reg5 <- glm(survived ~ pclass_str + sex + age_2 + sibsp + pclass_str*sex, 
            data=train, family = "binomial")
reg6 <- randomForest(survived_str ~ pclass_str + sex + age_2 + sibsp, 
                     data=train, ntree=5000)

There may be another problem with your model specification in that reg5 uses survived ~... and reg6 uses survived_str ~..., but I can't tell from your question if this is an issue.

Finally, as @Roland points out, you can simplify your formulas. If you're going to do this a lot, read the documentation on formula in R (?formula). In R formulas, interactions are built by specifying a:b. The notation a*b is equivalent to a + b +a:b (e.g., first order terms + their interaction). So, specifying pclass_str*sex is equivalent to specifying pclass_str + sex + pclass_str:sex.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top