Pergunta

I've got a database that is 161 x 151 and I applied the following on my dataset:-

> ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10, savePred = T)
> model <- train(RT..seconds.~., data = cadets, method = "lm", trControl = ctrl)

For which I get in return

Coefficients: (82 not defined because of singularities)

I know this means that a lot of my variables are co-linear, and are therefore not independent variables. So I want to be able to look at the coefficient matrix of my data, so I did:-

 cor(cadets, use="complete.obs", method ="kendall")

but the results as you can imagine was to big to fit it all into my R screen. Is there a way of viewing the model matrix so I can see which variables are co-linear with one another, and furthermore what can I do from here onwards to better improve the model if my variables are co-linear? How do I over come that?

Thanks

Foi útil?

Solução

Its described in the preprocess section of the caret manual (about half way down page): http://caret.r-forge.r-project.org/preprocess.html

so for you cadets data it's something like (not tested):

cadetsCor <- cor(cadets)
highlyCorCadets <- findCorrelation(cadetsCor, cutoff = 0.75)
cadets <- cadets[, -highlyCorCadets]

The other alternative is dimension reduction.. e.g PCA but then your model maybe gain in predictive power but lose interpretability.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top