Pregunta

I've used lm() to fit multiple regression models, for multiple (~1 million) response variables in R. Eg.

allModels <- lm(t(responseVariablesMatrix ~ modelMatrix)

This returns an object of class "mlm", which is like a huge object containing all the models. I want to get the Residual Sum of Squares for each model, which I can do using:

summaries <- summary(allModels)
rss1s <- sapply(summaries, function(a) return(a$sigma))

My problem is that I think the "summary" function calculates a whole bunch of other stuff, too, and is hence quite slow. I'm wondering if there is a faster way of extracting just the Residual sum of squares for the model?

Thanks!

¿Fue útil?

Solución

there is a component residuals in output of lm object, so you get residual sum of squares by sum(output$residuals^2).

edit: You are actually taking sigma out of summaries, which is sqrt(sum(output$residuals^2)/output$df.residuals)

For all models use

sapply(allModels, function(a) sqrt(sum(a$residuals^2)/a$df.residuals)))

Otros consejos

Rarely known to many, the generic function deviance can compute residual sum of squares for "lm" and "mlm" models. Let fit be your fitted model, you can do

sqrt(deviance(fit) / fit$df.residual)

There are two advantages here:

  1. the generic function is fully "vectorized" (using colSums) rather than loop-based (like the solution via sapply);
  2. the generic function knows how to deal with weighted regression case.
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top