I've used lm() to fit multiple regression models, for multiple (~1 million) response variables in R. Eg.

allModels <- lm(t(responseVariablesMatrix ~ modelMatrix)

This returns an object of class "mlm", which is like a huge object containing all the models. I want to get the Residual Sum of Squares for each model, which I can do using:

summaries <- summary(allModels)
rss1s <- sapply(summaries, function(a) return(a$sigma))

My problem is that I think the "summary" function calculates a whole bunch of other stuff, too, and is hence quite slow. I'm wondering if there is a faster way of extracting just the Residual sum of squares for the model?

Thanks!

有帮助吗?

解决方案

there is a component residuals in output of lm object, so you get residual sum of squares by sum(output$residuals^2).

edit: You are actually taking sigma out of summaries, which is sqrt(sum(output$residuals^2)/output$df.residuals)

For all models use

sapply(allModels, function(a) sqrt(sum(a$residuals^2)/a$df.residuals)))

其他提示

Rarely known to many, the generic function deviance can compute residual sum of squares for "lm" and "mlm" models. Let fit be your fitted model, you can do

sqrt(deviance(fit) / fit$df.residual)

There are two advantages here:

  1. the generic function is fully "vectorized" (using colSums) rather than loop-based (like the solution via sapply);
  2. the generic function knows how to deal with weighted regression case.
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top