Question

I have a large data set of genetic and environmental variables that I am using linear regression on. I need to obtain the r.squared, adj.r.squared, and p-value. I have no problem actually running the regression portions and can get a summary of each regression. I have approximately 20,000 models I need to compare and extracting each value individual seems tedious. I imagine there has to be a relatively straight forward way to accomplish this.

Here is my code for extracting values into a data.frame (b1 is the stored summary of my first model):

df=data.frame(r.squared=numeric(),adj.r.squared=numeric(),fstatvalue=numeric(),fstatnumdf=numeric(),fstatdendf=numeric())
for(i in 1:10)
{
df[iter,]=c(b1$r.squared, b1$adj.r.squared, b1$fstatistic)
}

This code creates my data.frame and extracts the data from the same model (b1) 10 times. I have tried several ways to try to get the model identifier to change with each iteration with no luck. Does anyone have a suggestion?

Was it helpful?

Solution

Like @Roland says, get your objects into a list first, then everything will be easy. Assuming you have ~20,000 objects in your workspace (!!!) all called e.g. b1, b2 ,...b20000, you can stick them in a list, extract the summary stats and return a data.frame like this:

# Stick objects in a list
x <- mget( ls( pattern = "^b[0-9]+$" ) )

# Extract summary statistics
out <- lapply( x , function(x) c(x$r.squared, x$adj.r.squared, x$fstatistic) )

# Turn into a data.frame
as.data.frame( out )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top