Question

When running a regression analysis in R (using glm) cases are removed due to 'missingness' of the data. Is there any way to flag which cases have been removed? I would ideally like to remove these from my original dataframe.

Many thanks

Was it helpful?

Solution

The model fit object returned by glm() records the row numbers of the data that it excludes for their incompleteness. They are a bit buried but you can retrieve them like this:

## Example data.frame with some missing data
df <- mtcars[1:6, 1:5]
df[cbind(1:5,1:5)] <- NA
df
#                    mpg cyl disp  hp drat
# Mazda RX4           NA   6  160 110 3.90
# Mazda RX4 Wag     21.0  NA  160 110 3.90
# Datsun 710        22.8   4   NA  93 3.85
# Hornet 4 Drive    21.4   6  258  NA 3.08
# Hornet Sportabout 18.7   8  360 175   NA
# Valiant           18.1   6  225 105 2.76

## Fit an example model, and learn which rows it excluded
f <- glm(mpg~drat,weight=disp, data=df)
as.numeric(na.action(f))
# [1] 1 3 5

Alternatively, to get the row indices without having to fit the model, use the same strategy with the output of model.frame():

as.numeric(na.action(model.frame(mpg~drat,weight=disp, data=df)))
# [1] 1 3 5

OTHER TIPS

Without a reproducible example I can't provide code tailored to your problem, but here's a generic method that should work. Assume your data frame is called df and your variables are called y, x1, x2, etc. And assume you want y, x1, x3, and x6 in your model.

# Make a vector of the variables that you want to include in your glm model
# (Be sure to include any weighting or subsetting variables as well, per Josh's comment)
glm.vars = c("y","x1","x3","x6") 

# Create a new data frame that includes only those rows with no missing values
# for the variables that are in your model
df.glm = df[complete.cases(df[ , glm.vars]), ] 

Also, if you want to see just the rows that have at least one missing value, do the following (note the addition of ! (the "not" operator)):

df[!complete.cases(df[ , glm.vars]), ] 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top