Removing character level outlier in R

https://stackoverflow.com/questions/22621401

20-06-2023
|

Pergunta

I have a linear model1<-lm(divorce_rate~marriage_rate+median_age+population) for which the leverage plot shows an outlier at 28 (State variable id for "Nevada"). I'd like to specify a model without Nevada in the dataset. I tried the following but got stuck.

data<-read.dta("census.dta")
attach(data)
data1<-data.frame(pop,divorce,marriage,popurban,medage,divrate,marrate)
attach(data1)
model1<-lm(divrate~marrate+medage+pop,data=data1)
summary(model1)
layout(matrix(1:4,2,2))
plot(model1)
dfbetaPlots(lm(divrate~marrate+medage+pop),id.n=50)
vif(model1)

dataNV<-data[!data$state == "Nevada",]
attach(dataNV)
model3<-lm(divrate~marrate+medage+pop,data=dataNV)

The last line of the above code gives me

Error in model.frame.default(formula = divrate ~ marrate + medage + pop,  : 
  variable lengths differ (found for 'medage')

enter image description here

Solução

I suspect that you have some glitch in your code such that you have attach()ed copies that are still lying around in your environment -- that's why it's really best practice not to use attach(). The following code works for me:

library(foreign)
## best not to call data 'data'
mydata <- read.dta("http://www.stata-press.com/data/r8/census.dta")

I didn't find divrate or marrate in the data set: I'm going to speculate that you want the per capita rates:

## best practice to use a new name rather than transforming 'in place'
mydata2 <- transform(mydata,marrate=marriage/pop,divrate=divorce/pop)
model1 <- lm(divrate~marrate+medage+pop,data=mydata2)
library(car)
plot(model1)
dfbetaPlots(model1)

This works fine for me in a clean session:

dataNV <- subset(mydata2,state != "Nevada")
## update() may be nice to avoid repeating details of the
##   model specification (not really necessary in this case)
model3 <- update(model1,data=dataNV)

Or you can use the subset argument:

model4 <- update(model1,subset=(state != "Nevada"))

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow