> dat <- data.frame(
var1=sample(letters[1:2],10,replace=TRUE),
var2=c(1,2,3,1,2,3,102,3,1,2)
)
> dat
var1 var2
1 b 1
2 a 2
3 a 3
4 a 1
5 b 2
6 b 3
7 a 102 #outlier
8 b 3
9 b 1
10 a 2
Now only return those rows which are not (!
) greater than 2 abs
olute sd
's from the mean
of the variable in question. Obviously change 2 to however many sd
's you want to be the cutoff.
> dat[!(abs(dat$var2 - mean(dat$var2))/sd(dat$var2)) > 2,]
var1 var2
1 b 1
2 a 2
3 a 3
4 a 1
5 b 2
6 b 3 # no outlier
8 b 3 # between here
9 b 1
10 a 2
Or more short-hand using the scale
function:
dat[!abs(scale(dat$var2)) > 2,]
var1 var2
1 b 1
2 a 2
3 a 3
4 a 1
5 b 2
6 b 3
8 b 3
9 b 1
10 a 2
edit
This can be extended to looking within groups using by
do.call(rbind,by(dat,dat$var1,function(x) x[!abs(scale(x$var2)) > 2,] ))
This assumes dat$var1
is your variable defining the group each row belongs to.