Question

I have a data frame, suppose this:

names<-c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c","c","c","c")
var1<-c(0.942999593,0.935507266,0.973589623,0.969415912,0.95230801,0.935507266,0.888740961,0.91750551,0.944482672,0.945468585,1.457579147,0.922206277,0.941511433,0.954724791,0.941014244,0.941511433,0.941511433,1.50511433)
var2<-c(-0.012678088,0.014313763,0.001138275,-0.020568206,0.012987126,0.001217192,0.03360358,0.009758172,0.015066932,-0.037879492,0.020471157,0.010738162,0.010952531,0.019377213,0.027140572,0.031116892,-0.018530676,-8.90E-05)
as.data.frame(cbind(names,var1,var2))->df

I would like to convert the outliers to Na in the columns var1 and var2. However I would like to calculate the outliers independently for each category in the column "names". So the outliers for "a" in var1, will be the outliers found using just the first 5 rows in var1.

the way in which I detect the outlier is all values, below or above the quantiles 0.25 and 0.75 respectively.

Is there any easy way to do this in R?

thank you very much in advance.

Tina.

Was it helpful?

Solution

Here's how you can do it for var1:

quantiles<-tapply(var1,names,quantile)
minq <- sapply(names, function(x) quantiles[[x]]["25%"])
maxq <- sapply(names, function(x) quantiles[[x]]["75%"])
var1[var1<minq | var1>maxq] <- NA

Repeat the same for var2 (or df$var2).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top