I have a data frame, suppose this:

names<-c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c","c","c","c")
var1<-c(0.942999593,0.935507266,0.973589623,0.969415912,0.95230801,0.935507266,0.888740961,0.91750551,0.944482672,0.945468585,1.457579147,0.922206277,0.941511433,0.954724791,0.941014244,0.941511433,0.941511433,1.50511433)
var2<-c(-0.012678088,0.014313763,0.001138275,-0.020568206,0.012987126,0.001217192,0.03360358,0.009758172,0.015066932,-0.037879492,0.020471157,0.010738162,0.010952531,0.019377213,0.027140572,0.031116892,-0.018530676,-8.90E-05)
as.data.frame(cbind(names,var1,var2))->df

I would like to convert the outliers to Na in the columns var1 and var2. However I would like to calculate the outliers independently for each category in the column "names". So the outliers for "a" in var1, will be the outliers found using just the first 5 rows in var1.

the way in which I detect the outlier is all values, below or above the quantiles 0.25 and 0.75 respectively.

Is there any easy way to do this in R?

thank you very much in advance.

Tina.

有帮助吗?

解决方案

Here's how you can do it for var1:

quantiles<-tapply(var1,names,quantile)
minq <- sapply(names, function(x) quantiles[[x]]["25%"])
maxq <- sapply(names, function(x) quantiles[[x]]["75%"])
var1[var1<minq | var1>maxq] <- NA

Repeat the same for var2 (or df$var2).

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top