Question

I have a dataset that looks like this

year month age
2007 1     17
2007 1     18
2007 1     19
2007 1     30
2007 1     31
2007 2     18
2007 2     19
2007 2     30
2008 2     41
2008 2     52
2008 2     49  
2008 3     23
2008 3     19
2008 3     39

And I'm stuck trying to find quartile group by each year and month.

The results should be like:

2007 1 Q1 Q2 Q3 Q4
2007 2 Q1 Q2 Q3 Q4

etc..

Thanks

Était-ce utile?

La solution 2

Your question is a bit confusing. It only takes three cutpoints to separate into quartiles. So what do you really want in those Q1, Q2, Q3,Q4 columns? If you want counts it would seem to be a bit boring. I'm going to assume you want the min, 25th.pctile, median, 75th.pctile, and max:

do.call ( rbind, with( dfrm, tapply(age, interaction(year=year , month=month), quantile, 
                                                           probs=c(0, .25,.5, 0.75, 1) ) ) )
#---------------------
       0%  25% 50%  75% 100%
2007.1 17 18.0  19 30.0   31
2007.2 18 18.5  19 24.5   30
2008.2 41 45.0  49 50.5   52
2008.3 19 21.0  23 31.0   39

Autres conseils

Aggregate does this.

> aggregate(.~year + month, data=age, FUN=fivenum)
  year month age.1 age.2 age.3 age.4 age.5
1 2007     1  17.0  18.0  19.0  30.0  31.0
2 2007     2  18.0  18.5  19.0  24.5  30.0
3 2008     2  41.0  45.0  49.0  50.5  52.0
4 2008     3  19.0  21.0  23.0  31.0  39.0


> dput(age)
structure(list(year = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L), month = c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), age = c(17L, 
18L, 19L, 30L, 31L, 18L, 19L, 30L, 41L, 52L, 49L, 23L, 19L, 39L
)), .Names = c("year", "month", "age"), class = "data.frame", row.names = c(NA, 
-14L))
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top