質問

I have a dataset that looks like this

year month age
2007 1     17
2007 1     18
2007 1     19
2007 1     30
2007 1     31
2007 2     18
2007 2     19
2007 2     30
2008 2     41
2008 2     52
2008 2     49  
2008 3     23
2008 3     19
2008 3     39

And I'm stuck trying to find quartile group by each year and month.

The results should be like:

2007 1 Q1 Q2 Q3 Q4
2007 2 Q1 Q2 Q3 Q4

etc..

Thanks

役に立ちましたか?

解決 2

Your question is a bit confusing. It only takes three cutpoints to separate into quartiles. So what do you really want in those Q1, Q2, Q3,Q4 columns? If you want counts it would seem to be a bit boring. I'm going to assume you want the min, 25th.pctile, median, 75th.pctile, and max:

do.call ( rbind, with( dfrm, tapply(age, interaction(year=year , month=month), quantile, 
                                                           probs=c(0, .25,.5, 0.75, 1) ) ) )
#---------------------
       0%  25% 50%  75% 100%
2007.1 17 18.0  19 30.0   31
2007.2 18 18.5  19 24.5   30
2008.2 41 45.0  49 50.5   52
2008.3 19 21.0  23 31.0   39

他のヒント

Aggregate does this.

> aggregate(.~year + month, data=age, FUN=fivenum)
  year month age.1 age.2 age.3 age.4 age.5
1 2007     1  17.0  18.0  19.0  30.0  31.0
2 2007     2  18.0  18.5  19.0  24.5  30.0
3 2008     2  41.0  45.0  49.0  50.5  52.0
4 2008     3  19.0  21.0  23.0  31.0  39.0


> dput(age)
structure(list(year = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L), month = c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), age = c(17L, 
18L, 19L, 30L, 31L, 18L, 19L, 30L, 41L, 52L, 49L, 23L, 19L, 39L
)), .Names = c("year", "month", "age"), class = "data.frame", row.names = c(NA, 
-14L))
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top