Pergunta

I have a dataset that looks like this

year month age
2007 1     17
2007 1     18
2007 1     19
2007 1     30
2007 1     31
2007 2     18
2007 2     19
2007 2     30
2008 2     41
2008 2     52
2008 2     49  
2008 3     23
2008 3     19
2008 3     39

And I'm stuck trying to find quartile group by each year and month.

The results should be like:

2007 1 Q1 Q2 Q3 Q4
2007 2 Q1 Q2 Q3 Q4

etc..

Thanks

Foi útil?

Solução 2

Your question is a bit confusing. It only takes three cutpoints to separate into quartiles. So what do you really want in those Q1, Q2, Q3,Q4 columns? If you want counts it would seem to be a bit boring. I'm going to assume you want the min, 25th.pctile, median, 75th.pctile, and max:

do.call ( rbind, with( dfrm, tapply(age, interaction(year=year , month=month), quantile, 
                                                           probs=c(0, .25,.5, 0.75, 1) ) ) )
#---------------------
       0%  25% 50%  75% 100%
2007.1 17 18.0  19 30.0   31
2007.2 18 18.5  19 24.5   30
2008.2 41 45.0  49 50.5   52
2008.3 19 21.0  23 31.0   39

Outras dicas

Aggregate does this.

> aggregate(.~year + month, data=age, FUN=fivenum)
  year month age.1 age.2 age.3 age.4 age.5
1 2007     1  17.0  18.0  19.0  30.0  31.0
2 2007     2  18.0  18.5  19.0  24.5  30.0
3 2008     2  41.0  45.0  49.0  50.5  52.0
4 2008     3  19.0  21.0  23.0  31.0  39.0


> dput(age)
structure(list(year = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L), month = c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), age = c(17L, 
18L, 19L, 30L, 31L, 18L, 19L, 30L, 41L, 52L, 49L, 23L, 19L, 39L
)), .Names = c("year", "month", "age"), class = "data.frame", row.names = c(NA, 
-14L))
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top