Pregunta

I have a dataset that looks like this

year month age
2007 1     17
2007 1     18
2007 1     19
2007 1     30
2007 1     31
2007 2     18
2007 2     19
2007 2     30
2008 2     41
2008 2     52
2008 2     49  
2008 3     23
2008 3     19
2008 3     39

And I'm stuck trying to find quartile group by each year and month.

The results should be like:

2007 1 Q1 Q2 Q3 Q4
2007 2 Q1 Q2 Q3 Q4

etc..

Thanks

¿Fue útil?

Solución 2

Your question is a bit confusing. It only takes three cutpoints to separate into quartiles. So what do you really want in those Q1, Q2, Q3,Q4 columns? If you want counts it would seem to be a bit boring. I'm going to assume you want the min, 25th.pctile, median, 75th.pctile, and max:

do.call ( rbind, with( dfrm, tapply(age, interaction(year=year , month=month), quantile, 
                                                           probs=c(0, .25,.5, 0.75, 1) ) ) )
#---------------------
       0%  25% 50%  75% 100%
2007.1 17 18.0  19 30.0   31
2007.2 18 18.5  19 24.5   30
2008.2 41 45.0  49 50.5   52
2008.3 19 21.0  23 31.0   39

Otros consejos

Aggregate does this.

> aggregate(.~year + month, data=age, FUN=fivenum)
  year month age.1 age.2 age.3 age.4 age.5
1 2007     1  17.0  18.0  19.0  30.0  31.0
2 2007     2  18.0  18.5  19.0  24.5  30.0
3 2008     2  41.0  45.0  49.0  50.5  52.0
4 2008     3  19.0  21.0  23.0  31.0  39.0


> dput(age)
structure(list(year = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L), month = c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), age = c(17L, 
18L, 19L, 30L, 31L, 18L, 19L, 30L, 41L, 52L, 49L, 23L, 19L, 39L
)), .Names = c("year", "month", "age"), class = "data.frame", row.names = c(NA, 
-14L))
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top