I think tapply is the function you're looking for:
tapply(d$weight, list(d$State, d$YYYYMM), sum)
Domanda
I have a data frame of three columns that looks like this:
> head(d)
YYYYMM State Weight
1 200812 AL 1876.356
2 200812 AL 2630.503
3 200812 AL 2763.981
4 200812 AL 2693.110
5 200812 AL 2905.784
6 200812 AL 3511.313
It has 51 states and and goes from 2008-04 until 2010-04, so it has 25 unique YYYYMM elements:
> levels(factor(d$YYYYMM))
[1] "200804" "200805" "200806" "200807" "200808" "200809" "200810"
[8] "200811" "200812" "200901" "200902" "200903" "200904" "200905"
[15] "200906" "200907" "200908" "200909" "200910" "200911" "200912"
[22] "201001" "201002" "201003" "201004"
Using the table(d$YYYYMM,d$State)
I get a contingency table of counts:
head(table(d$YYYYMM,d$State))
ME NH VT MA RI CT NY NJ PA OH IN IL ...
200804 2018 2340 1501 1651 1781 2373 4550 2181 3328 2949 1631 3242 ...
200805 2002 2332 1556 1648 1770 2360 4521 2217 3294 2936 1671 3193 ...
200806 1999 2369 1552 1676 1803 2390 4578 2221 3331 2997 1642 3181 ...
200807 1988 2354 1605 1601 1769 2362 4530 2165 3318 2973 1592 3271 ...
200808 1998 2348 1649 1667 1812 2411 4417 2191 3302 2975 1627 3198 ...
200809 2032 2343 1679 1670 1865 2367 4599 2185 3320 2914 1625 3155 ...
...
However, instead of counts I want those numbers to be the sum of weights. In other words, for example, for 200804 and state ME I want not the counts, but the sum of weights:
> sum(d[d$YYYYMM==200804 & d$State=="ME",]$Weight)
[1] 1063323
I tried using the "for" loop to calculate that, but it was taking way too much time. Is there a way to modify the table()
function to accomplish that? If not what other options are there? Eventually, I want to calculate the percentages, but it is trivial once I know how to get the sums of weights by YYYYMM and state. Thank you. Below is the summary of the data if you need it. Let me know in case more clarification is necessary.
> summary(d)
YYYYMM State Weight
Min. :200804 CA : 221244 Min. : 0
1st Qu.:200810 TX : 132650 1st Qu.: 1176
Median :200904 NY : 114282 Median : 2496
Mean :200887 FL : 106116 Mean : 2226
3rd Qu.:200910 PA : 82482 3rd Qu.: 3139
Max. :201004 IL : 80816 Max. :16822
(Other):1906523
Soluzione
I think tapply is the function you're looking for:
tapply(d$weight, list(d$State, d$YYYYMM), sum)
Altri suggerimenti
First, reshape your dataframe into wide format:
require(reshape2)
df <- dcast(d, YYYYMM ~ State, value.var="Weight")
after that you can sum the values by month with:
aggregate(df[,-1], df$YYYYMM, FUN = sum)