this is easy using the data.table
package:
library(data.table)
dt <-data.table(DF) # your DF
setkeyv(dt, c( "ProvinceID", "CityID", "House") )
dt[, list(HouseIncome = as.integer(sum(WorkingStatus)>0)), by=key(dt)]
ProvinceID CityID House HouseIncome
1: 10 10001 1 1
2: 10 10002 1 0
3: 20 20001 1 0
4: 20 20002 1 1
5: 20 20002 2 1
6: 30 30001 1 0
7: 30 30001 2 0
8: 40 40001 1 1
9: 40 40001 2 0
10: 50 50001 1 1
Very nice answer from @ChristianBorck, +1. Just couple of tips on improving it further.
setDT(DF)[, list(HouseIncome = any(WorkingStatus == 1L)*1L),
by=list(ProvinceID, CityID, House)]
1) You can use setDT
instead of as.data.table(.)
or data.table(.)
, which'll convert your data.frame
to data.table
by reference (without copying) and therefore avoids unnecessary memory usage and is also therefore instant.
2) And, you can, but don't have to use setkey
for aggregation/grouping, unless you really'd like to get the data sorted.