Select a group of rows which uniquely identified by more than one columns

Question 1

this is easy using the data.table package:

library(data.table)
dt <-data.table(DF) # your DF
setkeyv(dt, c( "ProvinceID", "CityID", "House") )

dt[, list(HouseIncome = as.integer(sum(WorkingStatus)>0)), by=key(dt)]


   ProvinceID CityID House HouseIncome
 1:         10  10001     1           1
 2:         10  10002     1           0
 3:         20  20001     1           0
 4:         20  20002     1           1
 5:         20  20002     2           1
 6:         30  30001     1           0
 7:         30  30001     2           0
 8:         40  40001     1           1
 9:         40  40001     2           0
10:         50  50001     1           1

Very nice answer from @ChristianBorck, +1. Just couple of tips on improving it further.

setDT(DF)[, list(HouseIncome = any(WorkingStatus == 1L)*1L), 
                    by=list(ProvinceID, CityID, House)]

1) You can use setDT instead of as.data.table(.) or data.table(.), which'll convert your data.frame to data.table by reference (without copying) and therefore avoids unnecessary memory usage and is also therefore instant.

2) And, you can, but don't have to use setkey for aggregation/grouping, unless you really'd like to get the data sorted.

Question 2

It's quite easy with the plyr package (or any functions that offer split-apply-combine functionality):

library(plyr)
ddply(DF, .(ProvinceID, CityID, House), 
        summarise, HouseIncome=as.numeric(any(WorkingStatus==1)))
#    ProvinceID CityID House HouseIncome
# 1          10  10001     1           1
# 2          10  10002     1           0
# 3          20  20001     1           0
# 4          20  20002     1           1
# 5          20  20002     2           1
# 6          30  30001     1           0
# 7          30  30001     2           0
# 8          40  40001     1           1
# 9          40  40001     2           0
# 10         50  50001     1           1

Question 3

To complete the set, here's an answer with dplyr. First, I'll create the data a safer way - you should never use cbind() to make data frames because it coerces all inputs to the same type:

df <- data.frame(
  ProvinceID = c(10, 10, 10, 20, 20, 20, 30, 30, 40, 40, 40, 40, 50),
  CityID = c(10001, 10001, 10002, 20001, 20002, 20002, 30001, 30001, 40001, 40001, 40001, 40001, 50001),
  House = c(0001, 0001, 0001, 0001, 0001, 0002, 0001, 0002, 0001, 0001, 0001, 0002, 0001),
  Person = c(000101, 000102, 000101, 000101, 000101, 000101, 000101, 000101, 000101, 000102, 000103, 000101, 000101),
  WorkingStatus = c(1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1)
)

With dplyr, you use group_by() to set up the grouping, and mutate() to add a new column. I think you're better off leaving the variable as a logical vector, rather than converting it to 0/1.

library(dplyr)
df %.% 
  group_by(ProvinceID, CityID, House) %.%
  mutate(HouseIncome = any(WorkingStatus == 1))
#> Source: local data frame [13 x 6]
#> Groups: ProvinceID, CityID, House
#> 
#>    ProvinceID CityID House Person WorkingStatus HouseIncome
#> 1          10  10001     1    101             1        TRUE
#> 2          10  10001     1    102             0        TRUE
#> 3          10  10002     1    101             0       FALSE
#> 4          20  20001     1    101             0       FALSE
#> 5          20  20002     1    101             1        TRUE
#> 6          20  20002     2    101             1        TRUE
#> 7          30  30001     1    101             0       FALSE
#> 8          30  30001     2    101             0       FALSE
#> 9          40  40001     1    101             1        TRUE
#> 10         40  40001     1    102             1        TRUE
#> 11         40  40001     1    103             0        TRUE
#> 12         40  40001     2    101             0       FALSE
#> 13         50  50001     1    101             1        TRUE

Question 4

Something like this perhaps, which will return a True/False results instead of the 1/0 that you desire -

library(data.table) ## >= 1.9.2
setDT(DF)[, list(HouseIncome = sum(WorkingStatus) > 0), 
                       by = list(ProvinceID,CityID,House)]

#    ProvinceID CityID House HouseIncome
#  1:         10  10001     1       FALSE
#  2:         10  10002     1       FALSE
#  3:         20  20001     1       FALSE
#  4:         20  20002     1       FALSE
#  5:         20  20002     2       FALSE
#  6:         30  30001     1       FALSE
#  7:         30  30001     2       FALSE
#  8:         40  40001     1        TRUE
#  9:         40  40001     2       FALSE
# 10:         50  50001     1       FALSE