Question

I have 2 dataframes. Truncated examples of these dataframes are:

dataSC_ds

SURVEY_DATE SITE
2012-07-01  Site 1
2012-08-10  Site 2
2012-09-15  Site 1
2012-09-20  Site 1
...

and

dataSC

 SURVEY_DATE  FISHING_SITE  DATA_COLLECTION_SITE  SHIFT  TIME_BLOCK
 2012-07-01   Site 1                              AM     9
 2012-07-01                 Site 1                AM     9
 2012-07-02   Site 2                              AM     11
 2012-07-02                 Site 2                AM     11
 2012-07-15   Site 3                              PM     15
 2012-07-15                 Site 3                PM     15
 2012-08-10   Site 2                              PM     16
 2012-08-10                 Site 2                PM     16
 2012-08-20   Site 2                              AM     11
 2012-08-20                 Site 2                AM     11
 2012-09-15   Site 1                              AM     9
 2012-09-15                 Site 1                AM     9
 2012-09-15   Site 1                              AM     10
 2012-09-15                 Site 1                AM     10
 2012-09-20   Site 1                              PM     13
 2012-09-20                 Site 1                PM     13
 2012-09-20   Site 3                              PM     15
 2012-09-20                 Site 3                PM     15
 ...

I would like to subset dataSC to retain the rows with the combinations of date & site that are in dataSC_ds. The complicated part is that given

2012-07-01  Site 1

in dataSC_ds, I would like to retain in dataSC the rows with 2012-07-01 in which Site 1 is either a FISHING_SITE or a DATA_COLLECTION_SITE.

Please let me know if you have any ideas about how I can do this. Thanks in advance.

Was it helpful?

Solution

It seems that your data has some redundancy. Can't the fishing and the collection sites be in the same observation? Nevertheless, you can use mapply to subset the data with all the different sites.

# make a function to subset the data
select <- function(x, y) dataSC[dataSC$SURVEY_DATE== y & (dataSC$FISHING_SITE==x | dataSC$DATA_COLLECTION_SITE==x), ]

#apply the function with all elements of dataSC_ds$SITE
subsets <- mapply(select, x=dataSC_ds$SITE, y=dataSC_ds$SURVEY_DATE, SIMPLIFY=FALSE)

#name the data.frames of the list with data and site
subsets <- setNames(subsets, paste(dataSC_ds$SURVEY_DATE, dataSC_ds$SITE))

This will give you a list with all the subsets:

subsets

$`2012-07-01 Site 1`
  SURVEY_DATE FISHING_SITE DATA_COLLECTION_SITE SHIFT TIME_BLOCK
1  2012-07-01       Site 1                         AM          9
2  2012-07-01                            Site 1    AM          9

$`2012-08-10 Site 2`
  SURVEY_DATE FISHING_SITE DATA_COLLECTION_SITE SHIFT TIME_BLOCK
7  2012-08-10       Site 2                         PM         16
8  2012-08-10                            Site 2    PM         16

$`2012-09-15 Site 1`
   SURVEY_DATE FISHING_SITE DATA_COLLECTION_SITE SHIFT TIME_BLOCK
11  2012-09-15       Site 1                         AM          9
12  2012-09-15                            Site 1    AM          9
13  2012-09-15       Site 1                         AM         10
14  2012-09-15                            Site 1    AM         10

$`2012-09-20 Site 1`
   SURVEY_DATE FISHING_SITE DATA_COLLECTION_SITE SHIFT TIME_BLOCK
15  2012-09-20       Site 1                         PM         13
16  2012-09-20                            Site 1    PM         13
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top