I believe this can be accomplished with a combination of two rolling joins, a feature of data.table
.
Define both datasets as data.table
s and set the keys for matching them by region start (lower bound). This way, each color in df2
will be matched to the next start in df1
that is smaller.
df1 <- data.table(df1, key='region,start')
df2 <- data.table(df2, key='region,start')
df.start <- df1[df2, roll=T, allow.cartesian=TRUE]
We do the same thing for the end, but we reverse the direction in which the match is made (next largest upper end of spectrum)
setkey(df1, region, end) ## reset the keys
setkey(df2, region, end)
df.end <- df1[df2, roll=-Inf, allow.cartesian=TRUE]
The solution you want is the intersection between the two datasets. This can be found by inner join (in database terms). We first need to set the keys so that they identify each combo uniquely.
setkey(df.start, sub_region, refID)
setkey(df.end, sub_region, refID)
df.start[df.end, list(colorDescrip), nomatch=0]
The last line returns the result you want, and you can save that in d3
. The syntax can appear a bit cryptic if you have never seen it before, but data.table
is worthwhile looking into.
Edit: Noticed part about region
matching and updated code to reflect that.