Question

I have two dataframes. In the first one, i have some threshold values (FROM and TO) for each GROUP.

FROM TO GROUP
1   99  1
100 199 2
200 399 3

In the second dataframe, I have some values in column X and I would like to assign the corresponding group to each value .

X       
50  
150     
250

I would like to obtain the following output:

 X  GROUP   
50  1
150 2   
250 3

I have managed to do it by using a for loop, but my real dataframe has more than 200.000 rows, so it takes a lot of time and I also have to repeat this operation several times.

Any help would be appreciated. Thank you!

Was it helpful?

Solution

Assuming your FROM/TO values leave no gaps and never overlap and all your x values are integers and that FROM is sorted -- this should work nicely.

dd<-data.frame(
    FROM=c(1,100,200),
    TO=c(99,199,399),
    GROUP=c(1,2,3)
)
x <- c(50,150,250,20, 350,110)
g <- dd$GROUP[findInterval(x, dd$FROM)]
cbind(x,g)

See ?findInterval for more information. It's a useful function in situations like this. You might also be interested in something like cut.

OTHER TIPS

As mentioned by MrFlick, here's a solution that uses cut.

range.df <- data.frame(FROM=c(1,100,200),
                       TO=c(99,199,399),
                       GROUP=c(1,2,3))

value.df <- data.frame(ROW=c(1,2,3,4,5,6,7),
                       X=c(50,150,250,100,90,300,275))

cbind(value.df,GROUP=cut(x=value.df$X,
                         breaks=c(range.df$FROM,max(range.df$TO)),
                         labels=range.df$GROUP,
                         right=FALSE))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top