Filtering a data frame on a vector [duplicate]

https://stackoverflow.com/questions/9350025

27-10-2019
|

Question

This question already has an answer here:

Filter data.frame rows by a logical condition 8 answers

I have a data frame df with an ID column eg A,B,etc. I also have a vector containing certain IDs:

L <- c("A", "B", "E")

How can I filter the data frame to get only the IDs present in the vector? Individually, I would use

subset(df, ID == "A")

but how do I filter on a whole vector?

Solution

You can use the %in% operator:

> df <- data.frame(id=c(LETTERS, LETTERS), x=1:52)
> L <- c("A","B","E")
> subset(df, id %in% L)
   id  x
1   A  1
2   B  2
5   E  5
27  A 27
28  B 28
31  E 31

If your IDs are unique, you can use match():

> df <- data.frame(id=c(LETTERS), x=1:26)
> df[match(L, df$id), ]
  id x
1  A 1
2  B 2
5  E 5

or make them the rownames of your dataframe and extract by row:

> rownames(df) <- df$id
> df[L, ]
  id x
A  A 1
B  B 2
E  E 5

Finally, for more advanced users, and if speed is a concern, I'd recommend looking into the data.table package.

OTHER TIPS

I reckon you need to use 'match'. It matches the values in one vector to the values in another vector, and gives NA where there's no match. So then you subset based on !is.na of the match.

See ?match and you can probably work it out for yourself, in which case you'll learn more than from the exact answer someone will do shortly which will just encourage you to cut n paste :)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow