Lookup table with the query with arbitrary length without using a for loop in R

Question 1

For a true lookup table, the result should be the length of the query and also deal with replication in the query. The approaches using match(...) are the only ones that do this:

query4 <- c("jack","sam", "dan","sam","jack")
dt[match(query4,dt$name),]$age
# [1] 20 28 13 28 20

This is because match(LHS,RHS) returns an integer vector of length(LHS) which contains the row numbers of the RHS which match the corresponding element of LHS.

The approaches based on comparison (==) will generally not work. This s because, when comparing two vectors, R tries to replicate the shorter one however many times are needed to make it the same length as the longer one, and then does an element-by-element comparison. So in the case of dt$name==query1, for example, the RHS gets replicated twice and the comparison is between c("jack","jill","sam","dan") and c("jack","dan","jack","dan").

dt$name==query1   # RHS is: c("jack","dan","jack","dan")
# [1]  TRUE FALSE FALSE  TRUE
dt$name==query2   # RHS is: c("sam","sam","sam","sam")
# [1] FALSE FALSE  TRUE FALSE
dt$name==query3   # RHS is: c("jack","sam", "dan","jack") with warning
# [1]  TRUE FALSE FALSE FALSE
# with warning:   longer object length is not a multiple of shorter object length

On the other hand, using LHS %in% RHS gives a result with length(LHS) and either T or F depending on whether that element is present in RHS.

dt$name %in% query1
# [1]  TRUE FALSE FALSE  TRUE
query1 %in% dt$name
# [1] TRUE TRUE

Note that it looks like df$name %in% query1 and df$name==query1 give the same result, but that's an artifact of query1 being replicated twice in the latter comparison. See, for example:

dt$name %in% query3
# [1]  TRUE FALSE  TRUE  TRUE
dt$name  ==  query3
# [1]  TRUE FALSE FALSE FALSE

Question 2

You want %in%, it returns of logical vector that is used to subset the data frame

dt[dt$name %in% query3,"age"]

Question 3

There are a lot of ways to do this, but I'll throw out one that I find useful. match(). @jlhoward's answer goes into more detail and explains why my previous == examples were wrong.

> match(query1, dt$name) #these give us the index of the *first* matching value
[1] 1 4
> match(query2, dt$name)
[1] 3

> dt$age[match(query1, dt$name)]
[1] 20 13
> dt$age[match(query2, dt$name)]
[1] 28

You can also use %in% unlike match this returns TRUE and FALSE for the elements that exist in the comparison (be sure to get the order right for, dt$name %in% query1 returns TRUE FALSE FALSE TRUE, query1 %in% dt$name returns TRUE TRUE)

> dt[dt$name %in% query1, ][,'age',]
[1] 20 13

With dplyr you can use filter

> require(dplyr)
> filter(dt, name %in% query1)
  name age
1 jack  20
2 dan  13
> filter(dt, name %in% query1)$age
[1] 20 13