For a true lookup table, the result should be the length of the query and also deal with replication in the query. The approaches using match(...)
are the only ones that do this:
query4 <- c("jack","sam", "dan","sam","jack")
dt[match(query4,dt$name),]$age
# [1] 20 28 13 28 20
This is because match(LHS,RHS)
returns an integer vector of length(LHS) which contains the row numbers of the RHS which match the corresponding element of LHS.
The approaches based on comparison (==) will generally not work. This s because, when comparing two vectors, R tries to replicate the shorter one however many times are needed to make it the same length as the longer one, and then does an element-by-element comparison. So in the case of dt$name==query1
, for example, the RHS gets replicated twice and the comparison is between c("jack","jill","sam","dan")
and c("jack","dan","jack","dan")
.
dt$name==query1 # RHS is: c("jack","dan","jack","dan")
# [1] TRUE FALSE FALSE TRUE
dt$name==query2 # RHS is: c("sam","sam","sam","sam")
# [1] FALSE FALSE TRUE FALSE
dt$name==query3 # RHS is: c("jack","sam", "dan","jack") with warning
# [1] TRUE FALSE FALSE FALSE
# with warning: longer object length is not a multiple of shorter object length
On the other hand, using LHS %in% RHS
gives a result with length(LHS) and either T or F depending on whether that element is present in RHS.
dt$name %in% query1
# [1] TRUE FALSE FALSE TRUE
query1 %in% dt$name
# [1] TRUE TRUE
Note that it looks like df$name %in% query1
and df$name==query1
give the same result, but that's an artifact of query1
being replicated twice in the latter comparison. See, for example:
dt$name %in% query3
# [1] TRUE FALSE TRUE TRUE
dt$name == query3
# [1] TRUE FALSE FALSE FALSE