Understanding .I in data.table in R

Question

I was playing around with data.table and I came across a distinction that I'm not sure I quite understand. Given the following dataset:

library(data.table)

set.seed(400)
DT <- data.table(x = sample(LETTERS[1:5], 20, TRUE), key = "x"); DT

Can you please explain to me the difference between the following expressions?

1) DT[J("E"), .I]

2) DT[ , .I[x == "E"] ]

3) DT[x == "E", .I]

Solution

set.seed(400)
library(data.table)

DT <- data.table(x = sample(LETTERS[1:5], 20, TRUE), key = "x"); DT

DT[  , .I[x == "E"] ] # [1] 18 19 20

is a data.table where .I is a vector representing the row number of E in the ORIGINAL dataset DT

DT[J("E")  , .I]   # [1] 1 2 3

DT["E"     , .I]   # [1] 1 2 3

DT[x == "E", .I]   # [1] 1 2 3

are all the same, producing a vector where .Is are vectors representing the row numbers of the Es in the NEW subsetted data

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow