NA/NaN/Inf in data.table 1.9.2

https://stackoverflow.com/questions/22137486

r
data.table

19-10-2022
|

Question

After checking the new feature of data.table 1.9.2, I'm not quite clear about the new feature of manipulation of NA/NaN/Inf.

The news:

NA, NaN, +Inf and -Inf are now considered distinct values, may be in keys, can be joined to and can be grouped. data.table defines: NA < NaN < -Inf

I don't know what does it mean by "can be joined to and can be grouped"

DT <- data.table(A=c(NA,NA,1:3), B=c("a",NA,letters[1:3]))

Now we have NAs in both column A and B,

But I'm lost a little how to proceed, and what the purpose of this new feature is. Could you provide an example to illustrate this?

Thanks a lot!

Solution

In previous versions of data.table NA, NaN,Inf values could exist in the key, but you could not join or use binary scan to select these rows in a consistent manner with other key values.

See Select NA in a data.table in R and data.table subsetting by NaN doesn't work for examples of SO questions that deal with these issues (and you can trace the history through the answers to Feature requests within the data.table project)

Now, in 1.9.2 (and above) such things will work.

# an example data set
DT <- data.table(A = c(NA,NaN,Inf,Inf,-Inf,NA,NaN,1,2,3), 
              B =letters[1:10], key = 'A')
# selection using binary search
DT[.(Inf)]
#     A B
# 1: Inf c
# 2: Inf d
DT[.(-Inf)]
#       A B
# 1: -Inf e
# note that you need to use the right kind of NA
DT[.(NA_real_)]
#     A B
# 1: NA a
# 2: NA f
DT[.(NaN)]
#      A B
# 1: NaN b
# 2: NaN g
# grouping works
DT[,.N,by=A]
#       A N
# 1:   NA 2
# 2:  NaN 2
# 3: -Inf 1
# 4:    1 1
# 5:    2 1
# 6:    3 1
# 7:  Inf 2

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow