Domanda

I noticed that frequency tables created by data.table in R seem not to distinguish between very small numbers and zero? Can I change this behavior or is this a bug?

Reproducible example:

>library(data.table)   
DT <- data.table(c(0.0000000000000000000000000001,2,9999,0))    
test1 <- as.data.frame(unique(DT[,V1]))   
test2 <-  DT[, .N, by = V1] 

As you can see, the frequency table (test2) will not recognize the differences between 0.0000000000000000000000000001 and 0 and put both observations in the same class.

Data.table version: 1.8.10
R: 3.02

È stato utile?

Soluzione

It is worth reading R FAQ 7.31 and thinking about the accuracy of floating point represenations.

I can't reproduce this in the current cran version (1.9.2). using

R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)

My guess that the change in behaivour will be related to this news item.

o Numeric data is still joined and grouped within tolerance as before but instead of tolerance being sqrt(.Machine$double.eps) == 1.490116e-08 (the same as base::all.equal's default) the significand is now rounded to the last 2 bytes, apx 11 s.f. This is more appropriate for large (1.23e20) and small (1.23e-20) numerics and is faster via a simple bit twiddle. A few functions provided a 'tolerance' argument but this wasn't being passed through so has been removed. We aim to add a global option (e.g. 2, 1 or 0 byte rounding) in a future release.


Update from Matt

Yes this was a deliberate change in v1.9.2 and data.table now distinguishes 0.0000000000000000000000000001 from 0 (as user3340145 rightly thought it should) due to the improved rounding method highlighted above from NEWS.

I've also added the for loop test from Rick's answer to the test suite.

Btw, #5369 is now implemented in v1.9.3 (although neither of these are needed for this question) :

o bit64::integer64 now works in grouping and joins, #5369. Thanks to James Sams for highlighting UPCs.

o New function setNumericRounding() may be used to reduce to 1 byte or 0 byte rounding when joining to or grouping columns of type 'numeric', #5369. See example in ?setNumericRounding and NEWS item from v1.9.2. getNumericRounding() returns the current setting.

Notice that rounding is now (as from v1.9.2) about the accuracy of the significand; i.e. the number of significant figures. 0.0000000000000000000000000001 == 1.0e-28 is accurate to just 1 s.f., so the new rounding method doesn't group this together with 0.0.

In short, the answer to the question is : upgrade from v1.8.10 to v1.9.2 or greater.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top