Question

In R I am finding some odd behaviour that I can't explain and I am hoping someone here can. I believe that the value of 100! is this big number.

A few lines from the console showing expected behaviour...

>factorial( 10 )
[1] 3628800
>prod( 1:10 )
[1] 3628800
> prod( as.double(1:10) )
[1] 3628800
> cumprod( 1:10 )
[1]       1       2       6      24     120     720    5040   40320  362880 3628800

However when I try 100! I get (notice how the resulting numbers begin to differ about 14 digits in):

> options(scipen=200) #set so the whole number shows in the output
> factorial(100)
[1] 93326215443942248650123855988187884417589065162466533279019703073787172439798159584162769794613566466294295348586598751018383869128892469242002299597101203456
> prod(1:100)
[1] 93326215443944102188325606108575267240944254854960571509166910400407995064242937148632694030450512898042989296944474898258737204311236641477561877016501813248
> prod( as.double(1:100) )
[1] 93326215443944150965646704795953882578400970373184098831012889540582227238570431295066113089288327277825849664006524270554535976289719382852181865895959724032
> all.equal( prod(1:100) , factorial(100) , prod( as.double(1:100) ) )
[1] TRUE

If I do some tests against a variable set to the 'known' number of 100! then I see the following:

# This is (as far as I know) the 'true' value of 100!
> n<- as.double(93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000)
> factorial(100) - n
[1] -1902315522848807765998160811905210717565551993186466795054798772271710903343294674760811531554315419925519536152107160826913610179566298858520576
> prod(1:100) - n
[1] -48777321098687378615337456715518223527321845979140174232174327494146433419058837814379782860367062049372295798771978482741374619988879457910784
> prod(as.double(1:100)) - n
[1] 0

The final result evaluates to zero, but the number returned for prod( as.double( 1:100 ) ) does not display as I would expect, even though it correctly evaluates prod( as.double( 1:100 ) ) - n where n is a variable set to the value of 100!.

Can anyone explain this behaviour to me please? It should not be related to overflow etc as far as I am aware, as I am using a x64 system. Version and machine info below:

> .Machine$double.xmax
[1] 1.798e+308
> str( R.Version() )
List of 14
 $ platform      : chr "x86_64-apple-darwin9.8.0"
 $ arch          : chr "x86_64"
 $ os            : chr "darwin9.8.0"
 $ system        : chr "x86_64, darwin9.8.0"
 $ status        : chr ""
 $ major         : chr "2"
 $ minor         : chr "15.2"
 $ year          : chr "2012"
 $ month         : chr "10"
 $ day           : chr "26"
 $ svn rev       : chr "61015"
 $ language      : chr "R"
 $ version.string: chr "R version 2.15.2 (2012-10-26)"
 $ nickname      : chr "Trick or Treat"

Can anyone explain this to me? I don't doubt that R does everything correctly and this is most likely useR related. You might point out that since prod( as.double( 1:100 ) ) - n evaluates correctly what am I bothered about, but I am doing Project Euler Problem 20 so I needed the correct digits displayed.

Thanks

Was it helpful?

Solution 2

Your test with all.equal does not produce what you expect. all.equal can only compare two values. The third argument is positionally matched to tolerance, which gives the tolerance of the comparison operation. In your invocation to all.equal you give it a tolerance of 100! which definitely leads to the comparison being true for absurdly differing values:

> all.equal( 0, 1000000000, prod(as.double(1:100)) )
[1] TRUE

But even if you give it two arguments only, e.g.

all.equal( prod(1:100), factorial(100) )

it would still produce TRUE because the default tolerance is .Machine$double.eps ^ 0.5, e.g. the two operands have to match to about 8 digits which is definitely the case. On the other hand, if you set the tolerance to 0, then neither three possible combinations emerge equal from the comparison:

> all.equal( prod(1:100), factorial(100), tolerance=0.0 )
[1] "Mean relative difference: 1.986085e-14"
> all.equal( prod(1:100), prod( as.double(1:100) ), tolerance=0.0 )
[1] "Mean relative difference: 5.22654e-16"
> all.equal( prod(as.double(1:100)), factorial(100), tolerance=0.0 )
[1] "Mean relative difference: 2.038351e-14"

Also note that just because you've told R to print 200 significant numbers doesn't mean that they are all correct. Indeed, 1/2^53 has about 53 decimal digits but only the first 16 are considered meaningful.

This also makes your comparison to the "true" value flawed. Observe this. The ending digits in what R gives you for factorial(100) are:

...01203456

You subtract n from it, where n is the "true" value of 100! so it should have 24 zeroes at the end and hence the difference should also end with the same digits that factorial(100) does. But rather it ends with:

...58520576

This only shows that all those digits are non-significant and one should not really look into their value.

It takes 525 bits of binary precision in order to exactly represent 100! - that's 10x the precision of double.

OTHER TIPS

This has to do not with the maximum value for a double but with its precision.

100! has 158 significant (decimal) digits. IEEE doubles (64 bit) have 52 bits of storage space for the mantissa, so you get rounding errors after about 16 decimal digits of precision have been exceeded.

Incidentally, 100! is in fact, as you suspected,

93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

so all of the values R calculated are incorrect.

Now I don't know R, but it seems that all.equal() converts all three of those values to floats before comparing, and so their differences are lost.

I will add a third answer just to graphically describe the behaviour you are encountering. Essentially, the double precision for factorial calculation is sufficient up to 22!, then it starts diverging more and more from the real value.

Around the 50!, there is a further distinction between the two methods factorial(x) and prod(1:x), with the latter yielding, as you indicated, values more similar to the "real" factor.

Factorial calculation precision in R

Code attached:

# Precision of factorial calculation (very important for the Fisher's Exact Test)
library(gmp)
perfectprecision<-list()
singleprecision<-c()
doubleprecision<-c()
for (x in 1:100){
    perfectprecision[x][[1]]<-factorialZ(x)
    singleprecision<-c(singleprecision,factorial(x))
    doubleprecision<-c(doubleprecision,prod(1:x))
}


plot(0,col="white",xlim=c(1,100),ylim=c(0,log10(abs(doubleprecision[100]-singleprecision[100])+1)),
        ,ylab="Log10 Absolute Difference from Big Integer",xlab="x!")
for(x in 1:100) {
    points(x,log10(abs(perfectprecision[x][[1]]-singleprecision[x])+1),pch=16,col="blue")
    points(x,log10(abs(perfectprecision[x][[1]]-doubleprecision[x])+1),pch=20,col="red")
}
legend("topleft",col=c("blue","red"),legend=c("factorial(x)","prod(1:x)"),pch=c(16,20))

Well, you can tell from the body of factorial that it calls gamma, which calls .Primitive("gamma"). What does .Primitive("gamma") look like? Like this.

For large inputs, .Primitive("gamma")'s behaviour is on line 198 of that code. It's calling

exp((y - 0.5) * log(y) - y + M_LN_SQRT_2PI +
            ((2*y == (int)2*y)? stirlerr(y) : lgammacor(y)));

which is just an approximation.


By the way, the article on Rmpfr uses factorial as its example. So if you're trying to solve the problem, "just use the Rmpfr library".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top