Question

In the console, go ahead and try

> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0

For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,

> 100000 == "100000"
FALSE

Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!

Was it helpful?

Solution

Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.

as.character(100000)
# [1] "1e+05"

Here, from ?Comparison, are R's rules for applying relational operators to values of different types:

If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.

Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")

So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):

as.character(100000)=="100000"
# [1] FALSE
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top