Question

Since upgrading from R 3.0.3 to 3.1.0 I have had troubles with read.csv as something seems to have changed in the underlying behaviour of read.table.

More precisely, I have a lot of CSV files that were once written using numpy. In general, these CSV files contain nothing more than a few columns of real values, e.g.:

foo,bar,baz
 1.162372390042962556e+00, 2.578863142444774326e+00, 9.740731078696458098e+02
-1.162361054912456337e+00, 6.006949912541799108e-01, 9.740731078696458098e+02
 1.327779088525234963e+00, 2.448484270423362030e+00, 9.664414899055957449e+02

Up to R 3.0.3, everything worked just fine when reading these files. Now I get this:

> tmp <- read.csv("foo.csv")
> str(tmp)
'data.frame':   3 obs. of  3 variables:
 $ foo: Factor w/ 3 levels " 1.162372390042962556e+00",..: 1 3 2
 $ bar: Factor w/ 3 levels " 2.448484270423362030e+00",..: 2 3 1
 $ baz: Factor w/ 2 levels " 9.664414899055957449e+02",..: 2 2 1

Will I have to change all of my codebase? Or is this merely a bug in 3.1.0?

Was it helpful?

Solution 2

The NEWS file explains a change to the default behaviour for unrepresentable decimal numbers:

type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.

If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be "numeric".

Your numbers have 18 decimal places, doubles can only accurately represent about 15.

OTHER TIPS

This is not a bug and yes, you have to change your code.

From the CRAN website:

type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.

If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be "numeric".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top