Since upgrading from R 3.0.3 to 3.1.0 I have had troubles with read.csv as something seems to have changed in the underlying behaviour of read.table.

More precisely, I have a lot of CSV files that were once written using numpy. In general, these CSV files contain nothing more than a few columns of real values, e.g.:

foo,bar,baz
 1.162372390042962556e+00, 2.578863142444774326e+00, 9.740731078696458098e+02
-1.162361054912456337e+00, 6.006949912541799108e-01, 9.740731078696458098e+02
 1.327779088525234963e+00, 2.448484270423362030e+00, 9.664414899055957449e+02

Up to R 3.0.3, everything worked just fine when reading these files. Now I get this:

> tmp <- read.csv("foo.csv")
> str(tmp)
'data.frame':   3 obs. of  3 variables:
 $ foo: Factor w/ 3 levels " 1.162372390042962556e+00",..: 1 3 2
 $ bar: Factor w/ 3 levels " 2.448484270423362030e+00",..: 2 3 1
 $ baz: Factor w/ 2 levels " 9.664414899055957449e+02",..: 2 2 1

Will I have to change all of my codebase? Or is this merely a bug in 3.1.0?

有帮助吗?

解决方案 2

The NEWS file explains a change to the default behaviour for unrepresentable decimal numbers:

type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.

If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be "numeric".

Your numbers have 18 decimal places, doubles can only accurately represent about 15.

其他提示

This is not a bug and yes, you have to change your code.

From the CRAN website:

type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.

If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be "numeric".

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top