Вопрос

I'm importing a large .csv file into R (about 0.5 million rows), so I've been trying to use fread() from the data.table package as a faster alternative to read.table() and read.csv(). However, fread() returns a data frame with all of the data from the rows inside one row, even though it has the correct number of columns. I found a bug report from 2013 showing this is related to the integer64 data class:

http://r-forge.r-project.org/tracker/index.php?func=detail&aid=2786&group_id=240&atid=975

Are there any fixes or ways to get around this?

The .csv file I'm trying to read is entirely integers ranging from 0 - 10000, with no missing data. I'm using R version 2.15.2 on a Windows 7 computer, with version 1.8.8 of the data.table package.

The code I'm running is:

require(data.table)
fread("pre2012_alldatapoints.csv", sep = ",", header= TRUE)-> pre
head(pre)

1: 1 22 -105 22 -105
2: 2 22 -105 22 -105
3: 3 20 -105 20 -105
4: 4 21 -105 21 -105
5: 5 21 -105 21 -105
6: 6 21 -105 21 -105

dim(pre)
[1] 12299  5 #dim returns the correct number of dimensions
#this is a subset of the file I want to import that I've confirmed imports correctly with read.csv
   
pre[,1]
[1] 1 #but trying to print a column returns this

length(pre[,1])
[1] 1 #and length for any column returns a row length of 1
Это было полезно?

Решение

fread creates a data.table. The data.table package comes with a number of vignettes; see the website to read more: https://rdatatable.gitlab.io/data.table/

Actually since this question was posted, the internals of data.table have changed such that pre[ , 1] now works as originally expected.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top