Question

I have a dataset stored in text file, it is of 997 columns, 45000 rows. All values are double values except row names and column names. I use R studio with read.table command to read the data file, but it seems taking hours to do it. Then I aborted it.

Even I use Excel to open it, it takes me 2 minutes.

R Studio seems lacking of efficiency in this task, any suggestions given how to make it faster ? I dont want to read the data file all the time ?

I plan to load it once and store it in Rdata object, which can make the loading data faster in the future. But the first load seems not working.

I am not a computer graduate, any kind help will be well appreciated.

Was it helpful?

Solution

I recommend data.table although you will end up with a data table after this. If you choose not to use the data table, you can simply convert back to a normal data frame.

require(data.table)
data=fread('yourpathhere/yourfile')

OTHER TIPS

As documented in the ?read.table help file there are three arguments that can dramatically speed up and/or reduce the memory required to import data. First, by telling read.table what kind of data each column contains you can avoid the overhead associated with read.table trying to guess the type of data in each column. Secondly, by telling read.table how many rows the data file has you can avoid allocating more memory than is actually required. Finally, if the file does not contain comments, you can reduce the resources required to import the data by telling R not to look for comments. Using all of these techniques I was able to read a .csv file with 997 columns and 45000 rows in under two minutes on a laptop with relatively modest hardware:

tmp <- data.frame(matrix(rnorm(997*45000), ncol = 997))
write.csv(tmp, "tmp.csv", row.names = FALSE)

system.time(x <- read.csv("tmp.csv", colClasses="numeric", comment.char = ""))
#   user  system elapsed 
#115.253   2.574 118.471

I tried reading the file using the default read.csv arguments, but gave up after 30 minutes or so.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top