Is it possible to read selected rows from an extremely huge dataset in R

https://stackoverflow.com/questions/23453205

15-07-2023
|

Question

I am running into a big problem with importing data into R. The thing is the original dataset is over 5GB, which in no way I can read in my laptop with 4GB RAM in total. There are unknown number of rows in the dataset (at least thousands of rows). I was wondering if I could select say the first 2000 rows to load into R so the I can still fit the data into my working memory?

Solution

As Scott mentioned, you can limit the number of rows read from a text file with the nrows to read.table (and its variants like read.csv).

You can use this in conjunction with the skip argument to read later chunks in the dataset.

my_file <- "my file.csv"
chunk <- 2000
first <- read.csv(my_file, nrows = chunk)
second <- read.csv(my_file, nrows = chunk, skip = chunk)
third <- read.csv(my_file, nrows = chunk, skip = 2 * chunk)

You may also want to read the "Large memory and out-of-memory data" section of the high-performance computing task view.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow