Question

so the story is i have a 30 gig txt file that need to be read into R, it contains two cols and about 2 billion rows of integers! I dont want to load the whole thing in one go, sizeable chunks will suffice.

I've tried using read.table with arguments like nrow = 10000000 and skip = "stupidly_large_number"

but i get the following error when i get far through the file

Error in readLines(file, skip):
    cannot allocate vector of length 1800000000

Please help me get at the data and thanks in advance!

Was it helpful?

Solution

it seems to me that you may need to split the text file into manageable chunks first before trying to process them. The unix split command should do the trick, but I don't know if you're on a platform that command exists on.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top