Pergunta

I have a very big text file that I cannot read in R with read.table() because of its huge size. I know that with readLines() function you may specify how many rows you want to import, but I would need to import one line at the time in a for loop, and save it in a new file or store in a vector/list/whatever...

So, something that in python would be:

myfile=open("myfile.txt",mode="r")
for line in myfile:
    line=line.strip()
    line=line.split("\t")
    print line

Is that possible with R?

Foi útil?

Solução

Give scan() a try. Using skip you can skip already read lines and using nlines you can specify the number of lines you would like to read. Then you can loop through the file.

> large <- 10000
> m <- matrix(sample(c(0,1),3*7,replace=TRUE), ncol=3)
> write.table(m, "test.txt")

> for(i in 0:large) {
+     l <- scan("test.txt", what = character(), skip = i, nlines = 1)
+     if(length(l) == 0) break
+     print (l)
+ }

Read 3 items
[1] "V1" "V2" "V3"
Read 4 items
[1] "1" "0" "1" "0"
Read 4 items
[1] "2" "0" "0" "0"
Read 4 items
[1] "3" "0" "0" "0"
Read 4 items
[1] "4" "0" "1" "1"
Read 4 items
[1] "5" "1" "1" "1"
Read 4 items
[1] "6" "1" "0" "1"
Read 4 items
[1] "7" "0" "0" "1"
Read 0 items

The code serves the purpose of illustrating how to apply scan() and how to know when you have to stop reading.

Outras dicas

While Яaffael's answer is enough, this a typical use case for package iterators.

With this package you iterate over the file, line by line, without really load all the data to memory. Just to show an example i will crack the Airlines data with this method. Get 1988 and follow this code:

> install.packages('iterators')
> library(iterators)
> con <- bzfile('1988.csv.bz2', 'r')

OK, now you have a connection to your file. Let's create a iterator:

> it <- ireadLines(con, n=1) ## read just one line from the connection (n=1)

Just to test:

> nextElem(it)

and you will see something like:

1 "1988,1,9,6,1348,1331,1458,1435,PI,942,NA,70,64,NA,23,17,SYR,BWI,273,NA,NA,0,NA,0,NA,NA,NA,NA,NA"

> nextElem(it) 

and you will see the next line, and so on.

If you want to read line by line till the end of the file you can use

> tryCatch(expr=nextElem(it), error=function(e) return(FALSE))

for example. When the file ends it return a logical FALSE.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top