How to read file in reverse order in python3.2 without reading the whole file to memory? [duplicate]

StackOverflow https://stackoverflow.com/questions/22286332

  •  11-06-2023
  •  | 
  •  

سؤال

I am parsing log files in size of 1 to 10GB using python3.2, need to search for line with specific regex (some kind of timestamp), and I want to find the last occurance.

I have tried to use:

for line in reversed(list(open("filename")))

which resulted in very bad performance (in the good cases) and MemoryError in the bad cases.

In thread: Read a file in reverse order using python i did not find any good answer.

I have found the following solution: python head, tail and backward read by lines of a text file very promising, however it does not work for python3.2 for error:

NameError: name 'file' is not defined

I had later tried to replace File(file) with File(TextIOWrapper) as this is the object builtin function open() returns, however that had resulted in several more errors (i can elaborate if someone suggest this is the right way:))

هل كانت مفيدة؟

المحلول

This is a function that does what you're looking for

def reverse_lines(filename, BUFSIZE=4096):
    f = open(filename, "rb")
    f.seek(0, 2)
    p = f.tell()
    remainder = ""
    while True:
        sz = min(BUFSIZE, p)
        p -= sz
        f.seek(p)
        buf = f.read(sz) + remainder
        if '\n' not in buf:
            remainder = buf
        else:
            i = buf.index('\n')
            for L in buf[i+1:].split("\n")[::-1]:
                yield L
            remainder = buf[:i]
        if p == 0:
            break
    yield remainder

it works by reading a buffer from the end of the file (by default 4kb) and generating all the lines in it in reverse. It then moves back by 4k and does the same until the beginning of the file. The code may need to keep more than 4k in memory in case there are no linefeed in the section being processed (very long lines).

You can use the code as

for L in reverse_lines("my_big_file"):
   ... process L ...

نصائح أخرى

If you don't want to read the whole file you can always use seek. Here is a demo:

 $ cat words.txt 
foo
bar
baz
[6] oz123b@debian:~ $ ls -l words.txt 
-rw-r--r-- 1 oz123 oz123 12 Mar  9 19:38 words.txt

The file size is 12 bytes. You can skip to the last entry by moving the cursor 8 bites forward:

In [3]: w=open("words.txt")
In [4]: w.seek(8)
In [5]: w.readline()
Out[5]: 'baz\n'

To complete my answer, here is how you print these lines in reverse:

 w=open('words.txt')

In [6]: for s in [8, 4, 0]:
   ...:     _= w.seek(s)
   ...:     print(w.readline().strip())
   ...:     
baz
bar
foo

You will have to explore you file's data structure and the size of each line. Mine was quite simple, because it was meant to demonstrate the principle.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top