Question

I need to mergeSort text files which are about 150 MB each, and together will amount to about 5GB

The problem is that i can't use mergesort using readlines(), since the last step would need to load 5GB into the memory, and with only the

for line1 in file1, line2 in file2:
    while( line1 & line2 )...

command, i can't tell python to only get the next line of file 1, and keep the line of file 2, and thus are unable to make a merge sort

i read something about setting the readbuffer really low on readlines(), only loading a single line into the memory, but then i can't delete the first line from the file

is there any other memory efficient way to get only the first line of a file and deleting it, or is there an available function to mergesort two text files somewhere allready?

Was it helpful?

Solution

command, i can't tell python to only get the next line of file 1, and keep the line of file 2, and thus are unable to make a merge sort

No you can.

line1 = file1.readline()
line2 = file2.readline()
while file1_not_at_end and file2_not_at_end:
    if line1 < line2:
        file3.write(line1)
        line1 = file1.readline()
    else:
        file3.write(line2)
        line2 = file2.readline()

 # merge file 1 into file 3
 # merge file 2 into file 3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top