Python set from file memory optimization

https://stackoverflow.com/questions/21968152

15-10-2022
|

Frage

I am very new to python. In a python script I need to check if input string is present in the set 'titles'; which I load from newline separated strings in files 'titles'. It consumes huge memory. I chose to store in set because there is if inputstring in titles: later on.

Line #    Mem usage    Increment   Line Contents
================================================
     1    6.160 MiB    0.000 MiB   @profile
     2                             def loadtitles():
     3  515.387 MiB  509.227 MiB     titles = open('titles').read().split()
     4  602.555 MiB   87.168 MiB     titles = set(titles)

Q1. Is there any other object type more memory efficient to store this large data?

One solution I can come up with is if I load file as string, it consumes exactly the same memory as filesize; which is 100% optimal consumption of memory.

Line #    Mem usage    Increment   Line Contents
================================================
     1    6.160 MiB    0.000 MiB   @profile
     2                             def loadtitles():
     3  217.363 MiB  211.203 MiB     titles = open('titles').read()

then I can do if inputstring+'\n' in titles:

Q2. Is there a faster alternative to this?

Lösung

You can either:

use a key/value store if you lookup lots of keys.
iterate over the file line by line and check for keys' existence if there are only a few keys to lookup.

Andere Tipps

Iterating file (processing line by line) instead of reading full contents of file will reduce memory consumption. (combining with generator expression):

def loadtitles():
    with open('titles') as f:
        titles = {word for line in f for word in line.split()}

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow