質問

I am very new to python. In a python script I need to check if input string is present in the set 'titles'; which I load from newline separated strings in files 'titles'. It consumes huge memory. I chose to store in set because there is if inputstring in titles: later on.

Line #    Mem usage    Increment   Line Contents
================================================
     1    6.160 MiB    0.000 MiB   @profile
     2                             def loadtitles():
     3  515.387 MiB  509.227 MiB     titles = open('titles').read().split()
     4  602.555 MiB   87.168 MiB     titles = set(titles)

Q1. Is there any other object type more memory efficient to store this large data?

One solution I can come up with is if I load file as string, it consumes exactly the same memory as filesize; which is 100% optimal consumption of memory.

Line #    Mem usage    Increment   Line Contents
================================================
     1    6.160 MiB    0.000 MiB   @profile
     2                             def loadtitles():
     3  217.363 MiB  211.203 MiB     titles = open('titles').read()

then I can do if inputstring+'\n' in titles:

Q2. Is there a faster alternative to this?

役に立ちましたか?

解決

You can either:

  • use a key/value store if you lookup lots of keys.
  • iterate over the file line by line and check for keys' existence if there are only a few keys to lookup.

他のヒント

Iterating file (processing line by line) instead of reading full contents of file will reduce memory consumption. (combining with generator expression):

def loadtitles():
    with open('titles') as f:
        titles = {word for line in f for word in line.split()}
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top