سؤال

Following the suggestions given here, I have stored my data using ZODB, created by the following piece of code:

# structure of the data [around 3.5 GB on disk]
bTree_container = {key1:[ [2,.44,0], [1,.23,0], [4,.21,0] ...[10,000th element] ], key2:[ [3,.77,0], [1,.22,0], [6,.98,0] ..[10,000th element] ] ..10,000th key:[[5,.66,0], [2,.32,0], [8,.66,0] ..[10,000th element]]}

# Code used to build the above mentioned data set
for Gnodes in G.nodes():      # Gnodes iterates over 10000 values 
Gvalue = someoperation(Gnodes)
    for i,Hnodes in enumerate(H.nodes()):  # Hnodes iterates over 10000 values 
        Hvalue =someoperation(Hnodes)
        score = SomeOperation on (Gvalue,Hvalue)
        btree_container.setdefault(Gnodes, PersistentList()).append([Hnodes, score, 0]) # build a list corresponding to every value of Gnode (key)
        if i%5000 == 0       # save the data temporarily to disk.
           transaction.savepoint(True)
transaction.commit()         # Flush all the data to disk

Now, I want to (in a separate module) (1) modify the stored data and (2) sort it. Following is the code that I was using:

storage = FileStorage('Data.fs')
db = DB(storage)
connection = db.open()
root = connection.root()
sim_sorted = root[0]

# substitute the last element in every list of every key (indicated by 0 above) by 1
# This code exhausts all the memory, never get to the 2nd part i.e. the sorting 
for x in sim_sorted.iterkeys():
    for i,y in enumerate(sim_sorted[x]):
        y[3] = 1
        if i%5000 ==0
            transaction.savepoint()

# Sort all the lists associated with every key in he reverse order using middle element as key   
[sim_sorted[keys].sort(key = lambda x:(-x[1])) for keys in sim_sorted.iterkeys()]

However, the code used for editing the value is eating up all the memory (never get to sorting). I am not sure how this works, but have a feeling that there is something terribly wrong with my code and ZODB is pulling everything into memory and hence the issue. What would be the correct method to achieve the desired effect i.e the substitution and sorting of stored elements in ZODB without running into memory issues? Also the code is very slow, suggestion to quicken it up ?

[Note: It's not necessary for me to write these changes back to the database]

EDIT There seems to be a little improvement in memory usage by adding the command connection.cacheMinimize() after the inner loop, however again after some time the entire RAM is consumed, which is leaving me puzzled.

هل كانت مفيدة؟

المحلول

Are you certain it's not the sorting that's killing your memory?

Note that I'd expect that each PersistentList has to fit into memory; it is one persistent record so it'll be loaded as a whole on access.

I'd modify your code to run like this and see what happens:

for x in sim_sorted.iterkeys():
    for y in sim_sorted[x]:
        y[3] = 1
    sim_sorted[x].sort(key=lambda y: -y[1])
    transaction.savepoint()

Now you process the whole list in one go and sort it; after all, it's already loaded into memory in one. After processing, you tell the ZODB you are done with this stage and the whole changed list will be flushed to temporary storage. There is little point flushing it when only half-way done.

If this still doesn't fit into memory for you, you'll need to rethink your data structure and split up the large lists into smaller persistent records so you can work on chunks of it at a time without loading the whole thing in one.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top