سؤال

I have three python object databases which I've constructed through the ZODB module, which I would like to merge into one. The reason I have three and not one is because each object belongs to one of three populations, and was added to the database once my code conducted an analysis of said object. The analysis of each object can definitely be done in parallel. My code takes a few days to run, so to prevent this from being a week long endeavor, I have three computers each processing objects from one of the three populations, and outputting a single ZODB database once it has completed. I couldn't have three computers adding the analysis of objects from different populations to the same database because of the way ZODB handles conflicts. Essentially, until you close the database, it is locked from the inside.

My questions are: 1) How can I merge multiple .fs database files into a single master database? The structure of each database is exactly the same - meaning the dictionary structures are the same between each. As an example, MyDB may represent the ZODB database structure of the first population:

root.['MyDB']['ID123456']['property1']
        ...  ['ID123456']['property2']
        ...       ...        ...

root.['MyDB']['ID123457']['property1']
        ...  ['ID123457']['property2']
        ...       ...        ...

... 

where ellipsis represents more of the same. The names of the keys 'property1', 'property2', etc., are all the same for each 'IDXXXXXX' key within the database, though the values will certaily vary.

2) What would have been the smarter thing to do to run this code in parallel while still resulting in a single ZODB structure?

Please let me know if clarification is needed.

Thanks!

هل كانت مفيدة؟

المحلول 2

Okay, since ZODB object databases are essentially just dictionaries of python objects, this post happens to be the answer I was looking for. It talks about how to add databases together, and in doing so literally adds together any similar common keys of both databases. It's still the answer I'm looking for because both databases are mutually exclusive, and so the result would be a single ZODB database which contains unmodified entries of the other two.

نصائح أخرى

The smarter thing would have been to use ZEO to share the ZODB storage among the processes.

ZEO shares a ZODB database across the network and extends the conflict resolution across multiple clients, which can reside on the same machine or elsewhere.

Alternatively, you could use the RelStorage backend to store your ZODB instead of using the standard FileStorage; this backend uses a traditional relational database to provide concurrent access instead.

See zc.lockfile.LockError in ZODB for some usage examples for either option.

The ZODB data structures are otherwise merely persisted Python data structures; merging the three ZODB datastructures requires you to open each of the databases and merging the nested structures as needed.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top