Multiprocessing objects with namedtuple - Pickling Error

https://stackoverflow.com/questions/22304815

12-06-2023
|

Pergunta

I am having trouble using namedtuples in objects that I want to put into multiprocessing. I am receiving pickling error. I tried couple of things from other stackoverflow posts, but I could not succeed. Here is the structure of my code:

package_main, test_module

 import myprogram.package_of_classes.data_object_module
 import ....obj_calculate

 class test(object):
       if __name__ == '__main__':
             my_obj=create_obj('myobject',['f1','f2'])
             input = multiprocessing.Queue()
             output = multiprocessing.Queue()
             input.put(my_obj)
             j=Process(target=obj_calculate, args=(input,output))
             j.start()

package_of_classes, data_object_module

 import collections
 import ....load_flat_file

 def get_ntuple_format(obj):
     nt_fields=''
     for fld in obj.fields:
         nt_fields=nt_fields+fld+', '
     nt_fields=nt_fields[0:-2]
     ntuple=collections.namedtuple('ntuple_format',nt_fields)
     return ntuple

 Class Data_obj:
    def __init__(self, name,fields):
        self.name=name
        self.fields=fields
        self.ntuple_form=get_ntuple_format(self)  

    def calculate(self):
        self.file_read('C:/files','division.txt')

    def file_read(self,data_directory,filename):
        output=load_flat_file(data_directory,filename,self.ntuple_form)
        self.data=output

utils_package,utils_module

def create_dataobj(name,fields):
    locals()[name]=Data_Obj(name,fields)
    return locals()[name]  

def obj_calculate(input,output):   
    obj=input.get()
    obj.calculate()
    output.put(obj)

loads_module

def load_flat_file(data_directory,filename,ntuple_form):
     csv.register_dialect('csvrd', delimiter='\t', quoting=csv.QUOTE_NONE)
     ListofTuples=[]
     with open(os.path.join(data_directory,filename), 'rb') as f:
          reader = csv.reader(f,'csvrd')
          for line in reader:
               if line:
                   ListofTuples.append(ntuple_form._make(line))
     return ListofTuples

And the error I am getting is:

PicklingError: PicklingError: Can't pickle  class '__main__ . ntuple_format: it's not the same object as __ main __. ntuple_format

P.S. As I extracted this sample code from a large project, please ignore minor inconsistencies.

Solução

You cannot pickle a class (in this case, a named tuple) that you create dynamically (via get_ntuple_format). For a class to be picklable, it has to be defined at the top level of an importable module.

If you only have a few kinds of tuples you need to support, consider defining them all in advance, at the top level of a module, and then picking the right one dynamically. If you need a fully dynamic container format, consider just using a dict instead.

Outras dicas

I'd argue you can pickle a namedtuple, as well as a class defined in __main__.

>>> import dill as pickle
>>> import collections
>>> 
>>> thing = collections.namedtuple('thing', ['a','b'])
>>> pickle.loads(pickle.dumps(thing))
<class '__main__.thing'>

Here's the same thing, used in a class method.

>>> class Foo(object):
...   def bar(self, a, b):
...     thing = collections.namedtuple('thing', ['a','b'])     
...     thing.a = a 
...     thing.b = b
...     return thing 
... 
>>> f = Foo()
>>> q = f.bar(1,2)
>>> q.a
1
>>> q.b
2
>>> q._fields
('a', 'b')
>>> 
>>> pickle.loads(pickle.dumps(Foo.bar))
<unbound method Foo.bar>
>>> pickle.loads(pickle.dumps(f.bar))
<bound method Foo.bar of <__main__.Foo object at 0x10dbf5450>>

You just have to use dill instead of pickle.

Get dill here: https://github.com/uqfoundation

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow