Question

I have a large dictionary whose structure looks like:

dcPaths = {'id_jola_001': CPath instance}

where CPath is a self-defined class:

class CPath(object):
    def __init__(self):
        # some attributes
        self.m_dAvgSpeed = 0.0
        ...
        # a list of CNode instance
        self.m_lsNodes = []

where m_lsNodes is a list of CNode:

class CNode(object):
    def __init__(self):
        # some attributes
        self.m_nLoc = 0

        # a list of Apps
        self.m_lsApps = []

Here, m_lsApps is a list of CApp, which is another self-defined class:

class CApp(object):
    def __init__(self):
        # some attributes
        self.m_nCount= 0
        self.m_nUpPackets = 0

I serialize this dictionary by using cPickle:

def serialize2File(strFileName, strOutDir, obj):
    if len(obj) != 0:
        strOutFilePath = "%s%s" % (strOutDir, strFileName)
        with open(strOutFilePath, 'w') as hOutFile:
            cPickle.dump(obj, hOutFile, protocol=0)
        return strOutFilePath
    else:
        print("Nothing to serialize!")

It works fine and the size of serialized file is about 6.8GB. However, when I try to deserialize this object:

def deserializeFromFile(strFilePath):
    obj = 0
    with open(strFilePath) as hFile:
        obj = cPickle.load(hFile)
    return obj

I find it consumes more than 90GB memory and takes a long time.

  1. why would this happen?
  2. Is there any way I could optimize this?

BTW, I'm using python 2.7.6

Was it helpful?

Solution

You can try specifying the pickle protocol; fastest is -1 (meaning: latest protocol, no problem if you are pickling and unpickling with the same Python version).

cPickle.dump(obj, file, protocol = -1)

EDIT: As said in the comments: load detects the protocol itself.

cPickle.load(obj, file)

OTHER TIPS

When you store complex python objects, python usually stores a lot of useless data (look at the __dict__ object property).

In order to reduce the memory consumption of unserialized data you should pickle only python natives. You can achieve this easily implementing some methods on your classes: object.__getstate__() and object.__setstate__(state).

See Pickling and unpickling normal class instances on python documentation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top