문제

I can save a serialized corpus into foobar.mm but when i try to load it, it gives UnpicklingError. Loading the dictionary seems fine though. Anyone knows how to resolve this? And why does this occur?

>>> from gensim import corpora
>>> docs = ["this is a foo bar", "you are a foo"]
>>> texts = [[i for i in doc.lower().split()] for doc in docs]
>>> print texts
[['this', 'is', 'a', 'foo', 'bar'], ['you', 'are', 'a', 'foo']]

>>> dictionary = corpora.Dictionary(texts)
>>> dictionary.save('foobar.dic')
>>> print dictionary
Dictionary(7 unique tokens)
>>> corpora.Dictionary.load('foobar.dic')
<gensim.corpora.dictionary.Dictionary object at 0x329f910>

>>> corpus = [dictionary.doc2bow(text) for text in texts]
>>> corpora.MmCorpus.serialize('foobar.mm', corpus)
>>> corpus = corpora.MmCorpus.load('foobar.mm')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.8.6-py2.7.egg/gensim/utils.py", line 166, in load
    return unpickle(fname)
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.8.6-py2.7.egg/gensim/utils.py", line 492, in unpickle
    return cPickle.load(open(fname, 'rb'))
cPickle.UnpicklingError: invalid load key, '%'.
도움이 되었습니까?

해결책

See the documentation at http://radimrehurek.com/gensim/tut1.html#corpus-formats

What you're trying to do is store the corpus in MatrixMarket format (=a text format) and then load it using the save/load binary interface.

To load a serialized MatrixMarket corpus, simply corpus = corpora.MmCorpus('foobar.mm')

다른 팁

Since gensim's corpora (whatever this is) is using pickle as the stacktrace reveals, you will only be able to store data of a limited type. For more details see What can be pickled and unpickled? in the Python docs.

If this does not apply (i. e. if what you want to pickle and unpickle should be picklable) I fear you might have found a bug in the pickle module. Maybe you then can solve your issue by upgrading to a newer Python version.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top