Question

I'm trying to use multiprocessing.pool to speed up some parsing of a file parsed using pyparsing, however I get a multiprocessing.pool.MaybeEncodingError exception whenever I try this.

I've narrowed it down to something to do with returning a dictionary (ParseResults.asDict()), using asList() the error doesn't occur; but the input I'm actually parsing is pretty complex so ideally I'd like to use asDict.

The actual data being parsed is an Erlang list of tagged tuples, which I want to map to a python list. The grammar for this is pretty complex, so I've instead got a simplified test case (updated to include a nested dict):

#!/usr/bin/env python2.7
from pyparsing import *
import multiprocessing

dictionary = Forward()
key = Word(alphas)
sep   = Suppress(":")
value = ( key | dictionary )
key_val = Group( key + sep + value )
dictionary <<= Dict( Suppress('[') + delimitedList( key_val ) + Suppress(']') )

def parse_dict(s):
    p = dictionary.parseString(s).asDict()
    return p

def parse_list(s):
    return dictionary.parseString(s).asList()

# This works (list)
data = ['[ foo : [ bar : baz ] ]']
pool = multiprocessing.Pool()
pool.map(parse_list, data)

# This fails (dict)
pool.map(parse_dict, data)

Fails with:

Traceback (most recent call last):
  File "lib/python/nutshell/multi_parse.py", line 19, in <module>
    pool.map(parse, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 250, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 554, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[{'foo': ([(['bar', 'baz'], {})], {'bar': [('baz', 0)]})}]'. Reason: 'TypeError("'str' object is not callable",)'
Was it helpful?

Solution

Update: The question has significantly changed since the update. The original point of not being pickable still stands and is left below.

You say in your grammar you use a delimitedList, so let's add that to our test case:

data = ['[ foo : [ bar : baz ], cat:dog ]']

There is no reason why your "dictionary" grammar object is a python dict, it is a list. If you didn't mean that you'll have to change delimitedList to something else. I've updated grammar to allow for a proper pickling using a parseAction:

dictionary = Forward()
key   = Word(alphas)
LP, RP, sep = map(Suppress, "[]:")
value = key | dictionary
key_val = key("key") + sep + value("val")
dictionary <<= LP + delimitedList( key_val ) + RP

def parse_key_val(x): return {x.key:x.val}
key_val.setParseAction(parse_key_val)

def parse_dict(s):
    # Yes, it's a list, not a dict!
    return dictionary.parseString(s).asList()

def parse_list(s):
    return dictionary.parseString(s).asList()

This gives a working answer in parallel:

[[{'foo': {'bar': 'baz'}}, {'cat': 'dog'}]]

Original answer: I think that multiprocessing fails since it can't pickle the object. You think you have a dict, but if you look at:

def parse_dict(s):
    val = lang.parseString(s).asDict()
    print type(val["foo"])
    return val

You'll find out that the inner type is a <class 'pyparsing.ParseResults'>. I'm not sure how to apply pp.Dict recursively, but a really simple fix would be to change your grammar:

value = ( Word(alphas) )
sep   = Suppress(":")
key_val = Group( value + sep + value )
lang = Dict( Suppress('[') + delimitedList( key_val ) + Suppress(']') )

Which now allows pp.Dict to operate properly. For what it's worth, I've found that many of my multiprocessing woes come from an object that can't be properly serialized, so it's usually the first place I look.

A useful and related question:

Can't get pyparsing Dict() to return nested dictionary

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top