I tried outputting a python set from a mapper in mrjob. I changed the function signatures of my combiners and reducers accordingly.

However, I get this error:

Counters From Step 1
Unencodable output:
TypeError: 172804

When I change the sets to lists, this error disappears. Are there certain python types that cannot be outputted by mappers in mrjob?

有帮助吗?

解决方案

Values are moved between stages of the MapReduce using Protocols, generally Raw, JSON or Pickle.

You must make sure that the values being moved around can be properly handled by the Protocol you pick. I would imagine that there's no default JSON representation of a set, and perhaps there's no raw representation either?

Try setting the INTERNAL_PROTOCOL to Pickle, as so:

class yourMR(MRJob):
    INTERNAL_PROTOCOL = PickleProtocol

    def map(self, key, value):
        # mapper

    def reduce(self, key, value):
        # reducer

Note: MRJob will handle pickling and unpickling for you, so don't worry about that aspect. You can also set the INPUT and OUTPUT protocols if necessary (for multiple stages, or set output from the reducer).

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top