Pregunta

I tried outputting a python set from a mapper in mrjob. I changed the function signatures of my combiners and reducers accordingly.

However, I get this error:

Counters From Step 1
Unencodable output:
TypeError: 172804

When I change the sets to lists, this error disappears. Are there certain python types that cannot be outputted by mappers in mrjob?

¿Fue útil?

Solución

Values are moved between stages of the MapReduce using Protocols, generally Raw, JSON or Pickle.

You must make sure that the values being moved around can be properly handled by the Protocol you pick. I would imagine that there's no default JSON representation of a set, and perhaps there's no raw representation either?

Try setting the INTERNAL_PROTOCOL to Pickle, as so:

class yourMR(MRJob):
    INTERNAL_PROTOCOL = PickleProtocol

    def map(self, key, value):
        # mapper

    def reduce(self, key, value):
        # reducer

Note: MRJob will handle pickling and unpickling for you, so don't worry about that aspect. You can also set the INPUT and OUTPUT protocols if necessary (for multiple stages, or set output from the reducer).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top