Pregunta
I tried outputting a python set from a mapper in mrjob. I changed the function signatures of my combiners and reducers accordingly.
However, I get this error:
Counters From Step 1
Unencodable output:
TypeError: 172804
When I change the sets to lists, this error disappears. Are there certain python types that cannot be outputted by mappers in mrjob?
Solución
Values are moved between stages of the MapReduce using Protocols, generally Raw, JSON or Pickle.
You must make sure that the values being moved around can be properly handled by the Protocol you pick. I would imagine that there's no default JSON representation of a set, and perhaps there's no raw representation either?
Try setting the INTERNAL_PROTOCOL to Pickle, as so:
class yourMR(MRJob):
INTERNAL_PROTOCOL = PickleProtocol
def map(self, key, value):
# mapper
def reduce(self, key, value):
# reducer
Note: MRJob will handle pickling and unpickling for you, so don't worry about that aspect. You can also set the INPUT and OUTPUT protocols if necessary (for multiple stages, or set output from the reducer).