Pregunta

By default, mrJob stores the key and the value from output in key[tab]output format.

This happens even if the key (or the value) is empty, null, or otherwise not interesting. Suppose my key, value pair is None, {"a":1", "b":1}. Then I get this:

None    {"a":1, "b":2}

Is there a way to suppress the key or the value? I just want this:

{"a":1, "b":2}

BTW, I've already tried this. Am I missing something...?

class MyMrJobClass(MRJob):
    OUTPUT_PROTOCOL = mrjob.protocol.JSONProtocol

    def step1_mapper(self, _, line):
        ...
        yield my_key, my_value

    def step1_reducer(self, key, values):
        for v in values:
            ...
        yield None, my_data

    def steps(self):
        return [
            self.mr(
                mapper=self.step1_mapper,
                reducer=self.step1_reducer,
            ),
        ]

NB: I know that I don't need to overwrite steps for a single-step job. This will eventually be a multistep job, so it's important to build the class that way.

Thanks!

¿Fue útil?

Solución

You can use mrjob.protocol.JSONValueProtocol (notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol.

The documentation has more information on using custom protocols.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top