Question

By default, mrJob stores the key and the value from output in key[tab]output format.

This happens even if the key (or the value) is empty, null, or otherwise not interesting. Suppose my key, value pair is None, {"a":1", "b":1}. Then I get this:

None    {"a":1, "b":2}

Is there a way to suppress the key or the value? I just want this:

{"a":1, "b":2}

BTW, I've already tried this. Am I missing something...?

class MyMrJobClass(MRJob):
    OUTPUT_PROTOCOL = mrjob.protocol.JSONProtocol

    def step1_mapper(self, _, line):
        ...
        yield my_key, my_value

    def step1_reducer(self, key, values):
        for v in values:
            ...
        yield None, my_data

    def steps(self):
        return [
            self.mr(
                mapper=self.step1_mapper,
                reducer=self.step1_reducer,
            ),
        ]

NB: I know that I don't need to overwrite steps for a single-step job. This will eventually be a multistep job, so it's important to build the class that way.

Thanks!

Was it helpful?

Solution

You can use mrjob.protocol.JSONValueProtocol (notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol.

The documentation has more information on using custom protocols.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top