Question

By default, mrJob stores the key and the value from output in key[tab]output format.

This happens even if the key (or the value) is empty, null, or otherwise not interesting. Suppose my key, value pair is None, {"a":1", "b":1}. Then I get this:

None    {"a":1, "b":2}

Is there a way to suppress the key or the value? I just want this:

{"a":1, "b":2}

BTW, I've already tried this. Am I missing something...?

class MyMrJobClass(MRJob):
    OUTPUT_PROTOCOL = mrjob.protocol.JSONProtocol

    def step1_mapper(self, _, line):
        ...
        yield my_key, my_value

    def step1_reducer(self, key, values):
        for v in values:
            ...
        yield None, my_data

    def steps(self):
        return [
            self.mr(
                mapper=self.step1_mapper,
                reducer=self.step1_reducer,
            ),
        ]

NB: I know that I don't need to overwrite steps for a single-step job. This will eventually be a multistep job, so it's important to build the class that way.

Thanks!

Était-ce utile?

La solution

You can use mrjob.protocol.JSONValueProtocol (notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol.

The documentation has more information on using custom protocols.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top