문제

By default, mrJob stores the key and the value from output in key[tab]output format.

This happens even if the key (or the value) is empty, null, or otherwise not interesting. Suppose my key, value pair is None, {"a":1", "b":1}. Then I get this:

None    {"a":1, "b":2}

Is there a way to suppress the key or the value? I just want this:

{"a":1, "b":2}

BTW, I've already tried this. Am I missing something...?

class MyMrJobClass(MRJob):
    OUTPUT_PROTOCOL = mrjob.protocol.JSONProtocol

    def step1_mapper(self, _, line):
        ...
        yield my_key, my_value

    def step1_reducer(self, key, values):
        for v in values:
            ...
        yield None, my_data

    def steps(self):
        return [
            self.mr(
                mapper=self.step1_mapper,
                reducer=self.step1_reducer,
            ),
        ]

NB: I know that I don't need to overwrite steps for a single-step job. This will eventually be a multistep job, so it's important to build the class that way.

Thanks!

도움이 되었습니까?

해결책

You can use mrjob.protocol.JSONValueProtocol (notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol.

The documentation has more information on using custom protocols.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top