You can use mrjob.protocol.JSONValueProtocol
(notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol
.
The documentation has more information on using custom protocols.
Question
By default, mrJob stores the key and the value from output in key[tab]output format.
This happens even if the key (or the value) is empty, null, or otherwise not interesting. Suppose my key, value pair is None, {"a":1", "b":1}. Then I get this:
None {"a":1, "b":2}
Is there a way to suppress the key or the value? I just want this:
{"a":1, "b":2}
BTW, I've already tried this. Am I missing something...?
class MyMrJobClass(MRJob):
OUTPUT_PROTOCOL = mrjob.protocol.JSONProtocol
def step1_mapper(self, _, line):
...
yield my_key, my_value
def step1_reducer(self, key, values):
for v in values:
...
yield None, my_data
def steps(self):
return [
self.mr(
mapper=self.step1_mapper,
reducer=self.step1_reducer,
),
]
NB: I know that I don't need to overwrite steps for a single-step job. This will eventually be a multistep job, so it's important to build the class that way.
Thanks!
Solution
You can use mrjob.protocol.JSONValueProtocol
(notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol
.
The documentation has more information on using custom protocols.