You can use mrjob.protocol.JSONValueProtocol
(notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol
.
The documentation has more information on using custom protocols.
Question
By default, mrJob stores the key and the value from output in key[tab]output format.
This happens even if the key (or the value) is empty, null, or otherwise not interesting. Suppose my key, value pair is None, {"a":1", "b":1}. Then I get this:
None {"a":1, "b":2}
Is there a way to suppress the key or the value? I just want this:
{"a":1, "b":2}
BTW, I've already tried this. Am I missing something...?
class MyMrJobClass(MRJob):
OUTPUT_PROTOCOL = mrjob.protocol.JSONProtocol
def step1_mapper(self, _, line):
...
yield my_key, my_value
def step1_reducer(self, key, values):
for v in values:
...
yield None, my_data
def steps(self):
return [
self.mr(
mapper=self.step1_mapper,
reducer=self.step1_reducer,
),
]
NB: I know that I don't need to overwrite steps for a single-step job. This will eventually be a multistep job, so it's important to build the class that way.
Thanks!
La solution
You can use mrjob.protocol.JSONValueProtocol
(notice the Value. See the documentation) as the output protocol instead of mrjob.protocol.JSONProtocol
.
The documentation has more information on using custom protocols.