Why don't you try defining your count in the class?
class MRWord(MRJob):
count = []
and drop the
def mapper_init_def(self):
self.count = {}
Domanda
I have a program that iterates a mapper and a reducer n
times consecutively. However, for each iteration, the mapper of each key-value pair computes a value that depends on n
.
from mrjob.job import mrjob
class MRWord(mrjob):
def mapper_init_def(self):
self.count = {}
def mapper_count(self, key, value):
self.count[key] = 0
print self.count[key]
# print correctly
yield key, value
def mapper_iterate(self, key, value):
yield key, value
print self.count[key]
#error
def reducer_iterate(self, key, value):
yield key, value
def steps(self):
return [
self.mr(mapper_init=self.mapper_init_def, mapper=self.mapper_count),
self.mr(mapper=self.mapper_iterate, reducer=self.reducer_iterate)
]
if __name__ == '__main__':
MRWord.run()
I defined a two-step mapper reducer, such that the first defines a class variable, self.count
. The program produces an error, AttributeError: 'MRWord' object has no attribute 'count'
. It seems each step defines an independent mrjob class object, and that variable cannot be shared. Is there another way to accomplish this?
Soluzione
Why don't you try defining your count in the class?
class MRWord(MRJob):
count = []
and drop the
def mapper_init_def(self):
self.count = {}