How can I write an iteration in Python using mrjob mapper reducer, for which the counter is a part of the computation in the loop?

StackOverflow https://stackoverflow.com/questions/19068553

  •  29-06-2022
  •  | 
  •  

Question

I have a program that iterates a mapper and a reducer n times consecutively. However, for each iteration, the mapper of each key-value pair computes a value that depends on n.

from mrjob.job import mrjob

class MRWord(mrjob):

  def mapper_init_def(self):

        self.count = {}


    def mapper_count(self, key, value):

            self.count[key] = 0

            print self.count[key]
      # print correctly  
            yield key, value


  def mapper_iterate(self, key, value):
      yield key, value
      print self.count[key]
  #error

  def reducer_iterate(self, key, value):
      yield key, value


  def steps(self):
      return [
        self.mr(mapper_init=self.mapper_init_def, mapper=self.mapper_count),

        self.mr(mapper=self.mapper_iterate, reducer=self.reducer_iterate)
      ]


if __name__ == '__main__':
    MRWord.run()

I defined a two-step mapper reducer, such that the first defines a class variable, self.count. The program produces an error, AttributeError: 'MRWord' object has no attribute 'count'. It seems each step defines an independent mrjob class object, and that variable cannot be shared. Is there another way to accomplish this?

Was it helpful?

Solution

Why don't you try defining your count in the class?

class MRWord(MRJob):
    count = []

and drop the

def mapper_init_def(self):
   self.count = {}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top