How can I write an iteration in Python using mrjob mapper reducer, for which the counter is a part of the computation in the loop?

StackOverflow https://stackoverflow.com/questions/19068553

  •  29-06-2022
  •  | 
  •  

Domanda

I have a program that iterates a mapper and a reducer n times consecutively. However, for each iteration, the mapper of each key-value pair computes a value that depends on n.

from mrjob.job import mrjob

class MRWord(mrjob):

  def mapper_init_def(self):

        self.count = {}


    def mapper_count(self, key, value):

            self.count[key] = 0

            print self.count[key]
      # print correctly  
            yield key, value


  def mapper_iterate(self, key, value):
      yield key, value
      print self.count[key]
  #error

  def reducer_iterate(self, key, value):
      yield key, value


  def steps(self):
      return [
        self.mr(mapper_init=self.mapper_init_def, mapper=self.mapper_count),

        self.mr(mapper=self.mapper_iterate, reducer=self.reducer_iterate)
      ]


if __name__ == '__main__':
    MRWord.run()

I defined a two-step mapper reducer, such that the first defines a class variable, self.count. The program produces an error, AttributeError: 'MRWord' object has no attribute 'count'. It seems each step defines an independent mrjob class object, and that variable cannot be shared. Is there another way to accomplish this?

È stato utile?

Soluzione

Why don't you try defining your count in the class?

class MRWord(MRJob):
    count = []

and drop the

def mapper_init_def(self):
   self.count = {}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top