city population difference

https://stackoverflow.com/questions/23356556

11-07-2023
|

Question

I have an input file

 Chicago 500
 NewWork 200
 California 100

I need difference of second column as output for each city with each other

 Chicago Newyork 300
 Chicago California 100
 Newyork Chicago -300
 Newyork California 100
 California Chicago -400
 California Newyork -100

I tried alot but not able to figure out exact and correct way to implement in map reduce . Please give me some solution

Solution

Here is a pseudocode. I use Python often, so it looks more like it. For this to work, you must know the total number of lines (i.e., cities here) and use that for N prior to running the job.

map(dummy, line):
  city, pop = line.split()
  for idx in 1:N
     emit(idx, (city, pop))

reduce(idx, city_data):
  city_data.sort() # sort by city to ensure indices are consistent
  city, pop = city_data[idx]
  for i in 1:N
     if idx != i:
        c, p = city_data[i]
        dist = pop - p 
        emit(city, (c, dist))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow