Get max salary employee name using hadoop map reduce

Question 1

I would make the map emit the full tuple of the max salary. For that, create a class (for the value) that implements Writable interface (http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/io/Writable.html). Maybe TupleWritable suits your needs (not much complex).

Since you will have 1 value emited per map, network is not an issue and seems fine to receive all tuple data in the reducer. Your reducer will just have to filter the top from the "max" values.

For more complex problems, you will have to think about chaining jobs (http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining)

Question 2

I can suggest following solution

1. Find the max salary using your mapreduce job

2. Read the max salary from hdfs (it should be in the file in output folder of your job)

3. Save the max salary two configuration, say `configuration.set("max.salary", maxSalary);`

4. Create new mapper-only job. The mapper of this job should read maxSalary value from the configuration in the setup method and filter out employers with salary equal to the maxSalary in map method. Pass your data to this job.

As the result, you'll

P.S. But as the better way, I'll recommend you to use HIVE or PIG for such kind of tasks, because if they doesn't involve complicated math/buseness logic would be much easier to implement them in high level instruments like hive and pig (and some other).