My MapReduce job become Fails

https://stackoverflow.com/questions/18587152

27-06-2022
|

Question

a have a mapreduce program in Eclipse. and I want to run it.. I follow the program from below url:

http://www.orzota.com/step-by-step-mapreduce-programming/

I do all things that the page says and run the program. but it show me error and my job fails.. the program create output folder but it is empty.. here is my cod:

package org.orzota.bookx.mappers;

import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class MyHadoopMapper extends MapReduceBase implements Mapper <LongWritable,  Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable _key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    String st = value.toString();
    String[] bookdata = st.split("\";\"");
    output.collect(new Text(bookdata[3]), one);
  }

   }

public class MyHadoopReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{

public void reduce(Text _key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    Text key = _key;
    int freq = 0;
    while (values.hasNext()){
        IntWritable value = (IntWritable) values.next();
        freq += value.get();
    }
    output.collect(key, new IntWritable(freq));
  }
  }


public class MyHadoopDriver {

public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(
            org.orzota.bookx.mappers.MyHadoopDriver.class);
    conf.setJobName("BookCrossing1.0");


    // TODO: specify output types
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);


    // TODO: specify a mapper
    conf.setMapperClass(org.orzota.bookx.mappers.MyHadoopMapper.class);

    // TODO: specify a reducer
    conf.setReducerClass(org.orzota.bookx.mappers.MyHadoopReducer.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);


    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));


    client.setConf(conf);
    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }
  }

   }

and here is the errors:

13/09/03 12:19:11 INFO util.ProcessTree: setsid exited with exit code 0
13/09/03 12:19:11 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3c2378
13/09/03 12:19:11 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclip/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:11 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:12 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:12 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:12 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:12 INFO mapred.JobClient:  map 0% reduce 0%
13/09/03 12:19:13 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:14 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:14 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000000_0 is done. And is in the process of commiting
13/09/03 12:19:14 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:14 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000000_0' done.
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000000_0
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:14 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@15dd910
13/09/03 12:19:14 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:14 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:14 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:14 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:14 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:14 INFO mapred.JobClient:  map 20% reduce 0%
13/09/03 12:19:15 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:15 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:15 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000001_0 is done. And is in the process of commiting
13/09/03 12:19:15 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:15 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000001_0' done.
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000002_0
13/09/03 12:19:15 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7c3885
13/09/03 12:19:15 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Book-Ratings.csv:0+30682276
13/09/03 12:19:15 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:15 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000003_0
13/09/03 12:19:16 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@11d2572
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Users.csv:0+12284157
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:16 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@164b09c
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.JobClient:  map 40% reduce 0%
13/09/03 12:19:17 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:17 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:17 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000004_0 is done. And is in the process of commiting
13/09/03 12:19:17 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:17 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000004_0' done.
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Map task executor complete.
13/09/03 12:19:17 WARN mapred.LocalJobRunner: job_local1379860058_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:17)
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
13/09/03 12:19:17 INFO mapred.JobClient:  map 60% reduce 0%
13/09/03 12:19:17 INFO mapred.JobClient: Job complete: job_local1379860058_0001
13/09/03 12:19:17 INFO mapred.JobClient: Counters: 16
13/09/03 12:19:17 INFO mapred.JobClient:   File Input Format Counters 
13/09/03 12:19:17 INFO mapred.JobClient:     Bytes Read=77795631
13/09/03 12:19:17 INFO mapred.JobClient:   FileSystemCounters
13/09/03 12:19:17 INFO mapred.JobClient:     FILE_BYTES_READ=178484057
13/09/03 12:19:17 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6981917
13/09/03 12:19:17 INFO mapred.JobClient:   Map-Reduce Framework
13/09/03 12:19:17 INFO mapred.JobClient:     Map output materialized bytes=2971356
13/09/03 12:19:17 INFO mapred.JobClient:     Map input records=271380
13/09/03 12:19:17 INFO mapred.JobClient:     Spilled Records=271380
13/09/03 12:19:17 INFO mapred.JobClient:     Map output bytes=2428578
13/09/03 12:19:17 INFO mapred.JobClient:     Total committed heap usage (bytes)=883687424
13/09/03 12:19:17 INFO mapred.JobClient:     CPU time spent (ms)=0
13/09/03 12:19:17 INFO mapred.JobClient:     Map input bytes=77787439
13/09/03 12:19:17 INFO mapred.JobClient:     SPLIT_RAW_BYTES=306
13/09/03 12:19:17 INFO mapred.JobClient:     Combine input records=0
13/09/03 12:19:17 INFO mapred.JobClient:     Combine output records=0
13/09/03 12:19:17 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient:     Map output records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Job Failed: NA  java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.orzota.bookx.mappers.MyHadoopDriver.main(MyHadoopDriver.java:44)

I think the error is from this line:

  output.collect(new Text(bookdata[3]), one);

but I don't know what it says.. can anyone help me please? thanks..

Solution

I checked the link you provided. I think the best thing you can do is do a system.out.println() of your input key value pairs (on a small subset of your input dataset), just to be sure. If the input file you are using contains a '\n' then it might be possible that the csv record is broken into 2 seperate records which contain fewer than 8 substrings. The ArrayOutOfBoundsException seems to point in this direction. I don't think it is a mapreduce error. You could also add the following line to your map function:

if (bookdata.length!=8){
  System.out.println("Warning, bad entry");
  return; 
}

If the simulation survives you have isolated the problem..

OTHER TIPS

Most probably the input file you are reading has a row that doesn't have 4 columns.

So when you split the row into an Array,

String[] bookdata = st.split("\";\"");

And you want to access the 4th element

output.collect(new Text(bookdata[3]), one);

It fails.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow