You may catch Exception
in both mapper and reducer and inside the catch block have a counter like the following:
catch (Exception ex){
context.getCounter("CUSTOM_COUNTER", ex.getMessage()).increment(1);
System.err.println(GENERIC_INPUT_ERROR_MESSAGE + key + "," + value); // also log the payoad which resulted in the exception
ex.printStackTrace();
}
If the exception message is something you would have expected and also the counter's value is acceptable then you can very well go ahead with the results or else investigate the logs. I know catching Exception
isn't advised but if you want to "continue on error", then it's pretty much the same thing. Since here cost of clusters are at stake, I think we are better off catching Excpetion
instead of specific exceptions.
Though, there may be side effects to it, such as your code might be run on entirely wrong input and but for the catch it would have failed much earlier. But chances of something like this happening is very less.
EDIT:
For your point #2, you may set max number of allowed failures per tracker by using the following:
conf.setMaxTaskFailuresPerTracker(noFailures);
OR
The config which you must set is mapred.max.tracker.failures
. As you may know the default is 4. For all other mapred configurations see here.