Question

I have written a job using scalding that runs great in local mode. But when I try to execute it in hdfs mode (on the same file), it doesn't do anything. More precisely, the first step has no tasks (mappers nor reducers) and the steps afterwards obviously do nothing.

I tried grepping the logs for exceptions and also wrap my code in try-catch (in scalding the job definition is in the constructor and I also wrapped the run method).

Maybe for some reason cascading decides to ignore the input file? It is an Avro deflate file.

UPDATE: Digging more, I can see this line:

2014-04-28 04:49:23,954 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201404280448_0001 = 0. Number of splits = 0

In the job xml, the mapred.input.dir property is set to the path to my file.

It looks like JobInProgress is getting its info from mapred.job.split.file which doesn't exists in the job xml file

Was it helpful?

Solution

It turns out that my avro file is named sample.avro.deflate. Avro, 1.7.4, silently ignores any input files that don't end with '.avro'. In 1.7.6, they added a property avro.mapred.ignore.inputs.without.extension

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top