I have written a job using scalding that runs great in local mode. But when I try to execute it in hdfs mode (on the same file), it doesn't do anything. More precisely, the first step has no tasks (mappers nor reducers) and the steps afterwards obviously do nothing.

I tried grepping the logs for exceptions and also wrap my code in try-catch (in scalding the job definition is in the constructor and I also wrapped the run method).

Maybe for some reason cascading decides to ignore the input file? It is an Avro deflate file.

UPDATE: Digging more, I can see this line:

2014-04-28 04:49:23,954 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201404280448_0001 = 0. Number of splits = 0

In the job xml, the mapred.input.dir property is set to the path to my file.

It looks like JobInProgress is getting its info from mapred.job.split.file which doesn't exists in the job xml file

有帮助吗?

解决方案

It turns out that my avro file is named sample.avro.deflate. Avro, 1.7.4, silently ignores any input files that don't end with '.avro'. In 1.7.6, they added a property avro.mapred.ignore.inputs.without.extension

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top