Question

I may be wrong, but all(?) examples I've seen with Apache Hadoop takes as input a file stored on the local file system (e.g. org.apache.hadoop.examples.Grep)

Is there a way to load and save the data on the Hadoop file system (HDFS)? For example I put a tab delimited file named 'stored.xls' on HDFS using hadoop-0.19.1/bin/hadoop dfs -put ~/local.xls stored.xls. How should I configure the JobConf to read it ?

Thanks .

Was it helpful?

Solution

JobConf conf = new JobConf(getConf(), ...);
...
FileInputFormat.setInputPaths(conf, new Path("stored.xls"))
...
JobClient.runJob(conf);
...

setInputPaths will do it.

OTHER TIPS

Pierre, the default configuration for Hadoop is to run in local mode, rather than in distributed mode. You likely need to just modify some configuration in your hadoop-site.xml. It looks like your default filesystem is still localhost, when it should be hdfs://youraddress:yourport. Look at your setting for fs.default.name, and also see the setup help at Michael Noll's blog for more details.

FileInputFormat.setInputPaths(conf, new Path("hdfs://hostname:port/user/me/stored.xls"));

This will do

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top