Question

I am running a map-reduce job through oozie. The command I use is as follows.

oozie job -verbose -oozie http://myoozieurl -config job.properties -run

How can I view the logs generated by the hadoop job? Is there a way I can see the logs generated or redicrect the logs to print on the terminal window?

If I run the job using (MapR) hadoop command, I can see the output of the log commands on the terminal.

New to hadoop and oozie. So this may be a newbie oversight.

Was it helpful?

Solution

This post explains how to logs are managed during mapreduce jobs

https://discuss.zendesk.com/hc/en-us/articles/201925118

Once the job has completed the NodeManager will keep the log for each container for ${yarn.nodemanager.log.retain-seconds} which is 10800 seconds by default ( 3 hours ) and delete them once they have expired. But if ${yarn.log-aggregation-enable} is enabled then the NodeManager will immediately concatenate all of the containers logs into one file and upload them into HDFS in ${yarn.nodemanager.remote-app-log-dir}/${user.name}/logs/<application ID> and delete them from the local userlogs directory. Log aggregation is enabled by default in PHD and it makes log collection convenient.

Example when log aggregation is enabled. We know there were 4 containers executed in this mapreduce job because "-m" specified 3 mappers and the fourth container is the application master. Each NodeManager got at least one container so all of them uploaded a log file.

[gpadmin@hdm1 ~]$ hdfs dfs -ls /yarn/apps/gpadmin/logs/application_1389385968629_0025/
Found 3 items
-rw-r-----   3 gpadmin hadoop       4496 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw1.hadoop.local_30825
-rw-r-----   3 gpadmin hadoop       5378 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw2.hadoop.local_36429
-rw-r-----   3 gpadmin hadoop    1877950 2014-02-01 16:54 /yarn/apps/gpadmin
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top