Question

Follow on from an earlier question...

I have an Oozie workflow that contains a shell action that invokes a Python script that is failing with the following error.

IOError: [Errno 13] Permission denied: '/home/test/myfile.txt'

All the Python script (hello.py) tries to do is open a file. this code works fine when executed outside of Hadoop.

if __name__ == '__main__':
    print ('Starting script')

    filein = '/home/test/myfile.txt'

    file = open(filein, 'r')

Here is my Oozie workflow.

<workflow-app xmlns="uri:oozie:workflow:0.4" name="hello">
    <start to="shell-check-hour" />
    <action name="shell-check-hour">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>hello.py</exec>
            <file>hdfs://localhost:8020/user/test/hello.py</file>
            <capture-output />
        </shell>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end" />
</workflow-app>

If I try to give an absolute path to where the file is, I get permission denied.

filein = '/home/test/myfile.txt'

If I try just the file name, I get file not found. I don't understand this, the Python script and file are in the same HDFS location

filein = 'myfile.txt'

Maybe I need to modify my Oozie script to add the file as a param as well?

Was it helpful?

Solution

Turns out that I needed to make a slight modification to my Python script to enable it to open the file from HDFS. Here is the sample code to open and read the file

import subprocess

''''''

cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"], stdout=subprocess.PIPE)
    for line in cat.stdout:
        print line
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top