Question

I am on windows 7. I installed mrjob and when I run the example word_count file from the website, it works fine on the local machine. However, I get the error when attempting to run it on Amazon EMR. I even tested connecting to amazon s3 with just boto and it works.

mrjob.conf file

runners:
  emr:
    aws_access_key_id: xxxxxxxxxxxxx
    aws_region: us-east-1
    aws_secret_access_key: xxxxxxxx
    ec2_key_pair: bzy
    ec2_key_pair_file: C:\aa.pem
    ec2_instance_type: m1.small
    num_ec2_instances: 3
    s3_log_uri: s3://myunique/
    s3_scratch_uri: s3://myunique/

running the following in my cmd

python word_count.py -c mrjob.conf -r emr mytext.txt

it produces

enter image description here

Upon suggestions that it was a windows path related issue, I double checked the parse.py in the source code, and it seems to have the relevant check for dealing with window file types

# Used to check if the candidate candidate uri is actually a local windows path.
WINPATH_RE = re.compile(r"^[aA-zZ]:\\")


def is_windows_path(uri):
    """Return True if *uri* is a windows path."""
    if WINPATH_RE.match(uri):
        return True
    else:
        return False


def is_uri(uri):
    """Return True if *uri* is any sort of URI."""
    if is_windows_path(uri):
        return False

    return bool(urlparse(uri).scheme)

What I don't understand is that I am still getting the error even after the updated code, and I'm not sure how to move forward with this.

Was it helpful?

Solution

The problems you are experiencing is due to the windows file system using the escape character \ (backslash) in its path. Just double it up and you should not have any more problems.

Change your mrjob.conf file to:

runners:
  emr:
    aws_access_key_id: xxxxxxxxxxxxx
    aws_region: us-east-1
    aws_secret_access_key: xxxxxxxx
    ec2_key_pair: bzy
    ec2_key_pair_file: C:\\aa.pem
    ec2_instance_type: m1.small
    num_ec2_instances: 3
    s3_log_uri: s3://myunique/
    s3_scratch_uri: s3://myunique/

for more information go visit: http://yaml.org/spec/1.2/spec.html#id2770814

OTHER TIPS

I was having a similar problem, and found that my issue was that I had included code from various files with file paths inside of my job. If that is the case, the error noted will also occur.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top