Question

I have setup AWS EMR. I SSH into master node. I wanted to copy a file into hdfs system. That small line of code in my program which does this is:

os.system('/home/hadoop/bin/hdfs dfs -put %s PATH_to_HADOOP' % tmp_output)

I want to enter the path to my hdfs file system.

I do

   [ec2-user@ip-172-31-0-185 input]$ /home/hadoop/bin/hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2014-04-14 22:21 /hbase
drwxrwx---   - hadoop supergroup          0 2014-04-14 22:19 /tmp

I try

    [ec2-user@ip-172-31-0-185 input]$ /home/hadoop/bin/hdfs dfs -mkdir /tmp/stockmarkets
mkdir: Permission denied: user=ec2-user, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwxrwx---

So, to add ec2-user to use hadoop I followed these instructions:

http://cloudcelebrity.wordpress.com/2013/06/05/handling-permission-denied-error-on-hdfs/

But after I write (am substituting ubuntu their for ec2-user)

sudo adduser ec2-user hadoop

instead of getting an add message, I get :

Usage: useradd [options] LOGIN
Options:
  -b, --base-dir BASE_DIR       base directory for the home directory of the
                                new account
  -c, --comment COMMENT         GECOS field of the new account
  -d, --home-dir HOME_DIR       home directory of the new account
  -D, --defaults                print or change default useradd configuration
  -e, --expiredate EXPIRE_DATE  expiration date of the new account
  -f, --inactive INACTIVE       password inactivity period of the new account
  -g, --gid GROUP               name or ID of the primary group of the new
                                account
  -G, --groups GROUPS           list of supplementary groups of the new
                                account
  -h, --help                    display this help message and exit
  -k, --skel SKEL_DIR           use this alternative skeleton directory
  -K, --key KEY=VALUE           override /etc/login.defs defaults
  -l, --no-log-init             do not add the user to the lastlog and
                                faillog databases
  -m, --create-home             create the user's home directory
  -M, --no-create-home          do not create the user's home directory
  -N, --no-user-group           do not create a group with the same name as
                                the user
  -o, --non-unique              allow to create users with duplicate
                                (non-unique) UID
  -p, --password PASSWORD       encrypted password of the new account
  -r, --system                  create a system account
  -s, --shell SHELL             login shell of the new account
  -u, --uid UID                 user ID of the new account
  -U, --user-group              create a group with the same name as the user
  -Z, --selinux-user SEUSER     use a specific SEUSER for the SELinux user mapping

So am all confused and screwed.. Please HELP>....

Was it helpful?

Solution

SSH in as hadoop@(publicIP) for Amazon EMR.

From there you can do anything you like with HDFS without having to "su." I just did an mkdir and ran distcp and a streaming job. I do everything as hadoop@, as per the EMR instructions.

OTHER TIPS

If you look at the permission for the HDFS directory /tmp, you can see that /tmp is owned by user hadoop and ec2-user doesn't have permission to created files/directories inside /tmp

Assign proper permission for the directory /tmp, use the following command

[ec2-user@ip-172-31-0-185 input]$ sudo -su hadoop /home/hadoop/bin/hdfs dfs -chmod 777 /tmp

Now try create directory inside /tmp HDFS location

[ec2-user@ip-172-31-0-185 input]$ /home/hadoop/bin/hdfs dfs -mkdir /tmp/stockmarkets
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top