Question

I have been facing this problem several times. I ssh to amazon EC2 or EMR machine using the command line and run some hive query in interactive mode. Which shows the gradual progress of mapper and reducer phase. But, let's say for some network problem i get disconnected from EC2 or EMR machine. Now, will my hive query still be running? If, yes then can i check the progress report like we see on hive console again?

Was it helpful?

Solution

So, three things you can do:

  1. Use the web interface. Amazon gives you access to this as detailed here

  2. Run the query in screen and then if you get disconnected, just reconnect and reattach to your previous session. You can also point the logging to happen to some file instead of stdout so you can then just reopen this when you log back on to the machine

  3. Run the query using nohup so it's not attached to any session and will keep running on its own even after you get kicked off. Again, pipe all logging to some file rather than stdout and then just check that file or tail it once you log back on.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top