Pergunta

I need to run a Java program on many remote machines. I'm using ssh in a loop and calling a remote script that runs the Java program.

As you can imagine, this is used for testing a distributed system on a cluster.

Problem is, the script hangs right after I input the password for the first ssh session. It's probably a bash error, as the Java program runs fine on local.

The exact structure is this, a local bash script running many remote bash scripts. Each remote script compiles and runs a Java program. This Java program starts a separate thread to do some work. When a SIGINT signal is received, the Java thread is informed so it can exit cleanly.

I made a simplified working example.

EDIT: the code below now works (fixed for posterity)

Please, if you want to answer, don't change the structure of the code too much, or it won't resemble the original one and I won't be able to understand what's wrong.

Bash script that's run by hand

#!/bin/bash

function startBatch()
{
    #the problem was using -n
    ssh -f "$1" "cd $projectDir;./startBatch.sh $2"
}

function stopBatch()
{
    #the problem was using -n
    ssh -f "$1" "pkill -f jnode_.*"
}

projectDir=NetBeansProjects/Runner

#start nodes
nodeNumber=0
while read node; do
    startBatch "$node" "$nodeNumber"
    nodeNumber=$(($nodeNumber + 1))
done < ./nodes.txt

sleep 3

#stop nodes
while read node; do
    stopBatch "$node"
done < ./nodes.txt

Bash script that is run by the other script

#!/bin/bash

#this is a simplified working example
myNumber=$1
$(exec -a jnode_"$myNumber" java -cp build/classes runner.Runner "$myNumber.txt")

Here's a less simplified version of the above script. Check the second part of the accepted answer if you want proper logging.

#!/bin/bash

batchNumber=$1
procNumber=0
batchSize=3
while [ "$procNumber" -lt "$batchSize" ]; do
    procName="$batchNumber"_"$procNumber"
    #this line was no good
    #$(exec -a jnode_"$procName" java -cp build/classes runner.Runner "$procName.txt" &)
    #this line works fine
    exec -a jnode_"$procName" java -cp build/classes runner.Runner "$procName.txt" 1>/dev/null 2>/dev/null &
    procNumber=$(($procNumber + 1))
done

Java Runner (the thing that starts the thread)

import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintStream;

public class Runner {

    public static void main(String[] args) throws FileNotFoundException, InterruptedException {
        //redirect all outputs to a given file
        PrintStream output = new PrintStream(new File(args[0]));
        System.setOut(output);
        System.setErr(output);

        //controlled object
        final MyRunnable myRunnable = new MyRunnable();

        //shutdown the controlled process on command
        Runtime.getRuntime().addShutdownHook(new Thread() {
            @Override
            public void run() {
                myRunnable.stop = true;
            }
        });

        //run the process
        new Thread(myRunnable).start();
    }
}

Java MyRunnable (the running thread)

public class MyRunnable implements Runnable {

    public boolean stop = false;

    @Override
    public void run() {
        while (!stop) {
            try {
                System.out.println("running");
                Thread.sleep(1000);
            } catch (InterruptedException ex) {
                System.out.println("interrupted");
            }
        }
        System.out.println("stopping");
    }
}

Do not to use System.exit() in your Java program, or the shutdown hook will not be properly called (or completely executed). Send a SIGINT message from outside.


As it was mentioned int the comments, inputting passwords can be boring. Password-less RSA keys are an option, but we can do better. Let's add some security features.

Create the public/private key pair

ssh-keygen -t rsa
Enter file in which to save the key (home/your_user/.ssh/id_rsa): [input ~/.ssh/nameOfKey]
Enter passphrase (empty for no passphrase): [input a passphrase not weaker than your ssh password]

Add the public key to the authorized_keys file of the remote hosts, so it can be authenticated.

#first option (use proper command)
ssh-copy-id user@123.45.67.89

#second option (append the key at the end of the file)
cat ~/.ssh/nameOfKey.pub | ssh user@123.45.67.89 "cat >> ~/.ssh/authorized_keys"

Now, if we use ssh-agent, we can make it so that the passphrase(s) will be asked only once (when executing the first command). Notice, it will ask for the passphrases (the ones inputted when creating the keys), not for the actual ssh passwords.

#activate the agent
eval `ssh-agent`

#add the key, its passphrase will be asked
ssh-add ~/.ssh/keyName1

#add more keys, if needed
ssh-add ~/.ssh/keyName2

You now have a very simple yet functional testing framework for your distributed system. Have fun.

Foi útil?

Solução

When executing remote commands, SSH won't exit until the remote command is complete. Your remote script won't exit until the Java program is complete, and a Java program won't exit until all its non-daemon threads exit, and your Java program runs forever. Therefore, your server-side invocation of SSH runs forever (well, until you kill it through some other means) and your script hangs.

You need to decide on a way to make your SSH remote command return immediately. You have options. The easiest is probably just to invoke it with & on the server script, as:

ssh -n "$1" "cd $projectDir;./startBatch.sh $2 &"

A more robust option is to invoke java with & in the remote script, and let the server-side run as you have it now (no &), that way you have a chance to completely read e.g. error messages produced by the remote script.

Side Note: As for the password itself (which you will ultimately have to deal with once you get past the current hurdle), as mentioned in my comment on the question: One possibility is to create a passwordless key (ssh-keygen -t rsa) on your machine then stick the public key in the authorized_keys2 on each of the remote machines, then you won't have to deal with the passwords when connecting from your machine. The SSH password prompt tends to wreak havoc on script interactivity sometimes. Comes with associated security pitfalls, but they may not matter for your situation.


Responding to comments below. You have a couple of options. If you want to capture everything to the same log file, with append, don't redirect your program outputs, and just redirect everything the while loop does to a log, e.g.:

while [ "$procNumber" -lt "$batchSize" ]; do
    procName="$batchNumber"_"$procNumber"
    exec -a jnode_"$procName" java -cp build/classes runner.Runner "$procName.txt" &
    procNumber=$(($procNumber + 1))
done >> "$myLog" 2>&1

If you want one log per process, with append:

while [ "$procNumber" -lt "$batchSize" ]; do
    procName="$batchNumber"_"$procNumber"
    exec -a jnode_"$procName" java -cp build/classes runner.Runner "$procName.txt" >> "$myLog.$procNumber" 2>&1 &
    procNumber=$(($procNumber + 1))
done

You could also combine the above two, if you want to separate application output from the output of other commands in the loop.

Outras dicas

The man page for ssh suggests that using -n will not work if ssh needs to ask for a password. You should be using -f, or set up passwordless ssh so you don't need to enter the passwords.

Quoting from the Mac OS X man page for ssh gives:

 -n      Redirects stdin from /dev/null (actually, prevents reading from stdin).  This must be used when ssh is run in the background.  A common trick is to use this to run X11
         programs on a remote machine.  For example, ssh -n shadows.cs.hut.fi emacs & will start an emacs on shadows.cs.hut.fi, and the X11 connection will be automatically for-
         warded over an encrypted channel.  The ssh program will be put in the background.  (This does not work if ssh needs to ask for a password or passphrase; see also the -f
         option.)

And also:

 -f      Requests ssh to go to background just before command execution.  This is useful if ssh is going to ask for passwords or passphrases, but the user wants it in the back-
         ground.  This implies -n.  The recommended way to start X11 programs at a remote site is with something like ssh -f host xterm.

         If the ExitOnForwardFailure configuration option is set to ``yes'', then a client started with -f will wait for all remote port forwards to be successfully established
         before placing itself in the background.
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top