Domanda

I have two versions of a program with the same purpose: to calculate how many prime numbers there are between 0 and n.

The first version uses concurrency, a Callable class "does the math" and the results are retrieved though a Future array. There are as many created threads as processors in my computer (4).

The second version is implemented via RMI. All four servers are registered in the local host. The servers are working in paralell as well, obviously.

I would expect the second version to be slower than the first, because I guess the network would involve latency and the other version would just run the program concurrently.

However, the RMI version is around twice faster than the paralel version... Why is this happening?!

I didn't paste any code because it'd be huge, but ask for it in case you need it and I'll see what I can do...

EDIT: adding the code. I commented the sections where unnecessary code was to be posted.

Paralell version

public class taskPrimes implements Callable
{
  private final long x;
  private final long y;
  private Long total = new Long(0);

  public taskPrimes(long x, long y)
  {
    this.x = x;
    this.y = y;
  }

  public static boolean isPrime(long n)
  {
    if (n<=1) return false ;
    for (long i=2; i<=Math.sqrt(n); i++)
      if (n%i == 0) return false;
    return true;
  } 

  public Long call()
  {   
    for (long i=linf; i<=lsup;i++)
      if (isPrime(i)) total++;
  return total;
  }
}

public class paralelPrimes 
{
  public static void main(String[] args) throws Exception 
  {
    // here some variables...
    int nTasks = Runtime.getRuntime().availableProcessors();

    ArrayList<Future<Long>> partial = new ArrayList<Future<Long>>(); 
    ThreadPoolExecutor ept = new ThreadPoolExecutor();
    for(int i=0; i<nTasks; i++)
    {
      partial.add(ept.submit(new taskPrimes(x, y))); // x and y are the limits of the range
      // sliding window here
    }  
    for(Future<Long> iterator:partial)
      try { total +=  iterator.get(); } catch (Exception e) {}
  }   
}

RMI version

Server

public class serverPrimes 
    extends UnicastRemoteObject 
        implements interfacePrimes
{
    public serverPrimes() throws RemoteException {}

    @Override
    public int primes(int x, int y) throws RemoteException
    {
        int total = 0;
        for(int i=x; i<=y; i++)
            if(isPrime(i)) total++;
        return total;
    }

    @Override
    public boolean isPrime(int n) throws RemoteException
    {
        if (n<=1) return false;
        for (int i=2; i<=Math.sqrt(n); i++)
            if (n%i == 0) return false ;
        return true;
    }

    public static void main(String[] args) throws Exception 
    {
        interfacePrimes RemoteObject1 = new serverPrimes();
        interfacePrimes RemoteObject2 = new serverPrimes();
        interfacePrimes RemoteObject3 = new serverPrimes();
        interfacePrimes RemoteObject4 = new serverPrimes();

        Naming.bind("Server1", RemoteObject1);
        Naming.bind("Server2", RemoteObject2);
        Naming.bind("Server3", RemoteObject3);
        Naming.bind("Server4", RemoteObject4);
    }
}

Client

public class clientPrimes implements Runnable
{
    private int x;
    private int y;
    private interfacePrimes RemoteObjectReference;
    private static AtomicInteger total = new AtomicInteger();

    public clientPrimes(int x, int y, interfacePrimes RemoteObjectReference)
    {
        this.x = x;
        this.y = y;
        this.RemoteObjectReference = RemoteObjectReference;
    }

    @Override
    public void run()
    {
        try
        {
            total.addAndGet(RemoteObjectReference.primes(x, y));
        }
        catch (RemoteException e) {}
    }

    public static void main(String[] args) throws Exception
    {
        // some variables here...
        int nServers = 4;
        ExecutorService e = Executors.newFixedThreadPool(nServers);

        double t = System.nanoTime();
        for (int i=1; i<=nServers; i++)
        {
            e.submit(new clientPrimes(xVentana, yVentana, (interfacePrimes)Naming.lookup("//localhost/Server"+i)));
            // sliding window here
        }
        e.shutdown();
        while(!e.isTerminated());
        t = System.nanoTime()-t;
    }
}
È stato utile?

Soluzione

One interesting thing to consider is that, by default, the jvm runs in client mode. This means that threads won't span over the cores in the most agressive way. Trying to run the program with -server option can influence the result although, as mentioned, the algorithm design is crucial the concurrent version may have bottlenecks. There is little chance that, given the problem, there is a bottleneck in your algorithm, but it sure needs to be considered.

The rmi version truly runs in parallel because each object runs on a different machine, since this tends to be a processing problem more than a communication problem then the latency plays a non important part.

[UPDATE]

Now that I saw your code lets get into some more details.

You are relying on the ThreadExecutorPool and Future to perform the thread control and synchronization for you, this means (by the documentation) that your running objects will be allocated on an existing thread and once your object finishes its computation the thread will be returned to that pool, on the other hand the Future will check periodically if the computation has finished so it can collect the values.

This scenario would be best fit for some computation that is performed periodically in a way that the ThreadPool could increase performance by having the threads pre-allocated (having the overhead of thread creation only on the first time the threads aren't there).

Your implementation is correct, but it is more centered on the programmer convinience (there is nothing wrong with this, I am always defending this point of view) than on system performance.

The RMI version performs differently due (mainly) of 2 things:

1 - you said you are running on the same machine, most OS will recognize localhost, 127.0.0.1 or even the real self ip address as being its self address and perform some optimizations on the communication, so little overhead from the network here.

2 - the RMI system will create a separate thread for each server object you created (as I mentioned before) and these servers will starting computing as soon as they get called.

Things you should try to experiment:

1 - Try to run your RMI version truly on a network, if you can configure it for 10Mbps would be better to see communication overhead (although, since it is a one shot communication it may have little impact to notice, you could chance you client application to call for the calculation multiple times, and then you see the lattency in the way)

2 - Try to change you parallel implementation to use Threads directly with no Future (you could use Thread.join to monitor execution end) and then use the -server option on the machine (although, sometimes, the JVM performs a check to see if the machine configuration can be truly said to be a server and will decline to move to that profile). The main problem is that if your threads doesn't get to use all the computer cores you won't see any performance improvent. Also try to perform the calculations many time to overcome the overhead of thread creation.

Hope that helps to elucidate the situation :)

Cheers

Altri suggerimenti

It depends on how your Algorithms are designed for parallel and concurrent solutions. There is no a criteria where parallel must be better than concurrent or viceversa. By example if your concurrent solution has many synchronized blocks it can drop your performance, in the other case maybe the communication in your parallel algorithm is minimum then there is no overhead on network.

If you can get a copy o the book of Peter Pacheco it can clear some ideas:http://www.cs.usfca.edu/~peter/ipp/

Given the details you provided, it will mostly depend on how large a range you're using, and how efficiently you distribute the work to the servers.

For instance, I'll bet that for a small range N you will probably have no speedup from distributing via RMI. In this case, the RMI (network) overhead will likely outweigh the benefit of distributing over multiple servers. When N becomes large, and with an efficient distribution algorithm, this overhead will become more and more negligible with regards to the actual computation time.

For example, assuming homogenous servers, a relatively efficient distribution could be to tell each server to compute the primes for all the numbers n such that n % P = i, where n <= N, P is the number of servers, i is an index in the range [0, P-1] assigned to each server, and % is the modulo operation.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top