Question

I am modelling and solving a linear program (LP) using single-threaded CPLEX with Java (under Linux). My aim is to solve multiple small LPs in parallel threads, ideally with each core independently solving one LP.

The problem is that solving two or more LPs in parallel is a lot slower than solving a single LP. In a very simple test, I concurrently started multiple identical processes that would solve the same LP. The difference in runtimes between starting a single process and starting multiple ones is huge:

  • 1 process: 180 s
  • 2 processes: 225 s
  • 3 processes: 280 s

Likewise, starting multiple threads from the same process to concurrently solve multiple LPs was a lot slower than solving a single LP.

I suspected memory access could be the bottleneck, but testing a piece of code that would often read and write to memory resulted in comparable runtimes:

  • 1 process: 87 s
  • 2 processes: 85 s
  • 3 processes 88 s

Any idea what could be causing the slowness?

The machine I am running this has 6 cores and plenty of memory to avoid any swapping. The IBM ILOG Cplex library is version 12.5.

Was it helpful?

Solution

If you have multiple threads enabled, CPLEX will use the concurrent optimizer. That means it runs the dual-simplex in one thread and the barrier algorithm in the remaining other threads. CPLEX reports the answer of the algorithm that finishes first. So if you have 6 cores, then it it likely using 1 thread for dual simplex and 5 threads for barrier. When you try to solve 3 LPs simultaneously, they are stealing cycles from each other. You can use just the dual simplex (which always uses just 1 thread) by setting the parameter RootAlg to 2. You could also set the parameter AuxRootThreads to limit the number of threads used by each LP solve. You should check which of the algorithms (Primal, Dual, or Barrier) work better on your problem. If Barrier, which can take advantage of multiple threads, works the best, you might be better off running each model serially and letting CPLEX do the parallelization.

OTHER TIPS

First, let me highlight that the comment by T.J.Crowder is right. Generally speaking, the real speed-up could not be linear in a multi-thread approach. In fact, consider that: 1) there are other processes that runs on the machine, not only yours; 2) the CPU architecture is very important, as well as other hardware (mother board, RAM, etc.), then these may affect the performance of your algorithm.

I suggest to play with the several options included in CPLEX, e.g. with parallel mode switch.

Finally, another important consideration is about your question: you are presuming that your implementation is good, but there is no evidence about that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top