Вопрос

I would like to know the optimal number of threads I can run. Normally, this equals to Runtime.getRuntime().availableProcessors().

However, the returned number is twice as high on a CPU supporting hyper threading. Now, for some tasks hyper threading is good, but for others it does nothing. In my case, I suspect, it does nothing and so I wish to know whether I have to divide the number returned by Runtime.getRuntime().availableProcessors() in two.

For that I have to deduce whether the CPU is hyper threading. Hence my question - how can I do it in Java?

Thanks.

EDIT

OK, I have benchmarked my code. Here is my environment:

  • Lenovo ThinkPad W510 (i.e. i7 CPU with 4 cores and hyperthreading), 16G of RAM
  • Windows 7
  • 84 zipped CSV files with zipped sizes ranging from 105M to 16M
  • All the files are read one by one in the main thread - no multithreading access to the HD.
  • Each CSV file row contains some data, which is parsed and a fast context-free test determines whether the row is relevant.
  • Each relevant row contains two doubles (representing longitude and latitude, for the curious), which are coerced into a single Long, which is then stored in a shared hash set.

Thus the worker threads do not read anything from the HD, but they do occupy themselves with unzipping and parsing the contents (using the opencsv library).

Below is the code, w/o the boring details:

public void work(File dir) throws IOException, InterruptedException {
  Set<Long> allCoordinates = Collections.newSetFromMap(new ConcurrentHashMap<Long, Boolean>());
  int n = 6;
  // NO WAITING QUEUE !
  ThreadPoolExecutor exec = new ThreadPoolExecutor(n, n, 0L, TimeUnit.MILLISECONDS, new SynchronousQueue<Runnable>());
  StopWatch sw1 = new StopWatch();
  StopWatch sw2 = new StopWatch();
  sw1.start();
  sw2.start();
  sw2.suspend();
  for (WorkItem wi : m_workItems) {
    for (File file : dir.listFiles(wi.fileNameFilter)) {
      MyTask task;
      try {
        sw2.resume();
        // The only reading from the HD occurs here:
        task = new MyTask(file, m_coordinateCollector, allCoordinates, wi.headerClass, wi.rowClass);
        sw2.suspend();
      } catch (IOException exc) {
        System.err.println(String.format("Failed to read %s - %s", file.getName(), exc.getMessage()));
        continue;
      }
      boolean retry = true;
      while (retry) {
        int count = exec.getActiveCount();
        try {
          // Fails if the maximum of the worker threads was created and all are busy.
          // This prevents us from loading all the files in memory and getting the OOM exception.
          exec.submit(task);
          retry = false;
        } catch (RejectedExecutionException exc) {
          // Wait for any worker thread to finish
          while (exec.getActiveCount() == count) {
            Thread.sleep(100);
          }
        }
      }
    }
  }
  exec.shutdown();
  exec.awaitTermination(1, TimeUnit.HOURS);
  sw1.stop();
  sw2.stop();
  System.out.println(String.format("Max concurrent threads = %d", n));
  System.out.println(String.format("Total file count = %d", m_stats.getFileCount()));
  System.out.println(String.format("Total lines = %d", m_stats.getTotalLineCount()));
  System.out.println(String.format("Total good lines = %d", m_stats.getGoodLineCount()));
  System.out.println(String.format("Total coordinates = %d", allCoordinates.size()));
  System.out.println(String.format("Overall elapsed time = %d sec, excluding I/O = %d sec", sw1.getTime() / 1000, (sw1.getTime() - sw2.getTime()) / 1000));
}

public class MyTask<H extends CsvFileHeader, R extends CsvFileRow<H>> implements Runnable {
  private final byte[] m_buffer;
  private final String m_name;
  private final CoordinateCollector m_coordinateCollector;
  private final Set<Long> m_allCoordinates;
  private final Class<H> m_headerClass;
  private final Class<R> m_rowClass;

  public MyTask(File file, CoordinateCollector coordinateCollector, Set<Long> allCoordinates,
                Class<H> headerClass, Class<R> rowClass) throws IOException {
    m_coordinateCollector = coordinateCollector;
    m_allCoordinates = allCoordinates;
    m_headerClass = headerClass;
    m_rowClass = rowClass;
    m_name = file.getName();
    m_buffer = Files.toByteArray(file);
  }

  @Override
  public void run() {
    try {
      m_coordinateCollector.collect(m_name, m_buffer, m_allCoordinates, m_headerClass, m_rowClass);
    } catch (IOException e) {
      e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
    }
  }
}

Please, find below the results (I have slightly changed the output to omit the repeating parts):

Max concurrent threads = 4
Total file count = 84
Total lines = 56395333
Total good lines = 35119231
Total coordinates = 987045
Overall elapsed time = 274 sec, excluding I/O = 266 sec

Max concurrent threads = 6
Overall elapsed time = 218 sec, excluding I/O = 209 sec

Max concurrent threads = 7
Overall elapsed time = 209 sec, excluding I/O = 199 sec

Max concurrent threads = 8
Overall elapsed time = 201 sec, excluding I/O = 192 sec

Max concurrent threads = 9
Overall elapsed time = 198 sec, excluding I/O = 186 sec

You are free to draw your own conclusions, but mine is that hyperthreading does improve the performance in my concrete case. Also, having 6 worker threads seems to be the right choice for this task and my machine.

Это было полезно?

Решение

For Windows, if the number of logical cores is higher than the number of cores, you have hyper-threading enabled. Read more about it here.

You can use wmic to find this information:

C:\WINDOWS\system32>wmic CPU Get NumberOfCores,NumberOfLogicalProcessors /Format:List


NumberOfCores=4
NumberOfLogicalProcessors=8

Hence, my system has hyper-threading. The amount of logical processors is double the cores.

But you may not even need to know. Runtime.getRuntime().availableProcessors() already returns the amount of logical processors.

A full example on getting the physical cores count (Windows only):

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

public class PhysicalCores
{
    public static void main(String[] arguments) throws IOException, InterruptedException
    {
        int physicalNumberOfCores = getPhysicalNumberOfCores();
        System.out.println(physicalNumberOfCores);
    }

    private static int getPhysicalNumberOfCores() throws IOException, InterruptedException
    {
        ProcessBuilder processBuilder = new ProcessBuilder("wmic", "CPU", "Get", "NumberOfCores");
        processBuilder.redirectErrorStream(true);
        Process process = processBuilder.start();
        String processOutput = getProcessOutput(process);
        String[] lines = processOutput.split(System.lineSeparator());
        return Integer.parseInt(lines[2]);
    }

    private static String getProcessOutput(Process process) throws IOException, InterruptedException
    {
        StringBuilder processOutput = new StringBuilder();

        try (BufferedReader processOutputReader = new BufferedReader(
                new InputStreamReader(process.getInputStream())))
        {
            String readLine;

            while ((readLine = processOutputReader.readLine()) != null)
            {
                processOutput.append(readLine);
                processOutput.append(System.lineSeparator());
            }

            process.waitFor();
        }

        return processOutput.toString().trim();
    }
}

Другие советы

Unfortunately, this is not possible from java. If you know that the app will run on a modern linux variant, you can read the file /proc/cpuinfo and infer if HT is enabled.

Reading the output of this command does the trick:

grep -i "physical id" /proc/cpuinfo | sort -u | wc -l

The is no reliable way to determine whether you have hyper threading which is on, hyper threading which is off or no hyper threading.

Instead a better approach is to do a first calibration the first time you run (or each time) which runs a first test which determines which approach to use.

Another approach is to use all the processors even if hyper threading doesn't help (provided it doesn't make the code dramatically slower)

Few more musings:

  • Hyperthreading may have more than 2 threads per code (Sparc can have 8)
  • Garbage collector needs CPU time to work as well.
  • Hyperthreading may help a concurrent GC - or may not; or the JVM may request to be exclusive (not hyperthreading) owner of the core. So hampering the GC to get your better results during a test could be hurting in the long run.
  • Hyperthreading is usually useful if there are cache-misses, so the CPU is not stalled but switched to another task. Hence, "to hyperthreading or not" would depend both on the workload and the CPU L1/L2 cache size/memory speed, etc.
  • OS's may have bias towards/against some threads and Thread.setPriority may not be honored (on Linux it's usually not honored).
  • It's possible to set affinity of the process, disallowing some cores. So knowing that there is hyperthreading won't be of any significant virtue in such cases.

That being said: you should have a setting for the size of the worker threads and recommendation how to set up given the specifics of the architecture.

There is no way to determine that from pure Java (after all a logical core is a core, if its implemented using HT or not). Beware that the solutions proposed so far can solve your requirement (as you asked), but not only Intel CPU's offer a form of hyperthreading (Sparc comes to mind and I'm sure there are others as well).

You also did not take into account that even if you determine the system uses HT, you will not be able to control a threads affinity with the cores from Java. So you are still at the mercy of the OS's thread scheduler. While there are plausible scenarios where less threads could perform better (because of reduced cache trashing) there is no way to determine statically how many threads should be used (after all CPU's do have very different cache sizes (a range from 256KB on the low end to >16MB in servers can be reasonably expected nowadays. And this is bound to change with new each generation).

Simply make it a configurable setting, any attempt to determine this without exactly knowing the target system is futile.

There is no way to do that, One thing you can do is Create a thread pool of Runtime.getRuntime().availableProcessors() Threads in your application and use as in when request comes in.

This way you can have 0 - Runtime.getRuntime().availableProcessors() number of threads.

You may not be able to query the OS or Runtime reliably, but you could run a quick benchmark.

Progressively increase spin-lock threads, testing to see if each new thread iterates as well as the previous. Once the performance of one of the threads is less than around half each of the previous tests (at least for intel, I don't know about SPARC), you know you have started sharing a core with a hyperthread.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top