Frage

I am aware that the Intel Xeon phi coprocessor SE10X has 61 cores and it is suggested to use only 60 cores since 1 core is used for the offload daemon. Also, since intel xeon phi coprocessor 5110P has 60 cores, is it suggested to use 59 cores?

War es hilfreich?

Lösung 2

I evaluated the performance of my test code on a intel xeon phi 7120p card. I observed that the code performance was best when no. of threads was a multiple of (number of cores - 1). This is because one of the cores is busy running the Linux micro-OS services.

In general:

No. of threads to create >= K * T * (N-1)
K = Positive integer (=2 works fine)  
T = No. of thread contexts on hardware(4 in my case)  
N = No. of cores present on hardware.  

Andere Tipps

From this this MIC-related FAQ:

Sensible Affinities

Under Intel MPSS many of the kernel services and daemons are affinitized to the “Bootstrap Processor” (BSP), which is the last physical core. This is also where the offload daemon runs the services required to support data transfer for offload. It is therefore generally sensible to avoid using this core for user code. (Indeed, as already discussed, the offload system does that automatically by removing the logical CPUs on the last core from the default affinity of offloaded processes).

From this OpenMP on MIC guide:

Offloaded programs inherit an affinity map that hides the last core, which is dedicated to offload system functions. Native programs can use all the cores, making the calculations required for balancing the threads slightly different.

None of these sources is specific to any MIC model, they're about the architecture; so it seems that if you offload to the device and don't use the default affinity, you should indeed avoid the last core.

When you execute your workload in offload mode (when application runs on the CPU and offloads some computation to the Xeon Phi) it is recommended to leave 1 core for offload runtime. There is a COI demon on the Xeon Phi side that runs four service threads to manage offload activity. Keep in mind that 1 physical core on Xeon Phi runs 4 hardware threads. In case of native execution model when application started directly on Xeon Phi card you could use all available cores. Since there are now any offload activity.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top