Some newer x86/x86_64 CPUs have the "RDTSCP
" variant of RDTSC instruction:
http://ref.x86asm.net/coder32-abc.html#R
RDTSC EAX EDX IA32_TIM… 0F 31
P1+ f2 Read Time-Stamp Counter
RDTSCP EAX EDX ECX ... 0F 01 F9 7
C7+ f2 Read Time-Stamp Counter and Processor ID
C7+ means that "0x0F01F9" instruction was introduced in some "Core i7"...
Opcodes
Hex Mnemonic Encoding Long Mode Legacy Mode Description
0F 01 F9 RDTSCP A Valid Valid
Read 64-bit time-stamp counter and 32-bit IA32_TSC_AUX value into EDX:EAX and ECX.
OS should write core id into IA32_TSC_AUX (Linux does), and this value is accessible with RDTSCP
.
Linux encodes numa id (<<12) and core id (8bit) into TSC_AUX:
341 if (cpu_has(&cpu_data(cpu), X86_FEATURE_RDTSCP))
342 write_rdtscp_aux((node << 12) | cpu);
343
344 /*
345 * Store cpu number in limit so that it can be loaded quickly
346 * in user space in vgetcpu. (12 bits for the CPU and 8 bits for the node)
347 */
In Linux there is also vsyscall getcpu
("__vdso_getcpu") to access cpu id via rdtscp
(if cpu has the instruction) or via GDT - GDT_ENTRY_PER_CPU: __getcpu in include/asm/vsyscall.h from 3.13. From the man page:
getcpu() was added in kernel 2.6.19 for x86_64 and i386.
Linux makes a best effort to make this call as fast possible. The
intention of getcpu() is to allow programs to make optimizations with
per-CPU data or for NUMA optimization.
From some intel manuals: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf#page=15
3.2 Improvements Using RDTSCP Instruction
The RDTSCP instruction is described in the Intel® 64 and IA-32 Architectures
Software Developer’s Manual Volume 2B ([3]) as an assembly instruction that, at
the same time, reads the timestamp register and the CPU identifier. The value of
the timestamp register is stored into the EDX and EAX registers; the value of the
CPU id is stored into the ECX register (“On processors that support the Intel 64
architecture, the high order 32 bits of each of RAX, RDX, and RCX are cleared”).
What is interesting in this case is the “pseudo” serializing property of RDTSCP. The
manual states:
“The RDTSCP instruction waits until all previous instructions have been executed
before reading the counter. However, subsequent instructions may begin execution
before the read operation is performed.”
This means that this instruction guarantees that everything that is above its call in
the source code is executed before the instruction itself is called. It cannot,
however, guarantee that - for optimization purposes - the CPU will not execute,
before the RDTSCP call, instructions that, in the source code, are placed after the
RDTSCP function call itself. If this happens, a contamination caused by instructions
in the source code that come after the RDTSCP will occur in the code under
measurement. .
Also, description is available here http://www.felixcloutier.com/x86/RDTSCP.html which is clone of https://github.com/zneak/x86doc
UPDATE: There will be separate instruction RDPID
just to read IA32_TSC_AUX register without timestamp counter (as RDTSCP does
https://hjlebbink.github.io/x86doc/html/RDPID.html
Reads the value of the IA32_TSC_AUX MSR (address C0000103H) into the destination register. The value of CS.D and operand-size prefixes (66H and REX.W) do not affect the behavior of the RDPID instruction.
F3 0F C7 /7 RDPID r32 M N.E./V RDPID Read IA32_TSC_AUX into r32.
F3 0F C7 /7 RDPID r64 M V/N.E. RDPID Read IA32_TSC_AUX into r64.
It will be enabled since "Ice Lake" microarchitecture (2018), as declared in https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf 319433-030 OCTOBER 2017