Let me first explain how the spinlock code works. We have variables
uint16_t inc = 0x0100,
lock->slock; // I'll just call this "slock"
In the assembler code, inc
is referred to as %0
and slock
as %1
. Moreover, %b0
denotes the lower 8 bit, i.e. inc % 0x100
, and %h0
is inc / 0x100
.
Now:
lock xaddw %w0, %1 ;; "inc := slock" and "slock := inc + slock"
;; simultaneously (atomic exchange and increment)
1:
cmpb %h0, %b0 ;; "if (inc / 256 == inc % 256)"
je 2f ;; " goto 2;"
rep ; nop ;; "yield();"
movb %1, %b0 ;; "inc = slock;"
jmp 1b ;; "goto 1;"
2:
Comparing the upper and lower byte of inc
succeeds if inc
is zero. Since inc
has the value of the original lock, this happens if the lock is unlocked. In that case, the lock will already have been incremented to non-zero by the atomic exchange-and-increment, so it is now locked.
Otherwise, i.e. if the lock had already been locked, we pause a little, then update inc
to the current value of the lock, and try again.
(I believe there's actually a possiblity for an overflow, if 28 threads simultaneously attempt to get the spinlock. In that case, slock
is updated to 0x0100, 0x0200, ... 0xFF00, 0x0000, and would then appear to be unlocked. Maybe that's why the second version of the code uses a 16-bit wide counter, which would require 216 simultaneous attempts.)
Now let's insert a counter:
uint32_t spincounter = 0;
asm volatile( /* code below */
: "+Q" (inc), "+m" (lock->slock)
: "=r" (spincounter)
: "memory", "cc");
Now spincounter
may be referred to as %2
. We just need to increment the counter each time:
1:
inc %2
cmpb %h0, %b0
;; etc etc
I haven't tested this, but that's the general idea.