My spinlock obviously has a busy-spin loop whilst the lock fails to be acquired:
while(try_lock() == false)
{
// Use _mm_pause() or _tpause() here?
}
I noticed I don’t have _mm_pause()
inside the loop. I understand omitting this can cause performance degradation regarding memory barriers/fences/ordering?
Before adding _mm_pause()
I discovered _tpause()
:
https://www.felixcloutier.com/x86/tpause
However, from the Intel Intrinsics Guide it usage seems slightly more complicated.
I would like to maximize performance/not concerned with power consumption.
Which should I use and if it’s _t_pause()
, how is it used correctly? I cannot find any example usage, even on Github.
Architecture will be 2022+ Intel Xeon models.
EDIT:
I’ve just noticed _mm_pause()
latency is 140 cycles?! I can’t see a figure for _tpause()
though.