I’m benchmarking a program, and observing that cycle_activity.stalls_any
is significantly higher than cycle_activity.stalls_l1_miss
, which to me indicates that the cpu is spending time stalled on resources other than data access. What other resources might those be, and what perf counters correspond to them? In other words, I would expect that stalls_any = stalls_l1_miss + X + Y + ...
. What are X
and Y
?