i want to try optimizing this line of code:
for i in 0..len {
slots[values[i].slot].count += 1;
}
(both of these lists are extremely long and the same size)
i have already optimized it like this(2.5x faster), to acess it in cache:
const PREFETCH_DISTANCE: usize = 32;
for i in (0..len).rev() {
if i >= PREFETCH_DISTANCE {
let pfs = values[i - PREFETCH_DISTANCE].slot;
let fetch = &slots[pfs] as *const Slot as *const i8;
unsafe { _mm_prefetch(fetch, _MM_HINT_T0) }
}
slots[values[i].slot].count += 1;
}
however it is still not as fast, as i was expecting. for example, this code, for the same list length is more than twice as fast, just because it doesn’t index anything:
for val in &*arr {
let ptr = val as *const T;
let val = unsafe { ptr.read() };
let dif = integrated_expected_distribution(&val) - start_val;
debug_assert!(dif >= 0.);
let slot = (dif * end_mult) as usize;
debug_assert!(slot < len);
values.push(Slottable { value: val, slot });
}
Am i missing something else, that i need to preload for that indexing? Or am i missing something, that causes performance issues?