I wonder if its possible to improve performance by getting the cpu to load something into cache while it still works on something else. I’m not very knowledgeable about the inner workings of a cpu and not even sure if the cpu can just wait for some data while doing something with other data already loaded. But somehow I feel like the designers would have thought about something similar for the workings of a cpu cache.
for example if I’m travessing a linked list, I need to fetch the next node before I can work on it, but what if it was possible to prime the cpu
typedef struct LinkedList{
LinkedList *next;
float data[64];
} LinkedList;
void do_stuff(LinkedList *li){
while (li){
__somehow_prefetch_this_into_cache_async(li->next);
for (int i=0; i < 64; i++){
// do some very complicated work while the next item still is being loaded into cache
li->data[i] += 1.0f;
}
li = li->next;
}
}
I’m working on some performance sensitive code that requires me to travesse a linked list and after some profiling I noticed that around 30% of the time in the code is spent on the line that loads the next item from the pointer, which I suspect is just because of memory latency.
StackOverflowToxicityVictim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.