Pointer access and cache coherency

To my understanding, when you access a variable, that variable and the surrounding area of memory is put into the L1 cache. If I’m wrong here, please tell me.

Now my question is, say I have an array of pointers, and I want to iterate through them all performing operation X. If I first have to access the pointer, to get the address of the actual data, does this mean that the memory near the pointer will be put into cache, then the memory near the data, then back to the pointer, then to the data etc? i.e. Cache thrashing?

If this is the case, how might one go about keep cache coherency?

Whether or not the data will evict the pointer from cache depends on how how much memory you actually touch, how memory addresses are mapped to cache lines, and on the replacement policy. The most common replacement policy is Least Recently Used (sometimes just approximating the least used) where the new data will replace the data which was used the least recently in the cache. Memory addresses are commonly mapped in N-way sets where multiple memory addresses map to the same cache line, and each address can map to N cache lines (see here). N will probably be 2 or 4.

Under these conditions, you will almost assuredly not have cache thrashing if you are dereferencing a pointer then using the data IF the data you touch after dereferencing fits into the cache line. In both memory accesses, the least recently used cache line is definitely not the line with the pointer (in case of the data access), or the data (in case of the pointer dereferencing) since those were the last (most recently) touched cache lines. If you touch more memory than what is in the cache line with the data after dereferencing the pointer then it is possible that you will evict the pointer data from cache, but that depends on where the data is in memory, and how much you touch. It is possible that more data access will not evict the pointers if the addresses do not map to the cache line with the pointers or the pointer line is not the least recently used in the set of lines the addresses map to, but since the memory address of your data allocation is usually not known until the program runs it is hard to say either way just from source code.

It is probably possible to have your data and pointers allocated at specific memory addresses just to keep hitting cache as much as possible, but that would require you know the policies of the cache of the processor running your code, and that the compiler doesn’t make optimizations which change the data access patterns without you knowing. It would also be processor specific, with different cache policies on different processors making any micro optimization on this scale worthless. As I note below, it is very difficult to find information about specific policies on common processors so it is probably not worth thinking about this kind of optimization at all. In general, consecutive memory address accesses are the best way of hitting cache if you are only touching things once (like in your loop over the pointers). No other optimization is needed.

This is all assuming a single processor with a single thread. Multiple threads and multiple processors throw wrenches in reasoning about cache. Multiple threads on a single processor means that another thread could start running in the middle of your loop and when your thread starts running again, the cache is in a completely different state and the whole process starts over, but otherwise not much is different.

If you have a multi threaded loop (think one thread per pointer) running on multiple processors, then the whole cache thing is pretty difficult to reason about. The order of execution could be rearranged, or consecutive pointers could be dereferenced on different processors. Both situations means that the cache probably has not be primed by previous pointer dereferences or data accesses. In this case you should consider chunking data so a thread accesses a range of pointers instead of a single one so the cache can actually be used.

Remember that the memory you touch after dereferencing has to be SMALL. A few (read 2 or 4) ints would fit in a cache line, but an object with many members might not. If you access data stored more than a cache line’s size from the beginning of the object, you are evicting more from cache and thus have more chance of evicting the pointers.

Finding data about the caches in Intel and AMD processors is kind of difficult, but I found this link from 2010 saying that Intel uses a pseudo LRU policy on a 4-way associative cache for L2 cache, and the same policy for L1 cache (no set associativity is mentioned) on at least one of their processor architectures. It should be noted that the replacement policy is an integral component in making cache speed up programs. I would not be surprised in the least if they are more complicated or harder to reason about than what I presented above since unmodified LRU is not universally the best replacement policy. Actual replacement policies probably have much more complicated rules in the name of keeping cache hits high, but exact details would be scarce since they are a competitive edge in the performance benchmark wars.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 09:17

Thẻ: caching, pointers

Thiết kế website giá rẻ

Danh mục

Pointer access and cache coherency