I’m writing a paper on the topic of applications affected more by memory performance than processor performance. I’ve got a lot written regarding the gap between the two, however I can’t seem to find anything about the applications that might be affected more by memory performance than by processor speed.
I suppose these are applications that make a large amount of memory references, but I have no idea what kind of applications would make such large number of references to make it stand out? Perhaps databases?
Can you please give me any pointers on how to proceed, some links to papers? I’m really stuck.
3
Applications that keep either many objects or deep object graphs in their cache may be candidates for your paper.
Speculation about Google’s search engine architecture might make for a good paper. Here’s why:
- Google’s search engine crawler may store its results in a massively distributed data store, but when you make a search engine query, your HTTP request does not ultimately query a database (it would be too slow).
- Instead, your HTTP request ultimately queries some sort of distributed in-memory cache. There shouldn’t be too much to calculate, and therefore not too much for the CPU to do. Querying the cache is the important part, and therefore, performance is heavily based on the way the cache is indexed and, indeed, memory performance.
Note:I’m sure that Google might provision state of the art machines with state of the art CPUs to crawl the web, but they are clustered so they don’t have to be. The web crawler algorithm is certainly of a distributed nature, such that adding more machines to to the cluster would improve web crawler performance.
Furthermore, crawling the web could conceivably be a CPU-intensive task, but querying for search results should not be.
2
Matrix transpose, matrix multiplication, convolution are all examples that come to mind. Many scientific computing problems are numerically quite simple (matrix multiplication is pretty trivial, it’s just a few adds and multiplies), but computing a single entry of a matrix requires accessing an entire row of the first operand an entire column of the second operand. The result is that, no matter how the matrix is stored in memory, it is difficult to implement in such a way to keep the relevant parts in cache (or even in memory in some cases).
A considerable amount of research has gone into developing cache-efficient versions of these operations, and the fastest implementations usually require system-dependent cache optimizations. See BLAS and FFTW (http://www.fftw.org/).
2