I was reading this blog post: http://kjellkod.wordpress.com/2012/02/25/why-you-should-never-ever-ever-use-linked-list-in-your-code-again/
and I found there a code to run: http://ideone.com/62Emz
I’ve compiled it using gcc 4.7.2 with g++ -std=c++11
on my old laptop with T5450 cpu with two cores with 32 Kbytes L1 cache each and 2 Megabytes of (common?) L2 cache and I have got this results:
********** Times in microseconds**********
Elements ADD (List, Vector) ERASE(List, Vector)
100, , 0, 0, 0, 0
200, , 0, 0, 15625, 0
500, , 0, 0, 0, 0
1000, , 15625, 0, 0, 15625
4000, , 109375, 140625, 46875, 31250
10000, , 750000, 875000, 312500, 187500
20000, , 2968750, 3468750, 1296875, 781250
40000, , 12000000, 13843750, 5359375, 3156250
Exiting test,. the whole measuring took 45375 milliseconds (45seconds or 0 minut
es)
Which actually says an opposite, at least for ADD
operation comparing to what author of that blog post says. List is faster for ADD
than Vector. What conclusions should I make from my results? Does it proof anything? What should I think or understand?
2
First of all, congratulations for doing the right thing and measuring rather than believing advice about efficiency! With modern computer architecture, it is harder than ever to predict how precisely a small change in data structures will affect runtimes, because of the many levels of the memory hierarchy, out-of-order execution, aggressive code optimizers etc. If your use case is indeed what you have measured, then yes, you will be better off with a list.
That said, that program doesn’t do a lot; in practice you will almost certainly access elements more often than you add or delete them, and probably in non-successive ways. I suspect that if you rerun benchmarks with a lot of random accesses, the results might turn around, but… remember what I just said? Never assume. Always measure. Happy profiling!
2
The title of that article is intentionally disingenuous. The author is making three points: that linked list traversals and searches are slow, which is true, that most people don’t realize how much slower they are than even linear array searches due to cache misses, which is probably true, and that most people don’t take search time into account when selecting a linked list, which is probably false.
His benchmark program includes the search time together with adds and deletes, which is the disingenuous part. When I select a linked list, the very first thing I ask myself is if the poor search time can either be worked around or if it is an acceptable trade off. I think that is true of most people.
For example, the last time I used a linked list was when I needed to keep some items sorted by how recently they were last accessed. The most common operation by far was moving an item from the middle of the list to the front, an O(1) operation for a linked list. However, there’s that pesky search time. It turned out to be convenient to store a pointer to the linked list node in another data structure I needed anyway, so the searches would be O(1) as well.
However, in other circumstances, I have just kept the O(n) searches because the semantics of a linked list simplified the code, and the performance hit was negligible. Your own test shows the difference measured in hundredths of a second for adding or deleting 4000 nodes. For most applications that’s completely unnoticeable.
The fact that you got different results than the author on his own benchmark is also very interesting, and illustrates well why you should do your own measuring. Your compiler, operating system, and standard library implementation, and even the other processes running on your system can all make a significant difference in things like how many cache misses your code will generate.
1
First a suggestion: You probably want to wait more than a couple of minutes before accepting an answer: You’ll probably get more answers that way.
Second: when benchmarking, you should always compile with optimizations turned on. Something like g++ -std=c++11 -O3 -march=native
should get you good results.
std::list
really is a poor data structure, and I have not yet found a situation in which it is the best. For instance, consider the case where you want to maintain a sorted data structure. You may think that std::list
, with O(1)
insertion and deletion time would be ideal, but in fact, it is sub-optimal!
For these tests, my contained type is a trivial class that contains an array of 4-byte integers. I test a std::array
of size 1, 10, and 100 (giving me an element size of 4, 40, and 400). I chose a std::array
because a move and a copy are the same thing. The initial element of the array is initialized to some random number between 0
and std::numeric_limits<uint32_t>::max()
. I create a std::vector
of some number of these (the x-axis), then I start the timer. I test iterating over that std::vector
for each element and inserting it in order (as sorted by that first element in the array, using operator<=
). To help avoid any clever compiler optimizations of removing any work, I then output the first element of the sorted container (which cannot be determined until the end) to some file and stop the timer.
These are my results for various sizes of elements:
We see that for 4 and 40 byte elements, std::vector
is better even at this inserting into the middle than std::list
, and for any element size you’re better off using a std::vector<unique_ptr>
than std::list
.
In general, I cannot come up with a reason to use std::list
over a class that wraps std::vector<std::unique_ptr>
to make it appear as though it has value semantics, other than the ability to copy (which I hope to fix by either submitting a value_ptr
class to Boost if one isn’t added soon, although there is discussion about that).
As an additional note, if this were real code, not a comparison / benchmark game, I would have written the std::vector
version much more differently. I would have copied the entire original container directly, then used std::sort
. I intend to write a more complete analysis on data structures, with a focus on the importance of data locality, and I will include timing on the “correct” way to do it. (the correct version blows all other methods out of the water, completing in much less than a second for 400,000 elements of size 400, which is 10 times more elements than I tested in my graphs).
I hope I explained everything well; these are some tests I ran several months ago and haven’t yet finished my notes on the subject.
Tests were done on an Intel i5 machine with 4 GiB of RAM. I believe I was using Fedora 17 x64.