Although I suspect the answer to this question may be something along the lines of “popularity is irrelevant; each algorithm has its trade-offs”, I’m interested in a list of popular sorting algorithms used in practice.
I was browsing Wikipedia’s page on sorting algorithms and while there are not that many algorithms listed there, I still don’t see all of them being used. For example, surely selection sort wouldn’t be used in practice since it has a O(n^2)
right? So, which are used in practice a lot?
6
First, read What are the criteria for choosing a sort algorithm? We’ll wait.
Are you done reading? OK. By way of answering your actual question, Microsoft chose an “unstable” quicksort algorithm for their Array.Sort()
method. I figure that’s as good an endorsement of a sorting algorithm as any other.
First of all note that sorting is the most heavily studied problem in CS. And for good reason. At one point the majority of all computer time went into sorting. Hash lookups likely exceed that today, but I would expect that sorting is still in the top 2 productive uses of CPU time. (If you’re dealing with big data sets, it is almost certainly #1.)
Therefore there are a lot of sorting algorithms, and people have studied exactly when each is best. Anything you can think of from the cost of a branch misprediction to the speed of accessing different kinds of data storage has been heavily analyzed. People have tried a ton of sorts and found useful roles for them. Yes, even selection sort.
For small lists, quicksort was long the favorite. These days Timsort is gaining popularity (used by default in Python and Java). Many, many other sorting algorithms have been used, and in general whatever is the default in your language will not prove to be a bottleneck.
For large data sets, the clear winner is merge sort. Its ability to stream data (which works well with large disks), ease of parallelization, and lack of hot spots give it a huge boost. Of course there are many details that you need to figure out. And so there are many different specific sorts that are used, but under the hood the good ones are overwhelmingly based on merge sort.
See http://sortbenchmark.org/ for a picture of what is possible at the high end, and descriptions of how they do it.
Every serious programming language has built in sorting algorithms. For small lists you should just use that and you’ll be fine. If your data is larger, just use the Unix sort utility (which, for large data, will be a merge sort under the hood). If your data is so big that it needs to be parallelized, you should find something pre-built for your use case. (MapReduce frameworks use sort internally, so Hadoop is often “good enough”.) If you have some other custom need not met by those three use cases, my experience says that you’ll likely be implementing some variation on a merge sort.
1
The typical default sort implementation for most of the languages I’ve used is either Mergesort or Quicksort. Python and increasingly some other languages use Timsort which is a Mergesort that looks for preordered chunks to speed up the sort. Typically Mergesort would be picked if your worst case is important but your space isn’t and Quicksort if the reverse is true.
You’re going to only see a fairly small subset not just because of the time and space complexities but also because parallelization favors algorithms that divide and conquer which both Mergesort and Quicksort make use of.
Parallelization opens up new sorts though like the Bitonic Sort which operates in log^2(n) time though the space is nlog^2(n).
1