I am implementing an application in Haskell and, for sorting, I use the library function Data.List.sort
. However, I was wondering whether this is the fastest sort implementation in the Haskell standard library (perhaps lists are not the best choice for efficient sorting).
I have found different alternatives, e.g. heap sort on arrays, sort on sequences (but the documentation does not say what kind of algorithm is used).
My question is: what is the fastest sorting implementation (container type + sort function) provided by the Haskell standard library? Is there some documentation page listing all library sort functions and comparing them wrt performance?
EDIT
To provide some more context, I am running a benchmark. I have written a simple program in C, Java, Python and Haskell that
- Reads 1000000 strings (lines) from a text file.
- Sorts the strings using a built-in (library) sorting algorithm.
- Writes the sorted list of strings to a file.
For each implementation, I only measure the sorting time (leaving out the time needed for disk IO).
Running the benchmark on Ubuntu 12.04, I get
- C (gcc 4.6.3,
qsort
onchar **
): 0.890 s - Java (OpenJDK 64-Bit 1.7.0_09,
Collections.sort()
onjava.util.LinkedList<String>
): 1.307 s - Python (Python 2.7.3,
list.sort()
): 1.072 s - Haskell (GHC 7.4.1,
Data.List.sort
on[Data.ByteString.UTF8.ByteString]
): 11.864 s
So I wonder if there is another data type / library function in Haskell that can give better performance.
11
The problem with Data.List.sort
is that it uses merge sort, which creates new lists during each pass. So a lot of time is spent on allocating and freeing memory.
Surprisingly, AFAIK there isn’t a sorting library for mutable arrays, which are likely to be faster. So I tried to make one and put it on github: marray-sort. It needs rigorous testing, polishing and optimizing too, but so far it already seems to be significantly faster than Data.List.sort
.
If you make any experiments with it, I’d be happy to see the results. I put your (slightly modified) benchmark to src-test/Test.hs for convenience. Don’t forget to compile everything with -O2
to trigger the necessary compiler optimizations.
Edit: I found out now that there is an implementation if introsort for mutable vectors in vector-algorithms. According to my measurements, it is slightly faster (5-10%) than my attempt above for MArray
s
See also: How does one sort with Data.Vector.Generic.Mutable?.
2