I want to implement an in-memory datastore for a web service in Haskell. I want to run transactions in the STM
monad.
When I google hash table steam Haskell I only get this: Data. BTree. HashTable. STM.
The module name and complexities suggest that this is implemented as a tree. I would think that an array should be more efficient for mutable hash tables.
Is there a reason to avoid using an array for an STM
hashtable?
Do I gain anything with this steam hash table or should I just use a steam ref to an IntMap
?
6
The problem with a hash table implementation based directly on an array is that some of operations on it will inevitably require linear time array resizing (i.e., creating a bigger/smaller array and copying all data to it). There are multiple standard algorithms that approach this problem, like Linear Hashing or Cuckoo Hashing.
Not so long ago another algorithm named Hash Array Mapped Trie emerged, which gained a great popularity across functional languages like Clojure, Scala and, of course, Haskell (with the “unordered-containers” and “hamtmap” libraries) due to support of persistent data structures.
Not long ago I released an STM-specialized containers library based on that algorithm named “stm-containers”, which should fit your task perfectly. You can also check out an introductory blog post, covering a motivation behind the library and providing benchmarks.
1
The implementation you reference is part of a package for implementing a concurrent B-Tree. The HashTable itself is implemented as an array of TVars of Data.Map objects.
The quoted complexity values are worst-case. Remember that hashtables are usually O(N) worst case for lookup, insertion, and deletion. Using Map for the buckets brings it down to O(log(N)).