I want to implement a Hash Table using Binary Search Trees to reduce the search complexity in the Separate Chaining process from O(n) (using linked list) to O(log n) (using BST). Can this be done, and if yes then how? It would be easier to understand if solution is step by step, implementation of the logic.
I want to reduce the Search time in the hashtable (build using separate chaining), but at the same time I do not want the insert time to increase. For my project i cannot change the hash function to reduce collisions. But due to scalability, collision are happening. I am trying to find a work around, so that i can somehow work with the best access and insert time in case a collision occurs… i.e. to manage the current state of thing than to restructure the entire algorithm. If it does not pan out then will have to restructure. So any ideas?
17
What you are asking for is possible given your constraints.
Analysis
A hash table’s strength is its fast lookup and insertion speed. To get that speed, one must forsake any semblance of order in the table: i.e. entries are all jumbled up. A list is acceptable to use as a table entry because while traversal is O(n), the lists tend to be short assuming the hash table is sufficiently large and the objects stored in the table are hashed using a good quality hashing algorithm.
A binary search tree (BST) has fast insertion and lookup at O(log2 n). It also imposes a restriction on the elements it stores: there must be some way to order the elements. Given two elements A and B stored in the tree, it must be possible to determine if A comes before B or if they have equivalent order.
A hash table imposes no such restriction: elements in a hash table must have two properties. First, there must be a way to determine if they are equivalent; second, there must be a way to calculate a deterministic hash code. Order is not a requirement.
If your hash table elements do have an order, then you can use a BST as a hash table entry to hold objects with the same hash code (collisions). However, due to a BST having O(log2 n) lookup and insertion, that means the worst case for the whole structure (hash table plus BST) is technically better than using a list as a table entry. Depending on the BST implementation it will require more storage than a list, but likely not much more.
Please note that normally the overhead and behavior of a BST brings nothing to the table in real world situations as hash table buckets, which is why the theoretical poor performance of a list is acceptable. In other words, the hash table compensates for the list’s weakness by placing fewer items in each list (bucket). However: the problem specifically stated that the hash table cannot increase in size, and collisions are more frequent than is typical in a hash table.
Implementation
I am not going to put code here because honestly it is not really necessary and you did not give a language anyway.
What I would do is simply copy whatever standard hash table your language’s standard library contains into a new class, then change the table bucket type from a list to a tree. Depending on the language and its standard library this may be a very trivial thing to do.
Normally I would not advocate copy and paste coding like this. However, it is an easy way to get a battle-tested data structure very quickly.
4
Using a binary tree for collision handling in a hash table isn’t just possible – it has been done.
Walter Bright is best known as the inventor of the D programming language, but also wrote an ECMAScript variant called DMDScript. In the past, a headline claim of DMDScript (or possibly an ancestor – I seem to remember the name DScript) was that its hashtables tended to outperform those in a lot of similar languages. The reason – collision handling using binary trees.
I don’t remember exactly where this is from, but the trees used were naive binary trees, with no partial balance scheme (not AVL, red-black or whatever) which makes sense as assuming the hashtable itself gets resized when it gets overfull and you don’t get absurdly improbable rates of hash collisions, the binary trees should always be small. Basically, the worst case is still the same as using a linked list for collision handling (except you pay the price of two pointers per node instead of one) but the average case reduces the amount of searching within each hash bucket.