Certain datastructures, like Python’s Dictionary, are unordered/random read/written.
As programming in python is iterative (and programming in general is?), how do these unordered datastructures work?
I understand that Python’s Dictionary is in essence a hashtable. But I figured that the datastructure gets stored in the memory by order of inserting, and are thus read, when iterating over the full datastructure, in a similar order. But this is not the case.
Aside from not understanding how this works, I also wonder of the benefit of this. But that may be all too clear when I understand how it works 🙂
ps. Didn’t think this was a StackOverflow Question, so put it here…
1
Dictionaries/Hashes are often implemented as a Array of linked lists.
The key is first hashed to give a subscript for the array which will point to the first entry of a linked list.
Then the linked list is searched to see if an entry with the key exists.
So for any reasonably large dictionary there will be a randomly ordered array, of linked lists. Each linked list will be ordered by entry sequence.
A hash table has some parts that are ordered, such as the list of buckets, but you very rarely access them in that order because that order has little to no correlation with any useful part of your data.
Say you want to store a lot of strings. If your goal is to efficiently pull out the 73rd (or other arbitrarily selected) string, then you might use an array, which is ordered. Then getting the 73rd string is a matter of referencing the 73rd element in the array directly.
However, say your goal is to efficiently determine if a particular string is in your set. If you have an unsorted array, you need to search through them all of the strings. If you have a sorted array then you need to sort through log(n) of the strings, which is better but still not great. So the solution is a hash table.
Naively… You make an array with 256 entries (each representing one character), each pointing to a list or array of strings that start with that character. Now if you want to find out if “Bob” is in your set of strings, you immediately jump to the ‘B’ bucket and only look through those strings.
Of course, this example still means you get the strings out in alphabetical order, which sounds nice, but in the real world you would use a hash function more complex than “the first character”, preferably one that distributes strings as evenly as possible between your buckets. The result of which is that simply reading through your hash table in order will give you a pseudo-random ordering of your strings.
1