So, quite often I’ll come across a situation where I’d like to process hostnames in a hierarchical manner.
For example, given a hostname “foo.bar.baz.example.com”, I might want to compare it against a data structure that contains a higher authority like “example.com” or “baz.example.com” and see if I get a match.
Usually, the way I handle this is to simply use a hashtable, and iteratively remove subdomains until I find a match (or, never find a match, whatever the case may be.)
But, I was thinking that something like a trie/prefix-tree might be a lot better suited for this (if I were to reverse the domain labels).
Of course, the problem is that a trie usually works on individual characters – whereas in this case I’d want to make each subdomain (or “label”, in DNS-speak), a single “unit”.
Is there a data-structure that might be well suited for something like this?
2
Use a tree, where each node will be a subdomain. The top level will be the start symbol. Then it would branch into k nodes, where k is the number of top level domains you have.
For instance, if your entire database was two entities: foo.bar.example.com and boo.car.ample.net, k will be 2 (net and com). Each of net and com nodes will have one children each: ample and example. And so on.
As an another example, let’s consider you have only two entities again. But this time it’s foo.bar.example.com and boo.bar.example.com. Now the tree would be something like this.
root
|
com
|
example
|
bar
/
foo boo
This will be your tree. If you the input is example.com, you would split by ‘.’ and reverse to get [com, example]. You will go down your tree by matching the list with the tree. Since ‘com’ is a child of root, you take that path, and from ‘com’, you search for the next element in your input list ‘example’. Since ‘example’ is a child of ‘com’ in the tree, we proceed. But now we have exhausted the input list and this means we found a match. At any point in the traversal, if you don’t find a child for the current input element from the list, you are guaranteed that there is no match in your database. For instance if you got the input string zoo.bar.example.com. Our input list will be [com, example, bar, zoo]. We would traverse the tree until bar, but in the tree ‘bar’ doesn’t have a child ‘zoo’. This implies we don’t have zoo.bar.example.com in our database.
You can find a match within O(n), where n is the number of subdomains in your input string that you’re trying to match.