I have a theoretical social network where two people, A & C, can become friends if and only if, they have a common friend B amongst them.
The simplest solution is to iterate over the two lists and see if they have any friends in common. My question is, can I use bloom filters instead? I will have a bloom filter for every user with their friend list added. When I need to check if a common friend exists I check A’s friend list with C’s bloom filter first and if I get a positive result I actually perform iteration to confirm.
Given that the average friend list size if 100 friends, will this implementation be better than simple iteration?
5
No, bloom filters are not suited for your case.
Bloom filters use case is following:
- you have very large datasets that typically don’t fit in memory
- you want to check if it contains given elements, false positives being possible
In other words, it’s a trick to fit something in memory that is actually bigger, but on the downside we accept false positives. This is both overkill and ill-suited for your case where you want to compare two lists of friends.
What you should do:
Just put one list in a hash set for O(1) “contain” operations, and iterate with the other list.
EDIT:
A reviewer changed this with does not contain, this is neither wrong nor true, it’s just the other face of the medal. To make it more clear:
- if the bloom filter gives a hit: the item is probably inside
- if the bloom filter gives a miss: the item is certainly not inside
…in other words:
- if an item is inside: you always get a hit
- if an item is not inside: you probably get a miss
…yeah, that’s the thing about probabilistic data structures. You can also see it that way: there are more keys hitting than items inside.