I have messages that are grouped into separate sets – they are unordered String keywords.
I want to search for patterns in the messages by matching sets to other sets.
This is set intersection but I am not satisfied with the looping approach I took. I looped on all sets known and did a search through incoming set items and did a .has() hashmap lookup on them.
count = 0
for node in self.nodes:
print("searching {}".format(node.label))
for item in incoming:
if node.has(item):
count = count + 1
if count == len(incoming):
results.append(node)
count = 0
I was hoping there would be a constant time approach. Or searching for a set with a set.
I think this is a conjunction search.
I thought, can two bloom filters be compared for all the bits matching in other bloom filters? A bloom filters for the database and a bloom filter for each input message sets
You can create a bloom filter for each node and the incoming set, but you will have to account for false positives.
For each of the bloom filters for the nodes, you can check if all of the bits in the incoming set bloom filter are set through (node_bf & incoming_bf) == incoming_bf
.
Another solution that reduces the runtime to O(n), where n is the number of sets, but is not constant is by replacing the inner for loop with the issubset()
method.
count = 0
for node in self.nodes:
print("searching {}".format(node.label))
if incoming.issubset(node):
results.append(node)