In order to decrease the number of false positives, I came up with a couple of possible solutions.
-
Create two different types of Bloom filters; eg. filter1 using 3 different hash algorithms, filter2 using 3 other hash algorithms.
-
Create two different lengths of one type Bloom filter; eg. the number of bits would be different, thus the hash modulation would be different.
-
Create two different types of Bloom filters with two different lengths.
I believe number 2 would be fastest, but would it in theory be better to use #1 (or #3), in order to (easily/quickly) reduce the number of false positives as much as possible ?
Most of the time, I would use this for short text-strings (up to approximately 20 characters) and URLs.
4