Dec 28, 2008

Bloom filter

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions with an even distribution.

To add an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1.

To query for an element (test whether it is in the set), feed it to each of the k hash functions to get k array positions. If any of the bits at these positions are 0, the element is not in the set – if it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then either the element is in the set, or the bits have been set to 1 during the insertion of other elements.

False Positive can be understanded as a "failed positive assertion", e.g. If I report to teacher you cheated in the last exam but in fact you didn't. In bloom filter case, it means the filter may report you the keyword 'd' exists in the set {'a','b','c'}, but the filter will never tell you 'a' or 'b' or 'c' not exists in the same set.