The bloom filter is a probabilistic data structure that provides a compact
representation of a set of elements. To keep false positive probabilities low,
the size of the bloom filter must be dimensioned a priori to be linear in the
maximum number of keys inserted, with the linearity constant ranging typically
from one to few bytes. A bloom filter is most commonly used as an in-memory data
structure, hence its size is limited by the availability of RAM space on the
machine. However, if sufficient RAM space is not available, we may need to store
bloom filters in the secondary storage devices. Since magnetic disks provide very
slow performance for the random access operations, it is not suitable for a bloom
filter as its operations require lot of random accesses.

In this paper, we propose to use flash memory based storage devices to store
bloom filters. Flash memory provides faster performance for read operations
irrespective of random and sequential access patterns, while it provides poor
performance for random write operations. Here, we present BloomFlash, a bloom
filter designed for flash memory based storage, that is especially designed to
take advantage of the physical characteristics of flash memory. To reduce random
write operations, BloomFlash exploits two key design decisions: (i) buffering bit
updates in RAM and applying them in bulk to flash, and (ii) hierarchical bloom
filter design consisting of component bloom filters. With the help of two
real-world data traces taken from representative bloom filter applications, our
experimental results show that BloomFlash achieves bloom filter access times in
the range of few tens of microseconds, thus allowing up to approximately 28,500
operations per sec.