Synopsis: In this work, the authors propose to intelligently
offload indexing to commodity General Processing Units (GPUs). They
introduce algorithms for building compressed bitmap indexes in real
time on GPUs and achieve indexing throughputs of up to 185 millions
records per second, which is an improvement by one order of magnitude
compared to the state-of-the-art. Dr.
Fusco has worked on a number of projects using bitmap indexes.

Synopsis: In this paper, the authors exploited the property of
bitmap index and developed a very effective bitmap pruning strategy
for processing iceberg queries. Their index-pruning-based approach
eliminated the need of scanning and processing the entire data set
(table) and thus speeds up the iceberg query processing significantly.
Experiments showed that their approach was much more efficient than
existing algorithms commonly used in row-oriented and column-oriented
databases.

Synopsis: This journal paper describes the SB-index and
HSB-index for spatial data warehouses (SDW) developed on top of FastBit indexes. Tests show
the the new methods are able to reduce the query response time by 68 -
99% while adding less than 1% to the overall space requirement.
This is the journal version of a conference
paper published in 2009.

Synopsis: This article presents an
indepth analysis of a variety of bitmap indexing methods. Among the
popular bitmap indexing methods, the analysis showed that
the bit-sliced
index and the compressed equality
index occupy two extreme points on the space-time trade-off curve.
The analysis also shows a way to construct multi-level bitmap indexes
that offer better space-time trade-off than known methods.
The analysis concentrates on the worst-case scenario with the random
data. It considers only one compression method,
the Word-Aligned Hybrid code.

Synopsis: This paper presents a
compact structure for bitmaps. It extends a trick
from Byte-aligned
Bitmap Code to make Word-Aligned Hybrid
(WAH) code more compact. A WAH fill word represents a group of
identical bits (called a fill), however, there many cases where such a
fill is followed by a literal word with only a single bit that is
different. With 32-bit words, PLWAH uses 5 bits to represent the
position of this bit that is different. In both BBC and PLWAH, the
group of mixed bits following a fill is combined into the data
representing the fill.
For certain types of data, PLWAH can reduce the space requirement
of WAH by a factor of two.

Synopsis: This paper presents a
compact structure for bitmaps that is very similar
to PLWAH. Both of them modifies
the WAH fill word to absorb an adjacent
literal word. The key difference is that this technique combines the
literal word before the fill word to form a new fill word, while PLWAH
absorbs the literal word following the fill word.

Synopsis: This paper presents a
system for capturing and indexing streaming network traffic data. What
is really intriguing is that it is the third article on modifying
the WAH compression method in 2010,
following PLWAH
and Concise. This variation, called COMPAX,
goes one step further than both PLWAH
and Concise, it introduces more ways of
combining neighboring fills and literal words. It has four different
types of compressed words, which is two more than the
other WAH variations.

Synopsis: This paper presents a
formal analysis of how reordering the rows affects the Run-Length
compressed bitmap indexes. Despite what the title says, this article
actually covers a lot more material than
sorting.
For example, it started with a discussion on bitmap compression and
bitwise logical operations, both of which are critical building
blocks for a practical bitmap index implementation. It also has an
extensive discussion on k-of-N encoding, an idea that was mentioned
in an
earlier work by Wong et al. (1985), but has not been used much.

Synopsis (By Rasmus Pagh): This paper considers range queries
whose result is given as a compressed bitmap. A static data structure
is given that consists of a number of compressed bitmaps, from which
any range query can be answered. Assuming that each bitmap is
optimally compressed, the amount of data read is within a constant
factor of the size of the result of the range query. This
asymptotically improves previous results in cases where the query
result is highly compressible. However, no experimental results are
provided. The paper also presents some results on a similar, but much
more complex, data structure that allows updates.

Synopsis:This paper presents a strategy to index all possible
combinations of different column values. It demonstrates that for
data with very low attribute cardinalities, the proposed scheme
(called ranked Non-clustered Bitmaps) can outperform a method called
Secondary Sorted Bitmaps and an open-source software
called FastBit.

Synopsis: This paper proposes to use overlapped bins to
reduce the query processing time. To reduce the possible overlapping
choices to consider, they assume the queries to be processed are known
ahead of time.

Synopsis: This is an extension of an
earlier paper. It adds a more formal description of the problem
being addressed, and provides additional performance evaluations as
well. In the ealier version, the take-home message was that the new
SB-Index was much more compact; in this paper, the emphasis is on
the reduction of query response time. The new index can reduce the
query response time by 80 - 95%. Very impressive!

Synopsis: The Spatial Bitmap Index introduced in this paper
uses compressed bitmaps to represent Minimal Bounding Rectangles
(MBR) for complex objects in geographical data warehouses. Tests on
two versions of TPC-H benchmark (called Geographical Redundant Star
Schema Benchmark and Geographical Hybrid Star Schema Benchmark) show
that using the new index can reduce query answering time by 25 -
95%. Between the two benchmarks, the Geographical Redundant Star
Schema Benchmark seems to have denormalized the underlying data
unnecessarily, which make it taking 10 times the space as the
Geographical Hybrid Star Schema Benchmark. Using the later
benchmark, the reductions in query answering time are between 90 -
95%, which can be expressed as speedup of 10 - 20 times.

Synopsis: This article aims to provide a up-to-date
guidelines for choosing between the bitmap index and the B-tree index.
Rather than giving general advices such as those based on attribute
cardinality, it provides some concrete measurements with Oracle 11G.
One key observation is that bitmap indexes are more efficient in
answering queries than B-trees even on high cardinality data. This
agrees with some recent observations from others,
e.g. Wu et al., 2006
and Sharma 2005. At the same time, the authors
also cast some doubts on how the cardinality dominates the performance
characteristics of bitmap indexes. Another noteworthy observation
is that the bitmap indexes can be constructed 8 to 10 times faster
than the B-trees in six out of the seven test cases (shown in Table
IV). In the last case, the bitmap index was constructed 3 times as
fast as the B-tree on the same set of data.

Synopsis: This paper investigates how the data redundancy
affects the query performance in Geographical Data Warehousing
applications. For a specialist in bitmap indexing, there are two
take-home messages. (1) Compressed bitmap indexes are very
effective in this type of application. The compression used in this
work is the Word-Aligned Hybrid (WAH) code. (2) Join indexes are
much large than the regular column indexes. Since a join index is
built on the cross product of two tables, we would expect it to take
more space than the index built for a single column. However, we
were surprised by the difference. The examples given in this paper
showed that the join indexes are 1000 times larger than the regular
column indexes.

Synopsis: This paper proposes a hierarchical structure called
HS-bitmap index to represent document-term matrix. The authors
implemented their data structure on PostgreSQL and observed it to
perform better than an inverted index. A short-coming might be that
HS-bitmap index takes more space than the inverted index even after
compression.

Synopsis: By far the most important thing to get out of this
paper is that packing the bitmaps as tightly as possible is a good
approach. This is a big part of the improvements on the RIDBit code
during the project -- the text of the paper alluded to the improvements
twice. The second point to note is that it demonstrated again that
vertical partitioning is a good thing for read-only data.

Synopsis: This paper presents a set of multi-level binning
strategies. In their tests, the multi-level indexes were shown to be
about 10 times faster than the single-level indexes. Potentially, most
of the performance gain is from reduction in the number of bitmaps
accessed during query processing. The same authors
have published some timing measurements before [IPDPS 2006],
the key contribution of this paper is a systematic analysis of the
multi-level indexes. This is also a big part of Sinha's PhD thesis
on indexing scientific data.

The approach of using multi-level bitmap indexes was also reported
earlier in a tech report
in 2001.

Synopsis: This paper presents a
formal analysis of the WAH compressed (basic) bitmap index. It formally
establishes the WAH compressed index as one of the optimal indexing
methods, an exclusive club including the best of B-tree variants
such as B*-tree and B+tree. This puts the WAH compressed bitmap index
at the same theoretical standing as the best of the B-trees. However,
compressed bitmap indexes have an unique advantage. Because it is much
easier to combine the results from multiple bitmap indexes, it is more
efficient to answer a query using multiple bitmap indexes than using
multiple B-Tree indexes. Therefore, bitmap indexes are more appropriate
in query-intensive applications, such as data warehousing and business
intelligence.

Synopsis: This paper presents interesting example of 2-out-N
encoding postulated earlier by Wong et
al.. Compared with one-component equality, range, and interval
encoding, the Scatter Bitmaps are more compact. For equality and
membership queries, it is more efficient than these one-component
encodings. Compared with the binary encoding, it takes more space, but
can answer queries faster.

Synopsis: This paper presents an indexing technique based on
multiple hashes. This indexing method can reduce the index sizes as
well as the query response time. The trade-off is that it answers
queries approximately. The paper also shows a way to predict the rate
of false positives.
Note that the technique presented is similar to
Bloom filters.

Synopsis: This presentation describes a bitmap index
implementation effort at GreenPlum. Since Bizgres is based on Postgres
and bitmap
is making some headway into PostgreSQL, we may see full-fledged
bitmap index in PostgreSQL yet.

Synopsis: This paper introduces a new data type in ORACLE DBMS
called bitmap and a set of techniques to work with these bitmaps
efficiently. The Summary Bitmap Tree described in this paper is similar
to certain multi-level bitmap indexes with equality encoding.

Synopsis: This paper describe the critical algorithm to make
compressed bitmap index answers queries in time that is no worse
than a linear function of the number of hits (i.e., the number of
records selected by the query condition). In terms of
computational
complexity, this places WAH compressed bitmap index in the same
category of optimal indexing methods as the best of the
B-tree indexes.
The critical algorithm is to perform bitwise logical OR operations on a
large number of small compressed bitmap in time that proportional to the
total size of all the bitmaps involved.

Synopsis: The Hierarchical Bitmap Index proposed in this paper is
in fact of hierarchy of signatures that are designed to answer the
set-containment queries. It is a version of a
signature
tree. The signatures are organized into a tree structure with the
higher level nodes indicating the non-empty nodes at the next
level. The leaf nodes contains information about the presence of a
value in a set.

Synopsis: This is a brief write up about
WAH
compression and its performance on a small subset of data from a
high-energy project called STAR.
The work with STAR eventually lead to the development of
Grid
Collector.

Synopsis: This paper describes a number of algorithms for
operating on the bitmaps of a bit-sliced index to compute various
qualities such as SUM, MAX and so on. These operations are useful
for aggregate queries. This paper also addresses the issue of
Bit-Sliced Term-Match for
Information
Retrieval.

Synopsis: In this paper, the authors describe a bitmap indexing
technique for clustering XML documents. The BitCube is formed from
three dimensions of documents (d), XML-elements (p), and terms (w). The
BitCube can be used to efficiently locate documents containing
XML-elements and terms. In addition, it can also be used to computer
aggregate values such as averages and standard deviations.

Synopsis: This dissertation presents Spatial Bit-Sliced (SBS)
index based on Bit-Sliced indexing method. The main conclusion from
their studies is that the performance of SBS index structure is
“immune” to the query size, regardless of the query
size, while other indexing strategies such as R*-tree exhibit
significant variations depending on the query size.

Synopsis: This paper describes a set of performance
measurements comparing a number of different bitmap compression
techniques. In the tests, no compression technique was the clear
winner, even though the Byte-aligned Bitmap Code (BBC) was the most
efficient in many cases. The author eventually developed a strategy
to choose the most appropriate compression method based
characteristics of the bitmaps.

Synopsis: The Group Bitmap Index was designed to address the
association rule retrieval problem. Two variations are proposed in this
paper, the Simple Group Bitmap Index and Hash Group Bitmap Index . The
Simple Group Bitmap Index is essentially a bitmap join index between
item number and the group number (transaction ID). The Hash Group
Bitmap Index can be thought of as a one-component Bloom filter, thus
the same analysis apply.

Synopsis: This paper proposes Range Encoding and
multi-component encodings. From their analysis, they conclude that
using two-component encodings provides the optimal space-time
trade-off among the multi-component encodings.

Synopsis: This paper presents a short review Bitmap indexes,
and Bit-Sliced index and Projection index. A Projection index
materializes all values of a column in RID order, and a Bit-Sliced
index essentially takes an orthogonal bit-by-bit view of the same
data, both of which are available in
Sybase
IQ. This paper also present analysis that demonstrates important
performance advantages for variant indexes in some types of SQL
aggregation, predicate evaluation, and grouping.

Synopsis: This is the first paper to fully explain the bitmap
join indexes for multi-table joins based on foreign keys. Such indexes
are used in popular database management systems and are very effective
for data warehousing applications. This is an important reason why some
people claim that "the star schema and bitmap indexes are a marriage made in heaven."

Synopsis: This paper introduces the Bit Transposed File as
"an extreme version of the (attribute) transposed file."
It stores the bulk of the data as compressed bit vectors and answers
queries with bitwise logical operations. This establishes two key
characteristics of a bitmap
index: using bit vectors to store information and use bitwise
logical operations to answer queries. They did not present an
implementation of bitmap indexes.

Synopsis: This article describes the Encoded Vector Index
(EVI) implemented in IBM DB2. EVI can be considered as a version of
the binary
encoded bitmap index. Another name used in literature is the
bit-sliced index.

Synopsis: This article has detailed explanations of how to use
Auricles bitmap index feature. In the opening paragraph, it states that
"a bitmap index is always advisable for systems in which data is not
frequently updated by many concurrent systems." In other word, the
usefulness of a bitmap index is no longer limited to low cardinality
attributes any more! Zaker et al. (2008)
gave the concrete measurements to affirm this general observation.