Analysis of the Burrows-Wheeler Transform

The Burrows-Wheeler Transform performs a permutation of the input string, and thus provides no compression on its own. Rather, the BWT essentially reorganizes the input sequence so that symbols with similar contexts are clustered together in the output stream. In this sense the BWT can be seen as a preprocessing scheme to expose potential redundancies in a given input, and hence enhance later compression, using existing compression algorithms. Thus, the BWT can be viewed as a “compression booster”, since it makes it possible for relatively simple compression algorithms to perform better on most input sequences (Ferragina et al., 2005a). Interestingly, some of the better compression methods don't do so well on the BWT output because they are being too “clever” looking for patterns when the patterns are very simple. This ability of the BWT to reorganize (sort) the input sequence based on contexts is central to its relationship with other compression algorithms. It is also the key to its use in various other fields and applications, different from data compression. Not surprisingly, this context sorting stage of the BWT is also the major bottleneck in performing the transformation on a given input sequence.

In this chapter, we analyze the theoretical performance of the Burrows- Wheeler Transform.We consider its performance in terms of its computational complexity (both space and time complexity). We also consider its theoretical performance in data compression, in terms of how close or how fast it could approach the theoretically optimal performance for a given source. We will examine how the theoretical performance of the BWT, be it computational complexity or compression ability, is related to the sorted contexts used by the transform. We will then relate it to other compression methods, especially those based on using contexts, as the BWT approach turns out to effectively be partitioning the input according to contexts in which the characters occur.