The slides to my presentation of the paper on January 7th, 2013 in New Orleans, LA, USA is available: alenex13esais-slides.pdf . They contain little text and an example of the eSAIS algorithm with a simplified PQ.

We have also submitted a full version of the eSAIS paper to a journal. Due to long publication cycles, we make a pre-print of the journal article is available here: esais-preprint.pdf . The full paper contains more details on the inducing algorithm for the LCP array and additional experimental details.

Notice that most of these files are compressed with xz and must be recompressed with gzip for the esactest test program to be able to automatically decompress them on-the-fly.

Abstract

We consider text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory. Practical tests show that this outperforms the previous best EM suffix sorter [Dementiev et al., ALENEX 2005] by a factor of about two in time and I/O-volume. Our second contribution is to augment the first algorithm to also construct the array of longest common prefixes (LCPs). This yields the first EM construction algorithm for LCP arrays. The overhead in time and I/O volume for this extended algorithm over plain suffix array construction is roughly two. Our algorithms scale far beyond problem sizes previously considered in the literature (text size of 80 GiB using only 4 GiB of RAM in our experiments).