Abstract:

Sequence alignment is a fundamental operation for homology search in bioinformatics. For two DNA or protein
sequences of length m and n, full-matrix (FM), dynamic programming alignment algorithms such as Needleman-
Wunsch and Smith-Waterman take O(mxn) time and use a possibly prohibitive O(mxn) space.
Hirschbergâs algorithm reduces the space requirements to O(min(m,n)), but requires approximately
twice the number of operations required by the FM algorithms.
The Fast Linear Space Alignment (FastLSA) algorithm adapts to the amount of space available by trading space
for operations. FastLSA can effectively adapt to use either linear or quadratic space, depending on the specific
machine. Our experiments show that, in practice, due to memory caching effects, FastLSA is always as fast or faster
than Hirschberg and the FM algorithms.
To further improve the performance of FastLSA, we have parallelized it using a simple but effective form of
wavefront parallelism. Our experimental results show that Parallel FastLSA exhibits good speedups, almost linear for
8 processors or less, and also that the efficiency of Parallel FastLSA increases with the size of the sequences that are
aligned. Consequently, parallel and sequential FastLSA can be flexibly and effectively used with high performance
in situations where space and the number of parallel processors can vary greatly.