Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics
research. This dissertation discusses, specifically, protein structure related problems
including protein structure alignment and query, and genome sequence related problems including
haplotype reconstruction and genome rearrangement. It first presents an algorithm
for pairwise protein structure alignment that is tested with structures from the Protein Data
Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and
CE. The preliminary algorithm is a graph-theory based approach, which uses the concept
of \stars" to reduce the complexity of clique-finding algorithms. The algorithm is then
improved by introducing \double-center stars" in the graph and applying a self-learning
strategy. The updated algorithm is tested with a much larger set of protein structures and
shown to be an improvement in accuracy, especially in cases of weak similarity. A protein
structure query algorithm is designed to search for similar structures in the PDB, using the
improved alignment algorithm. It is compared with SSM and shows better performance with
lower maximum and average Q-score for missing proteins. An interesting problem dealing
with the calculation of the diameter of a 3-D sequence of points arose and its connection
to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence
is approximated by a series of sublinear time deterministic, zero-error and bounded-error
randomized algorithms and we have obtained a series of separations about the power of
sublinear time computations. This dissertation also discusses two genome sequence related
problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices
with incomplete and inconsistent errors. The experiments with simulated data show
both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy
of the algorithm. Finally, a genome rearrangement problem is studied. The concept of
non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity
to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several
polynomial time algorithms are presented.

Access

Unrestricted;

Degree

Ph. D.;

Degree Program

Engineering and Applied Science;

Department

Dept. of Computer Science;

Major Professor

Summa, Christopher

Advisory Committee

Winters-Hilt, Stephen; Fu, Bin; Chen, Huimin; Zhu, Dongxiao;

Date Degree Awarded

2008-08-07;

Format

PDF

URL

See 'reference url' on top or bottom navigation bars.

Rights

The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.