Tomasz Gambin sustained his PhD thesis

Array-based Comparative Genomic Hybridization (aCGH) is a technique that is used to detect Copy Number Variations (CNVs) by hybridization of test and reference DNA samples to DNA microarrays. ACGH is widely used in medical and biological applications including studies on cancer, constitutional disorders and genomic variation among populations. This thesis is devoted to the development of algorithms dedicated to the design of microarrays and post-processing of results from aCGH experiments.

First, we consider the issues related to the design of custom CGH arrays, such as identifying the target regions to be covered on the array and optimizing the probe coverage for these regions. We propose a new, efficient method for the prediction of Low Copy Repeats and rearrangement hotspots, using a suffix-tree-based genome alignment tool. In reference to the coverage generation, we introduce a probe selection procedure that ensures a stable probe density along the covered region. We argue that our density-based method leads to the generation of an array design that provides better detectability of CNVs in comparison to the algorithms which aim to maximize resolution of a coverage. We then present our software system, dedicated to the gathering and analysis of aCGH data, which greatly improves the process of manual inspection of aCGH samples and provides the data source for further analysis.

Next, we investigate three areas related to multiple sample aCGH data analysis: empirical evaluation of an array design performance, supervised classification and detection of rare CNVs. In particular, we propose and validate a method for evaluating of functional performance of an array design by inspection of segmentation robustness. Moreover, we introduce a novel approach to supervised classification of aCGH data, based on the concept of limited Jumping Emerging Patterns, and we demonstrate its efficiency on sythentic and selected real datasets. Last but not least, we describe a new algorithm for the detection of rare, outstanding, and thus potentially pathogenic CNVs, which is robust to technological artifacts such as spurious probes and waviness. Finally, we present medically relevant outcomes from the analysis of experiments, which were performed on the custom array designs, generated using our methods.