Detecting Structural Variations with Illumina, PacBio and Optical Maps Data by Computational Approaches
Doctor of Philosophy
Detecting structural variations (SV) is important in deciphering variations in human DNA and the cause of genetic disease such as cancer. Computational approaches to detect SVs are made possible by sequencing technologies. As different sequencing technologies render data with different characteristics, computational approaches are designed in a way that is specific to a certain technology. In this thesis I studied three technologies: Illumina, PacBio and Optical Maps. As Illumina and PacBio reads have complementary advantages and disadvantages of read length and error rate, I proposed a new approach, HySA, that combines Illumina and PacBio to detect SV. HySA was able to detect SVs that cannot be detected by the approaches for either only Illumina or only PacBio. However, due to the repetitiveness of the human DNA as well as the existence of complex SVs, it is still challenging for HySA to detect some SVs on the repetitive regions or complex SVs. To overcome that, I proposed a new approach to detect SVs by Optical Maps data, which is advantageous over Illumina and PacBio in read length, despite its lack of sequence and unique error profile. The SVs detected by Optical Maps alone complement those from Illumina and PacBio. In all, the two approaches I proposed help push towards a more complete characterization of SVs in human DNA.