Haplotype block and genetic association
Doctor of Philosophy
The recently identified (Daly et al. 2001 and Patil et al. 2001) block-like structure in the human genome has attracted much attention since each haplotype block contains limited sequence variation, which can reduce the complexity in genetic mapping studies. This dissertation focuses on estimating haplotype block structures and their application to genetic mapping using single nucleotide polymorphisms (SNPs) from unrelated individuals. Among other issues, the traditional single marker association study leads to the problem of multiple testing, which is still not well understood in the context of genomewide association studies. The haplotype-based approach is one way to lessen certain problems caused by multiple testing. There is also evidence that haplotype based tests have higher statistical power. We first propose a novel approach to estimate haplotype blocks based on pairwise linkage disequilibrium (LD). The application to simulated data shows that our new approach has higher power than several existing methods in identifying haplotype blocks. We also examine the impact of marker density and different tagging strategies on the estimation of haplotype blocks. We introduce a new statistic to measure the difference between two different block partitions. Applying the new statistic to real and simulated data we show that a higher marker density is needed than previously expected in order to recover the true block structure over a given region. Finally, we analyzed a real SNP data set. A comparison of the haplotype-SNP based method to the more traditional single-SNP based method shows that the two methods tend to agree more when halplotype block sizes are small. On the other hand, the haplotype-SNP based approach does not always have higher power than the single-SNP based study as is supported by theoretical considerations. Indeed, long haplotype blocks where the LD structure might be very complex can lead to inferior power compared to single-SNP approaches. In practice, it is recommended that single-SNP analyses be run routinely, especially in the presence of moderate to long blocks.
Biostatistics; Genetics; Statistics