Show simple item record

dc.contributor.advisor Kimmel, Marek
dc.creatorHicks, Stephanie
dc.date.accessioned 2013-09-16T15:13:21Z
dc.date.accessioned 2013-09-16T15:13:33Z
dc.date.available 2013-09-16T15:13:21Z
dc.date.available 2013-09-16T15:13:33Z
dc.date.created 2013-05
dc.date.issued 2013-09-16
dc.date.submitted May 2013
dc.identifier.urihttps://hdl.handle.net/1911/71965
dc.description.abstract Genetic and genomic data often contain unobservable or missing information. Applications of probabilistic models such as mixture models and hidden Markov models (HMMs) have been widely used since the 1960s to make inference on unobserved information using some observed information demonstrating the versatility and importance of these models. Biological applications of mixture models include gene expression data, meta-analysis, disease mapping, epidemiology and pharmacology and applications of HMMs include gene finding, linkage analysis, phylogenetic analysis and identifying regions of identity-by-descent. An important statistical and informatics challenge posed by modern genetics is to understand the functional consequences of genetic variation and its relation to phenotypic variation. In the analysis of whole-exome sequencing data, predicting the impact of missense mutations on protein function is an important factor in identifying and determining the clinical importance of disease susceptibility mutations in the absence of independent data determining impact on disease. In addition to the interpretation, identifying co-inherited regions of related individuals with Mendelian disorders can further narrow the search for disease susceptibility mutations. In this thesis, we develop two probabilistic models in application of genetic and genomic data with missing information: 1) a mixture model to estimate a posterior probability of functionality of missense mutations and 2) a HMM to identify co-inherited regions in the exomes of related individuals. The first application combines functional predictions from available computational or {\it in silico} methods which often have a high degree of disagreement leading to conflicting results for the user to assess the pathogenic impact of missense mutations on protein function. The second application considers extensions of a first-order HMM to include conditional emission probabilities varying as a function of minor allele frequency and a second-order dependence structure between observed variant calls. We apply these models to whole-exome sequencing data and show how these models can be used to identify disease susceptibility mutations. As disease-gene identification projects increasingly use next-generation sequencing, the probabilistic models developed in this thesis help identify and associate relevant disease-causing mutations with human disorders. The purpose of this thesis is to demonstrate that probabilistic models can contribute to more accurate and dependable inference based on genetic and genomic data with missing information.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectStatistics
Statistical genomics
Bioinformatics
Mixture models
Hidden Markov models
dc.title Probabilistic Models for Genetic and Genomic Data with Missing Information
dc.contributor.committeeMember Thompson, James R.
dc.contributor.committeeMember Nakhleh, Luay K.
dc.contributor.committeeMember Plon, Sharon E.
dc.date.updated 2013-09-16T15:13:33Z
dc.identifier.slug 123456789/ETD-2013-05-506
dc.type.genre Thesis
dc.type.material Text
thesis.degree.department Statistics
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Doctoral
thesis.degree.name Doctor of Philosophy
dc.identifier.citation Hicks, Stephanie. "Probabilistic Models for Genetic and Genomic Data with Missing Information." (2013) Diss., Rice University. https://hdl.handle.net/1911/71965.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record