Show simple item record

dc.contributor.advisor Vannucci, Marina
dc.creatorLi, Qiwei
dc.date.accessioned 2017-08-01T15:24:30Z
dc.date.available 2017-08-01T15:24:30Z
dc.date.created 2016-12
dc.date.issued 2016-11-14
dc.date.submitted December 2016
dc.identifier.citation Li, Qiwei. "Bayesian Models for High-Dimensional Count Data with Feature Selection." (2016) Diss., Rice University. https://hdl.handle.net/1911/95966.
dc.identifier.urihttps://hdl.handle.net/1911/95966
dc.description.abstract Modern big data analytics often involve large data sets in which the features of interest are measured as counts. My thesis considers the problem of modeling a high-dimensional matrix of count data and presents two novel Bayesian hierarchical frameworks, both of which incorporate a feature selection mechanism and account for the over-dispersion observed across samples as well as across features. For inference, I use Markov chain Monte Carlo (MCMC) sampling techniques with Metropolis-Hastings schemes employed in Bayesian feature selection. In the first project on Bayesian nonparametric inference, I propose a zero-inflated Poisson mixture model that incorporates model-based normalization through prior distributions with mean constraints. The model further allows us to cluster the samples into homogenous groups, defined by a Dirichlet process (DP) while selecting a parsimonious set of discriminatory features simultaneously. I show how my approach improves the accuracy of the clustering with respect to more standard approaches for the analysis of count data, by means of a simulation study and an application to a bag-of-words benchmark data set, where the features are represented by the frequencies of occurrence of each word. In the second project on Bayesian integrative analysis, I propose a negative binomial mixture regression model that integrates several characteristics. In addition to feature selection, the model includes Markov random field (MRF) prior models that capture structural dependencies among the features. The model further allows the mixture components to depend on a set of selected covariates. The simulation studies show that employing the MRF prior improves feature selection accuracy. The proposed approach is also illustrated through an application to RNA-Seq gene expression and DNA methylation data for identifying biomarkers in breast cancer.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectStatistics
Bayesian inference
High-dimensional data
Count data
Clustering
Feature selection
Regression
Integrative analysis
Bayesian nonparametric approaches
Dirichlet process
Markov chain Monte Carlos
Graphical network priors
Markov random field
dc.title Bayesian Models for High-Dimensional Count Data with Feature Selection
dc.date.updated 2017-08-01T15:24:30Z
dc.type.genre Thesis
dc.type.material Text
thesis.degree.department Statistics
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Doctoral
thesis.degree.name Doctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record