Bayesian graphical models for complex biological networks
Doctor of Philosophy
In this thesis, we propose novel Bayesian methodologies in estimating graphical models from complex genomic/health data, for which traditional methods are often found to be inefficient and unsuitable. Our approaches are motivated by various applications including construction of non-linear gene regulatory networks, data integration, cancer surveillance and precision medicine. This thesis consists of three projects. First, we develop a novel semi/non-parametric directed acyclic graphical model to reconstruct gene regulatory network from cancer gene expression data. The regulatory relationship between genes is assumed to be sparse and is allowed to be nonlinear, which is modeled by penalized splines with a spike-and-slab selection prior. We impose a discrete mixture prior on the smoothing parameter of the splines so that we are able to distinguish between linear and nonlinear relationships. Simulation studies show good performance of our approach in comparison with competing methods. Application to GBM data reveals several interesting findings. Second, we propose a multi-dimensional graphical model based on Cholesky-type decomposition of precision matrices to study the conditional independences of multi-dimensional data that are constituted by measurements along multiple axes. Our proposed approach is a unified framework applicable to both directed and undirected graphs as well as arbitrary combinations of these. We develop efficient sampling algorithm based on partially collapsed Gibbs samplers. Simulation studies show that our method has favorable performance against both benchmark and state-of-the-art approaches. We apply our approach to ovarian cancer protein expression data and U.S. cancer mortality data. Third, we propose a novel class of graphical models, graphical regression, which allow graph structure to vary with additional covariates in a flexible fashion. We impose sparsity in both graph structure and covariates. Our approach produces subject-specific graph and predictive graph for new subject. We provide theoretical property and demonstrate the good performance of our method through simulation studies. Finally, we apply our approach to multiple myeloma gene expression data taking prognostic factors as covariates, which reveals several interesting findings.