Stochastic Modeling and Simulation of Biological Phenomena with Applications in Population Genetics and in Cell Populations
Deem, Michael W.
Doctor of Philosophy
Stochastic modeling and simulation play important roles in population genetics, statistical genetics, cell biology, molecular biology and evolutionary theory. This thesis explores four aspects of stochastic modeling and simulation of biological phenomena with applications. Research carried out is focused on two major themes. The first one (Chapters 2 and 3) concerns application of stochastic modeling in genetics, specifically to identify biases in analysis of genetics data. Two problems that are considered are ascertainment bias in estimation of microsatellite diversity in interspecies comparisons, and sample-selection bias in comparing different methods of rare variant analysis. The second theme (Chapters 4 and 5) concerns application of Poisson and branching process models to understand various aspects of cell proliferation, using S-phase labeling. Two model systems are: transient dynamics of proliferation of neurogenic progenitors in mouse brain with emphasis on differentiation and apoptosis, and balanced growth under different assumptions concerning DNA-replication pattern. In the first part, we investigate factors that are influencing the ascertainment bias of microsatellite allele sizes and explore the impact on estimates of mutation rates. Microsatellite loci play an important role as markers for identification, disease gene mapping and evolutionary studies. Mutation rate, which is of fundamental importance, can be obtained from interspecies comparisons, which however are subject to ascertainment bias. This bias arises for example when a locus is selected based on its large allele size in one species (cognate species 1), in which it is first discovered. It is reflected in average allele length in any non-cognate species 2 being smaller than that in species 1. We derive an analytical model based on coalescence theory to calculate the average allele length difference between species 1 and 2 under effects of both ascertainment bias and intrinsic genetic influences, such as demography, genetic drift, mutation, etc. Analytical results are confirmed by forward-time simulations using simuPOP. Re-analyzing literature data, we demonstrate that despite bias, the microsatellite mutation rate estimate in Human exceeds that in Chimpanzee, and also that population bottlenecks and expansions in the recent human history have little impact on the conclusion. The second part of the thesis introduces a simulation framework, SimRare, to generate sequence-based data for rare variant association studies and evaluating association test methods. Currently, it is difficult to compare rare variant association methods present in the literature because different methods are used to generate data. In any given study, variant and/or disease model is often generated using a test set that makes a particular method to appear superior to other methods. The SimRare program is developed to provide an easy way to generate validation data sets for both variant and phenotype data using realistic models, and to evaluate association methods in an unbiased manner, including novel methods. Using SimRare we validate existing association methods using data generated under various scenarios of demographic history and disease etiology. We demonstrate that the power of each method depends on the underlying model and differences in power between methods are usually modest. The third part is devoted to the study of early stages of adult hippocampal neurogenesis, a process of formation of newborn neurons, which occurs throughout life in the hippocampus responsible for learning and memory. The majority of hippocampal neurogenesis studies have predominantly focused on late stages, while little is known about its early stages that regulate the proliferation and differentiation of neural stem cells and progenitor cells to form neurons. Based on the branching process theory we develop a stochastic model with simulation program to analyze cell labeling data obtained from BrdU pulse-and-chase labeling experiments. By fitting data our simulation results reveal unknown but meaningful biological parameters, such as apoptotic rate and duration time at each stage, etc., to allow us to predict overall efficiency of hippocampal neurogenesis in both normal and diseased conditions. The fourth part focuses on the modeling of DNA replication and bivariate cell labeling experiments. Understanding kinetics of DNA replication gives an insight into mechanisms revealing specifics of normal and cancer cells proliferation. We propose a multiscale modeling of stochastic events related to the measured labeling intensities of both DNA content and replication progression over various exposure times in proliferating cells. We demonstrate that the experimental asymmetry in DNA replication scatterplots is the hallmark of an increasing replication initiation rate in the S-phase of the cell cycle. In summary, the research results justify the hypothesis that application of stochastic modeling, simulation and statistical analysis leads to results which would be impossible to obtain otherwise.