Approximate Modeling of Recombination within the Multispecies Coalescent
Elworth, Ryan Leo
Doctor of Philosophy
The coalescent with recombination is a powerful stochastic process for modeling genome evolution. However, statistical inference under this process, particularly sampling the graphical structures that arise due to recombination, is very challenging. To address this challenge, approximations of this stochastic process have been introduced based on a process that operates along the genomes and that can be naturally captured by a hidden Markov model. Parameterizing such hidden Markov models based on the coalescent process and population parameters is very challenging. In this thesis, we propose using gene tree topologies with integrated likelihoods for the states, and parameterize the transition probabilities based on topological differences of the gene trees. This approximation, which overcomes the issues of introducing too many states and has an automated procedure for parameterizing transitions, provides good results, as we demonstrate on simulated and biological data. Furthermore, we show how the approximation can be modified slightly to account for cases of gene flow. The work in this thesis provides a general framework for approximating coalescent-based computations.
hidden markov model; phylogenetics; recombination; coalescent; ancestral recombination graph