Ab initio methods for protein structure prediction
Dousis, Athanasios Dimitri
Doctor of Philosophy
Recent breakthroughs in DNA and protein sequencing have unlocked many secrets of molecular biology. A complete understanding of gene function, however, requires a protein structure in addition to its sequence. Modern protein structure determination methods such as NMR, cryo-EM and X-ray crystallography are woefully unable to keep pace with automated sequencing techniques, creating a serious gap between available sequences and structures. This thesis describes several ab initio computational methods designed in the near-term to facilitate structure determination experiments, and in the long-term goal to predict protein structure completely and reliably. First, VecFold is a novel method for predicting the global tertiary structure topologies of proteins. VecFold applies fragment assembly to construct structural models from a target sequence by folding a chain of predicted secondary structure elements; these elements are represented either as Calpha-based rigid bodies or as vectors. The knowledge-based energy function OPUS-Ca or a knowledge-based geometric packing potential is used to guide the folding process. The newest version of VecFold is demonstrated to modestly outperform Rosetta, one of the leading ab initio predictors, on the CASP8 benchmark set. In our protein domain boundary prediction method OPUS-Dom, VecFold generates a large ensemble of folded structure models, and the domain boundaries of each model are labeled by a domain parsing algorithm. OPUS-Dom then derives consensus domain boundaries from the statistical distribution of the putative boundaries; the original version is also aided by three empirical sequence-based domain profiles. The latest version of OPUS-Dom outperformed, in terms of prediction sensitivity, several state-of-the-art domain prediction algorithms over various multi-domain protein sets. Even though many VecFold-generated structures contain large errors, collectively these structures provide a more robust delineation of domain boundaries. The success of OPUS-Dom suggests that the arrangement of protein domains is more a consequence of limited coordination patterns per domain arising from tertiary packing of secondary structure segments, rather than sequence-specific constraints. Finally, the knowledge-based energy function OPUS-Core was applied to the problem of protein folding core prediction, and it was shown to outpredict two leading computational methods on a benchmark set of 29 well-characterized protein targets.
Biochemistry; Biomedical engineering; Biophysics