Template-based Protein Structure Prediction and its Applications
Doctor of Philosophy
Protein structure prediction, also called protein folding, is one of the most significant and challenging research areas in computational biophysics and structural bioinformatics. With the rapid growth of PDB database, template-based modeling such as homology modeling and threading has become a popular method in protein structure prediction. However, it is still hard to detect good templates when the sequence identity is below 30%. In chapter 1, a profile-profile alignment method is proposed. It uses evolutionary and structural profiles to detect homologs, and a z-score-based method to rank templates. The performance of this method in the critical assessment of protein structure prediction experiments (CASP) was reported. In chapter 2, p53 mutations are studied as an application of protein structure prediction. The TP53 gene encodes a tumor suppressor protein called p53, and p53 mutations occur in about half of human cancers. Experimental studies showed that p53 cancer mutants can be reactivated by mutations on other sites. Machine learning technologies were used in this research. Multiple classifiers were built to predict whether a p53 mutant (single-point or multiple-point) would be transcriptionally active or not, based on features extracted from amino acid sequences and structures. The mutant structures were modeled using template-based protein structure prediction. Theses features were selected and analyzed using different feature selection methods, and classifiers were built under different learning settings, such as supervised learning and semi-supervised learning. The performances of these classifiers were analyzed and compared. Besides the study of single proteins, protein complexes in yeast are studied in chapter 3. Multiple classifiers were built to predict whether several given proteins can form a protein complex, based on features generated from amino acid sequences and protein-protein interaction network. Theses features were selected and analyzed using different feature selection methods. Also, these classifiers were built under different learning settings, such as supervised learning and active learning. The performances of these classifiers were analyzed and compared.
Protein structure prediction