Examining the Use of Homology Models in Predicting Kinase Binding Affinity
Kavraki, Lydia E.
Master of Science
Drug design is a difficult and multi-faceted problem that has led to extensive interdiscplinary work in the field of computational biology. In recent years, several computational methods have emerged. The overall goal of computational algorithms is to narrow down the number of leads that will be further considered for laboratory experimentation and clinical studies. Much of current drug design focuses on a family of proteins called kinases because they play a pivotal role in many of the cell signaling pathways in the human body. Drugs need to be designed such that they bind to specific kinases in the human kinome inhibiting kinase functions that can be causing various diseases such as cancer. It is important for drugs to have high specificity inhibiting only certain kinases avoiding undesirable effects on the human body. Computational prediction methods can accomplish this complex task by doing a comparative analysis on the binding site of kinases both in sequence and structure to predict binding affinity with potential drugs. However, computational methods depend on existing protein data to make predictions. There is a lack of structural protein data relative to known proteins and protein sequences. A potential solution to the the lack of information is to use computationally generated structural data called homology models. This thesis introduces a framework for the integration of homology models with CCORPS, a semi-supervised learning method that identifies structural features in proteins that correlate with protein function. We discuss the effects of using homology models to supplement existing experimental structural data for kinases to predict the binding affinity of kinases with various drugs in our experiments. While the work in this thesis focuses on predicting kinase binding affinity, the framework can be generalized showing the potential of using CCORPS with computationally generated data when there is a lack of experimental data.