Predicting protein-ligand interactions from primary structure
One of the key challenges in the post-genomic era is to understand protein-ligand interactions on a large scale. The question is: Given the primary structures of a protein and a ligand, how well can we computationally predict whether the ligand will bind to the protein? Wet laboratory experiments using combinatorial peptide screens and phage display techniques have yielded positive and negative examples of protein-ligand binding(Sparks, Zucconi, Alexandropoulos). In this paper, we model the prediction of protein-ligand interactions from primary structure as a classification problem and train naive Bayes classifiers (Mitchell) to distinguish between positive and negative examples of protein-ligand interactions. Such a predictive model can screen large numbers of potential ligands and save laboratory time and costs. We demonstrate the power of our approach in predicting interactions between SH3 domains and proline-rich ligands. We use laboratory data gathered from combinatorial peptide library screening (Sparks) of 8 diverse SH3 domains to construct a body of positive and negative examples. We learn naive Bayes models of ligand binding specificity of these SH3 domains and test them using across-validation approach. The models have prediction accuracies of 90% and higher with low false positive and negative rates. In addition, we visualize our classification model to reveal sites on both the ligand and the SH3 domain that contribute to the interaction. We use our classifiers to screen PxxPligands from Swissprot for given SH3 domains. Over 80% of these ligands are eliminated by our naive Bayes classifiers for 5 of the 8 SH3 domains considered in this paper.