Show simple item record

dc.contributor.advisor Kavraki, Lydia E.
dc.creatorBryant, Drew
dc.date.accessioned 2013-05-13T19:24:26Z
dc.date.accessioned 2013-05-13T19:24:36Z
dc.date.available 2013-05-13T19:24:26Z
dc.date.available 2013-05-13T19:24:36Z
dc.date.created 2012-12
dc.date.issued 2013-05-13
dc.date.submitted December 2012
dc.identifier.urihttps://hdl.handle.net/1911/71132
dc.description.abstract The protein kinases are a large family of enzymes that play a fundamental role in propagating signals within the cell. Because of the high degree of binding site similarity shared among protein kinases, designing drug compounds with high specificity among the kinases has proven difficult. However, computational approaches to comparing the 3-dimensional geometry and physicochemical properties of key binding site residues, referred to here as substructures, have been shown to be informative of inhibitor selectivity. This thesis introduces two fundamental approaches for the comparative analysis of substructure similarity and demonstrates the importance of each method on a variety of large protein structure datasets for multiple biological applications. The Family-wise Alignment of SubStructural Templates Framework (The FASST Framework) provides an unsupervised learning approach for identifying substructure clusterings. The substructure clusterings identified by FASST allow for the automatic evaluation of substructure variability, the identification of distinct structural conformations and the selection of anomalous outlier structures within large structure datasets. These clusterings are shown to be capable of identifying biologically meaningful structure trends among a diverse number of protein families. The FASST Live visualization and analysis platform provides multiple comparative analysis pipelines and allows the user to interactively explore the substructure clusterings computed by FASST. The Combinatorial Clustering Of Residue Position Subsets (CCORPS) method provides a supervised learning approach for identifying structural features that are correlated with a given set of annotation labels. The ability of CCORPS to identify structural features predictive of functional divergence among families of homologous enzymes is demonstrated across 48 distinct protein families. The CCORPS method is further demonstrated to generalize to the very difficult problem of predicting protein kinase inhibitor affinity. CCORPS is demonstrated to make perfect or near-perfect predictions for the binding ability of 12 of the 38 kinase inhibitors studied, while only having overall poor predictive ability for 1 of the 38 compounds. Additionally, CCORPS is shown to identify shared structural features across phylogenetically diverse groups of kinases that are correlated with binding affinity for particular inhibitors; such instances of structural similarity among phylogenetically diverse kinases are also shown to not be rare among kinases. Finally, these function-specific structural features may serve as potential starting points for the development of highly specific kinase inhibitors. Importantly, both The FASST Framework and CCORPS implement a redundancy-aware approach to dealing with structure overrepresentation that allows for the incorporation of all available structure data. As shown in this thesis, surprising structural variability exists even among structure datasets consisting of a single protein sequence. By incorporating the full variety of structural conformations within the analysis, the methods presented here provide a richer view of the variability of large protein structure datasets.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectBioinformatics
Machine learning
Protein function prediction
Protein kinases
Enzymes
Drug design
dc.title Redundancy-aware learning of protein structure-function relationships
dc.contributor.committeeMember Nakhleh, Luay K.
dc.contributor.committeeMember Shamoo, Yousif
dc.date.updated 2013-05-13T19:24:36Z
dc.identifier.slug 123456789/ETD-2012-12-74
dc.type.genre Thesis
dc.type.material Text
thesis.degree.department Computer Science
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Doctoral
thesis.degree.name Doctor of Philosophy
dc.identifier.citation Bryant, Drew. "Redundancy-aware learning of protein structure-function relationships." (2013) Diss., Rice University. https://hdl.handle.net/1911/71132.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record