Mapping the structural landscape of protein families with geometric feature vectors
Kavraki, Lydia E.
Master of Science
This thesis describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine family-wide sub-group organization by quantifying the substructural variation within a protein family. The results demonstrate examples of automatically determined sub-groups that can be linked to phylogenetic distance between family members, segregation by ligation state, and organization by ancestry among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative template for each of the subgroups determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing templates. This work provides an unbiased, automated assessment of the structural variability of identified substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function.
Biology; Bioinformatics; Computer science; Applied sciences