Knowledge-based prediction of chemical shift and recognition of protein native structure
Master of Science thesis
We designed and implemented a suite of program which is able to accurately and automatically predict chemical shift of protein C-alpha nuclei on the simple basis of protein sequence and low-resolution C-alpha trace conformation. We applied this knowledge-based prediction approach on a group of C-alpha structures generated by computational modeling methods, and successfully identify the native structure by comparing the predicted and unassigned observed NMR data. We begin the automatic prediction with construction of a knowledge-based protein structural profile library, which aims at capturing the most significant structural features affecting chemical shifts, even from a highly coarse-grained C-alpha model. The library is populated by more than 5000 non-homologous proteins, with publicly accessible structures from Protein Data Bank and more than 1.5 million pre-calculated chemical shifts by a widely used NMR predictive program SHIFTX. Fed with the minimum sequential and structural information, the program is able predict highly consistent chemical shifts comparing with experimental observed data from an NMR spectroscopy database BioMagResBank(BMRB). Overall, the proposed program achieves a correlation coefficient of 0.937 and RMSD of 1.702 ppm towards observed chemical shifts. These results are slightly lower than those from achieved by the benchmark program SHIFTX, which utilizes semi-empirical hypersurfaces and semi-classical equations. On the same test sets, SHIFTX achieved a correlation coefficient of 0.945 and RMSD of 1.599 against experimental observations. In compensation, like most other predictive methods, SHIFTX requires high-resolution protein structures with three-dimensional all-atom coordinates, its accuracy of prediction will be highly compromised unless fed with all-atom high-resolution structure, which is normally exceedingly difficult to obtain. Combined with an optimization matching system using Monte Carlo method, we compared the predicted C-alpha chemical shifts with unassigned NMR data from BMRB, and successfully identify the native fold topology by the resemblance between two sets of chemical shifts. In summary, the proposed program is one of the only methods which are capable to predict accurate chemical shifts, even on low-resolution C-alpha protein structures, which are far more accessible and readily obtained by currently available protein modeling methods. Based on the understanding that the similar pattern of chemical shifts reflects resemblance of two structures, we approved that prediction-recognition approach not only fundamentally improve the way of the NMR-assisted computational protein modeling, but is effective in accelerating the traditional protein structure determination and validation by NMR.