Computational Biology: Insights into Hemagglutinin and Polycomb Repressive Complex 2 Function
Kirk, Brian David
Doctor of Philosophy
Influenza B virus hemagglutinin (HA) is a major surface glycoprotein with frequent amino-acid substitutions. However, the roles of antibody selection in the amino-acid substitutions of HA were still poorly understood. An analysis was conducted on a total of 271 HA 1 sequences of influenza B virus strains isolated during 1940âˆ¼2007 finding positively selected sites all located in the four major epitopes (120-loop, 150-loop, 160-loop and 190-helix) supporting a predominant role of antibody selection in HA evolution. Of particular significance is the involvement of the 120-loop in positive selection. Influenza B virus HA continues to evolve into new sublineages, within which the four major epitopes were targeted selectively in positive selection. Thus, any newly emerging strains need to be placed in the context of their evolutionary history in order to understand and predict their epidemic potential. As key epigenetic regulators, polycomb group (PcG) proteins are responsible for the control of cell proliferation and differentiation as well as stem cell pluripotency and self-renewal. To facilitate experimental identification of PcG target genes, which are poorly understood, we propose a novel computational method, EpiPredictor , which models transcription factor interaction using a non-linear kernel. The resulting targets suggests that multiple transcription factor networking at the cis -regulatory elements is critical for PcG recruitment, while high GC content and high conservation level are also important features of PcG target genes. To try to translate the EpiPredictor into human data, we performed a computational study utilizing 22 human genome-wide CHIP data to identify DNA motifs and genome features that would potentially specify PRC2 using five motif discovery algorithms, Jaspar known transcription binding motifs, and other whole genome data. We have found multiple motifs within the various subgroups of experimental categories that have much higher enrichment against CHIP identified gene promoter than among random gene promoters. Specifically, we have identified Low CpG content CpG Islands (LeG's) as being critical in the separation of Cancer cell line identified targets from Embryonic Stem cell line identified targets. Additionally, there are differences between human and mouse ES cell predictions using the same motifs and features suggesting relevant evolutionary divergence.