Collections in this community

Recent Submissions

  • PME: pruning-based multi-size embedding for recommender systems 

    Liu, Zirui; Song, Qingquan; Li, Li; Choi, Soo-Hyun; Chen, Rui; (2023)
    Embedding is widely used in recommendation models to learn feature representations. However, the traditional embedding technique that assigns a fixed size to all categorical features may be suboptimal due to the following reasons. In recommendation domain, the majority of categorical features' embeddings can be trained with less capacity without ...
  • A deep learning solution for crystallographic structure determination 

    Pan, T.; Jin, S.; Miller, M. D.; Kyrillidis, A.; Phillips, G. N. (2023)
    The general de novo solution of the crystallographic phase problem is difficult and only possible under certain conditions. This paper develops an initial pathway to a deep learning neural network approach for the phase problem in protein crystallography, based on a synthetic dataset of small fragments derived from a large well curated subset of ...
  • EnGens: a computational framework for generation and analysis of representative protein conformational ensembles 

    Conev, Anja; Rigo, Mauricio Menegatti; Devaurs, Didier; Fonseca, André Faustino; Kalavadwala, Hussain; (2023)
    Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully ...
  • Enabling accurate and early detection of recently emerged SARS-CoV-2 variants of concern in wastewater 

    Sapoval, Nicolae; Liu, Yunxi; Lou, Esther G.; Hopkins, Loren; Ensor, Katherine B.; (2023)
    As clinical testing declines, wastewater monitoring can provide crucial surveillance on the emergence of SARS-CoV-2 variant of concerns (VoCs) in communities. In this paper we present QuaID, a novel bioinformatics tool for VoC detection based on quasi-unique mutations. The benefits of QuaID are three-fold: (i) provides up to 3-week earlier VoC ...
  • PepSim: T-cell cross-reactivity prediction via comparison of peptide sequence and peptide-HLA structure 

    Hall-Swan, Sarah; Slone, Jared; Rigo, Mauricio M.; Antunes, Dinler A.; Lizée, Gregory; (2023)
    IntroductionPeptide-HLA class I (pHLA) complexes on the surface of tumor cells can be targeted by cytotoxic T-cells to eliminate tumors, and this is one of the bases for T-cell-based immunotherapies. However, there exist cases where therapeutic T-cells directed towards tumor pHLA complexes may also recognize pHLAs from healthy normal cells. The process ...
  • Improved understanding of biorisk for research involving microbial modification using annotated sequences of concern 

    Godbold, Gene D.; Hewitt, F. Curtis; Kappell, Anthony D.; Scholz, Matthew B.; Agar, Stacy L.; (2023)
    Regulation of research on microbes that cause disease in humans has historically been focused on taxonomic lists of ‘bad bugs’. However, given our increased knowledge of these pathogens through inexpensive genome sequencing, 5 decades of research in microbial pathogenesis, and the burgeoning capacity of synthetic biologists, the limitations of this ...
  • Genome-Wide Analysis of Structural Variants in Parkinson Disease 

    Billingsley, Kimberley J.; Ding, Jinhui; Jerez, Pilar Alvarez; Illarionova, Anastasia; Levine, Kristin; (2023)
    Objective Identification of genetic risk factors for Parkinson disease (PD) has to date been primarily limited to the study of single nucleotide variants, which only represent a small fraction of the genetic variation in the human genome. Consequently, causal variants for most PD risk are not known. Here we focused on structural variants (SVs), which ...
  • Intratumoral Heterogeneity and Clonal Evolution Induced by HPV Integration 

    Akagi, Keiko; Symer, David E.; Mahmoud, Medhat; Jiang, Bo; Goodwin, Sara; (2023)
    The human papillomavirus (HPV) genome is integrated into host DNA in most HPV-positive cancers, but the consequences for chromosomal integrity are unknown. Continuous long-read sequencing of oropharyngeal cancers and cancer cell lines identified a previously undescribed form of structural variation, “heterocateny,” characterized by diverse, interrelated, ...
  • A Chromosome-length Assembly of the Black Petaltail (Tanypteryx hageni) Dragonfly 

    Tolman, Ethan R; Beatty, Christopher D; Bush, Jonas; Kohli, Manpreet; Moreno, Carlos M; (2023)
    We present a chromosome-length genome assembly and annotation of the Black Petaltail dragonfly (Tanypteryx hageni). This habitat specialist diverged from its sister species over 70 million years ago, and separated from the most closely related Odonata with a reference genome 150 million years ago. Using PacBio HiFi reads and Hi-C data for scaffolding ...
  • FixItFelix: improving genomic analysis by fixing reference errors 

    Behera, Sairam; LeFaive, Jonathon; Orchard, Peter; Mahmoud, Medhat; Paulin, Luis F.; (2023)
    The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 protein-coding genes, including 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together with a modified ...
  • Fast Quantum State Reconstruction via Accelerated Non-Convex Programming 

    Kim, Junhyung Lyle; Kollias, George; Kalev, Amir; Wei, Ken X.; Kyrillidis, Anastasios (2023)
    We propose a new quantum state reconstruction method that combines ideas from compressed sensing, non-convex optimization, and acceleration methods. The algorithm, called Momentum-Inspired Factored Gradient Descent (MiFGD), extends the applicability of quantum tomography for larger systems. Despite being a non-convex method, MiFGD converges provably ...
  • The swan genome and transcriptome, it is not all black and white 

    Karawita, Anjana C.; Cheng, Yuanyuan; Chew, Keng Yih; Challagulla, Arjun; Kraus, Robert; (2023)
    Background: The Australian black swan (Cygnus atratus) is an iconic species with contrasting plumage to that of the closely related northern hemisphere white swans. The relative geographic isolation of the black swan may have resulted in a limited immune repertoire and increased susceptibility to infectious diseases, notably infectious diseases from ...
  • Streaming Quantiles Algorithms with Small Space and Update Time 

    Ivkin, Nikita; Liberty, Edo; Lang, Kevin; Karnin, Zohar; Braverman, Vladimir (2022)
    Approximating quantiles and distributions over streaming data has been studied for roughly two decades now. Recently, Karnin, Lang, and Liberty proposed the first asymptotically optimal algorithm for doing so. This manuscript complements their theoretical result by providing a practical variants of their algorithm with improved constants. For a given ...
  • Analysis of bronchoalveolar lavage fluid metatranscriptomes among patients with COVID-19 disease 

    Jochum, Michael; Lee, Michael D.; Curry, Kristen; Zaksas, Victoria; Vitalis, Elizabeth; (2022)
    To better understand the potential relationship between COVID-19 disease and hologenome microbial community dynamics and functional profiles, we conducted a multivariate taxonomic and functional microbiome comparison of publicly available human bronchoalveolar lavage fluid (BALF) metatranscriptome samples amongst COVID-19 (n = 32), community acquired ...
  • Auto-GNN: Neural architecture search of graph neural networks 

    Zhou, Kaixiong; Huang, Xiao; Song, Qingquan; Chen, Rui; Hu, Xia; (2022)
    Graph neural networks (GNNs) have been widely used in various graph analysis tasks. As the graph characteristics vary significantly in real-world systems, given a specific scenario, the architecture parameters need to be tuned carefully to identify a suitable GNN. Neural architecture search (NAS) has shown its potential in discovering the effective ...
  • De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee 

    Liu, Yunxi; Elworth, R. A. Leo; Jochum, Michael D.; Aagaard, Kjersti M.; Treangen, Todd J. (2022)
    Computational analysis of host-associated microbiomes has opened the door to numerous discoveries relevant to human health and disease. However, contaminant sequences in metagenomic samples can potentially impact the interpretation of findings reported in microbiome studies, especially in low-biomass environments. Contamination from DNA extraction ...
  • Systematic Analysis of Mobile Genetic Elements Mediating β-Lactamase Gene Amplification in Noncarbapenemase-Producing Carbapenem-Resistant Enterobacterales Bloodstream Infections 

    Shropshire, W.C.; Konovalova, A.; McDaneld, P.; Gohel, M.; Strope, B.; (2022)
    Noncarbapenemase-producing carbapenem-resistant Enterobacterales (non-CP-CRE) are increasingly recognized as important contributors to prevalent carbapenem-resistant Enterobacterales (CRE) infections. However, there is limited understanding of mechanisms underlying non-CP-CRE causing invasive disease. Long- and short-read whole-genome sequencing was ...
  • Multiple genome alignment in the telomere-to-telomere assembly era 

    Kille, Bryce; Balaji, Advait; Sedlazeck, Fritz J.; Nute, Michael; Treangen, Todd J. (2022)
    With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative ...
  • Infectious Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in Exhaled Aerosols and Efficacy of Masks During Early Mild Infection 

    Adenaiye, Oluwasanmi O.; Lai, Jianyu; Bueno de Mesquita, P. Jacob; Hong, Filbert; Youssefi, Somayeh; (2022)
    Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemiology implicates airborne transmission; aerosol infectiousness and impacts of masks and variants on aerosol shedding are not well understood.We recruited coronavirus disease 2019 (COVID-19) cases to give blood, saliva, mid-turbinate and fomite (phone) swabs, and 30-minute breath ...
  • Accelerating High-Order Stencils on GPUs 

    Sai, Ryuichi; Mellor-Crummey, John; Meng, Xiaozhu; Araya-Polo, Mauricio; Meng, Jie (2020)
    While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of the techniques work well for high-order stencils, such as those used for seismic imaging. In this paper, we study practical seismic imaging computations on GPUs using high-order stencils on large domains with meaningful boundary conditions. ...

View more