A massively parallel 3′ UTR reporter assay reveals relationships between nucleotide content, sequence conservation, and mRNA destabilization

  1. K. Mark Ansel1
  1. 1Department of Microbiology and Immunology and Sandler Asthma Basic Research Center, University of California San Francisco, San Francisco, California 94143, USA;
  2. 2Department of Medicine and Lung Biology Center, University of California San Francisco, San Francisco, California 94143, USA;
  3. 3School of Medicine, Sun Yat-Sen University, Guangzhou, People's Republic of China, 510245;
  4. 4Department of Biochemistry and Biophysics, Department of Urology, and Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94143, USA
  • Corresponding author: Mark.Ansel{at}ucsf.edu
  • Abstract

    Compared to coding sequences, untranslated regions of the transcriptome are not well conserved, and functional annotation of these sequences is challenging. Global relationships between nucleotide composition of 3′ UTR sequences and their sequence conservation have been appreciated since mammalian genomes were first sequenced, but the functional relevance of these patterns remain unknown. We systematically measured the effect on gene expression of the sequences of more than 25,000 RNA-binding protein (RBP) binding sites in primary mouse T cells using a massively parallel reporter assay. GC-rich sequences were destabilizing of reporter mRNAs and come from more rapidly evolving regions of the genome. These sequences were more likely to be folded in vivo and contain a number of structural motifs that reduced accumulation of a heterologous reporter protein. Comparison of full-length 3′ UTR sequences across vertebrate phylogeny revealed that strictly conserved 3′ UTRs were GC-poor and enriched in genes associated with organismal development. In contrast, rapidly evolving 3′ UTRs tended to be GC-rich and derived from genes involved in metabolism and immune responses. Cell-essential genes had lower GC content in their 3′ UTRs, suggesting a connection between unstructured mRNA noncoding sequences and optimal protein production. By reducing gene expression, GC-rich RBP-occupied sequences act as a rapidly evolving substrate for gene regulatory interactions.

    Footnotes

    • Received August 1, 2018.
    • Accepted May 2, 2019.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server