Trends in Immunology
Volume 43, Issue 9, September 2022, Pages 741-756
Journal home page for Trends in Immunology

Review
Short open reading frame genes in innate immunity: from discovery to characterization

https://doi.org/10.1016/j.it.2022.07.005Get rights and content

Highlights

  • New approaches have implicated hundreds of long noncoding RNAs as potential protein coding genes through overlooked short open reading frames (sORFs).

  • There are many thousands of sORFs, with multiple lines of evidence supporting production of sORF-encoded peptides (SEPs), compiled in databases but uninvestigated.

  • Confirmation of production and functional characterization can be nuanced, but thoughtful interrogation has already expanded the known proteome and our understanding of important biological pathways. This includes contributions to innate immune function in mouse and human.

  • Although the expanding proteome is likely to interest investigators from all fields, there is reason to believe that immunologists are particularly well positioned to make impactful discoveries.

Next-generation sequencing (NGS) technologies have greatly expanded the size of the known transcriptome. Many newly discovered transcripts are classified as long noncoding RNAs (lncRNAs) which are assumed to affect phenotype through sequence and structure and not via translated protein products despite the vast majority of them harboring short open reading frames (sORFs). Recent advances have demonstrated that the noncoding designation is incorrect in many cases and that sORF-encoded peptides (SEPs) translated from these transcripts are important contributors to diverse biological processes. Interest in SEPs is at an early stage and there is evidence for the existence of thousands of SEPs that are yet unstudied. We hope to pique interest in investigating this unexplored proteome by providing a discussion of SEP characterization generally and describing specific discoveries in innate immunity.

Section snippets

Gene annotation and sORFs

Beginning with the sequencing of the yeast Saccharomyces cerevisiae genome in the 1990s, the scientific community has shown considerable interest in comprehensive identification and annotation of protein-coding genes in eukaryotes. Early efforts focused on ATG-initiated ORFs capable of encoding a polypeptide of at least 100 amino acids [1,2]. ORFs that did not meet this length cut-off, short open reading frames (sORFs) (see Glossary), required additional evidence to merit a protein-coding

Sequence analysis

Discovering new SEPs begins with sequence analysis. The standard scanning model of translation initiation involves a preinitiation complex binding the RNA 5′-cap and traversing the transcript until reaching a Kozak sequence centered at the start codon AUG. There, the remaining translational machinery engages, and elongation of the polypeptide occurs. Once a stop codon is encountered, translation is terminated, and the ribosome dissociates from the transcript [15]. This model implies a simple

Candidate validation, approaches, and drawbacks

Translated sORF predictions based on ribosomal association likely misestimate the coding potential of transcripts and say nothing about the function of predicted SEPs (including whether they are functional at all). To confirm novel peptide production, candidate sORFs are typically validated via peptide tagging and microscopy or immunoprecipitation (IP). SEPs frequently influence phenotype by complexing with larger protein partners [16,17,70., 71., 72.]; therefore, determining these partners

High-throughput validation

Individual, high-resolution characterization of SEPs will be an important part of correctly annotating the genome and characterizing SEP–protein interaction networks. However, the large numbers of putatively coding sORFs detected from Ribo-Seq and sequence analysis argues for the application of high-throughput methods to validate translation en masse. Broadly, there are two approaches: peptidomics via MS (Box 1) and genome editing with CRISPR-Cas (Box 2). In both cases the challenge is for the

An emerging class: bifunctional genes

Here we describe SEPs that were recently discovered and characterized in innate immune (and innate immune-derived) cell lines. We also note instances where the RNA itself is known to contribute to a phenotype distinctly from the SEP; this is the case in four of the five examples shown in Figure 2. Although the sample size is too small to make strong inferences, the high representation of these ‘coding-noncoding’ or ‘bifunctional’ [85,86] genes suggests that future studies of SEPs would do well

Concluding remarks

Nuanced biomolecular interrogation has allowed researchers to differentiate between SEP and RNA activity, and there are many databases containing thousands of sORFs and lncRNAs that are yet to be investigated (Table 2). Immunologists might find the study of this expanded proteome particularly fruitful. It is well established that the transcriptome is drastically changed under conditions of inflammatory stimuli. Furthermore, it is reported that multiple components of translation initiation

Acknowledgments

S.C. is supported by R01AI148413 from National Institute of Allergy and Infectious Diseases and R35GM137801 from the National Institute of General Medical Sciences. E.M. is supported by T32HG012344 and in part by R35GM137801.

Declaration of interests

S.C. is a paid consultant to NextRNA Therapeutics. No interests are declared by E.M.

Glossary

Dark proteome
understudied and under-characterized proteins and peptides, including those that arise from UTRs and noncoding RNAs.
FASTQs
the standard short-read sequencing format for bioinformatic sequencing analysis.
Homology-directed repair (HDR)
repair of DNA breaks using a homologous template, allowing the insertion of genetic material.
Lipopolysaccharide (LPS)
a PAMP component of Gram-negative bacterial cell walls. In its purified form, it is commonly used as an inflammation-inducing ligand in

References (137)

  • E.W. Mills

    Dynamic regulation of a ribosome rescue pathway in erythroid cells and platelets

    Cell Rep.

    (2016)
  • U. Weill

    Assessment of GFP tag position on protein localization and growth fitness in yeast

    J. Mol. Biol.

    (2019)
  • T. Shibata

    Addition of an EGFP-tag to the N-terminal of influenza virus M1 protein impairs its ability to accumulate in ND10

    J. Virol. Methods

    (2018)
  • G. Vandemoortele

    Pick a tag and explore the functions of your pet protein

    Trends Biotechnol.

    (2019)
  • J.-W. Nam

    Incredible RNA: dual functions of coding and noncoding

    Mol. Cells

    (2016)
  • P. Kumari et al.

    cncRNAs: bi-functional RNAs with protein coding and non-coding functions

    Semin. Cell Dev. Biol.

    (2015)
  • B.J. Floyd

    Mitochondrial protein interaction mapping identifies regulators of respiratory chain function

    Mol. Cell

    (2016)
  • M.W. Potter

    Endotoxin (LPS) stimulates 4E-BP1/PHAS-I phosphorylation in macrophages

    J. Surg. Res.

    (2001)
  • K.-W. Min

    eIF4E phosphorylation by MST1 reduces translation of a subset of mRNAs, but increases lncRNA translation

    Biochim. Biophys. Acta Gene Regul. Mech.

    (2017)
  • P.M. Harrison

    A question of size: the eukaryotic proteome and the problems in defining it

    Nucleic Acids Res.

    (2002)
  • A. Goffeau

    Life with 6000 genes

    Science

    (1996)
  • J.-P. Couso et al.

    Classification and function of small open reading frames

    Nat. Rev. Mol. Cell Biol.

    (2017)
  • J.L. Rinn et al.

    Genome regulation by long noncoding RNAs

    Annu. Rev. Biochem.

    (2012)
  • X. Yang

    Long NONCODING RNA AW112010 promotes the differentiation of inflammatory T cells by suppressing IL-10 expression through histone demethylation

    J. Immunol.

    (2020)
  • E.K. Robinson

    lincRNA-Cox2 functions to regulate inflammation in alveolar macrophages during acute lung injury

    J. Immunol.

    (2022)
  • D. Papaioannou

    The long non-coding RNA HOXB-AS3 regulates ribosomal RNA transcription in NPM1-mutated acute myeloid leukemia

    Nat. Commun.

    (2019)
  • Z. Ji

    Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins

    eLife

    (2015)
  • D. Schlesinger et al.

    Revisiting sORFs: overcoming challenges to identify and characterize functional microproteins

    FEBS J.

    (2022)
  • A.Z.-X. Leong

    Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures

    J. Biomed. Sci.

    (2022)
  • A.G. Hinnebusch

    The scanning mechanism of eukaryotic translation initiation

    Annu. Rev. Biochem.

    (2014)
  • L. Niu

    A micropeptide encoded by lncRNA MIR155HG suppresses autoimmune inflammation via modulating antigen presentation

    Sci. Adv.

    (2020)
  • A. Matsumoto

    mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide

    Nature

    (2017)
  • T. Kwan et al.

    Noncanonical translation initiation in eukaryotes

    Cold Spring Harb. Perspect. Biol.

    (2019)
  • E. Smith

    Leaky ribosomal scanning in mammalian genomes: significance of histone H4 alternative translation in vivo

    Nucleic Acids Res.

    (2005)
  • J.M. Acevedo

    Changes in global translation elongation or initiation rates shape the proteome via the Kozak sequence

    Sci. Rep.

    (2018)
  • S. Samandi

    Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins

    eLife

    (2017)
  • J.M. Mudge

    Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci

    Genome Res.

    (2019)
  • J. Armstrong

    Whole-genome alignment and comparative annotation

    Annu. Rev. Anim. Biosci.

    (2019)
  • M.F. Lin

    PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions

    Bioinformatics

    (2011)
  • A. Siepel

    Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

    Genome Res.

    (2005)
  • A. Prakash et al.

    Measuring the accuracy of genome-size multiple alignments

    Genome Biol.

    (2007)
  • K.S. Pollard

    Detection of nonneutral substitution rates on mammalian phylogenies

    Genome Res.

    (2010)
  • W.J. Kent

    The human genome browser at UCSC

    Genome Res.

    (2002)
  • M. Blum

    The InterPro protein families and domains database: 20 years on

    Nucleic Acids Res.

    (2021)
  • H. Yoshikawa

    Efficient analysis of mammalian polysomes in cells and tissues using Ribo Mega-SEC

    eLife

    (2018)
  • N.T. Ingolia

    Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling

    Science

    (2009)
  • G.A. Brar et al.

    Ribosome profiling reveals the what, when, where and how of protein synthesis

    Nat. Rev. Mol. Cell Biol.

    (2015)
  • N.T. Ingolia

    The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments

    Nat. Protoc.

    (2012)
  • S. Lee

    Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution

    Proc. Natl. Acad. Sci. U. S. A.

    (2012)
  • X. Gao

    Quantitative profiling of initiating ribosomes in vivo

    Nat. Methods

    (2015)
  • Cited by (7)

    View all citing articles on Scopus
    View full text