Chapter One - Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a “Plug and Play” Domain

https://doi.org/10.1016/bs.mie.2018.06.004Get rights and content

Abstract

The radical SAM superfamily contains over 100,000 homologous enzymes that catalyze a remarkably broad range of reactions required for life, including metabolism, nucleic acid modification, and biogenesis of cofactors. While the highly conserved SAM-binding motif responsible for formation of the key 5′-deoxyadenosyl radical intermediate is a key structural feature that simplifies identification of superfamily members, our understanding of their structure–function relationships is complicated by the modular nature of their structures, which exhibit varied and complex domain architectures. To gain new insight about these relationships, we classified the entire set of sequences into similarity-based subgroups that could be visualized using sequence similarity networks. This superfamily-wide analysis reveals important features that had not previously been appreciated from studies focused on one or a few members. Functional information mapped to the networks indicates which members have been experimentally or structurally characterized, their known reaction types, and their phylogenetic distribution. Despite the biological importance of radical SAM chemistry, the vast majority of superfamily members have never been experimentally characterized in any way, suggesting that many new reactions remain to be discovered. In addition to 20 subgroups with at least one known function, we identified additional subgroups made up entirely of sequences of unknown function. Importantly, our results indicate that even general reaction types fail to track well with our sequence similarity-based subgroupings, raising major challenges for function prediction for currently identified and new members that continue to be discovered. Interactive similarity networks and other data from this analysis are available from the Structure-Function Linkage Database.

Section snippets

Introduction: Overview of the Radical SAM Superfamily

The widely studied radical S-adenosylmethionine (SAM) superfamily (RSS) was originally defined using bioinformatics techniques to survey the 650 RSS members available at that time. It described a homologous group of enzymes united by their utilization of SAM in a radical mechanism (Sofia, Chen, Hetzler, Reyes-Spindola, & Miller, 2001). The original sequence set came from 126 species representing all Kingdoms of life and included many of the first RSS enzymes to be characterized: l-lysine

Results and Discussion

In this study, we first describe the RSS from a structural perspective, including the known variations across MDAs that typify the superfamily. Next, we provide a global view of sequence similarity relationships among the RSS using SSNs to illustrate the subgroups into which we partitioned these sequences to establish a comprehensive classification of the entire superfamily based on sequence similarity (in contrast to the majority of previously published RSS classifications that are based on

Collection of RSS Sequences

To initiate populating the RSS for the SFLD, we collected the full-length sequences in September 2012 associated with Pfam model PF04055 and InterPro family IPR007197, removed duplicate sequences and resolved other differences. This set was last updated using the SFLD automated update protocol on 7/9/14. Functional domains of this representative sequence set superfamily were last updated on 11/22/17.

Representative Networks

The full set of 113,776 sequences was clustered using CD-HIT (Li & Godzik, 2006) at 50% pairwise

Acknowledgments

Support for this work acknowledges NIH R01 GM60595 (P. Babbitt), NIH R01 GM-122595 (S. Booker), and NSF DBI-1356193 (P. Babbitt and G. Holliday). Some of the results described in this chapter were initially developed as part of a workshop on the RSS sponsored by the Enzyme Function Initiative with support from NIH U54GM093342 (J. Gerlt). The SFLD was developed as a joint project of the Babbitt lab with support by NIH R01GM60595 and NSF Grants DBI-0234768 and DBI-0640476 (P. Babbitt), NSF

Author Contributions

G.L.H performed the analysis, directed the research, and wrote the manuscript. E.C.M. generated the structure comparisons provided in Figs. 2 and 3 and assisted with analysis and proof reading of the manuscript. E.A. and S.D.B. assisted with analysis and proof reading of the manuscript. S.C., U.P., and A.S. performed the 3D-structure prediction. S.J.B. provided expertise in RSS enzymology and in assigning functions based on the literature. P.C.B. oversaw the project and wrote the manuscript.

Competing Financial Interests Statement

None.

References (80)

  • T.A. Grell et al.

    SPASM and twitch domains in S-adenosylmethionine (SAM) radical enzymes

    The Journal of Biological Chemistry

    (2015)
  • C. Kalyanaraman et al.

    Discovery of a dipeptide epimerase enzymatic function guided by homology modeling and virtual screening

    Structure

    (2008)
  • J. Knappe et al.

    A radical-chemical route to acetyl-CoA: The anaerobically induced pyruvate formate-lyase system of Escherichia coli

    FEMS Microbiology Reviews

    (1990)
  • N.D. Lanz et al.

    Identification and function of auxiliary iron-sulfur clusters in radical SAM enzymes

    Biochimica et Biophysica Acta

    (2012)
  • F. Mancia et al.

    How coenzyme B12 radicals are generated: The crystal structure of methylmalonyl-coenzyme A mutase at 2 A resolution

    Structure

    (1996)
  • M. Moss et al.

    The role of S-adenosylmethionine in the lysine 2,3-aminomutase reaction

    The Journal of Biological Chemistry

    (1987)
  • Y. Nicolet et al.

    X-ray structure of the [FeFe]-hydrogenase maturase HydE from Thermotoga maritima

    The Journal of Biological Chemistry

    (2008)
  • E. Pilet et al.

    The role of the maturase HydG in [FeFe]-hydrogenase active site synthesis and assembly

    FEBS Letters

    (2009)
  • M.R. Reyda et al.

    Loss of iron-sulfur clusters from biotin synthase as a result of catalysis promotes unfolding and degradation

    Archives of Biochemistry and Biophysics

    (2008)
  • L. Yang et al.

    Spore photoproduct lyase: The known, the controversial, and the unknown

    The Journal of Biological Chemistry

    (2015)
  • E. Akiva et al.

    The structure-function linkage database

    Nucleic Acids Research

    (2014)
  • S.F. Altschul et al.

    Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

    Nucleic Acids Research

    (1997)
  • M. Ashburner et al.

    Gene ontology: Tool for the unification of biology. The gene ontology consortium

    Nature Genetics

    (2000)
  • H.J. Atkinson et al.

    Using sequence similarity networks for visualization of relationships across diverse protein superfamilies

    PLoS One

    (2009)
  • P.R. Baker et al.

    Variant non ketotic hyperglycinemia is caused by mutations in LIAS, BOLA3 and the novel gene GLRX5

    Brain

    (2014)
  • A.E. Barber et al.

    Pythoscape: A framework for generation of large protein similarity networks

    Bioinformatics

    (2012)
  • A. Benjdia et al.

    Structural insights into recognition and repair of UV-DNA damage by spore photoproduct lyase, a radical SAM enzyme

    Nucleic Acids Research

    (2012)
  • H. Berman et al.

    Announcing the worldwide Protein Data Bank

    Nature Structural Biology

    (2003)
  • J.N. Betz et al.

    [FeFe]-hydrogenase maturation: Insights into the role HydE plays in dithiomethylamine biosynthesis

    Biochemistry

    (2015)
  • A.J. Blaszczyk et al.

    Spectroscopic and electrochemical characterization of the iron-sulfur and cobalamin cofactors of TsrM, an unusual radical S-adenosylmethionine methylase

    Journal of the American Chemical Society

    (2016)
  • J.B. Broderick et al.

    Radical S-adenosylmethionine enzymes

    Chemical Reviews

    (2014)
  • S.D. Brown et al.

    A gold standard set of mechanistically diverse enzyme superfamilies

    Genome Biology

    (2006)
  • S. Calhoun et al.

    Prediction of enzymatic pathways by integrative pathway mapping

    eLife

    (2018)
  • R.M. Cicchillo et al.

    Lipoyl synthase requires two equivalents of S-adenosyl-L-methionine to synthesize one equivalent of lipoic acid

    Biochemistry

    (2004)
  • R.M. Cicchillo et al.

    Escherichia coli lipoyl synthase binds two distinct [4Fe-4S] clusters per polypeptide

    Biochemistry

    (2004)
  • N.L. Dawson et al.

    CATH-Gene3D: Generation of the resource and its use in obtaining structural and functional annotations for protein sequences

    Methods in Molecular Biology

    (2017)
  • T.A. de Beer et al.

    PDBsum additions

    Nucleic Acids Research

    (2014)
  • S.R. Eddy

    Accelerated profile HMM searches

    PLoS Computational Biology

    (2011)
  • R.D. Finn et al.

    InterPro in 2017-beyond protein family and domain annotations

    Nucleic Acids Research

    (2017)
  • R.D. Finn et al.

    The Pfam protein families database: Towards a more sustainable future

    Nucleic Acids Research

    (2016)
  • Cited by (93)

    • Biosynthesis and function of microbial methylmenaquinones

      2023, Advances in Microbial Physiology
    • Iron-sulfur clusters – functions of an ancient metal site

      2023, Comprehensive Inorganic Chemistry III, Third Edition
    • Widespread microbial utilization of ribosomal β-amino acid-containing peptides and proteins

      2022, Chem
      Citation Excerpt :

      These were subsequently aligned and used to infer a maximum-likelihood phylogenetic tree (Figure 2A; see supplemental information). Functional predictions of rSAM enzymes are notoriously difficult as reaction types often track poorly with sequence similarity, an issue compounded by a relative dearth of characterized members.20 Given that genes encoding RiPP pathways are typically co-located on the chromosome in BGCs,21 we extracted genomic neighborhoods for sample rSAM-SPASM sequences using RODEO v2.022 and calculated the frequency at which YG substrate motifs occur in the gene products neighboring each rSAM-SPASM gene.

    View all citing articles on Scopus
    2

    Current address: Medicines Discovery Catapult, Mereside, Alderley Park, Alderley Edge, Cheshire, United Kingdom.

    3

    Current address: Department of Energy, Joint Genome Institute, Walnut Creek, CA, United States.

    4

    Current address: National Agricultural Library, Agricultural Research Service, United States Department of Agriculture, Beltville, MA, United States.

    View full text