Skip to main content

De Novo Molecular Formula Annotation and Structure Elucidation Using SIRIUS 4

  • Protocol
  • First Online:
Book cover Computational Methods and Data Analysis for Metabolomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2104))

Abstract

SIRIUS 4 is the best-in-class computational tool for metabolite identification from high-resolution tandem mass spectrometry data. It offers de novo molecular formula annotation with outstanding accuracy. When searching fragmentation spectra in a structure database, it reaches over 70% correct identifications. A predicted fingerprint, which indicates the presence or absence of thousands of molecular properties, helps to deduce information about the compound of interest even if it is not contained in any structure database. Here, we present best practices and describe how to leverage the full potential of SIRIUS 4, how to incorporate it into your own workflow, and how it adds value to the analysis of mass spectrometry data beyond spectral library search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf

  2. 2.

    https://www.ncbi.nlm.nih.gov/pubmed

  3. 3.

    http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

References

  1. Allen F, Greiner R, Wishart D (2015) Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics 11(1):98–110. https://doi.org/10.1007/s11306-014-0676-4

    Article  CAS  Google Scholar 

  2. Böcker S (2017) Searching molecular structure databases using tandem MS data: are we there yet? Curr Opin Chem Biol 36:1–6. https://doi.org/10.1016/j.cbpa.2016.12.010. https://authors.elsevier.com/a/1UF-u4sz6LvFfY

    Article  PubMed  CAS  Google Scholar 

  3. Böcker S, Dührkop K (2016) Fragmentation trees reloaded. J Cheminform 8:5. https://doi.org/10.1186/s13321-016-0116-8. http://www.jcheminf.com/content/8/1/5

  4. Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher CA, Holland TA, Keseler IM, Kothari A, Kubo A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Subhraveti P, Weaver, DS, Weerasinghe D, Zhang P, Karp PD (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 42(D1):D459–D471. https://doi.org/10.1093/nar/gkt1103. http://nar.oxfordjournals.org/content/42/D1/D459.abstract

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. da Silva RR, Dorrestein PC, Quinn RA (2015) Illuminating the dark matter in metabolomics. Proc Natl Acad Sci U S A 112(41):12549–12550. https://doi.org/10.1073/pnas.1516878112

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Djoumbou-Feunang Y, Fiamoncini J, Gil-de-la Fuente A, Greiner R, Manach C, Wishart DS (2019) BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification. J Cheminf 11(1):2

    Article  Google Scholar 

  7. Dührkop K, Shen H, Meusel M, Rousu J, Böcker S (2015) Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci U S A 112(41):12580–12585. https://doi.org/10.1073/pnas.1509788112

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Dührkop K, Lataretu MA, White WTJ, Böcker S (2018) Heuristic algorithms for the maximum colorful subtree problem. In: Proceedings of workshop on algorithms in bioinformatics (WABI 2018). Leibniz international proceedings in informatics (LIPIcs), vol 113. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, pp 23:1–23:14. https://doi.org/10.4230/LIPIcs.WABI.2018.23. http://drops.dagstuhl.de/opus/volltexte/2018/9325

  9. Dührkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Böcker S (2019) Sirius 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods. https://doi.org/10.1038/s41592-019-0344-8

    Article  PubMed  CAS  Google Scholar 

  10. Fonger GC, Hakkinen P, Jordan S, Publicker S (2014) The National Library of Medicine’s (NLM) Hazardous Substances Data Bank (HSDB): background, recent enhancements and future plans. Toxicology 325:209–216. https://doi.org/10.1016/j.tox.2014.09.003

    Article  CAS  PubMed  Google Scholar 

  11. Gu J, Gui Y, Chen L, Yuan G, Lu HZ, Xu X (2013) Use of natural products as chemical library for drug discovery and network pharmacology. PLoS One 8(4):1–10

    Google Scholar 

  12. Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C (2016) ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res 44(D1):D1214–D1219. https://doi.org/10.1093/nar/gkv1031. http://europepmc.org/articles/PMC4702775

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Heinonen M, Shen H, Zamboni N, Rousu J (2012) Metabolite identification and molecular fingerprint prediction via machine learning. Bioinformatics 28(18):2333–2341. https://doi.org/10.1093/bioinformatics/bts437

    Article  CAS  PubMed  Google Scholar 

  14. Hoffmann N, Rein J, Sachsenberg TT, Hartler J, Haug K, Mayer G, Alka O, Dayalan S, Pearce JTM, Rocca-Serra P et al (2019) mzTab-M: a data standard for sharing quantitative results in mass spectrometry metabolomics. Anal Chem 91(5):3302–3310. https://doi.org/10.1021/acs.analchem.8b04310

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, Ojima Y, Tanaka K, Tanaka S, Aoshima K, Oda Y, Kakazu Y, Kusano M, Tohge T, Matsuda F, Sawada Y, Hirai MY, Nakanishi H, Ikeda K, Akimoto N, Maoka T, Takahashi H, Ara T, Sakurai N, Suzuki H, Shibata D, Neumann S, Iida T, Tanaka K, Funatsu K, Matsuura F, Soga T, Taguchi R, Saito K, Nishioka T (2010) MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom 45(7):703–714. https://doi.org/10.1002/jms.1777

    Article  CAS  PubMed  Google Scholar 

  16. Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 52(7):1757–1768

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Jeffryes JG, Colastani RL, Elbadawi-Sidhu M, Kind T, Niehaus TD, Broadbelt LJ, Hanson AD, Fiehn O, Tyo KEJ, Henry CS (2015) MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. J Cheminform 7:44. https://doi.org/10.1186/s13321-015-0087-1

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462

    Article  CAS  PubMed  Google Scholar 

  19. Keseler IM, Mackie A, Santos-Zavaleta A, Billington R, Bonavides-Martínez C, Caspi R, Fulcher C, Gama-Castro S, Kothari A, Krummenacker M, Latendresse M, Muñiz-Rascado L, Ong Q, Paley S, Peralta-Gil M, Subhraveti P, Velázquez-Ramírez DA, Weaver D, Collado-Vides J, Paulsen I, Karp PD (2017) The EcoCyc database: reflecting new knowledge about Escherichia coli k-12. Nucleic Acids Res 45:D543–D550

    Article  CAS  PubMed  Google Scholar 

  20. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202–D1213. https://doi.org/10.1093/nar/gkv951

    Article  CAS  PubMed  Google Scholar 

  21. Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24(21):2518–2525. https://doi.org/10.1093/bioinformatics/btn479

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Larson EA, Hutchinson CP, Lee YJ (2018) Gas chromatography-tandem mass spectrometry of lignin pyrolyzates with dopant-assisted atmospheric pressure chemical ionization and molecular structure search with CSI:FingerID. J Am Soc Mass Spectrom 29(9):1908–1918. https://doi.org/10.1007/s13361-018-2001-3

    Article  CAS  PubMed  Google Scholar 

  23. Ludwig M, Dührkop K, Böcker S (2018) Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints. Bioinformatics 34(13):i333–i340. https://doi.org/10.1093/bioinformatics/bty245. Proceedings of Intelligent Systems for Molecular Biology (ISMB 2018)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Meusel M, Hufsky F, Panter F, Krug D, Müller R, Böcker S (2016) Predicting the presence of uncommon elements in unknown biomolecules from isotope patterns. Anal Chem 88(15):7556–7566. https://doi.org/10.1021/acs.analchem.6b01015

    Article  CAS  PubMed  Google Scholar 

  25. Mohimani H, Gurevich A, Shlemov A, Mikheenko A, Korobeynikov A, Cao L, Shcherbin E, Nothias LF, Dorrestein PC, Pevzner PA (2018) Dereplication of microbial metabolites through database search of mass spectra. Nat Commun 9(1):4035. https://doi.org/10.1038/s41467-018-06082-8

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Nelson SJ, Johnston WD, Humphreys BL (2001) Relationships in medical subject headings. In: Bean CA, Green R (eds) Relationships in the organization of knowledge. Kluwer Academic Publishers, Dordrecht, pp 171–184. http://www.nlm.nih.gov/mesh/meshrels.html

    Chapter  Google Scholar 

  27. Pluskal T, Castillo S, Villar-Briones A, Oresic M (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinf 11:395. https://doi.org/10.1186/1471-2105-11-395

    Article  CAS  Google Scholar 

  28. Ramirez-Gaona M, Marcu A, Pon A, Guo AC, Sajed T, Wishart NA, Karu N, Djoumbou Feunang Y, Arndt D, Wishart DS (2017) YMDB 2.0: a significantly expanded version of the yeast metabolome database. Nucleic Acids Res 45:D440–D445

    Article  CAS  PubMed  Google Scholar 

  29. Rasche F, Svatoš A, Maddula RK, Böttcher C, Böcker S (2011) Computing fragmentation trees from tandem mass spectrometry data. Anal Chem 83(4):1243–1251. https://doi.org/10.1021/ac101825k

    Article  CAS  PubMed  Google Scholar 

  30. Ridder L, van der Hooft JJJ, Verhoeven S, de Vos RCH, Bino RJ, Vervoort J (2013) Automatic chemical structure annotation of an LC-MS(n) based metabolic profile from green tea. Anal Chem 85(12):6033–6040. https://doi.org/10.1021/ac400861a

    Article  CAS  PubMed  Google Scholar 

  31. Röst HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich HC, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmström L, Aebersold R, Reinert K, Kohlbacher O (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13(9):741–748. https://doi.org/10.1038/nmeth.3959

    Article  CAS  PubMed  Google Scholar 

  32. Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016) MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J Cheminform 8:3. https://doi.org/10.1186/s13321-016-0115-9

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Schymanski EL, Ruttkies C, Krauss M, Brouard C, Kind T, Dührkop K, Allen FR, Vaniya A, Verdegem D, Böcker S, Rousu J, Shen H, Tsugawa H, Sajed T, Fiehn O, Ghesquière B, Neumann S (2017) Critical assessment of small molecule identification 2016: automated methods. J Cheminf 9:22. https://doi.org/10.1186/s13321-017-0207-1

    Article  Google Scholar 

  34. Shinbo Y, Nakamura Y, Altaf-Ul-Amin M, Asahi H, Kurokawa K, Arita M, Saito K, Ohta D, Shibata D, Kanaya S (2006) KNApSAcK: a comprehensive species-metabolite relationship database. In: Saito K, Dixon RA, Willmitzer L (eds) Plant metabolomics. Biotechnology in agriculture and forestry, vol 57. Springer, Berlin, pp 165–181

    Chapter  Google Scholar 

  35. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Tautenhahn R, Cho K, Uritboonthai W, Zhu Z, Patti GJ, Siuzdak G (2012) An accelerated workflow for untargeted metabolomics using the METLIN database. Nat Biotechnol 30(9):826–828. https://doi.org/10.1038/nbt.2348

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Tsugawa H, Kind T, Nakabayashi R, Yukihira D, Tanaka W, Cajka T, Saito K, Fiehn O, Arita M (2016) Hydrogen rearrangement rules: computational ms/ms fragmentation and structure elucidation using MS-FINDER software. Anal Chem 88(16):7946–7958. https://doi.org/10.1021/acs.analchem.6b00770

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Wang R, Fu Y, Lai L (1997) A new atom-additive method for calculating partition coefficients. J Chem Inf Comput Sci 37(3):615–621. https://doi.org/10.1021/ci960169p

    Article  CAS  Google Scholar 

  39. Wang R, Gao Y, Lai L (2000) Calculating partition coefficient by atom-additive method. Perspect Drug Discov Des 19(1):47–66. https://doi.org/10.1023/A:1008763405023

    Article  CAS  Google Scholar 

  40. Wang Y, Kora G, Bowen BP, Pan C (2014) MIDAS: a database-searching algorithm for metabolite identification in metabolomics. Anal Chem 86(19):9496–9503. https://doi.org/10.1021/ac5014783

    Article  CAS  PubMed  Google Scholar 

  41. Wang M et al (2016) Sharing and community curation of mass spectrometry data with Global Natural Products Social molecular networking. Nat Biotechnol 34(8):828–837. https://doi.org/10.1038/nbt.3597

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Weber RJM, Li E, Bruty J, He S, Viant MR (2012) MaConDa: a publicly accessible mass spectrometry contaminants database. Bioinformatics 28(21):2856–2857. https://doi.org/10.1093/bioinformatics/bts527

    Article  CAS  PubMed  Google Scholar 

  43. Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C (2017) The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminf 9(1):33. http://dx.doi.org/10.1186/s13321-017-0220-4

  44. Wishart DS, Feunang YD, Marcu A, Guo AC, Liang K, Vázquez-Fresno R, Sajed T, Johnson D, Li C, Karu N, Sayeeda Z, Lo E, Assempour N, Berjanskii M, Singhal S, Arndt D, Liang Y, Badran H, Grant J, Serra-Cayuela A, Liu Y, Mandal R, Neveu V, Pon A, Knox C, Wilson M, Manach C, Scalbert A (2018) HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res 46(D1):D608–D617. http://dx.doi.org/10.1093/nar/gkx1089

    Article  PubMed Central  CAS  Google Scholar 

  45. Wolf S, Schmidt S, Müller-Hannemann M, Neumann S (2010) In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinf 11:148. https://doi.org/10.1186/1471-2105-11-148

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Böcker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Ludwig, M., Fleischauer, M., Dührkop, K., Hoffmann, M.A., Böcker, S. (2020). De Novo Molecular Formula Annotation and Structure Elucidation Using SIRIUS 4. In: Li, S. (eds) Computational Methods and Data Analysis for Metabolomics. Methods in Molecular Biology, vol 2104. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0239-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0239-3_11

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0238-6

  • Online ISBN: 978-1-0716-0239-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics