Skip to main content

pyPGCF: A Python Software for Phylogenomic Analysis, Species Demarcation, Identification of Core, and Fingerprint Proteins of Bacterial Genomes That Are Important for Plants

  • Protocol
  • First Online:
Plant Functional Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2788))

  • 154 Accesses

Abstract

This computational protocol describes how to use pyPGCF, a python software package that runs in the linux environment, in order to analyze bacterial genomes and perform: (i) phylogenomic analysis, (ii) species demarcation, (iii) identification of the core proteins of a bacterial genus and its individual species, (iv) identification of species-specific fingerprint proteins that are found in all strains of a species and, at the same time, are absent from all other species of the genus, (v) functional annotation of the core and fingerprint proteins with eggNOG, and (vi) identification of secondary metabolite biosynthetic gene clusters (smBGCs) with antiSMASH. This software has already been implemented to analyze bacterial genera and species that are important for plants (e.g., Pseudomonas, Bacillus, Streptomyces). In addition, we provide a test dataset and example commands showing how to analyze 165 genomes from 55 species of the genus Bacillus. The main advantages of pyPGCF are that: (i) it uses adjustable orthology cut-offs, (ii) it identifies species-specific fingerprints, and (iii) its computational cost scales linearly with the number of genomes being analyzed. Therefore, pyPGCF is able to deal with a very large number of bacterial genomes, in reasonable timescales, using widely available levels of computing power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815

    Article  Google Scholar 

  2. Simpson AJ, Reinach FC, Arruda P, Abreu FA, Acencio M, Alvarenga R, Alves LM, Araya JE, Baia GS, Baptista CS et al (2000) The genome sequence of the plant pathogen Xylella fastidiosa. The Xylella fastidiosa consortium of the Organization for Nucleotide Sequencing and Analysis. Nature 406:151–159

    Article  CAS  PubMed  Google Scholar 

  3. Amoutzias GD, Nikolaidis M, Hesketh A (2022) The notable achievements and the prospects of bacterial pathogen genomics. Microorganisms 10:1040

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Woese CR, Stackebrandt E, Weisburg WG, Paster BJ, Madigan MT, Fowler VJ, Hahn CM, Blanz P, Gupta R, Nealson KH et al (1984) The phylogeny of purple bacteria: the alpha subdivision. Syst Appl Microbiol 5:315–326

    Article  CAS  PubMed  Google Scholar 

  5. Tindall BJ, Rosselló-Móra R, Busse H-J, Ludwig W, Kämpfer P (2010) Notes on the characterization of prokaryote strains for taxonomic purposes. Int J Syst Evol Microbiol 60:249–266

    Article  CAS  PubMed  Google Scholar 

  6. Whitman WB, Woyke T, Klenk H-P, Zhou Y, Lilburn TG, Beck BJ, De Vos P, Vandamme P, Eisen JA, Garrity G et al (2015) Genomic encyclopedia of bacterial and archaeal type strains, phase III: the genomes of soil and plant-associated and newly described type strains. Stand Genomic Sci 10:26

    Article  PubMed  PubMed Central  Google Scholar 

  7. Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA et al (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A 95:3140–3145

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Jolley KA, Bliss CM, Bennett JS, Bratcher HB, Brehony C, Colles FM, Wimalarathna H, Harrison OB, Sheppard SK, Cody AJ et al (2012) Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain. Microbiology 158:1005–1015

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Vernikos G, Medini D, Riley DR, Tettelin H (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154

    Article  CAS  PubMed  Google Scholar 

  10. Kyrpides NC, Hugenholtz P, Eisen JA, Woyke T, Göker M, Parker CT, Amann R, Beck BJ, Chain PSG, Chun J et al (2014) Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains. PLoS Biol 12:e1001920

    Article  PubMed  PubMed Central  Google Scholar 

  11. Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ et al (2009) A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature 462:1056–1060

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gogarten JP, Townsend JP (2005) Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol 3:679–687

    Article  CAS  PubMed  Google Scholar 

  13. Kunin V, Goldovsky L, Darzentas N, Ouzounis CA (2005) The net of life: reconstructing the microbial phylogenetic network. Genome Res 15:954–959

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kunin V, Ouzounis CA (2003) The balance of driving forces during genome evolution in prokaryotes. Genome Res 13:1589–1594

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004

    Article  CAS  PubMed  Google Scholar 

  16. Konstantinidis KT, Tiedje JM (2005) Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci U S A 102:2567–2572

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Richter M, Rosselló-Móra R (2009) Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci U S A 106:19126–19131

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114

    Article  PubMed  PubMed Central  Google Scholar 

  19. Nikolaidis M, Mossialos D, Oliver SG, Amoutzias GD (2020) Comparative analysis of the core proteomes among the pseudomonas major evolutionary groups reveals species-specific adaptations for pseudomonas aeruginosa and pseudomonas chlororaphis. Diversity 12:289

    Article  CAS  Google Scholar 

  20. Nikolaidis M, Hesketh A, Mossialos D, Iliopoulos I, Oliver SG, Amoutzias GD (2022) A comparative analysis of the core proteomes within and among the Bacillus subtilis and Bacillus cereus evolutionary groups reveals the patterns of lineage- and species-specific adaptations. Microorganisms 10:1720

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial ‘pan-genome’. Proc Natl Acad Sci U S A 102:13950–13955

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Rooney AP, Price NPJ, Ehrhardt C, Swezey JL, Bannan JD (2009) Phylogeny and molecular taxonomy of the Bacillus subtilis species complex and description of Bacillus subtilis subsp. inaquosorum subsp. nov. Int J Syst Evol Microbiol 59:2429–2436

    Article  CAS  PubMed  Google Scholar 

  23. Jun S-R, Wassenaar TM, Nookaew I, Hauser L, Wanchai V, Land M, Timm CM, Lu T-YS, Schadt CW, Doktycz MJ et al (2016) Diversity of pseudomonas genomes, including Populus-associated isolates, as revealed by comparative genome analysis. Appl Environ Microbiol 82:375–383

    Article  CAS  PubMed  Google Scholar 

  24. Vernikos GS (2020) A review of Pangenome tools and recent studies. In: Tettelin H, Medini D (eds) The Pangenome. Springer International Publishing, Cham, pp 89–112

    Chapter  Google Scholar 

  25. Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238

    Article  PubMed  PubMed Central  Google Scholar 

  26. Altenhoff AM, Levy J, Zarowiecki M, Tomiczek B, Warwick Vesztrocy A, Dalquen DA, Müller S, Telford MJ, Glover NM, Dylus D et al (2019) OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Res 29:1152–1163

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Buchfink B, Reuter K, Drost H-G (2021) Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18:366–368

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421

    Article  PubMed  PubMed Central  Google Scholar 

  30. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552

    Article  CAS  PubMed  Google Scholar 

  32. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14:587–589

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J (2021) eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol 38:5825–5829

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ et al (2019) eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314

    Article  CAS  PubMed  Google Scholar 

  36. Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, van Wezel GP, Medema MH, Weber T (2021) antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 49:W29–W35

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

Marios Nikolaidis would like to thank the University of Thessaly Research Committee (Ph.D studentship: DEKA-UTH-259) for financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grigorios D. Amoutzias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Nikolaidis, M., Oliver, S.G., Amoutzias, G.D. (2024). pyPGCF: A Python Software for Phylogenomic Analysis, Species Demarcation, Identification of Core, and Fingerprint Proteins of Bacterial Genomes That Are Important for Plants. In: Maghuly, F. (eds) Plant Functional Genomics. Methods in Molecular Biology, vol 2788. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3782-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3782-1_8

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3781-4

  • Online ISBN: 978-1-0716-3782-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics