Skip to main content

Evaluating Programs for Predicting Genes and Transcripts with RNA-Seq Support in Fungal Genomes

  • Protocol
  • First Online:
Fungal Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1775))

Abstract

The steps needed to computationally predict genes and transcripts in fungal genomes with support from RNA-Seq data are described in detail for three prediction programs: CodingQuarry, BRAKER1, and Harfang. These programs predicted from 86% to 92% (Harfang) of the genes in a manually curated reference set for Aspergillus niger strain NRRL3. Genes with little or no RNA-Seq read coverage were predicted less successfully than genes with adequate coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Majoros WH (2007) Methods for computational gene prediction. Cambridge University Press, Cambridge

    Book  Google Scholar 

  2. Hrdlickova R, Toloue M, Tian B (2017) RNA-Seq methods for transcriptome analysis. Wiley Interdiscip Rev RNA 8(1). https://doi.org/10.1002/wrna.1364

    Article  CAS  Google Scholar 

  3. Levin JZ, Yassour M, Adiconis X, Nusbaum C, Thompson DA, Friedman N, Gnirke A, Regev A (2010) Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods 7(9):709–715. https://doi.org/10.1038/nmeth.1491

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Wikipedia (2017) List of gene prediction software. https://en.wikipedia.org/wiki/List_of_gene_prediction_software

    Google Scholar 

  5. Reese MG, Hartzell G, Harris NL, Ohler U, Abril JF, Lewis SE (2000) Genome annotation assessment in Drosophila melanogaster. Genome Res 10:483–501

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG (2006) EGASP: the human ENCODE genome annotation assessment project. Genome Biol 7(Suppl 1):S2.1–S231. https://doi.org/10.1186/gb-2006-7-s1-s2

    Article  Google Scholar 

  7. Coghlan A, Fiedler TJ, SJ MK, Flicek P, Harris TW, Blasiar D, nGASP Consortium, Stein LD (2008) nGASP--the nematode genome annotation assessment project. BMC Bioinformatics 9:549. https://doi.org/10.1186/1471-2105-9-549

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Galagan JE, Henn MR, Ma L, Cuomo CA, Birren B (2005) Genomics of the fungal kingdom: insights into eukaryotic biology. Genome Res 15:1620–1631

    Article  CAS  PubMed  Google Scholar 

  9. Nakagawa S, Niimura Y, Gojobori T, Tanaka H, Miura K (2008) Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes. Nucleic Acids Res 36:861–871

    Article  CAS  PubMed  Google Scholar 

  10. Grützmann K, Szafranski K, Pohl M, Voigt K, Petzold A, Schuster S (2014) Fungal alternative splicing is associated with multicellular complexity and virulence: a genome-wide multi-species study. DNA Res 21(1):27–39. https://doi.org/10.1093/dnares/dst038

    Article  PubMed  CAS  Google Scholar 

  11. McDonnell E, Strasser K, Tsang A. (2018) Manual Gene Curation and Functional Annotation. This book

    Google Scholar 

  12. Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M (2016) BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767–769. https://doi.org/10.1093/bioinformatics/btv661

    Article  PubMed  CAS  Google Scholar 

  13. Lomsadze A, Burns PD, Borodovsky M (2014) Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res 42:e119. https://doi.org/10.1093/nar/gku557

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Stanke M, Diekhans M, Baertsch R, Haussler D (2008) Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644

    Article  CAS  PubMed  Google Scholar 

  15. Testa AC, Hane JK, Ellwood SR, Oliver RP (2015) CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genomics 16:170. https://doi.org/10.1186/s12864-015-1344-4

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Reid I, O'Toole N, Zabaneh O, Nourzadeh R, Dahdouli M, Abdellateef M, Gordon PM, Soh J, Butler G, Sensen CW, Tsang A (2014) SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models. BMC Bioinformatics 15:229. https://doi.org/10.1186/1471-2105-15-229

    Article  PubMed  PubMed Central  Google Scholar 

  17. Tange O (2011) Gnu parallel – the command-line power tool. Login: The USENIX Magazine 36:42–47

    Google Scholar 

  18. Song L, Florea L (2015) Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads. GigaScience 4(48). https://doi.org/10.1186/s13742-015-0089-y

  19. Hongshang J, Lei R, Ding S-W, Zhu S (2014) Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15:1–12

    Google Scholar 

  20. Kopylova E, Noé L, Touzet H (2012) SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28:3211–3217. https://doi.org/10.1093/bioinformatics/bts611

    Article  PubMed  CAS  Google Scholar 

  21. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21. https://doi.org/10.1093/bioinformatics/bts635

    Article  PubMed  CAS  Google Scholar 

  22. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33:290–295. https://doi.org/10.1038/nbt.3122

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2008) BLAST+: architecture and applications. BMC Bioinformatics 10:421. https://doi.org/10.1186/1471-2105-10-421

    Article  CAS  Google Scholar 

  24. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659

    Article  CAS  PubMed  Google Scholar 

  25. Robinson JT, Helga Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14:178–192

    Article  CAS  PubMed  Google Scholar 

  27. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C (2016) Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference. BioRxiv. https://doi.org/10.1101/021592

Download references

Acknowledgments

This work was supported by Genome Canada and Génome Québec.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ian Reid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Reid, I. (2018). Evaluating Programs for Predicting Genes and Transcripts with RNA-Seq Support in Fungal Genomes. In: de Vries, R., Tsang, A., Grigoriev, I. (eds) Fungal Genomics. Methods in Molecular Biology, vol 1775. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7804-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7804-5_17

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7803-8

  • Online ISBN: 978-1-4939-7804-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics