Skip to main content
Log in

MetaObtainer: A Tool for Obtaining Specified Species from Metagenomic Reads of Next-generation Sequencing

  • Original Research Article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Reads classification is an important fundamental problem in metagenomics study. With the development of next-generation sequencing, metagenome samples can be generated using much less money and time. However, the short reads generated by next-generation sequencing make the problem of reads classification much more difficult than before. None of the existing tools can assign NGS short reads to each genome accurately, which limit their use in real application. Fortunately, in many applications, it is meaningless to separate all the species in the metagenome sample from each other. That is because we usually only focus on some specified species categories in the sample and do not care about the others. There is no existing tool that is designed technically for obtaining specified species from short metagenome reads generated by next-generation sequencing. In this paper, we propose a tool named MetaObtainer to obtain the specified species from next-generation sequencing short reads. The tool synthesizes some of newest technologies for processing of short reads, so it can have better performance than other tools. It can (1) deal with next-generation sequencing reads which are shorter than 100 bp with very high accuracy (both of precision and recall are more than 90 %); (2) find unknown species using the reference genomes of species which are similar with it; (3) perform well when reads of specified species are very few in the dataset; (4) handle genomes of similar abundance levels as well as different abundance levels (1:10); and (5) obtain multiple species categories from metagenome sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Béjà O et al (2000) Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol 2(5):516–529

    Article  Google Scholar 

  2. Huson DH et al (2007) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386

    Article  CAS  Google Scholar 

  3. Krause L et al (2008) Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res 36(7):2230–2239

    Article  CAS  Google Scholar 

  4. Yang B et al (2010) Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers. BMC Bioinform 11(Suppl 2):S5

    CAS  Google Scholar 

  5. Yang B et al (2010) MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation. In: Proceedings of the first ACM international conference on bioinformatics and computational biology, pp 170–179

  6. Leung HC et al (2011) A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11):1489–1495

    Article  CAS  Google Scholar 

  7. Chatterji S et al (2008) CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads. In: Research in computational molecular biology, pp 17–28

  8. Diaz NN et al (2009) TACOA-taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinform 10(1):56

    Article  Google Scholar 

  9. McHardy AC et al (2006) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4(1):63–72

    Article  Google Scholar 

  10. Brady A et al (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6(9):673–676

    Article  CAS  Google Scholar 

  11. Reis-Filho JS (2009) Next-generation sequencing. Breast Cancer Res 11(Suppl 3):S12

    Article  Google Scholar 

  12. Bentley SD et al (2004) Comparative genomic structure of prokaryotes. Annu Rev Genet 38:771–791

    Article  CAS  Google Scholar 

  13. Wu Y et al (2010) A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. In: Research in computational molecular biology, pp 535–549

    Google Scholar 

  14. Tanaseichuk O et al (2011) Separating metagenomic short reads into genomes via clustering. In: WABI, pp 298–313

  15. Tanaseichuk O et al (2012) A probabilistic approach to accurate abundance-based binning of metagenomic reads. In: Algorithms in bioinformatics, pp 404–416

    Google Scholar 

  16. Wang Y et al (2012) MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species. J Comput Biol 19(2):241–249

    Article  CAS  Google Scholar 

  17. Wang Y et al (2012) MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28(18):i356–i362

    Article  CAS  Google Scholar 

  18. Wu Q et al (2012) Homology-independent discovery of replicating pathogenic circular RNAs by deep sequencing and a new computational algorithm. Proc Nat Acad Sci 109(10):3938–3943

    Article  CAS  Google Scholar 

  19. Cortes C et al (1995) Support vector machine. Mach Learn 20(3):273–297

    Google Scholar 

  20. Dayhoff JE et al (2001) Artificial neural networks. Cancer 91(S8):1615–1635

    Article  CAS  Google Scholar 

  21. Cover T et al (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  Google Scholar 

  22. Chor B et al (2009) Genomic DNA k-mer spectra: models and modalities. Genome Biol 10(10):R108

    Article  Google Scholar 

  23. Zhou F et al (2008) Barcodes for genomes and applications. BMC Bioinform 9(1):546

    Article  Google Scholar 

  24. Richter DC et al (2008) MetaSim-A sequencing simulator for genomics and metagenomics. PloS One 3(10):e3373

    Article  Google Scholar 

Download references

Acknowledgments

We thank Jiaoyun Yang, Pengyu Nie, and Xingxing Zhang, who provided many helpful suggestions for our article. Constructive comments from the reviewers are also appreciated. This work is supported by the National Natural Science Foundation of China (No. 61033009 and No. 60970085) and Foreign Scholars in University Research and Teaching Programs (B07033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, W., Chen, B. & Xu, Y. MetaObtainer: A Tool for Obtaining Specified Species from Metagenomic Reads of Next-generation Sequencing. Interdiscip Sci Comput Life Sci 7, 405–413 (2015). https://doi.org/10.1007/s12539-015-0281-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-015-0281-x

Keywords

Navigation