Review
Sequencing our way towards understanding global eukaryotic biodiversity

https://doi.org/10.1016/j.tree.2011.11.010Get rights and content

Microscopic eukaryotes are abundant, diverse and fill critical ecological roles across every ecosystem on Earth, yet there is a well-recognized gap in understanding of their global biodiversity. Fundamental advances in DNA sequencing and bioinformatics now allow accurate en masse biodiversity assessments of microscopic eukaryotes from environmental samples. Despite a promising outlook, the field of eukaryotic marker gene surveys faces significant challenges: how to generate data that are most useful to the community, especially in the face of evolving sequencing technologies and bioinformatics pipelines, and how to incorporate an expanding number of target genes.

Section snippets

Microscopic eukaryotes: global dominance, scant knowledge

Microscopic eukaryotic taxa are abundant and diverse, playing a globally important role in the functioning of ecosystems 1, 2 and host-associated habitats [3]. Here, we consider taxa generally represented by individuals <1 mm in size; the term ‘microscopic eukaryotes’ thus encompasses meiofaunal metazoans (e.g. Nematoda, Platyhelminthes, Gastrotricha and Kinorhyncha; see Glossary), microbial representatives of fungi and deep protist lineages (Alveolata, Rhizaria, Amoebozoa, algal taxa in the

Emerging insight from environmental data

Following earlier 16S rRNA (reference GenBank accession X80721.1 for Escherichia coli) investigations of archaeal and bacterial communities 14, 22, high-throughput marker gene approaches were developed for different groups of microscopic eukaryote using the 18S nuclear small subunit rRNA gene (nSSU; reference GenBank accession X03680.1 for Caenorhabditis elegans), focusing on protists 11, 12, 23, 24, 25, 26 and meiofauna 9, 10, 27. Similar to 16S investigations, these early 18S studies

Analyzing high-throughput data

Over the past few years, high-throughput sequencing techniques have been informed by rapid progress in sequencing technology, bioinformatics tools and analytical pipelines. Here, we present an overview of the analytical considerations for high-throughput studies. Following sample collection, extraction of environmental DNA, PCR and sequencing (Figure 1), large data sets can be processed using many existing tools (Table 1, Figure 2).

The need for robust guide trees and reference databases

Limited eukaryote reference databases and inconsistent taxonomic levels currently hinder the development of robust computational pipelines for marker gene data (e.g. reference-based OTU picking and confident taxonomy assignments [60]), and limit the use of tree-based methods and deeper sequencing technologies with shorter sequence reads (such as those derived from the Illumina platforms). Microscopic eukaryotic taxa have been historically underrepresented in public repositories, with some

Future outlook and challenges

Although substantial progress is being made with high-throughput eukaryotic studies, many challenges lie ahead. A strong emphasis on morphological and environmental data collection, guide trees and reference sequence databases, and open-access repositories for high-throughput data sets is urgently needed. Large-scale sequencing methods offer substantial promise for basic and applied biodiversity research, yet the wider adoption of these approaches will probably hinge on the ease-of-use and

Concluding remarks

The promise and accessibility of high-throughput sequencing is now poised to attract increasing numbers of non-computationally trained researchers. With ongoing declines in the price of sequencing, deep sequencing will inevitably represent the most cost-effective approach for elucidating ecological and functional roles of complex communities. However, exploiting the data will require the continued refinement of bioinformatics pipelines and database resources, which will in turn require an

Acknowledgments

The authors would like to thank the anonymous reviewers for their insightful comments that significantly helped to improve an earlier version of the manuscript. Development of this manuscript was made possible by a Catalysis Meeting award (HB and WKT) from the National Evolutionary Synthesis Center. HB and WKT supported through NSF (DEB-1058458 and NIH (NIH-1P20RR030360-01)). SC supported by a Natural Environment Research Council (NERC) New Investigator Grant (NE/E001505/1), a Post Genomic and

Glossary

454
common term for the Roche GS platforms that use bead emulsion methods and typically return approximately 1.2 million sequences per full plate run (reads currently averaging 350–450 bp).
Illumina
company producing the newest Hi-Seq and MiSeq platforms, which uses bridge amplification to produce 1.6 billion sequences per eight-lane Hi-Seq flow cell (current max length for paired-end reads is 300 bp).
Marker gene surveys
high-throughput environmental sequencing utilizing homologous genetic loci (e.g.

References (100)

  • A. Groisillier

    Genetic diversity and habitats of two enigmatic marine alveolate lineages

    Aquat. Microb. Ecol.

    (2006)
  • A. Rosling

    Archaeorhizomycetes: unearthing an ancient class of ubiqutous soil fungi

    Science

    (2011)
  • M.D.M. Jones

    Discovery of novel intermediate forms redefines the fungal tree of life

    Nature

    (2011)
  • S. Creer

    Ultrasequencing of the meiofaunal biosphere: practice, pitfalls, and promises

    Mol. Ecol.

    (2010)
  • D.L. Porazinska

    Evaulating high-throughput sequencing as a method for metagenomic analysis of nematode diversity

    Mol. Ecol. Resour.

    (2009)
  • T. Stoeck

    Multiple marker parallel tag enviornmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water

    Mol. Ecol.

    (2010)
  • T. Stoeck

    Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities

    BMC Biol.

    (2009)
  • N.R. Pace

    A molecular view of microbial diversity and the biosphere

    Science

    (1997)
  • M.L. Sogin

    Microbial diversity in the deep sea and the unexplored ‘rare biosphere’

    Proc. Natl. Acad. Sci. U.S.A.

    (2006)
  • L. Amaral-Zettler

    A global census of marine microbes

  • N. Fierer

    The influence of sex, handedness, and washing on the diversity of hand surface bacteria

    Proc. Natl. Acad. Sci. U.S.A.

    (2008)
  • P.J. Turnbaugh

    A core gut microbiome in obese and lean twins

    Nature

    (2009)
  • C.D. Prokopowich

    The correlation between rDNA copy number and genome size in eukaryotes

    Genome

    (2003)
  • A.Y. Pei

    Diversity of 16S rRNA genes within individual prokaryotic genomes

    Appl. Environ. Microbiol.

    (2010)
  • J.O. Andersson

    Lateral gene transfer in eukaryotes

    Cell. Mol. Life Sci.

    (2005)
  • J.A. Huber

    Microbial population structures in the deep marine biosphere

    Science

    (2007)
  • L. Amaral-Zettler

    A method for studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-subunit ribosomal RNA genes

    PLoS ONE

    (2009)
  • R. Medinger

    Diversity in a hidden world: potential an limitation of next-generation sequencing for surveys of molecular diversity of eukaryotic microorganisms

    Mol. Ecol.

    (2010)
  • W. Orsi

    Protistan microbial observatory in the Cariaco Basin, Caribbean. II. Habitat specialization

    ISME J.

    (2011)
  • V. Edgcomb

    Protistan microbial observatory in the Cariaco Basin, Caribbean. I. Pyrosequencing vs Sanger insights into species richness

    ISME J.

    (2011)
  • H.M. Bik

    Metagenetic community analysis of microbial eukaryotes illuminates biogeographic patterns in deep-sea and shallow water sediments

    Mol. Ecol.

    (2012)
  • V. Nolte

    Contrasting seasonal niche separation between rare and abundant taxa conceals the extent of protist diversity

    Mol. Ecol.

    (2010)
  • J. Reeder et al.

    Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions

    Nat. Methods

    (2010)
  • V.G. Fonseca

    Second-generation environmental sequencing unmasks marine metazoan biodiversity

    Nat. Commun.

    (2010)
  • A. Behnke

    Depicting more accurate pictures of protistan community complexity using pyrosequencing of hypervariable SSU rRNA gene regions

    Environ. Microbiol.

    (2011)
  • B. Lecroq

    Ultra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous lineages in deep-sea sediments

    Proc. Natl. Acad. Sci. U.S.A.

    (2011)
  • A. Chariton

    Ecological assessment of estuarine sediments by pyrosequencing eukaryotic ribosomal DNA

    Front. Ecol. Environ.

    (2010)
  • M. Hajibabaei

    Environmental barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos

    PLoS ONE

    (2011)
  • M.E. Pfrender

    Assessing macroinvertebrate biodiversity in freshwater ecosystems: advances and challenges in DNA-based approaches

    Q. Rev. Biol.

    (2010)
  • B. Emerson

    Phylogeny, phylogeography, phylobetadiversity and the molecular analysis of biological communities

    Philos. Trans. R. Soc. B: Biol. Sci.

    (2011)
  • C. Quince

    Removing noise from pyrosequenced amplicons

    BMC Bioinform.

    (2011)
  • F.A. Matsen

    pplacer: linear time maximum-likelihood Bayesian phyogenetic placement of sequences onto a fixed reference tree

    BMC Bioinform.

    (2010)
  • S.A. Berger et al.

    Aligning short reads to reference alignments and trees

    Bioinformatics

    (2011)
  • R.C. Edgar

    Search and clustering orders of magnitude faster than BLAST

    Bioinformatics

    (2010)
  • P.D. Schloss

    Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities

    Appl. Environ. Microbiol.

    (2009)
  • W. Li et al.

    Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

    Bioinformatics

    (2006)
  • Y. Cai et al.

    ESPRIT-Tree: hierarchial clustring analysis of millions of 16S rRNA pyrosequences in quasilinear computational time

    Nucleic Acids Res.

    (2011)
  • J.R. Cole

    The Ribosomal Database Project: improved alignments and new tools for rRNA analysis

    Nucleic Acids Res.

    (2009)
  • T. DeSantis

    Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB

    Appl. Environ. Microbiol.

    (2006)
  • J.G. Caporaso

    QIIME allows analysis of high-throughput community sequencing data

    Nat. Methods

    (2010)
  • Cited by (341)

    View all citing articles on Scopus
    View full text