Main

The classification system developed by Carl Linnaeus extended to animals, plants and rocks. Linnaeus did not classify microbes, used in this Review to refer collectively and exclusively to Bacteria and Archaea1, but since the mid-nineteenth century, binomial Linnaean names have been used by microbiologists to designate microbial species. The species level is where multiple disciplines intersect, including microbial systematics, ecology, population genetics, evolution and genomics. The explosion of data during the genomic era has been accompanied by debates that threaten the existence of any cohesive overview of the genetic nature of microbial species. Furthermore, the newly appreciated importance of the frequency of recombination, either in the form of homologous recombination2 or lateral gene transfer(LGT)3, has challenged our concepts in each field and has raised many questions. It is difficult to integrate these diverse sources of information, because microbiologists lack a widely accepted theoretical species concept that is comparable to the biological species concept proposed by Ernst Mayr4. According to the biological species concept, the existence of many animal species and some plant species is ensured by the cohesive evolutionary forces that result from pre-zygotic and post-zygotic barriers between eukaryotic species5. Cohesive evolutionary forces are also necessary to generate microbial species, because in their absence the accumulation of genetic variation would probably result in a genetic continuum and microbial species would represent an artificial classification rather than a natural, circumscribed biological grouping3.

Microbial species are currently defined by a pragmatic, polyphasic approach that is based on clear rules for both genotypic and phenotypic properties6 (Box 1). This pragmatic approach has served the community well, resulting in more than 7,031 accepted microbial species (G.M. Garrity, personal communication), and is being adapted to the genomic era. However, as currently practised, this approach faces serious problems, because a primary criterion for distinguishing species is a certain cut-off level for pairwise genomic DNA–DNA hybridization levels. This cut-off level is not based on any particular theoretical justification, but instead was chosen 20 years ago to match pre-existing species definitions. In addition, similar to the situation with eukaryotes7,8, most subsequent debates among microbiologists on the topic of microbial species have focused on methodologies for their definition9,10,11,12,13, rather than on explanations for their existence. Surveys of microbial diversity have equated species with operational taxonomic units (OTUs), based on 16S ribosomal RNA (rRNA) sequences14. However, 16S rRNA possesses insufficient genetic resolution for the reliable binning of microbes into species, and it might be preferable to use the average sequence diversity between all orthologous genes in pairs of genome sequences12,15 (Box 1). Recommending whole-genome comparisons for the definition of species may seem premature, considering that only approximately 600 completed microbial genomes are currently available and that the diversity of uncultured microbes in the environment is high. However, more than 1,400 bacterial genomes are currently being sequenced; genomic sequences have been obtained from uncultured bacteria by metagenomic studies16,17,18,19 and after whole-genome amplification of single cells20,21; and the rapid development of re-sequencing technologies will soon increase the number of available genome sequences by several orders of magnitude22. Alternatively, it might be possible to replace pairwise DNA–DNA hybridization by the identification of discrete sequence clusters based on multiple core genes2,10,23. But these are technical issues that do not address primary conceptual questions, namely, what is a microbial species and do they truly exist3,13? In this Review, we propose that microbiologists should adopt an abstract species concept that seems to apply to all biological organisms, we discuss potential cohesive forces at the species level and we illustrate these concepts through selected examples of the genetic diversity within and between species.

A species concept for microbes

Various concepts have been suggested for microbial species (Box 1), but none have been generally accepted9, possibly because all of these concepts include methodological considerations. In contrast to concepts based on particular methods, the method-free unitary species concept of de Queiroz7 (and similar considerations by Hey8) seems to apply well to microbes — that is, “species are metapopulation lineages”. Metapopulations are “sets of connected subpopulations” that are “maximally inclusive” and the limits of which are set by evolutionary cohesive forces. A lineage can be thought of as a metapopulation that extends through time (Fig. 1), “occupies an adaptive zone minimally different from that of any other lineage in its range” and “evolves separately from all lineages outside its range”. Unlike other species concepts, “metapopulation lineages do not have to be phenotypically distinguishable, or diagnosable, or monophyletic, or reproductively isolated, or ecologically divergent, to be species. They only have to be evolving separately from other such lineages.” Microbes that form distinct groups owing to a cohesive force are metapopulation lineages and thus form species, whereas microbes without limits imposed by a cohesive force do not.

Figure 1: A unifying species concept that consists of a metapopulation lineage over time.
figure 1

Theoretical histograms indicating the frequencies of binned levels of pairwise genetic divergence are shown. A metapopulation consists of one or more related populations at any one time; this schematic shows two populations. Samples taken at different time points, indicated in different colours, yield the same metapopulation, which indicates that it is a lineage, even if quantitative genetic divergence within and between each population varies owing to microevolution. Figure modified, with permission, from Ref. 7 © (2005) National Academy of Sciences.

Notably, the only criterion for a species according to this concept is their evolutionary fate, and no methodological criterion is required for assigning species designations. If this concept were accepted by microbiologists, it would still be necessary to agree on practical approaches and criteria by which species can be recognized, which might differ from current methodologies. Yet the following description of models of population structures in microbes, and observations on their microevolution, support our interpretation that such a species concept is necessary as a guide for the definition of species. Further experimental data and experience will be needed, however, to determine whether this concept is sufficient for this purpose.

Ecotypes and periodic selection

Various ecotype-based population-structure models have been proposed by Cohan24 that could result in cohesive evolutionary forces in microbes (Fig. 2a–e). For asexual microbes, all related inhabitants of a unique ecological niche are thought to belong to a single, stable ecotype, within which overgrowth by fitter variants (periodic selection) repeatedly eliminates the genetic diversity that accumulates over time (Fig. 2a). Periodic selection places limits on the genetic diversity within an ecotype, but does not prevent the divergence of other, non-competing ecotypes in distinct niches, nor does it limit their diversity. Ecotypes are equated with true biological groupings24,25. By contrast, some described species are thought to be too broad, as they encompass multiple ecotypes. Under special conditions, however, single ecotypes can also encompass multiple clusters of genotypes. For example, discrete genotype clusters that have diverged during geographical separation (allopatry) can compete owing to facilitated global transmission (Fig. 2b) and, when the population size is small, genetic diversity can persist owing to genetic drift, instead of being eliminated by periodic selection (Fig. 2c). Alternatively, homologous recombination between ecotypes can slow the elimination of genetic diversity (Fig. 2d) and LGT can facilitate the continual emergence of new ecotypes, with concomitant, continuous extinction of other competing ecotypes (Fig. 2e).

Figure 2: Theoretical models of population structure versus population structures of genetically monomorphic bacterial pathogens.
figure 2

a–e | Five types of ecotype models have been previously described in detail10. Basic characteristics of these models are indicated by their trivial names (for example, the Stable ecotype model and the Geotype and Boeing model). E1 and E2 represent ecotypes; G1 and G2 represent genotypes. Colours reflect genetic ancestry. Solid lines indicate extant lineages that exist today, whereas dotted lines indicate extinct lineages that have disappeared owing to overgrowth during episodes of periodic selection. f | Minimal spanning tree of genotypes within Salmonella enterica subsp. enterica serovar Typhi based on mutation discovery over 88 kb among 105 strains that were isolated in Africa, Asia and South America. The most striking aspect of these data is the apparent continuity of genetic diversity, which argues against periodic selection. The numbers along the edges that connect haplotypes are numbers of single nucleotide polymorphisms (SNPs); unlabelled edges represent single SNPs. An additional striking aspect is the current existence of ancestral haplotypes, including the ancestral node H45, which is circled in red. Finally, there is a lack of geographic specificity and many old haplotypes are global in their distribution, indicating multiple events of global spread. g | Conceptual phylogenetic tree based on genomic synonymous SNPs within 156 strains of Yersinia pestis that resembles the expectations of genetic drift (c), rather than periodic selection. The tree has a three-branch structure and contains eight populations, but the number of ecotypes, if any, is unknown. Numbers along branches represent numbers of synonymous SNPs (as in f); the branches are not drawn to scale. h | Patterns that resemble the species-less model (e) are caused by repeated extinction of genotype clusters (called 'genoclouds') among 502 isolates of Neisseria meningitidis serogroup A subgroup III during pandemic spread of cerebrospinal meningitis36. However, the rapid, apparently random, successions of genoclouds are attributed to geographic bottlenecks during epidemic spread that amplify the first strain to pass the bottleneck, even if it is a sequence variant. Genoclouds are represented by horizontal rectangles, the widths of which indicate the time periods during which they were isolated. Vertical lines terminated by dots indicate rare sequence variants of each genocloud. Sequence clusters 3, 5 and 8 were isolated globally, whereas others were restricted to China (1 and 4), Russia (2) or Africa (6, 7 and 9). Panels a–e reproduced, with permission, from Nature Reviews Microbiology Ref. 10 © (2005) Macmillan Publishers Ltd. Panel f modified, with permission, from Ref. 32 © (2006) American Association for the Advancement of Science. Panel g modified, with permission, from Ref. 34 © (2004) National Academy of Sciences.

The ecotype model is highly attractive as it describes evolutionary and ecological principles that might account for microbial population structures and speciation. For example, different individual genetic clusters of Bacillus simplex, Bacillus subtilis and Bacillus licheniformis that were preferentially associated with shady and sunny slopes of 'Evolution canyon' seem to represent numerous distinct ecotypes25,26. The ecotype model has also been invoked to explain genotype clusters in pathogenic microbes; for example, ecotypes have been equated with host specificity within the Mycobacterium tuberculosis complex27. But we think that some details of the ecotype concept lack support from biological observations or are even contradicted by them. For example, one prominent application of the ecotype concept was to provide ecological meaning to sequence clusters within marine genera (Pelagibacter, Vibrio and Prochlorococcus) that seemed to be associated with distinct distributions of environmental parameters28. Various ecotypes of Prochlorococcus are thought to vary in abundance in seawater according to gradients of light, temperature and nutrients, possibly reflecting an association between individual ecotypes with particular genomic islands29. However, unlike Cohan's ecotypes24, in which sequence diversity is limited, the nucleotide divergence between marine sequence clusters can be large, and can be greater than the limits that are normally associated with a microbial species by current definitions. Possibly, a designation other than ecotype might be more appropriate for these groupings within marine microbes; alternatively, further modifications of the ecotype concept might be needed to accommodate them. A further problem with the ecotype predictions are recent data, from a range of bacterial pathogens, that represent the most extensive analyses that are currently available on population structure within microbial species. As described below, these data raise questions about the details of Cohan's ecotype models, and may also require changes in these models.

Genetically monomorphic pathogens

Bacterial pathogens with low synonymous sequence diversity (DS of <0.0002), so-called genetically monomorphic organisms, are promising models to reveal evolutionary mechanisms, and could even be more informative than microbes with greater sequence diversity, in which millions of years of evolutionary history might have blurred genomic signals of phylogenetic history through recombination30 or eliminated them through genomic reduction31.

Based on their low levels of neutral sequence diversity, all Salmonella enterica subsp. enterica serovar Typhi (S. Typhi; the cause of typhoid fever) strains are thought to have descended from a common ancestor that infected humans 10,000–50,000 years ago32. Various genotypes have evolved over that time period. However, unlike the expectations for periodic selection, which should leave gaps between sequence clusters, many of these old genotypes apparently continue to exist today and form a continuum without gaps (Fig. 2f). In addition, although mutants resistant to nalidixic acid have been selected by extensive disease therapy with fluoroquinolones, none of the S. Typhi genotypes have become uniformly resistant to nalidixic acid, and local populations consist of a mixture of sensitive and resistant organisms32. This observation also argues against periodic selection, which should lead to complete replacement by fitter variants. Periodic selection may not occur in S. Typhi, as a few cases of typhoid fever result in a subsequent healthy carrier state and therefore S. Typhi should be protected from competition with bacteria from acute-phase disease. Alternatively, periodic selection might be so rare that all S. Typhi strains are derived from a single ecotype that has not undergone periodic selection over many millennia. S. Typhi illustrates two main problems with the ecotype concept: our lack of understanding of how rapidly sequence diversity is purged and the paucity of observations from nature that support complete purging.

Yersinia pestis , the cause of plague, illustrates a second distinct population structure. Y. pestis is thought to represent a clone of Yersinia pseudotuberculosis , a gastrointestinal pathogen that is transmitted by the faecal–oral route. Y. pestis became capable of flea-borne transmission between rodents owing to the acquisition of two plasmids 12,000 years ago33,34 (Fig. 2g). Strains of Y. pestis are almost indistinguishable from Y. pseudotuberculosis, and would not form a distinct species by the criteria that are normally used to differentiate species (Box 1). Yet Y. pestis does deserve its designation as a distinct species, because it is almost certainly evolving independently of Y. pseudotuberculosis, with a totally distinct ecological niche and mechanism of transmission. Based on the few neutral (synonymous) sequence polymorphisms that differentiate genomes of modern strains34, the population genetic structure of Y. pestis (Fig. 2g) resembles the predictions associated with genetic drift (Fig. 2c). However, similar to S. Typhi, the limits of putative ecotypes remain unclear: do all Y. pestis strainscorrespond to a single ecotype or is each population a distinct ecotype? Thus, currently, the predictions of the ecotype models are too imprecise to reliably identify ecotypes among microbial populations.

We now turn to Neisseria meningitidis serogroup A subgroup III, the causal agent of epidemic cerebrospinal meningitis, the population structure of which seems to resemble that of the species-less ecotype model (Fig. 2e). Sequence diversity in these organisms accumulates rapidly over a period of several years, largely owing to homologous recombination with other neisserial species or lineages of N. meningitidis35,36. The most frequent genotype and its sequence variants are referred to as a 'genocloud'. The diversity within genoclouds is only transient, owing to repeated purging during waves of pandemic spread (Fig. 2h). However, rather than periodic selection, these purification events were attributed to the bottlenecks that are associated with the migration of only a few organisms to new geographical areas36; migration is necessary for the continued existence of these bacteria because herd immunity leads to their elimination after several years in any single location. Genoclouds are also transient, being replaced by the progeny of particularly fit single-cell variants (founders) that yield new genoclouds. Bottlenecks and founder events could also explain the lack of genetic diversity within Mycobacterium bovis strains from Britain27 and provide an alternative mechanism to periodic selection for the cohesive forces that generate populations.

Genotype clusters at the species level

How well does the metapopulation species concept match patterns of genetic diversity in classical species? In the 1980s, microbiologists thought that species consisted of asexual clones, or clusters of related clones, that could be differentiated by neutral markers, such as electrophoretic allozymes37. Subsequent extensive comparisons of sequences from multiple housekeeping gene fragments (multilocus sequence typing; MLST) have yielded comparable results38. MLST data have been generated for ≥48 species (see All Species MLST databases in Further information), primarily to analyse their epidemiological and environmental diversity. Within each species, individual strains are assigned to sequence types(STs), which are then grouped into genotype clusters that consist of chains of STs that differ from each other at only one or two loci39. The levels of genetic diversity in MLST genotype clusters are comparable to, or greater than, those of the individual populations described above for genetically monomorphic pathogens, whereas the genetic diversity of an entire species is typically much greater (DS ≤ 0.25) than those of such populations. Currently, the largest MLST database, that of Neisseria spp.,contains data from >8,000 isolates in 43 genotype clusters and >5,000 STs (Fig. 3f). Eight other MLST databases also contain large numbers of isolates, STs and genotype clusters, and the size of all MLST databases is growing rapidly. The neutral genetic diversity within these datasets indicates that many classical species contain multiple populations and may well correspond to metapopulation lineages.

Figure 3: Microbial species and sequence clusters.
figure 3

a | Nearly continuous genetic clines have arisen via recombination between five ancestral populations in Helicobacter pylori. The multi-coloured rectangle contains 769 horizontal lines, one per isolate, each of which is composed of segments that are colour coded by ancestral source. The length of each segment is proportional to the proportion of ancestry from that source. Designations of modern populations are indicated at the left, whereas the colour codes for ancestral populations are at the right. b | NJ (neighbour joining) tree of concatenated sequences of H. pylori, which shows that the hpAfrica2 population is almost as distinct from the other populations of H. pylori as is the distinct species Helicobacter acinonychis. c | NJ tree of concatenated sequences of Escherichia coli, Escherichia albertii and Salmonella enterica subsp. enterica serovar Typhi. The green circle encloses the main population within E. coli, which contains 460 isolates (enclosed by a blue circle), plus a minor second population, which contains only 2 isolates. The second population recombines with the main population even though it is almost as distinct as E. albertii, a completely distinct species. d | NJ tree of concatenated sequences from S. enterica and Salmonella bongori. The italicized designations of clusters within S. enterica indicate distinct subspecies. The S. Typhi population appears to be almost as distinct as a subspecies owing to its having imported one allele by recombination from one of the subspecies. e | Bayesian tree of third-position sites within concatenated sequences of Neisseria spp. Three species, Neisseria meningitidis (red), Neisseria lactamica (blue) and Neisseria gonorrhoeae (green), are well resolved, but various others (other colours) are not. This poor resolution may reflect problems with taxonomic assignments or an inability to separate distinct species purely by genotype clusters. f | Illustration of the complexity of genetic diversity within a microbial species by eBurst analysis of N. meningitidis sequence types (STs). Each ST is represented by a single dot. Dots are arbitrarily arranged concentrically in seemingly continuous circles unless they are linked to other STs in a genotype cluster (fractal-like elements). Genotype clusters that were already known in 1998 (Ref. 85) are labelled with their prior designations. Panel a modified, with permission, from Nature Ref. 45 © (2007) Macmillan Publishers Ltd. Panel b modified from Ref. 52. Panel c modified, with permission, from Ref. 30 © (2006) Blackwell Publishing. Panel d modified, with permission, from Ref. 47 © (2006) Royal Society Publishing. Panel e modified, with permission, from Ref. 50 © (2005) BioMed Central. Data in panel f courtesy of B.G. Spratt and E.J. Feil, based on 5,811 STs containing 7,973 isolates from the PubMLST database (see Further information).

The predominance of asexual vertical inheritance within microbial species has been debated since 1993, when Maynard Smith et al.40 posed the deceptively simple question “How clonal are bacteria?”, and then concluded that homologous recombination has influenced bacterial population genetic structures39. Traces of homologous recombination within a species have now been observed in a wide range of microbes, including marine bacterioplankton41, cyanobacterial mats42, intracellular symbionts31 and Archaea16,23,43,44, but the frequency of recombinants varies between species. In one extreme example, recombination has been so frequent in Helicobacter pylori that almost each isolate defines a new ST45 and 50% of the genome can be replaced by homologous recombination within 40–2,000 years46. H. pylori consists of a number of populations that accompanied ancient human migrations and then differentiated owing to isolation by distance45. But subsequent recombination has been so frequent between H. pylori from neighbouring geographical areas that these old genetic boundaries are now starting to blur (Fig. 3a). Signs of frequent recombination have even been detected within species that were previously thought to be largely clonal, such as Escherichia coli and S. enterica, resulting in star-like trees that contain little phylogenetic information30,47 (Fig. 3c,d).

Distinct species are separated by large genetic distances that can act as barriers to homologous recombination, resulting in discontinuities in the genetic-distance distributions of multilocus sequences that delineate the boundaries of individual species2,47,48. Such major gaps can also result from other neutral processes, even in the absence of recombination, including the random extinction of individual clades2,49. Thus, the identification of genetic gaps within multilocus sequence data from related strains has been proposed to represent a promising approach for future biological species definitions10,28,49,50.

Delimited clusters of concatenated multilocus sequences do correlate well with classical species definitions for species of Burkholderia and Streptococcus50,51, and have been used to define species of Ferroplasma48. However, for other 'fuzzy' species, not all isolates fall into a sharply circumscribed genotype cluster because of occasional recombination between the species clusters23,48,50. For example, N. meningitidis, Neisseria gonorrhoeae and Neisseria lactamica form distinct genotype clusters, but strains assigned to various other Neisseria spp. simply form an ill-defined group of twigs at the tips of the tree (Fig. 3e). Few species have been tested rigorously by these criteria and additional analyses and extensive data from global sources are needed. However, it is already clear that current species definitions do not always correspond to single, circumscribed genotype clusters. Most H. pylori strains fall into one genotype cluster, but a second distinct genotype cluster (hpAfrica2) has been found in South Africa45. This second cluster is genetically almost as distinct from the primary cluster as is Helicobacter acinonychis, a separate species that arose by a host jump to large felines 200,000 years ago52 (Fig. 3b). Similarly, two distinct genetic populations exist within E. coli30, one of which is almost as distant from the main population as is Escherichia albertii (Fig. 3c). Another problem with this approach to species definition is that multilocus sequence data based on seven loci might not have sufficient resolution for reliably recognizing genotype clusters. For example, S. Typhi seems to be distinct from other serovars of S. enterica (Fig. 3d), but this reflects the import of one allele from a distinct S. enterica subspecies47 rather than speciation. The opposite situation is observed with sequence clusters that are based on rRNA sequences from environmental sources of Francisella spp. — one sequence cluster extends over various classical species rather than being limited to a single species53. Finally, genetic differentiation between species is a gradual process by which genes that are involved in speciation, and proximate genes that are linked to them, differentiate more quickly than others that are less important for adaptation to different niches54.

If genotype clusters were to become a primary criterion for recognizing species boundaries, species containing several, distinct genotype clusters would need to be subdivided into multiple species that each correspond to a single genotype cluster. However, we note that metapopulation lineages within the theoretical species concept do not preclude multiple genotype clusters, but simply dictate that multiple discrete populations within a metapopulation lineage do not evolve separately. Consistent with the metapopulation concept, homologous recombination continues to occur between genotype clusters in both H. pylori55 and E. coli30. Discrete populations (genotype clusters) can also represent stages in speciation, such that the criteria needed for their designation as a separate species has not yet been achieved, but might be in the future54.

LGT, gene loss and adaptation

Possibly the greatest challenge to the species concept3 has been driven by the recent recognition that LGT has occurred on innumerable occasions and potentially represents a major disruptive force that could invalidate the existence of microbial species. LGT has been recognized as a major cause of reduced susceptibility to antibiotics since the 1960s. For example, 28–67 kb long SSCmec genetic elements that encode resistance to meticillin and other antibiotics have been imported into Staphylococcus aureus on at least 20 independent occasions56. But many genes other than those involved in antibiotic resistance have also been imported by LGT, including selfish DNA on plasmids and prophages, resulting in unexpectedly high levels of variation in gene content within individual species. Genomic comparisons between strains of E. coli revealed that 8–21% of the genes within each genome were specific to single strains and many strain-specific genes were associated in genomic islands, each containing several genes54,57,58. Genomic islands are often associated with LGT and can introduce complete metabolic pathways and environmental adaptations through a single quantum event59,60. Genomic islands are probably widespread in microbes, and have also been identified by metagenomic studies of sea water29, marine sediments61 and acid-mine-drainage biofilms44.

The diversity in content between related microbial genomes can be amazingly high. Extrapolations from 8 genomes of Streptococcus agalactiae 62 indicated that each additional genome would add at least 30 novel genes to the pan-genome. These extrapolations also indicated that the gene content of the pan-genome would still not be completely known after hundreds of genomes had been sequenced.

The differential existence of variable genes or genomic islands in some strains is compatible with a concept of rapid environmental adaptation owing to LGT. However, with the notable exception of antibiotic resistance, little evidence exists for recent, real-time evolution that is due to frequent and ongoing LGT. Strains of E. coli in which the genomes have been compared57,58 differentiated millions of years ago30, and attempts to enumerate how often and when genes were introduced into an individual species indicate that most imported genes were probably introduced only rarely and before written history63,64. For example, one 'recently' introduced pathogenicity island (PAI), the cagPAI of H. pylori, seems to have been acquired only once, early in the history of this species, and most of the variably present genes in H. pylori seem to reflect differential gene loss65. Similarly, the well-known Salmonella pathogenicity islands 1 and 2 were apparently introduced into Salmonella spp. millions of years ago, long before sub-differentiation into modern subspecies and serovars54,66,67.

Although selective pressures can amplify rare genetic variants over extremely short time periods, and much more rapidly than neutral processes32,68, most sequence changes are eventually lost, especially if they reduce fitness. Examples of rapid disappearance owing to a lessening of fitness include: isolates of Mycobacterium tuberculosis that are resistant to rifampin69; variants of E. coli with increased adhesion to the bladder epithelium owing to changes in FimH adhesion70; non-synonymous polymorphisms that arose in Pseudomonas aeruginosa during chronic infection of a human lung (cystic fibrosis)71; and antigenic variants in N. meningitidis that were selected by herd immunity36. Loss of function during adaptation to a new host has also been observed after a host jump of Helicobacter spp. that occurred 200,000 years ago52, as well as in other pathogens, mutualists and symbionts72. Thus, rather than reflecting adaptive increases in fitness, a considerable proportion of the genetic diversity between individual strains within a species, including genes introduced by LGT, is transient and lost over time63. Given that circumscribed sequence clusters have already been observed in numerous species, we conclude that the existence of occasional LGT is not sufficient grounds to cast doubt on the concept of microbial species.

Environmental microbes

The examples of microbial population structures described above are compatible with the metapopulation species concept, but almost all are from bacteria that infect animal hosts. Within vertebrate-associated microbes, including both pathogens and commensals, branches of genetic diversity increase exponentially in frequency near the tips of dendrograms, at the level of species or strains73. However, the frequency of appearance of new branches may be much more continuous within environmental isolates73, whether from terrestrial or aquatic environments, which might explain why doubt has been cast on the existence of microbial species among environmental isolates3. This problem is particularly acute because millions of microbial species are thought to exist that have not yet been cultivated74,75, and we are only beginning to appreciate just how much microbial genetic diversity exists in nature. More than 85 novel bacterial phyla have been discovered since 1987 through molecular tools of microbial ecology (Fig. 4). For most of these phyla, we have not yet cultured even one representative isolate nor recovered a single genome sequence (Fig. 4). Furthermore, nothing is yet known about the population structure, frequency of recombination and cohesive forces for the vast majority of microbes on the Earth.

Figure 4: Numbers of phyla and genomic sequences among Bacteria and Archaea since 1987.
figure 4

Each schematic tree shows the numbers of known phyla — defined as groups of organisms that share <85% sequence identity in their 16S ribosomal RNA with other groups86 — that had been recognized by each indicated year. Each phylum is represented by a vertical line, the colour of which indicates whether any of its members have been cultivated (blue) or not (green). Coloured horizontal bars span phyla for which at least one genomic sequence is complete (red) or in progress (yellow). Numbers above each line indicate the numbers of phyla that they span. Genome data for 2007 were extracted from the GOLD Genomes OnLine Database87 (see Further information) in March 2007. Figure modified, with permission, from Ref. 88 © (2005) American Society for Microbiology.

Many descriptions of genetic diversity in environmental samples have been published, but most of these have attempted to estimate the numbers of distinct OTUs rather than examining the diversity within individual clusters of related sequences. However, recent metagenomic approaches are beginning to address this issue. For example, there has been extensive sequencing of microbial DNA from the ocean surface at various geographical locations19. Similarly, genomic sequence diversity has been compared between Prochlorococcus spp. from various depths of the ocean and different geographical sources29,76, and sequence diversity within one gene has been determined over a 1-year period among Vibrio splendidus strains from coastal bacterioplankton77. A crucial test of the metapopulation species concept would be facilitated by understanding the global population structure of multiple environmental microbial taxa that might equate to species. Unfortunately, such data are still rare18,43,78.

Implications of a method-free species concept

What would be the consequences of accepting the abstract concept of microbial species that we describe? First, the concept does not specify a 'magic' level of diversity. N. meningitidis is a metapopulation that consists of numerous, semi-discrete lineages that are linked by recombination50. E. coli is a metapopulation that contains two populations, one of which results from the admixture of four ancestral populations30. Second, this concept provides a rationale for maintaining the species designations for distinct ecotype-like groupings, such as Y. pestis, which is largely indistinguishable from Y. pseudotuberculosis at the sequence level34, but that represents a distinct lineage that is unlikely to merge with its parent owing to their distinct ecological niches. If Prochlorococcus ecotypes were linked by cohesive forces, such as recombination, this concept would also support grouping them into one species, despite their extremely pronounced sequence differences29. However, the metapopulation species concept does have a potentially negative aspect, namely, it would place the onus on traditional systematics to justify why 70% DNA–DNA hybridization or 95% average nucleotide identity should represent a magic and invariable definition of microbial species. Optimistically, analyses addressed to answer this onus might reveal biological mechanisms that can explain these magic levels. Pessimistically, some species designations might need to be reconsidered if they did not reflect evolutionary distinct metapopulation lineages, resulting in splitting of some currently accepted taxa and merging of others. The taxonomic issues of designating microbes that do not form a species would also need to be addressed.

The acceptance of a method-free species concept would free microbiologists to choose from a range of techniques those that are most suitable for the target organism, as the concept specifies neither a specific methodology nor a specific cut-off, but rather invokes a general evolutionary criterion for speciation. Such freedom would potentially allow the definition of species among some uncultivated microbes from the environment, while possibly demonstrating that others do not form species. A method-free concept would certainly stimulate methodological discussions that would need to be linked to evolutionary theory, a development that we fully endorse. However, changing to a theory-based approach to species definitions will not happen overnight. It will take considerable time for microbiologists to reach a consensus on the population genetic patterns that are reliable markers of the boundaries of a coherent microbial species. Time will also be needed for the examination of potential taxa, particularly among environmental samples. During this interim period, current methodologies will remain the only available tools for the definition of species and should only be replaced gradually once it is clear which methodologies correlate most strongly with the metapopulation concept.

Our credo is that many microbes fall into natural species with cohesive properties, but that one size cannot fit all. Species are biological units, some simple and others more complicated. The extent of these biological units is determined by the environment and their genomic potential. Concerted interdisciplinary in-depth comparisons of the nature of various microbial species are needed to provide the data that is essential for microbiologists to focus on unifying principles and understand exceptions.