Introduction

Loss of genetic diversity is a major concern in conservation and evolutionary biology, as genetic diversity is the raw material upon which natural selection acts to produce adaptive evolutionary change. Further, it is related to inbreeding and loss of reproductive fitness in random mating populations and ultimately to elevated extinction risks.

Neutral theory (Kimura, 1983) is widely used to predict changes in genetic diversity over time and to predict equilibrium levels for populations and species. However, Maynard Smith and Haigh (1974) and Gillespie (2000) proposed that linked selection (hitchhiking or genetic draft) may be a more important stochastic factor than genetic drift in natural populations. Thus, quantitative analyses of how well neutral theory and genetic draft predict genetic diversity across a range of circumstances are necessary.

I review recent insights into loss of genetic diversity in finite populations, concentrating on the magnitude of deviations from neutral predictions. First, I review data on correlations between genetic diversity and population size. Second, I evaluate the magnitude of deviations from neutral predictions for loss of genetic diversity in small populations over short durations (50 generations). Third, I evaluate the effects on equilibrium genetic diversity of selection on linked loci for chromosomes with low versus relatively normal recombination. To allow quantitative comparisons for different chromosomes and genomes, data on the latter issue are converted to a common currency, the ratio of observed diversity to that predicted by neutral theory (adjusted for copy number and mutation rate difference, as described below) where the neutral expectation is 1.

If there is widespread non-neutral behaviour of coding, non-coding and synonymous sites, as the results indicate, it has widespread implications, including the following:

  • Prediction based on neutral theory in conservation and evolutionary biology, and animal and plant breeding may often be inaccurate.

  • Estimates of effective population sizes may vary for different regions of genomes, especially those with different recombination rates (Charlesworth, 2009).

  • Estimates of migration rates may vary for different regions of the genome.

  • Phylogenies inferred from sequence data on non-recombining chromosomes (for example, mitochondrial (mtDNA) and chloroplast DNA (cpDNA)) may be distorted by selection.

  • Molecular clock dating may be incorrect.

  • Tests for selection based on the use of non-coding loci or synonymous substitutions may miss selected loci and provide inaccurate estimates of selection coefficients.

Predictions of neutral theory

For neutral loci, the proportion of initial genetic diversity (expected heterozygosity) retained after t generations (H0/Ht) in a diploid population is predicted to be related to effective population size (Ne), generations (t) and the inbreeding coefficient at generation t (Ft) in random mating populations, as follows (see Wright, 1969; Falconer and Mackay, 1996):

In the long term, the predicted equilibrium genetic diversity (He) due to the balance between neutral mutations and random genetic drift for the infinite alleles model is (Crow and Kimura, 1970):

where u is the neutral mutation rate. The factor 4 in equation (2) is altered to 1 for Y and W chromosomal loci, chloroplast DNA (cpDNA) and mtDNA, and to 3 for loci on X and Z chromosomes in species with separate sexes (assuming random variation in offspring numbers, no separate sexes and transmission of single mtDNA and cpDNA genomes per female gamete).

Predicted impacts of selection at linked loci on genetic diversity

The equations above are based upon single locus models that ignore the impacts of selection at linked loci. However, there is now substantial theoretical and empirical evidence that linked selected loci often affect genetic diversity of nearby inherently neutral loci, owing to associative balancing selection (see Latter, 1998; Charlesworth, 2006), selective sweeps (hitchhiking) (see Maynard Smith and Haigh, 1974) and background (purifying) selection (see Charlesworth et al., 1993).

Directional selection

When a new favourable mutation goes to fixation (positive selection) it will remove initial genetic diversity for all loci on a non-recombining chromosome within a closed population. This effect was first designated as periodic selection in asexual bacteria (Atwood et al., 1951; Cohan, 2005) and later referred to in eukaryotes as hitchhiking (Maynard Smith and Haigh, 1974), selective sweeps (Berry et al., 1991) or genetic draft (Gillespie, 2000). The impact of selective sweeps on long-term effective population size (Nl), and thus on genetic diversity (through equations (1) and (2)) in a non-recombining segment of chromosome, is predicted to be (Gillespie, 2000):

where Ne is the effective population size in the absence of selective sweeps and δ is the rate of selective sweeps. Selective sweeps also reduce genetic diversity in regions with recombination but to a lesser degree, as recombination breaks down linkage disequilibrium (Gillespie, 2000). There are many examples of reduced genetic diversity in regions flanking loci subject to directional selection (see Nurminsky, 2005). For example, reduced diversity occurs around the tb1 locus involved in maize domestication (Clark et al., 2004); for loci surrounding the waxy locus involved in domestication of japonica rice (Olsen et al., 2006) and for loci flanking the lactase locus in human populations that adopted dairying and evolved adult lactose persistence (Burger et al., 2007). Sweeps originating from new favourable mutations are referred to as hard sweeps, whereas those due to pre-existing polymorphisms are referred to as soft sweeps and typically have lesser impacts (Hermisson and Pennings, 2005).

Natural selection against new deleterious mutations also reduces genetic diversity at linked loci (Charlesworth et al., 1993; Charlesworth, 1996). The proportionate reduction in nucleotide diversity (π/π0) at a neutral locus owing to background selection depends upon the deleterious mutation rates (ud), the selection coefficients against heterozygotes for deleterious alleles (s) and the recombination rates (c), and the effects are summed across all linked loci that mutate to deleterious alleles, as follows (Nordborg et al., 1996):

This is a weak form of selection that reduces Ne. Over many generations it reduces genetic diversity at linked loci, especially in regions of low recombination.

In what follows, I typically do not distinguish the impacts of selective sweeps from those of background selection, as both reduce genetic diversity compared with neutral expectations and both usually operate simultaneously. In addition, selection at one locus impedes selection response at linked loci, especially in regions of low recombination (Hill and Robertson, 1966), and interacts with other forms of selection, especially background selection (Charlesworth et al., 2009).

Balancing selection

Balancing selection in a region will usually lead to nearby neutral loci retaining more genetic diversity than expected with neutrality (see Latter, 1998; Charlesworth, 2006). Associative balancing selection can be due to linked loci showing balancing selection or to the short-term effects of deleterious alleles at several loci in linkage disequilibrium (Latter, 1998) (that contrast with the long-term effects involved in background selection). For example, loci near self-incompatibility loci (that are subject to frequency-dependent selection) in Arabidopsis lyrata, but not involved in self-incompatibility themselves, have elevated levels of genetic diversity compared with other loci in the genome (Kamau et al., 2007; Ruggiero et al., 2008). Similarly, the major histocompatibility complex in vertebrates and the complementary sex-determining locus in Hymenoptera are both subject to balancing selection, and flanking sequences that are not themselves subject to selection also show elevated levels of genetic diversity (O’hUigin et al., 2000; Hasselmann and Beye, 2006).

Correlations of genetic diversity with population size and fitness

Relationships of genetic diversity and effective population size

Equations (1) and (2) predict that Hardy–Weinberg expected genetic diversity across populations will be positively correlated with effective population sizes, provided mutation rates are similar. This prediction has been verified experimentally in pedigreed Drosophila populations for allozymes (correlation rHe−log Ne=0.59; Montgomery et al., 2000) and microsatellites (rHe−log Ne=0.91; Montgomery et al., 2010). Further, Palstra and Ruzzante (2008) reported a correlation of 0.73 between He for microsatellites and log Ne across 26 closed populations of diverse species.

Effective population sizes and Ne/N ratios

For most species Ne is unknown, so inferences are often based upon census sizes (N) and Ne/N ratios from other species. Based on a meta-analysis, ratios in unmanaged wild populations with all relevant variables included averaged 0.11 (Frankham, 1995). No consistent significant differences were detected in ratios across a broad range of major taxa, indicating that positive correlations between genetic diversity and census population sizes are expected. Palstra and Ruzzante (2008) reviewed temporal estimates of Ne/N and reported a median value of 0.14, based on 64 estimates from diverse animal and plant taxa. However, species with high fecundity have significantly reduced Ne/N ratios on the order of 10−3 to 10−6 based on data from fish, oysters, shrimp and seaweed (see Coyer et al., 2008; Palstra and Ruzzante, 2008). Presumably variances in family sizes increase with average fecundity, as Ne/N ratios decline as variances in family sizes increase (Wright, 1969).

Relationship of genetic diversity to census population size

Equation (2) predicts that genetic diversity will be positively correlated with census population sizes, provided current N reflects long-term Ne, and that mutation rates are similar across populations. Empirical estimates of these correlations are overwhelmingly positive (Table 1). Soulé (1976) reported a correlation of 0.7 between genetic diversity for allozymes and logarithm of N across animal species. Subsequent studies that had sufficient statistical power (particularly meta-analyses) have confirmed Soulé's conclusions across a broad range of major taxa and population sizes (up to 1020).

Table 1 Correlations between genetic diversity and population size, or its surrogates

Threatened species have, by definition, small or declining population sizes, so are expected to have reduced genetic diversity. Threatened species across a broad taxonomic range had on average 35% less genetic diversity than taxonomically related non-threatened species (Spielman et al., 2004), whereas the reduction was 25% for birds (Evans and Sheldon, 2008) and 30% for tetrapods (Flight, 2010).

A significant correlation between genetic diversity and log N within species was also reported by Frankham (1996), based on a meta-analysis (mean r=0.46). Correlations between genetic diversity and population size are not always significant, but this seems to be due primarily to statistical power. In my meta-analysis, significant correlations were found in only seven studies, but 22 of the 23 reported correlations were positive (highly significant sign test). Several studies have also reported correlations between surrogates of population size and genetic diversity, such as range size in plants, island area (both positive), rates of chromosomal evolution and body size (both negative) (see Frankham, 1996; Table 1). Further, average genetic diversity of island populations (presumed to have smaller N) is lower than that for mainland populations (Frankham, 1997).

Populations subject to short size bottlenecks typically show reduced genetic diversity for molecular markers, compared with non-bottlenecked populations (see England et al., 2003; Garner et al., 2005; Frankham et al., 2010, Chapter 8). In Drosophila, the effect of a single pair bottleneck on microsatellite diversity was close to the theoretical reduction of 25% in genetic diversity (England et al., 2003) and the expected reduction in allelic diversity (Frankham et al., 2010, p. 172).

Positive correlations between genetic diversity and population size are also expected for loci that are subject directly to weak balancing selection (Robertson, 1962). These have been observed for loci subject to balancing selection (Table 1), including the major histocompatibility complex in vertebrates, self-incompatibility loci in plants and inversions in Drosophila (Montgomery et al., 2000).

Overall, there is overwhelming evidence for positive correlations between genetic diversity and population size for nuclear markers. The correlations are less than 1, but this is expected owing to sampling variation and variation in Ne/N ratios.

However, the relationship between mtDNA genetic variation and population size is controversial (Table 1). Frankham (1996) reported a correlation of 0.45, based on data for 18 vertebrate populations from 12 species. Conversely, Bazin et al. (2006) reported a non-significantly negative correlation (Kendall τ=−0.14) between mtDNA sequence diversity and nuclear allozyme genetic diversity across eight major animal groups encompassing 1683 species, while finding a significant positive correlation between nuclear DNA diversity and allozyme diversity (τ=0.87) across the same groups. Subsequent analyses using the same data set revealed significant positive correlations between mtDNA and allozyme genetic diversity across eutherian mammalian orders (Mulligan et al., 2006) and within mammals (Nabholz et al., 2008b). Several other studies have also reported positive correlations between mtDNA diversity and population size or its surrogates (Table 1), but these have all involved data across a narrower taxonomic range than used by Bazin et al. (2006).

Selection and variation in mutation rates have been suggested as the reason for the equivocal relationship between mtDNA diversity and population size (Bazin et al., 2006; Eyre-Walker, 2006). First, there are multiple lines of evidence that selective sweeps and/or background selection affect mtDNA (see Rand, 2001; Ballard and Rand, 2005; Bazin et al., 2006; Kivisild et al., 2006; Meiklejohn et al., 2007; Wares, 2009).

Second, mtDNA mutation rates differ across animal taxa (Nabholz et al., 2008a, 2009, but see Charlesworth, 2010) and Bazin et al. (2006) did not correct for such differences. mtDNA silent site substitution rates and presumably mutation rates fall into high (flatworms, molluscs, annelid worms, bryozoans, arthropods, nematodes, echinoderms, tunicates and vertebrates) and low (angiosperms, fungi, sponges, corals, sea fans and Medusozoa) rates, with the former rates being about 10 times nuclear substitution rates and the latter about an order of magnitude slower (Hellberg, 2006). Further, there have been several independent transitions from slow to fast rates, in both plants and animals, possibly owing to inactivation of mtDNA proof-reading or repair enzymes. Within mammals, mtDNA shows mutational hotspots and site-specific mutation rates that vary rapidly over time (Galtier et al., 2006), whereas plants show rates that vary among species, over time and between loci (see Sloan et al., 2009). Evolutionary changes in mutation rates are well known in asexual bacteria (see Denamur and Matic, 2006). High mutation rates are favoured during adaptation to new environments, and lowered rates once the costs of deleterious mutations exceed the benefits from new advantageous mutations, as predicted by theory (Leigh, 1970). Mutation rates are also expected to evolve in a related manner in mtDNA, cpDNA and other non-recombining chromosomes and species, provided that loci controlling mutation rates lie within the genome being affected. Conversely, in recombining genomes, mutation rates typically evolve towards minima (see Sniegowski et al., 2000).

Correlations between fitness and genetic diversity

As loss of genetic diversity in random mating populations is directly related to the population average inbreeding coefficient (equation (1)), and inbreeding has deleterious impacts on reproductive fitness that are approximately linearly related to F (see Lynch and Walsh, 1998), a positive correlation is expected between population average genetic diversity and population average fitness for small populations in similar environments. This prediction has been supported in meta-analyses of data from animal and plant species (r=0.45 (Reed and Frankham, 2003); r0.3 (Leimu et al., 2006); r=0.40, 0.49 (Markert et al., 2010)).

By contrast, little relationship is expected between individual multi-locus genetic diversity for near-neutral loci and reproductive fitness within populations, unless there is heterozygote advantage, inbreeding or population structure. A positive correlation is expected if the heterozygote fitness for marker loci exceeds the weighted mean fitness of the homozygote genotypes (Deng and Fu, 1998). Meta-analyses have revealed only very weak relationships (Britten, 1996; David, 1998; Coltman and Slate, 2003; Chapman et al., 2009). For example, Chapman et al. (2009) found a correlation of only 0.036 and effects that did not fit causal relationships between individual markers and fitness, based on a meta-analysis of 628 estimates. Further, Szulkin et al. (2010) concluded the available data are qualitatively and quantitatively consistent with the inbreeding hypothesis, based upon theoretical analyses and a review of the empirical evidence.

Deviations from neutral expectations for loss of genetic diversity over generations

Quantitative analyses reveal that there are frequently significant deviations from neutral predictions for changes in genetic diversity over generations in finite populations. Rigorous tests for deviation from neutrality that do not rely on extraneous assumptions can be performed by regressing Ht/H0 on the pedigree inbreeding coefficient of populations: Neutrality yields a slope of −1 (equation (1)), whereas directional selection results in faster than neutral declines in genetic diversity, and balancing selection usually results in slower than neutral declines.

Slower than neutral declines in genetic diversity

Declines of average allozyme genetic diversity with pedigree inbreeding coefficients (Fp) are often slower than predicted by neutral theory (see Rumball et al., 1994; Gilligan et al., 2005). For example, the regression coefficient of Ht/H0 on Fp for 40 captive Drosophila populations maintained for 50 generations with diverse effective population sizes (all derived from the same wild population) was −0.79±0.10, significantly slower than the neutral expectation of −1 (Gilligan et al., 2005). These could be due to either balancing selection on the loci themselves (see Kreitman and Hudson, 1991) or short-term associative balancing selection (see Latter, 1998; Charlesworth, 2006). If associative balancing selection is involved, the direction of deviations from neutrality for different markers will be the same. As microsatellites showed a faster than neutral decline (Figure 1), the associated balancing selection hypothesis was rejected. This implicates balancing selection on the allozymes themselves. As temporal or spatial variations in selection are improbable in our constant laboratory environment, heterozygote advantage or frequency-dependent selection are the most probable explanations for the allozyme results. The above conclusions relate to average genetic diversity and concealed some allozyme loci showing neutral behaviour and others subject to selective sweeps (Montgomery et al., 2010). Molecular studies have detected selection on several allozyme loci with intermediate frequency polymorphisms in wild populations (see Eanes, 1999; Hey, 1999). However, our results reflect only the last 50 generations in a different, captive environment. Analyses indicate that balancing selection probably only affects a small proportion of loci in human genomes (Asthana et al., 2005; Bubb et al., 2006; Andrés et al., 2009) and mice allozyme loci (Storz and Nachman, 2003).

Figure 1
figure 1

Relationship between proportion of microsatellite genetic diversity retained (Ht/H0) and pedigree inbreeding coefficient (Fp) for 23 populations maintained with effective population sizes of between 25 and 500 for 48 generations (from Montgomery et al., 2010). The solid line is the fitted regression to the data and the dotted line is the neutral expectation.

Selective sweeps in populations adapting to new environments

Genetic diversity for non-coding microsatellites declined at a 12% faster than neutral rate in 23 Drosophila populations adapting to captivity over 48 generations (Figure 1; Montgomery et al., 2010). Further, allele frequency changes were 33% greater than the neutral expectation, and variation among replicate populations was 25% greater than predicted by neutrality. Direct selection on the microsatellite loci themselves would have caused drift variances among replicates to be less than neutral predictions. Our populations experienced conditions that were highly conducive to selective sweeps. Adaptations to captivity were highly deleterious when populations were returned to simulated wild conditions (Woodworth et al., 2002), indicating that adaptation was due largely to initially rare alleles that were deleterious in the wild, but beneficial in captivity. Linkage disequilibrium around alleles in mutation–selection balance is expected because of the turnover of deleterious alleles. All eight microsatellite loci spread throughout the Drosophila genome showed at least some signals of selective sweeps, implying that the effects were genome-wide. Given the short duration and the large deviations from neutral predictions, it is improbable that background selection caused the deviation from neutrality (Charlesworth, 1996). Selective sweeps are expected in all genetically variable populations subject to environmental change (Montgomery et al., 2010). Widespread selective sweeps across genomes may be missed in many current analyses, such as those based on outlier analyses (Hahn, 2007), as regions affected by sweeps may be subsumed into the common ‘control’ group, and only sequences with extreme behaviour identified as selected. Rigorous controls, such as those provided by pedigrees, provide powerful means for detecting widespread selective sweeps.

Low genetic diversity in regions with low recombination rates

Nucleotide diversity across chromosomes is positively correlated with recombination rate in Drosophila (Begun and Aquadro, 1992; Shapiro et al., 2007; Kulathinal et al., 2008), humans (Hellmann et al., 2005), white-throated sparrows (Huynh et al., 2010), tomatoes (Stephan and Langley, 1998) and maize (Tenaillon et al., 2001), but not in A. lyrata (Wright et al., 2006). In several Drosophila species, nucleotide diversity differs by 10-fold between regions of high and low recombination (Stephan et al., 1992; Aquadro et al., 1994; Begun et al., 2007). Selective sweeps and/or background selection could account for these results.

The hypothesized relationship between genetic diversity and recombination leads to the prediction that genetic diversity should be lower than expected from neutral theory for loci on chromosomes with very low recombination (W and Y chromosomes, mtDNA and cpDNA in eukaryotes, and small fourth chromosomes in several Drosophila species), provided the chromosomes retain some functional loci. Further, translocation of such a chromosome to a region of normal recombination should lead to a higher equilibrium level of genetic diversity.

By contrast, any deviations from neutrality owing to recombination rate differences between X and autosomes should be modest and of variable direction, as the effective rates of recombination on X chromosomes versus autosomes should be 2/3:1 in species with recombination (most species) and 4/3:1 in species such as Drosophila where there is no recombination in males (assuming that recombination rates in sex chromosomes and autosomes are otherwise similar on average).

Many studies have reported effects of linked selection on genetic diversity, but I am unaware of any compilation of the magnitude of these effects for regions of low recombination. Below I review levels of nucleotide diversity in chromosomes with low rates of recombination and compare them with data for chromosomes with more normal recombination rates. To minimize variation due to differences among populations in Ne and other extraneous variables, I compared ratios of nucleotide diversity in chromosomes with low rates of recombination (W, Y, dot fourth, cpDNA) on a within-species basis with ‘normally’ recombining autosomal, X or Z chromosomal loci, or mtDNA and cpDNA with nuclear DNA. Further, ratios for relatively normally recombining chromosomes (X and Z) with autosomal loci were computed. All ratios were adjusted for differences in copy number of the chromosomes and for mutation rate differences (based on genetic divergences from another species at silent sites, or using the method of Ellegren (2007) that is based upon differences in male and female mutation rates derived from numbers of germ cell divisions in females versus males), so that the neutral expectation is 1. Ellegren (2009) reported X:A and Z:A ratios across species, but did not correct them for mutation rate differences.

I avoided using data where introgression from other species or sub-species was suspected and from recently bottlenecked populations, as these may distort comparisons (see Pool and Nielsen, 2007). Uninformative comparisons where mtDNA and nuclear diversity were both 0 are not included (Zhou et al., 2010). As ratios are often not normally distributed, I present both means and medians, and tests for deviation from neutral predictions using non-parametric tests. Statistical tests were performed both for all estimates and for species means. Tests were one-tailed for chromosomes with low recombination rates and two-tailed for those with ‘normal’ rates.

W and Y chromosomes

Adjusted ratios of genetic diversity for non-recombining regions on W and Y chromosomes compared with that on autosomes or Z or X chromosomes were much lower than predicted by neutrality (Tables 2 and 3). For example, the ratio of nucleotide diversity for non-coding regions on the W chromosome to that on the autosomes in domestic chickens is 0.0108, and 0.075 after adjusting for the four-fold higher autosomal Ne and the 1.75-fold higher mutation rate in autosomes than W chromosomes (Berlin and Ellegren, 2004).

Table 2 Nucleotide diversity for Y, W, dot fourth chromosomes (compared with diversity for X or Z chromosomal or autosomal loci) and mtDNA and cpDNA (compared with nuclear loci) as a proportion of that expected from neutral theory (corrected for differences in copy number and mutation rates)
Table 3 Mean and median nucleotide diversity ratios (adjusted for copy number and mutation rate differences) as compared with neutral expectation of 1 for comparisons of chromosomes for regions of low and ‘normal’ recombination

All 12 W chromosome ratios were much less than 1 (sign test P<0.0002), the mean ratio being 0.011±0.007 and the median 0.00 (Tables 2 and 3). Similarly, 22 of 23 estimates of adjusted genetic diversity for non-recombining regions on Y chromosomes were less than 1 (sign test P<0.0001), with a mean of 0.226±0.058 and a median of 0.110 (Tables 2 and 3).

Very low adjusted ratios of W chromosome to autosomal variation in birds are only realistically attributable to selection, as any effects of male polygamy are expected to bias the ratio upwards (Berlin and Ellegren, 2004). Conversely, adjusted ratios for Y chromosomes may be biased downwards owing to lower effective population sizes in males than females (see Frankham et al., 2010). Handley et al. (2006) showed that the Y chromosome in the greater white-toothed shrew still showed a deficit in variation relative to the X, after accounting for mutation rate, copy number and demography, leaving directional selection as the probable explanation. Further, Gerrard and Filatov (2005) reported positive selective for two of three Y chromosomal loci across 6–12 mammalian species.

How does selection have such large impacts when W and Y chromosomes typically have few functional loci and low polymorphism? While Drosophila melanogaster has only 10–20 protein-coding loci on the Y chromosome, Lemos et al. (2008) reported that it harbours substantial polymorphic regulatory variation that affects hundreds of X linked and autosomal loci with important functions.

Drosophila chromosome-4

The small dot chromosome-4 found in several species of Drosophila has only about 1/100th the recombination rate of the rest of the nuclear genome. All 12 estimates of the adjusted ratio of its nucleotide diversity, compared with other recombining autosomes or X chromosomes in six species of Drosophila, were less than 1 (sign test P=0.002), with a mean of 0.083±0.032 and a median of 0.055 (Tables 2 and 3). A causal relationship between very low recombination and low genetic diversity has been established, as translocation of the dot chromosome onto another autosome increased recombination rates and resulted in normal levels of genetic diversity for dot chromosomal loci (Tables 2 and 3; Powell et al., 2011). The difference between the adjusted ratios for free and fused fourth chromosomes was highly significant (Mann–Whitney test P=0.006).

mtDNA and cpDNA diversity

In all of the cases in Table 2 mtDNA and cpDNA are maternally inherited or transmitted by only one mating type (Chlamydomonas). The criteria applied to detect deviations from neutrality for nucleotide diversity in mtDNA and cpDNA versus nuclear DNA were very stringent, as they assume a four-fold difference in ploidy in species with two sexes and double in hermaphrodites (Wright et al., 2008), that only one mtDNA and cpDNA haplotype is transmitted per female gamete, that there are no paternal contributions, that none of the mtDNA sequences are integrated into the nucleus, that there has been no introgression between taxa and that male and female effective population sizes are equal. Consequently, the adjusted ratios of mtDNA and cpDNA nucleotide diversity compared with nuclear loci will typically be biased upwards by higher effective population sizes in females than males (Frankham et al., 2010), widespread occurrence of low levels of paternal leakage and recombination (see Ballard and Rand, 2005; White et al., 2008), nuclear integration of mtDNA, introgression between taxa and multiple mtDNA genomes being transmitted across generations (30 in Drosophila melanogaster, Haag-Liautard et al. (2008); 100 in humans and Chinook salmon and 200 in mice, White et al. (2008)).

Most estimates of the adjusted ratio of mtDNA diversity to nuclear diversity were less than 1 (17 out of 19, sign test P=0.007), but one species had a large excess of mtDNA diversity compared with nuclear levels (Table 2). Further, Hellberg (2006) reported 0 within-population mtDNA nucleotide variation in Balanophylla elegans coral, but high levels of allozyme variation at nuclear loci. Substantial effects on mtDNA diversity in arthropods are caused by infections with maternally transmitted endosymbionts, such as Wolbachia bacteria (Hurst and Jiggins, 2005). Typically invertebrates infected with one strain of endosymbiont have reduced levels of mtDNA variation, whereas those infected with multiple strains may have elevated levels of mtDNA variation. For example, Drosophila recens infected with the maternally transmitted Wolbachia shows much lower mtDNA variation than predicted by neutrality, whereas the closely related uninfected Drosophila quinaria has more normal levels of mtDNA. Conversely, Adalia bipunctata is infected with three endosymbiont strains and has highly elevated mtDNA levels, overall, but very low mtDNA levels in beetles infected with a particular endosymbiont strain (Jiggins and Tinsley, 2005).

The two adjusted ratios for cpDNA were lower than the neutral expectation (Table 2). Further, Banks and Birky (1985) reported very low cpDNA variation in Lupinus texensis, a species with very high levels of nuclear-encoded allozyme variation (He=0.41). cpDNA has been shown to display selective sweeps (Muir and Filatov, 2007). However, there are insufficient estimates to be sure about general conclusions.

There are strong opportunities for selection among organelle genomes within cells, among cells within individuals and among individual females within populations, especially given that mtDNA codes for critical functions in energy metabolism and cpDNA codes for critical proteins involved in photosynthesis (Rand, 2001; White et al., 2008). There is substantial evidence of selection on mtDNA (see Ballard and Whitlock, 2004; White et al., 2008) and on the cpDNA-encoded rbcL locus (which codes for the large subunit of the Rubisco, an enzyme with a critical role in photosynthesis) of most analysed land plants (Kapralov and Filatov, 2007).

Additional comparative estimates of mtDNA and cpDNA versus nuclear variation are required across a broader range of taxa.

X and Z chromosomes

Adjusted ratios of X:autosomal genetic diversity varied on either side of the neutral expectation, with a mean of 1.091±0.050 and a median of 1.02 (Tables 3 and 4). Seventeen estimates were less than 1 and 20 greater (sign test P=0.74). Part of the variation in these ratios is associated with differences in recombination and distance from coding loci (and effects of selection). Hammer et al. (2010) found that the adjusted ratio increased from 0.89 to 1.48 as the recombination distance from functional loci increased. Further, Vicoso and Charlesworth (2009) reported an adjusted ratio for nucleotide diversity in Drosophila melanogaster for non-coding regions of X chromosomes: autosomes of 1.57, but the adjusted ratio was 0.97 for regions with similar effective recombination rates on X and autosomes.

Table 4 Nucleotide diversity for X and Z chromosome (compared with diversity for autosomal loci) as a proportion of that expected from neutral theory (corrected for differences in copy number and mutation rates)

Deviations from neutral expectation for X chromosomal genetic diversity may reflect different effective population sizes in the two sexes, selection on linked loci, or modest differences in effective rates of recombination on X and autosomes.

Adjusted ratios of Z chromosome nucleotide diversity compared with the autosomes in birds were all substantially less than 1 (Tables 3 and 4), but with only four estimates they do not differ significantly from 1 (mean 0.363±0.100 and median 0.40). However the Z:A ratios were significantly lower than the X:A ones (Mann–Whitney test P=0.007). The Z chromosome in chickens has a 60% lower recombination rate than the autosomes (Sundström et al., 2004). The lower ratios for Z than X chromosomes are probably attributable to male polygamy and stronger selection effects for the Z chromosomes owing to its lower recombination rate.

Discussion

Correlations between population size and genetic diversity

There is compelling evidence for positive associations between genetic diversity and population size both across species and within species. These correlations extend from non-coding sequences and allozymes to loci subject to balancing selection. The only equivocal case is mtDNA in animals and here most data sets report significant correlations, especially for analyses restricted to specific groups of organisms. Overall, these results support neutral theory, or background selection, rather than the proposal of Gillespie (2000, 2001) that there is little relationship between genetic diversity and population size owing to genetic draft (selective sweeps). Further, the results constrain the range of acceptable models of selective sweeps.

Variation in mutation rates and different selection scenarios, rather than lack of drift effects, may explain why correlations between mtDNA diversity and populations sizes are non-significant in broadly based surveys, but significantly positive for more narrowly based ones. First, wide variation in mtDNA mutation rates have been observed (see above). Second, mtDNA and the W chromosome are associated in birds and selection on either affects genetic diversity for the other, but there is no association between mtDNA and the Y chromosomes in mammals. Third, maternally transmitted Wolbachia endosymbionts lead to large selective changes and reduced genetic diversity for mtDNA in many invertebrate species infected with a single strain, whereas species infected with multiple strains may have elevated mtDNA diversity (Hurst and Jiggins, 2005).

Widespread deviations from neutral expectations for loci in regions with low recombination and high linkage disequilibrium

There is overwhelming evidence of reduced genetic diversity compared with neutral expectations in circumstances with high linkage disequilibrium (Tables 3 and 5). The ratios differed significantly between the combined low-recombination and the combined ‘normal’ recombination data set (Mann–Whitney test P<0.0001). Of the 65 estimates for low-recombination situations, as compared with ‘normally’ recombining controls, 62 were less than the neutral expectation (sign test P<0.0001). By contrast, deviations from neutral expectation for genetic diversity on X chromosomes, Z chromosomes and fused dot chromosomes (that all have much higher recombination rates) varied on either side of neutral expectations, 22 being above 1 and 22 below (sign test P=1.00). Even here there was evidence that low recombination was associated with low genetic diversity, as described above.

Table 5 Summary of levels of genetic diversity as compared with neutral predictions for comparisons of chromosomes in relation to levels of recombination and other causes for deviations

The conclusions above are not due to the overrepresentation of estimates from humans, laboratory species and model species, as the magnitude of effects are similar for all estimates and for species means, and the conclusions are the same (Table 3). Some of the species have structured populations where sex-specific differences in migration rates could yield biases in ratios. However, this does issue not apply to the fourth chromosome data or to the correlations between genetic diversity and recombination rates within chromosomes. Its effects must be modest overall given the consistent signal of low versus normal recombination rates on genetic diversity.

Related, but less extreme, effects are found in populations adapting to new environments owing to alleles that were previously deleterious and subject to mutation–selection balance (but are now favoured) and that are expected to show substantial initial linkage disequilibrium with flanking loci owing to the recent mutational origin of most mutant alleles.

While the effects of very low recombination rates on genetic diversity vary across species owing to species-specific life-history attributes, the common factor causing deviations from neutrality is selection. Reduced genetic diversity in regions with low recombination is expected with either selective sweeps (Maynard Smith and Haigh, 1974; Gillespie, 2000) or background selection (Charlesworth et al., 1993), and both are probably implicated.

All non-recombining regions share many features with asexual bacteria in chemostats (with no recombination and no gene transfer among genotypes) that undergo periodic selection and evolutionary changes in mutation rates. These analogies in evolutionary behaviour deserve more attention, especially as mtDNA and cpDNA derive from captured bacteria. All are expected to show periodic selective sweeps, linkage disequilibrium, genetic diversity that is lower than simple neutral predictions, background selection, Hill–Robertson effects and evolutionary changes in mutation rates. Quantitative deviations from neutrality will differ among them according to effective population sizes; advantageous, deleterious and neutral mutation rates; effectiveness of selection (largely determined by Ne) and recombination rates (Maruyama and Birky, 1991). The assumption of neutral behaviour is not credible for any of these chromosomes: The default hypothesis should be that they are being affected by selection (see also Ballard and Whitlock, 2004; Ballard and Rand, 2005; Hurst and Jiggins, 2005; Wares, 2009).

At first it may seem surprising that W chromosomal diversity appears, if anything, to deviate more from neutrality than Y chromosomes, whereas male polygamy would cause a difference in the opposite direction. However, the W chromosome and mtDNA in birds (and Lepidoptera) are both maternally inherited and selection on one affects the other and vice versa (Berlin et al., 2007). Conversely, such combined effects are not expected for paternally transmitted Y chromosomes. A similar combined impact of selection is expected in plants with maternal inherited cpDNA and mtDNA (Mohanty et al., 2003).

mtDNA often yields phylogenies that are correct in spite of background selection and periodic selective sweeps (see above), but there are many reported conflicts between phylogenies derived from mtDNA versus nuclear loci (see Ballard and Whitlock, 2004; Hurst and Jiggins, 2005; Kapralov and Filatov, 2007). Levels of mtDNA within populations and divergences among populations will represent the net effects of mutation, drift, selection, gene flow and any recombination. Despite reduced levels of mtDNA compared with neutral predictions, most animal species show mtDNA polymorphisms (Bazin et al., 2006) due largely to its high mutation rate. Phylogenies will often be adversely affected by selective sweeps on mtDNA and cpDNA in the initial phase of divergence from a polymorphic common ancestor (Hickerson et al., 2006). Selective sweeps of new mutations later in the process are unlikely to affect the phylogenetic structure, but may distort branch lengths. Background selection will probably have only a modest effect on branch lengths and little effect on their relative values. The use of relaxed models that account for differences in substitution rates among lineages is expected to reduce problems due to selection and mutation rate differences, and they seem to provide better results (Whelan et al., 2001). However, phylogenies based on mtDNA can, at best, be considered as generating a hypothesis about relationship. They need to be independently corroborated with analyses of DNA sequences for multiple nuclear loci or irreversible transposon insertions.

DNA barcoding, the molecular identification system that is being used to discover new species and to estimate the approximate number of animal species on Earth, is based on sequencing a section of the mtDNA cytochrome oxidase-I locus (see Hebert et al., 2003). Its efficacy is affected by selective sweeps, especially those associated with Wolbachia (see Hurst and Jiggins, 2005). Barcoding is less than 70% successful in identifying Dipteran species, a group susceptible to this bacterium (Meier et al., 2006; Whitworth et al., 2007).

Some inferences based on genetic data will suffer only mild distortions when markers show behaviour that deviates from neutrality, but for others distortions may be large. Great care is required in using markers on non-recombining chromosomes (Y and W chromosomes, mtDNA and cpDNA) for making inferences about populations. Inferences may also be distorted in populations that have recently moved to new environments.

What can be done about deviations from neutrality for markers?

  1. 1)

    Routinely test non-coding ‘neutral’ markers for signals of selection (see Frankham et al., 2010).

  2. 2)

    Test conclusions from non-recombining chromosomal loci against results for nuclear autosomal loci (from regions with ‘normal’ recombination).

  3. 3)

    Test the robustness of conclusions to deviations from neutrality of non-coding loci that are being used as controls (for example, by using simulations).

  4. 4)

    To reflect genome-wide measures derived from genetic data (Ne, migration rates, population structures, etc.), non-coding loci need to be sampled from across the genome, and to encompass loci from regions of the genome with high, medium and low recombination rates in a representative manner. This is difficult for species that have not been genetically mapped and sequenced.

Conclusions

  1. 1)

    There is extensive empirical evidence for positive correlations between genetic diversity and population size, as predicted by neutral theory.

  2. 2)

    It is not credible to assume that loci on W and Y chromosomes, mtDNA and cpDNA, and other non-recombining chromosomes will follow neutral predictions, as they lose genetic diversity at a faster than neutral rate owing to selective sweeps and background selection.

  3. 3)

    Inherently neutral loci will typically show higher genetic diversity and slower loss of genetic diversity in regions flanking loci subject to balancing selection, especially those surrounding complementary sex-determining loci (in Hymenoptera), major histocompatibility complex and self-incompatibility loci.

  4. 4)

    Even inherently neutral loci may not behave neutrally, especially those in regions of low recombination or in populations adapting to new environments where faster than neutral declines may be evident for loci across the genome.

  5. 5)

    In populations adapting to new environments, use of non-coding sequences and synonymous sites as neutral controls to detect selected loci as outliers may miss many selected loci, owing to the occurrence of many selective sweeps that affect control loci.

  6. 6)

    Sampling of loci to estimate effective population sizes, dispersal rates, population structure, etc., is important. It should avoid regions known to be subject to balancing or directional selection, or to have very low recombination rates.