Introduction

To determine the effects of local extinctions and recolonizations on genetic diversity and effective size, Slatkin (1977) defined two models, the ‘migrant pool’ and ‘propagule pool’ models. In the former, recolonizers come from the whole metapopulation; in the latter, they preferentially come from a single population. These models were subsequently investigated by a number of works (eg Wade and McCauley, 1988; Whitlock and McCauley, 1990; Whitlock and Barton, 1997; Pannell and Charlesworth, 1999). However, there are several inaccuracies in the methods previously used to compute effective size in these models. This short paper discusses several methods for computing effective size. All of them provide a single expression (equation (7)), which has not been previously given. First, a simple coalescent argument for this result is provided. Then, alternative derivations are used to explain discrepancies with earlier works. The quantitative importance of these discrepancies is briefly discussed. A Mathematica (Wolfram, 1999) notebook performing the computations described in this paper is available on request.

A simple coalescent argument

Effective size is defined to give the asymptotic rate of coalescence of pairs of genes. For pairs of genes in different demes, this rate can be deduced by a two-step argument. First, the two ancestral lineages must gather in the same deme. Then, they may coalesce or separate again in different demes.

No local extinctions

In the island model without local extinctions, this argument develops as follows. Let m be the dispersal probability. The probability that two genes in different demes come from a single deme in the previous generation is

where nd is the number of demes. Thus, on a timescale of nd generations, the rate at which genes in different demes come in a single deme is 1−(1−m)2. Then, the ancestral lineages coalesce immediately on this time scale if they have the same parent (probability: 1/N, where N is the number of adults, here haploid, per deme), or if they coalesce in this deme in a recent past, rather than separate in different demes again. The probability of the latter event may be written in terms of Wright's FST measure of population structure, as (1−1/N)FST, since in the island model, FST is approximately (to O(1/nd)) the probability of recent coalescence of genes within demes (Hudson, 1998; Rousset, 2002). Then, the overall rate of coalescence is

per nd generations. For Wright's island model,

and the whole expression can be written as (1−FST)/N. Hence, the effective size is (1−FST)/(Nnd) as already given by Wright (1943).

This derivation of effective size is in line with other computations of effective size in ‘structured coalescents with two time scales’ (eg Nordborg, 2001; Wakeley and Aliacar, 2001; Nordborg and Krone, 2002). Here the separation of time scales is obtained for a large number of demes, and the expression for effective size is correct to leading order in 1/nd. Wakeley and Aliacar (2001) have already used such an approach (together with additional approximations) to obtain an expression for the effective size of a metapopulation. Their result is consistent with the more general one presented below.

Using a time scale of nd generations to derive the effective size is in line with such previous arguments. Alternatively, we can compute the probability that two lineages coalesce in a given ‘current’ generation as the sum over t of probabilities that (1) the two ancestral lineages gathered in the same deme t generations earlier (probability (1−(1−m)2)/nd), and (2) given t, they coalesce in the current generation rather than separate again in different demes. 1/N+(N−1)FST/N then appears as the sum over t of probabilities of the second event.

Local extinctions

With local extinctions, the two demes where the genes are sampled may have both become extinct in the previous generation (probability e2), or neither became extinct (probability (1−e)2), or only one did (probability 2e(1−e)). If neither became extinct, the probability that the two lineages come from a single deme in the previous generation is

where (1−e)nd is the number of parental demes that contribute to the next generation. If one or both demes became extinct, the probability is simply 1/((1−e)nd).

Taking the different cases into account, the overall rate of coalescence is

where means that the computation is correct to leading order in 1/nd. This can be written as

where QR≡1/N+(N−1)FST/N is the ‘identity by descent’ among gametes produced by adults within a deme, that is the relatedness of such gametes relative to gametes produced in different demes. QR can be computed, for example, as in Wade and McCauley (1988), or can be deduced from the recursions detailed below. One then obtains that for an infinite number of demes

where k is the number of recolonizers, and where φ=0 for the migrant pool model and φ=(1−m)2 for the propagule pool model. Hence,

This result is consistent with approximation (23) derived by Wakeley annd Aliacar (2001) for φ=0, large N, and small m and e. Further, equation (7) is also obtained by the following argument, which is longer but allows step-by-step comparison with earlier analyses of Pannell and Charlesworth (1999).

Matrix formulation

The asymptotic rate of coalescence is obtained as (1−λ)−1, where λ is the largest eigenvalue of the matrix which describes the decrease of gene diversities in the absence of mutation (eg Hill, 1972; Ewens, 1982; Whitlock and Barton, 1997). To construct this matrix, we first consider a system of recursions for probabilities of identity within and among demes, comparable to those of Slatkin (1977) and Pannell and Charlesworth (1999). These recursions include mutation as those of Slatkin, but of course they describe the decrease of gene diversities when mutation rate is set to zero.

The life cycle considered in these models is as follows (Slatkin, 1977). In the absence of extinction, events occur in the following order: gamete production, dispersal, and population regulation, where N adult offspring survive. In each generation any deme can independently become extinct with some probability e. Extinction occurs before reproduction, so that the adults do not contribute anything to the next generation. An extinct deme is immediately recolonized by k colonizers, which reproduce immediately so that there will always be N adults in the next generations in all demes, recolonized or not. Thus, recolonizers experience two rounds of dispersal and reproduction within one ‘generation’, that is within the time only one round is considered for demes that do not become extinct. This assumption was thought to simplify the computation of FST, but can be relaxed (see Discussion). In the ‘migrant pool’ model, colonizers in the propagules are independently sampled from all other demes. In the ‘propagule pool’ model, propagules of k colonizers are formed in nonextinct demes after gamete production and dispersal, and each extinct deme is recolonized by the k members of a single propagule. As noticed by Pannell and Charlesworth (1999), it is actually not required that the extinct and colonized habitats are the same: it is only assumed that a constant number of demes become extinct and that an equal number of habitats are colonized in each generation. For conciseness, I consider N and k where previous authors considered 2N and 2k genes. Let Qi be the probability of identity of genes from different adults, within demes (Q1) and in different demes (Q2). Here we consider the probability of allelic identity in the infinite allele model. Let μ be the mutation rate. We write the recursions for next-generation identities Q1′ and Q2′ as

where

Here:

• Equation (8) and (9) are as in Slatkin (1977) and Pannell and Charlesworth (1999) (except for obvious typos in the latter). For nd → ∞, equilibrium FST is the solution Q1 of the recursion deduced from equation (8) with Q2=0:

This yields equation (6).

C is a shorthand for a term already considered by these authors.

• Equation (10) is modified from Slatkin (1977) as suggested by Pannell and Charlesworth (1999).

• Equation (10) is modified from Pannell and Charlesworth's equation (A.3) so as to be consistent with the exact recursions in Nagylaki (1983) for e=0. Nevertheless, this difference does not affect results for FST nor for effective size to leading order in 1/nd, because equation (11) and their equation (A.3) are identical to first order in 1/nd: they differ only by terms of order 1/nd2. Likewise, the O(1/nd) term in equation (10) does not affect FST nor effective size to leading order in 1/nd. Thus, this term, which represents the probability that two genes immigrating in the same deme come from a common parental deme, can also be neglected in later analyses.

•In equation (12), I use the notation φ for the probability that two colonizers have parent(s) in the same deme. If propagule pools are formed after dispersal of gametes (consistently with Slatkin's verbal description of the life cycle), then φ=(1−m)2+O(1/nd). If propagule pools are formed from locally produced gametes, then φ=1. In the migrant pool model, φ=O(1/nd). It may be checked that the O(1/nd) terms in φ have no bearing on the results, and they will be ignored below.

There are some inconsistencies among different analyses of the propagule pool model. Slatkin analyzed two different scenarios: ‘model I’ and ‘model II’ (respectively, an islands-mainland model without dispersal between the islands, and a finite island model with migration). As noticed by Wade and McCauley (1988), FST should be the same in both models when nd → ∞. For the propagule pool model, equation (15) of Wade and McCauley (1988) is consistent with equation (6) of Slatkin (1977); they imply φ=(1−m)2. But Slatkin's and Pannell and Charlesworth's systems of recursions for model II yield different results; they imply φ=1. Of course, both cases can be considered, provided they are distinguished.

The probability of common origin of colonizers φ was first considered by Whitlock and McCauley (1990), but I was unable to match their equations with the above ones. In particular, their equation (4) for identity among gametes implies that there are two successive reproductions at recolonization if φ=1 (as seen from terms of order 1/(kN) in the recursions, and consistently with Slatkin's description of the life cycle), while there is only one reproduction at recolonization if φ=1 (only terms of order 1/k appear, so gametes are not produced by N adults at some stage). Thus, their equations may not correspond to a well-defined life cycle.

•The main discrepancy with earlier expressions is in equation (13), which has the factor Q1+(1−Q1)/N instead of Q1 in Slatkin (1977) and Pannell and Charlesworth (1999). This difference takes into account that when genes from different demes originate from parent(s) in a single deme, they may actually have the same parent and coalesce. This does not affect FST values, but it does affect Ne values.

From the above equations, one can derive the probabilities gij that a pair of genes of type i (i=1 for pairs within a deme and 2 for pairs in different demes) derives without coalescence from a pair of genes of type j. They are

Ne is obtained as (1−λ)−1, where λ is the largest eigenvalue of G≡(gij). To leading order in 1/nd, the expression for Ne can be deduced from a perturbation approximation for λ near the limit nd → ∞. Classical expressions for perturbation approximations (eg Horn and Johnson, 1985; Charlesworth, 1994; Caswell, 2001) here take the form

where y and e are left and right eigenvectors associated with the largest eigenvalue (here 1) or the unperturbed matrix ( here lim n d G ) ; and dG are the terms of order 1/nd in G - lim n d G

By this method, one obtains again equation (7). It differs from the result implied by the same method but with Pannell and Charlesworth's system of recursions, as expected from the difference in the definition of B1 equation (13)). Differences between the rate of coalescence predicted from the above recursions and from Pannell and Charlesworth's ones are only of order e(e+m), and the maximum differences I have found numerically were by factors of ≈1.5 to 2.2 for k=N, large m, and e from 0.25 to 0.7 (details not shown). Approximations comparable to those of their Table 2 for total diversity πT can be derived from equation (7). It appears that a factor (1+Ne/k) is missing from the denominator of their central and right-hand approximations for πT when em. This may imply some reductions in effective size.

Alternative coalescent arguments

Alternative derivations of effective size also allow comparison with Whitlock and Barton's methods. First, the rate of coalescence may be obtained as the probability ξ that two different lineages that have not already coalesced are in gametes from the same deme, times the probability πc that two gametes from one deme coalesce within one generation. Here, distinguishing whether the deme has just been recolonized or not,

and ξ is the equilibrium solution of the recursion

Here ξ′ is the probability ξ considered one generation later, and the factor of ξ′ is the probability that two gametes produced in the same deme originate from two different gametes produced in a single deme one generation before (compare with the denominator of QR in equation (6). The remainder is the probability that genes in different demes come from gametes in the same deme (see equations (4) and (5)). It is straightforward to check that ξπc is 1/Ne as given by equation (7).

A variant of this argument is to compute ξ as (1−FST)ρ where ρ is the equilibrium solution of the recursion

obtained from equation (2) by ignoring the coalescence terms 1/N and 1/k. The rationale for the computation of ρ is given in the Appendix. This approach again yields equation (7), but now 1/Ne is expressed in the form (1−FST)ρπc, which allows a comparison with an argument on p. 434 of Whitlock and Barton (1997). They derive 1/Ne from their equation (13), in the form

However, if effective size was of this form, then their ϑx should depend on k (as ρπc does). This is not the case. The resulting formula tends to overestimate effective size, possibly by a factor of 100 or more for φ=1 and em (details not shown). It also conflicts with the approximation 1/Ne≈2(m+e)FST/nd given by Whitlock and Barton (1997) and further considered by Pannell and Charlesworth (1999). Much the same can be said of their equation (22), which is correct only when k → ∞ (for φ=0). A possible explanation for these discrepancies is that results are derived from their equation (3), which does not hold in Slatkin's models (see the Appendix).

The approximation 1/Ne≈2(m+e)FST/nd is valid, but in need of a general argument. This approximation can be deduced simply by expressing QR as a function of FSTQ using equation (15), plugging the result in equation (5), and simplifying for small m and e.

Discussion

It should be a relief to everyone that effective size can be obtained by the simple coalescent argument leading to equation (4). Such arguments efficiently yield expressions for effective size in more complex metapopulations with variable deme size (Rousset, in press). However, the coalescent argument has been obscured by earlier analyses (except Wakeley And Aliacar, 2001), which conflict with the present results. Previous recursions for probabilities of identity in Slatkin (1977), Whitlock and McCauley (1990) and Pannell and Charlesworth (1999) are inconsistent with Slatkin's life cycle and do not correspond to another well-defined life cycle. These discrepancies affect expressions for FST in Whitlock and McCauley (1990) and Whitlock and Barton (1997) and for effective size in Whitlock and Barton (1997) and Pannell and Charlesworth (1999). Quantitatively, effective size differs slightly from the expression resulting from Pannell and Charlesworth's system of recursions, and may differ substantially from equation (22) of Whitlock and Barton (1997) (for φ=0) or from results based on their equation (13).

Expectedly, the present results support the intuitive conclusion that extinctions reduce the effective size, which previous works had reached. The simple coalescent argument easily yields equation (5), which shows that propagule size k and probability of common origin φ affect effective size only through their effects on QR, that is on FST. Also as expected, lower k and higher φ reduce the effective size.

The assumption that two successive reproduction events occur right after extinction when only one occurs in nonextinct demes may seem unnatural and is easily relaxed (eg Whitlock et al, 1993), but results will then depend on additional assumptions about the life cycle, that is whether demes of k colonizers produce as much juveniles as demes of N individuals. If so, equation (5) is still valid, giving Ne in terms of the identity QR among gametes produced within a deme. QR obeys a recursion of the form

A concrete illustration of the different formulas is obtained by applying equation (5) to two sets of estimates of demographic parameters from the literature. Whitlock (1992) estimated 2N=21.7 (genes copies), m=0.31, φ=0.5, e=0.1, 2k=10.6 (gene copies) in the beetle Bolithoterus cornutus. The ratio Ne/(Nnd) is 0.67 or 0.72 whether an intercalary generation is assumed at recolonization or not. Ingvarsson et al (1997) estimated 2N=22.2 (genes copies), m=0.366, φ=0.5, e=0.255, 2k=8 (gene copies) in the beetle Phalacrus substriatus. The ratio Ne/Nnd is likewise 0.35 or 0.40. Thus, the overall effect of population structure seems to be a moderate reduction of effective size, whatever formula is used. Substantially larger reductions in effective size may occur for lower numbers k of colonizers relative to N. How often this occurs is an empirical question.