Introduction

The analysis of systems involving many loci is relevant in population and quantitative genetics; for example, in studies of gene networks using multi-locus data. An important problem is that of measuring departures of the joint distribution of genotypes at two loci from independence. This is known as ‘linkage disequilibrium’ (LD) and arises, for instance, in genome-enabled prediction of quantitative traits using markers (for example, Meuwissen et al., 2001) or in LD mapping. Association among alleles at various loci stem from physical proximity within a chromosome (linkage), mutation, random drift or selection favoring epistatic combinations. LD can also arise due to correlated allelic frequencies (Ohta, 1982) or when a covariance structure among allelic frequencies results from sub-structure within a population (Gianola et al., 2012). An example of this is in dairy cattle, where there are distinct and large half-sib families. LD has received enormous attention in population genetics (for example, Hill, 1974, 1975; Hedrick, 1987; Lewontin, 1988; McVean, 2007) and has become an increasingly important object of study in the light of genomic data, for example, use of single-nucleotide polymorphism (SNP) markers and sequence information enabling the study of evolution of genomic blocks and of genome-wide associations with quantitative traits.

Here, we address the measurement of the statistical association between allelic states at two loci, that is, ‘gametic’ or ‘linkage disequilibrium’, via the pair-wise correlation measures that are typically used for characterizing this association. The most widely used measure of pair-wise LD is the squared correlation between allelic content or r2 (Hill and Robertson, 1968); additional estimators are discussed by Hedrick (1987). Lewontin (1988) cautioned about the interpretation and behavior of these correlations (for example, lack of some ‘invariance’ properties correlations have in a multivariate normal distribution) and warned that the distributions of the standard estimators are frequency dependent. For example, comparisons of the strength of LD structure between populations are confounded by variation in allelic frequencies.

Problems introduced by frequency-dependent estimators of parameters are well-known in quantitative genetic analysis of binary traits with threshold models (Dempster and Lerner, 1950; Gianola, 1982). For example, if a trait is coded as ‘present’ or ‘absent’, the variance of the trait depends on its mean value, and heritability is a function of the fraction of the population having the ‘present’ modality. This problem has been addressed by postulating an underlying (for example, Gaussian) distribution on which gene substitution takes place (Wright, 1934), followed by either transforming estimates from linear models using a formula suggested first by Alan Robertson (Dempster and Lerner, 1950) or by use of generalized linear models (Gianola and Foulley, 1983). The resulting estimators apply to a conceptual Gaussian scale, but their distribution is not completely frequency independent either. For example, Moreno et al. (1997), using simulation, found that the performance of Bayesian estimators of variance parameters in a threshold model depended on the extent of sparsity in the data. Although the distribution of estimates in a threshold model is frequency dependent, the parameterization allows for comparisons among populations via use of a scale in which frequencies do not intervene, at least explicitly.

Widely used estimators of LD, such as r2, calculated from (0, 1) binary data at each of the two loci (‘1’ denotes the presence of an allele ‘1’ on a locus A, say) are frequency dependent as well. For instance, the disequilibrium parameter D=Pijpiqj, where Pij is the population frequency (probability) of haplotype ij and pi and qj are the probabilities (allelic frequencies) of observing a ‘1’ at either locus, is frequency dependent itself, by definition. Lewontin (1988) stated:

Our conclusion is that there are no generally gene frequency independent measures of association between loci, and that, indeed, the concept itself is an ill-defined one. In any particular case, we may be able to find a measure of association that is preserved under particular conditions, but the search for a “pure” measure of gametic disequilibrium is doomed to failure.

Even if a frequency-independent measure of LD may not exist, it would be desirable to develop one that is less affected by frequencies than r2. An obvious alternative for a 2 × 2 contingency table, where rows and columns denote allelic states at each of two loci and entries are counts of the observed haplotypes, is a tetra-choric correlation (Pearson, 1901). This is the correlation parameter of a standard bivariate normal latent process representing forces that create association between alleles, for example, selection favoring some combination of alleles. Apart from the threshold-liability models of animal breeding, tetra-choric correlations have been used in human genetics, for example, heritability of migraines and of dependency on opioids (Nyholt et al., 2005; Gelernter et al., 2006). Typically, estimates of tetra-choric correlations are larger than standard correlation statistics (for example, Ueberseax, 2010). However, it seems that the tetra-choric correlation has not been used for LD, at least with genomic data. This correlation can be estimated via maximum likelihood (Tallis, 1962) or by Bayesian methods. Although the calculations are fairly trivial by today’s computing standards, this is not the case in the context of genome-wide data. For example, consider a LD analysis using 2.5 million Drosophila SNP markers (Ober et al., 2012); a maximum likelihood or Bayesian analysis would require iterating many times over each of a huge number of marker pairs.

The objective of this article is to introduce a new estimator of a LD parameter (ρ2) based on a metric proposed by Plackett (1965) that is much easier to compute than a maximum likelihood (or Bayesian) estimate of a tetra-choric correlation. Based on arguments given by Plackett (1965), the estimator approximates well the tetra-choric correlation, and we conjectured that the sampling distribution of is less frequency dependent than that of the estimator of r2. The manuscript is organized as follows. After this introduction, the ‘new’ estimator is presented and discussed. The following section presents an empirical evaluation of LD in Holstein–Friesian cattle using 771 SNP markers, followed by an application to a data set from HapMap III representing a sample of 88 unrelated Tuscan people assessed for 21 991 SNPs (chromosome 3). Subsequently, 1600 haplotypes over a region of 1 Mb were simulated under the coalescent, to estimate LD using the two measures. Finally, a modest simulation study contrasted the new estimator with r2 with respect to statistical properties using several scenarios of LD and allelic frequencies. The paper concludes with the statement that provides a useful metric for the study of LD, especially when comparing populations or groups that differ in allelic frequencies.

A novel estimator of LD

Consider a pair of bi-allelic loci and let (0, 1) denote the two possible allelic states at each of the loci. A standard assumption in LD analysis is that a random sample of size n is drawn from a homogeneous, unstructured population. The n data points (haplotypes) can be organized into the 2 × 2 contingency table below

where n00, n01, n10 and n11 are haplotype counts. As noted earlier, one possible model for association postulates an underlying Gaussian process creating a correlation (tetra-choric correlation) in a latent scale, as in threshold-liability models. Here, there would be two standard normal liability variates with correlation ρ. Tallis, (1962) presented a maximum likelihood procedure for estimating ρ from data in the contingency table. Although this is computationally fairly simple, calculating the association parameter for each pair of whole-genome markers does not scale well, because iteration must be carried out for each pair of loci. Hence, it would be desirable to produce an approximation that is simple to compute from highly dimensional genomic data. If such procedure leads to a characterization of LD that is less frequency dependent than that provided by the standard estimator of r2, this would be desirable as well.

Define the cross-product ratio

approaching in large numbers

here, the Ps are true, unknown, gametic frequencies. Plackett (1965) introduced bivariate distributions indexed by a single parameter ψ that, in the case of the 2 × 2 table, takes the form ψ=. The relationship between the tetra-choric correlation and ψ is given by

When xLarge is <1, negative disequilibrium results; values of xLarge between 1/5 and 5 yield a set of values of ρ in the interval (−0.56, 0.56). Rutledge (1977) showed that Equation (2) is precisely the tetra-choric correlation of Pearson (1901).

Let y and z be the frequencies of alleles ‘0’ at each of loci A and B, respectively. In the standard parameterization of LD (for example, Hill, 1974; Lewontin, 1988), one has

Using this representation, algebraic manipulation of Equation (1) leads to

Now, a standard metric for characterizing LD (for example, Hill and Robertson, 1968; Lewontin, 1988) is the correlation between allelic contents (0, 1) in the discrete scale

Using this in Equation (3) produces the relationship

Next, define the allelic odd ratios at loci A and B and parameter γ as

Then

Equation (4) is valid for values of r different from those of γ or from its reciprocal and, of course, provided that xLarge is positive; note the discontinuity in the relationship between r and xLarge at r=γ or r=γ−1.

In view of Equations (2) and (3), the following relationship between ρ and r can be established

with a discontinuity at r=γ or r=γ−1. The relationship is illustrated at γ=1, 2 or 3 in Figure 1. In cases a and b, the relationship is monotonic. Further, variations in r do not translate into the same amount of variation in ρ. For instance, when r is negative, parameter ρ is essentially insensitive with respect to variation in r. When r is positive, ρ is more sensitive, especially for γ=2. The discontinuity does not show up in cases a and b because ρ reaches 1 before r=1 or r=0.5, respectively. Also, note that there is always agreement in sign: if r is negative, so is ρ, and likewise if r is positive. Case c (γ=3) shows the effect of the discontinuity: when r reaches 1/3, ρ is no longer finite, and at larger values, a positive r translates into negative values of ρ. This is of course absurd: if Equation (4) is evaluated at γ=3, it can be verified that there are values of r >1/3 that produce a negative xLarge; this is not possible because this parameter is a ratio of products of probabilities, so it cannot take negative values.

Figure 1
figure 1

Relationship between the tetra-choric correlation (rho) and the correlation between binary allelic codes at loci A and B at selected values of (Gamma). (a) Gamma=1. (b) Gamma=2. (c) Gamma=3.

The estimator of ρ suggested by Plackett (1965) is

where for n10>0 and n01 >0, so its calculation is trivial. Rutledge (1977) used simulation to evaluate Equation (6) in the context of repeatability of (0, 1) traits, that is, exactly as in the LD setting, and found that it had a minor upward bias, whereas the standard product-moment correlation (akin to r in LD) gave a sizable upward bias when inferring ρ. Garcia and Toro (1989) used Plackett’s estimator in a Monte Carlo evaluation of four estimators of heritability of threshold traits; its performance tended to be better when the number of families and progeny group sizes were large. A derivation of the approximate standard error of the estimates obtained with Equation (6) is given in Appendix 1; Plackett (1965) gives the end results without presenting their derivation.

Empirical comparisons of 2 versus r2

Cattle data

Genotypes from 771 SNPs across Bos taurus autosome 27 (BTA 27) with minor allelic frequencies (MAF)0.05 were used; the distribution of SNPs over MAF bins was fairly uniform. To obtain reliable estimates of LD metrics, we used a sample of 806 Holstein–Friesian cattle, including 469 bulls and 341 bull dams; a detailed description of the data is in Qanbari et al. (2010). Mean neighbor marker distance was estimated at 72 kb. We reconstructed haplotypes for each chromosome using default options in fastPHASE (Scheet and Stephens, 2006). All 296835 SNP pairs on chromosome 27 were used to estimate pair-wise LD values as conveyed by the ρ2 and r2 metrics; densities of the distributions of the estimates are in Figure 2. Irrespective of the estimator used, most pairs of SNPs had a very weak association. However, the tetra-choric correlation (left panel) produced a less sharp distribution of estimates and a higher frequency of values >0.2. This is consistent with what is often observed when Gaussian latent scale models are used to infer heritability, also a squared correlation (Dempster and Lerner, 1950; Gianola, 1982).

Figure 2
figure 2

Cattle data: density of the distribution of estimates of LD parameters: Plackett’s estimator at the left, and the standard metric at the right.

It is well known that there is a decay of LD measures as physical distance between markers increases, with LD expected to be a function of distance. We examined this relationship by plotting all r2 and ρ2 estimates against physical distance between marker pairs. As shown in Figure 3, both metrics showed a similar pattern; however, the magnitude of ρ2 values was markedly larger than that of r2 for all pair-wise intervals, and the decay of LD was clearly gentler. In order to visualize the decay of LD and the proportion of marker pairs in ‘useful’ (in the sense of genome-enabled prediction of complex traits) LD, we stacked the estimates and plotted these as a function of inter-marker distance classes (<0.025, 0.025–0.05, 0.05–0.075, 0.075–0.12, 0.12–0.2, 0.2–0.5, 0.5–1.5, 1.5–3 and 3–5 Mbp). The bar plot (Figure 4) illustrates the rate at which LD decays with physical distance and gives a more clear comparison between the two metrics. Figure 4a, based on r2, conveys a picture of weaker LD than that resulting from Figure 4b, corresponding to ρ2. A mean value of r2=0.20±0.32 was estimated for pair-wise distances of 50–75 kb compared with ρ2=0.58±0.36, which includes the average inter-marker space in this study. Despite the strong relationship between LD and distance, the proportion of physically close marker pairs with low values of r2 was large. As shown in Figure 4, >70% of marker pairs at 25 kb apart are not in ‘sufficient’ LD to be useful for association studies as deemed by estimates of r2; Ardlie et al. (2002) and Qanbari et al. (2010) discuss this issue further. With the ρ2 metric, a larger proportion of marker pairs displayed strong LD; the level at which ρ2 would be ‘useful’ for association studies needs to be determined in future research.

Figure 3
figure 3

Cattle data: LD decay as a function of distance between pairs of SNPs over the entire chromosome (standard metric at the left, Plackett’s estimator at the right).

Figure 4
figure 4

Cattle data: comparison of fraction of marker pairs with different LD levels (<0.1, 0.25, 0.4, 0.6 and >0.6, depicted by different colors) for marker pairs in different distance bins of maximum 5 Mb. Panels (a) and (b) represent r2 and ρ2, respectively.

When a new mutation occurs in a finite population, LD is created to a degree that depends on the frequency of the allele that is haplotyped together with the new mutation. It has been shown that estimates of LD between SNPs with a low MAF are biased upwards, and that high-frequency polymorphisms are better for accurate estimation of LD (Reich et al., 2001). Although this is probably due to the frequency-dependent nature of the LD estimators (Lewontin, 1988; Pritchard and Przeworski, 2001), it should be kept in mind that low-frequency SNPs have a higher probability of having arisen recently (Nordborg and Tavaré, 2002), so discarding low frequency SNPs from an analysis may give a distorted evolutionary picture. To examine this issue, we evaluated the decay of LD as a function of the difference in MAF (ΔMAF) between the two loci. Most of the pairs had a ΔMAF that was <0.40. To explore the dependence of LD on allele frequencies, we calculated the average ρ2 statistic for all SNP pairs within five bins of ΔMAF and compared results with those obtained by averaging r2. As ΔMAF increased, average r2 decreased. On the other hand, average values of ρ2 increased with a growing ΔMAF. As shown in Figure 5a, average ρ2 was 0.065 for SNP pairs of ΔMAF0.1, increasing to 0.1 for those in the largest ΔMAF bin. In contrast, r2 started at 0.016 for frequency-matched SNPs and decreased to 0.009 for the bin with the largest ΔMAF. The emerging picture is that the two estimators are dependent on ΔMAF. Now, as estimates of ρ2 were larger than those of their r2 counterparts, a scale-free comparison is more sensible. Co-scaling the LD values as indicated in Figure 5b produced an increase of ≈30% in magnitude of ρ2 between frequency-matched SNPs relative to those with an extreme MAF. In the case of r2, co-scaling produced an 80% increase. Hence, the dependency of r2 on ΔMAF was evidently stronger than that of ρ2. Furthermore, the trend of ρ2 for SNPs of low or moderate ΔMAF was fairly flat, and considering that only 5% of the SNP pairs had extreme ΔMAFs, the dependency of ρ2 on allele frequency difference is probably negligible (Figure 5, panel b).

Figure 5
figure 5

Cattle data: (a) Average observed LD as a function of difference in the frequency of minor alleles at each pair of SNPs. (b) Using xi=LDi/max (LD), for i=1,2,..., 5 ΔMAF bins, co-scaled values for each LD metric were computed as max (x)−xi. A full color version of this figure is available at the Heredity journal online.

In addition, we calculated the average of r2 estimates within three bins of physical distance at five different ΔMAF bins and compared results with the average of ρ2 estimates (Figure 6). As shown in Figure 6a, for pairs of SNPs <100 kb apart, a mean value of r2=0.23 was observed for frequency-matched SNPs, and this sharply dropped to 0.04 for the pairs with the largest ΔMAF. In contrast, an overall mean value of ρ2= 0.47 was observed for frequency-matched SNPs <100 Kb apart, which rose to 0.57 for those with the largest ΔMAFs. Co-scaling the LD metrics (Figure 6b) again indicated a stronger dependency of r2 on allele frequencies. In particular, pairs of SNPS at short physical distance had r2 values that were much more affected by ΔMAF than the corresponding ρ2, a trend that was observed for all ΔMAF bins. This dependency of co-scaled r2 values was almost four times larger than that of ρ2. For markers within a distance range of 100 kb, the magnitude of ρ2 increased from 26% to 39% as ΔMAF augmented, but for SNP pairs that were far from each other, the dependency was negligible. Furthermore, the dependency of co-scaled ρ2 on allele frequency did not vary much with respect to physical distance between pairs of SNPs.

Figure 6
figure 6

Cattle data: panels (a) and (b) present the average and co-scaled measures of LD, respectively, as a function of ΔMAF for pairs of SNPs at three different physical distances. A full color version of this figure is available at the Heredity journal online.

HapMap data

The Italian Tuscan population sample from the HapMap III data set was used to examine the behavior of the new metric when applied to human genomic data. The Italian Tuscan population involves 88 unrelated individuals, and ρ2 and r2 were estimated using the 21 991 SNPs across chromosome 3 (out of a total of 1 387 486). Figure 7 displays the relationships between the estimates of ρ2 (right panel) and r2 (left panel) as a function of physical distance and absolute value of the MAF difference between SNP pairs. As expected, the level of LD as conveyed by the two metrics decreased with increasing distance, although more gently for ρ2 than for r2. It is evident in the plot that estimates of ρ2 were essentially insensitive with respect to ΔMAF, whereas estimates of r2 were clearly frequency dependent. In the case of r2, SNP pairs in short physical distance were more affected by ΔMAF.

Figure 7
figure 7

Three-dimensional surface plots depicting the behavior of estimates of r2 (left panel) and ρ2 (right panel) as a function of inter-marker distance (Mb) and MAF interval (dMAF) for SNPs in chromosome 3 of the Italian Tuscan population in HapMap III. A full color version of this figure is available at the Heredity journal online.

This empirical study with cattle and human data indicates that ρ2 also decays as a function of physical distance, a similar behavior to that of other LD estimators. However, the new metric is clearly less dependent on the MAF difference between SNP pairs than r2 and provides a more sensible and potentially useful new metric for analyzing LD across the genome. Although an entirely frequency-independent measure of LD does not seem possible to attain (Lewontin 1988), ρ2 removes a major source of statistical noise when assessing LD structure. As already noted, further studies are needed to determine the ‘useful’ level of ρ2 in association studies.

Monte carlo comparison of 2 versus r2

Statistical properties

We examined small sample (N=200 individuals) properties of the estimator of ρ2 and compared these with those of r2 via two simulations. In the first simulation, referred to here are as ‘ρ-induced’, the strength of LD was fixed at the latent scale, at values of ρ2=0.04, 0.16, 0.36 and 0.64. In the second simulation, termed ‘r-induced’, we took values of r2=0.04, 0.16, 0.36 and 0.64, but at the observable (0, 1) scale. In both simulations, the allelic frequencies used were (0.1, 0.1), (0.25, 0.25) and (0.50, 0.50), where the numbers in each set refer to the frequencies of the ‘0’ allele for the two loci in question; only positive disequilibrium was considered; unequal frequencies were examined as well, leading to similar qualitative pictures, so results are not reported. For each LD by allelic frequency combination, 4000 independent samples were generated, and LD was estimated in each using either or as indicated below.

The rationale for the two different simulations is as follows: we attempted to avoid favoring a given estimator because the resulting haplotype frequencies depend non-trivially on the process that creates association, as this takes place either at the level of the latent scale (ρ-induced) or at that of the observed scale (r-induced). If disequilibrium is due to some latent process, then ρ is a relevant parameter; if there is no such latent process, then ρ lacks meaning. As shown subsequently, the joint distributions of alleles strongly depend on how parameters are elicited, and the same is true of the behavior of the corresponding estimators.

In the two simulations, allelic frequencies were estimated from the observed number of haplotypes in each realization, but with some shrinkage to avoid null frequency estimates. For instance, . This is equivalent to a Bayesian posterior mean in which the allelic frequency is assigned a Beta (0.001, 0.001) previous distribution. Likewise, the joint frequency in the (0,0) cell was estimated as For the sample size considered (N=200), the effect of this shrinkage away from 0 is mild, and essentially negligible at larger sample sizes (results not presented). As the estimator of ρ is not defined when either or n10 are null, if a simulation produced null values, these cell counts were set to 1.

ρ-induced simulation

As before, let y and z be the true frequencies of the ‘0’ alleles at the two loci and note that P01=yP00, P10=zP00 and P11=1−P00−(yP00)−( From the definition of xLarge given in Equation (1), one can write

so that, given y, z and xLarge, and assuming equal allelic frequencies at the two loci (y=z), the following equation (with the unknown being P00), is satisfied

This equation has three P00 roots, namely:

The first root is the linkage equilibrium solution, P00=y2, which is trivial. The other two roots can be shown to approach y as xLarge goes to infinity; one of these two roots does approach to y from ‘above’ and the other one from ‘below’. As P00 is a probability, bounded between 0 and 1, we verified that

generated a valid probability distribution, consistent with the allelic frequencies (both equal to y) used as input for solving the equation. Now, from the definition of the tetra-choric correlation , it follows that

and, given ρ, this equation can be solved numerically for an xLarge value that is consistent with the true level of disequilibrium in the latent scale. Finally, given xLarge and y, P00 in Equation (7) can be calculated readily, and the probability distribution generated has frequencies P00=Equation (7), P01=yP00, P10=zP00 and P11=1+P00yz. For example, let ρ=0.40 and y=0.10; then, with arccos (−0.4)=1.9823132,

Solving the above equation numerically yields xLarge=2.9239, and using this together with y produces the ‘true’ probability distribution (after rounding) , and . It is important to note that while parameter ρ is fixed externally to the allelic (y) and haplotype frequencies (P’s), the latter depend on ρ. In other words, the simulation input parameters are ρ and y.

From this probability distribution, the induced value of the correlation in the (0,1) scale is For the example given above, after rounding, compared with a ‘parental’ value of ρ=0.40.

r-induced simulation

Here, the probability distribution was generated from the parametric definition (assuming equal allelic frequencies at the two loci)

so that, assuming positive disequilibrium, P00=y[y+r(1−y)]. From this, given r and y one can solve for P00 and generate the entire haplotype distribution, as well as xLarge. Then, the ρ parameter is assessed from the relationship between ρ and xLarge but with xLarge evaluated as indicated previously. To illustrate, for r=0.40 and y=0.10, the probability distribution obtained is P00=0.046, P01=0.054, P01=0.054 and P11=0.846, with

and ρ=0.78, after rounding.

Results

Table 1 gives the results of the ρ-induced simulation. When disequilibrium was simulated at the latent level, the estimators of both ρ2 and r2 had an upward bias. However, as r2 values were much smaller than those of ρ2, the relative bias was larger for the estimator of r2 than for that of ρ2. The relative bias decreased as allelic frequency increased and also as disequilibrium became stronger, for example, for ρ2=0.16 versus 0.64. When disequibrium was very weak (ρ2 =0.04), the empirical bias of the estimator of r2 did not vary as allelic frequency increased, but the relative bias decreased from 100% of the parameter value (for allele frequency equal to 0.10) to 19% (frequency equal to 0.50). On the other hand, for strong disequilibrium (ρ2 =0.64), the relative bias of the estimator of r2 decreased from 3% to 1%. Clearly, the estimator of ρ2 had a smaller relative bias throughout, which was practically insensitive with respect to allelic frequencies when its true value was 0.36 and 0.64. The inter-quartile ranges were smaller for the estimator of r2 than for that of ρ2, a reflection of the larger values of the latter parameter. However, the relative precision, as measured by the coefficient of variation, was always smaller for the estimator of ρ2. Precision was higher as allelic frequencies increased and as latent level of disequilibrium became stronger. This simulation suggests that, to the extent that disequilibrium is caused by forces operating in a latent scale, such as selection (typically, selection operates on phenotypes and not on genotypes), ρ2 provides a more sensible metric, conceptually and statistically, with the corresponding estimator being fairly insensitive to allelic frequencies except at mild levels of LD.

Table 1 Mean, inter-quartile (third–first) range, coefficient of variation (CV, %) and relative absolute empirical bias (RAB, percentage of ‘true’ parameter value) of estimators of r2 and ρ2 at varying allelic frequencies at the two-loci (y, z) and levels of latent level linkage disequilibrium; ρ2 is the disequilibrium parameter driving the process and r2 is the induced value

Results for the r-induced simulation are not presented, and we give a brief summary of the results. Here, disequilibrium was created and parameterized at the (0,1) scale: the true joint frequencies followed from the values of the true-positive disequilibrium at the r-scale and from the allelic frequencies. The bias of estimation of r2 was negligible, except when LD was weak, whereas that of ρ2 was very large, except at very strong levels of disequlibrium. Even though the estimator of ρ2 had better relative precision, its strong bias negated this advantage. The explanation is that disequilibrium is not viewed as following from latent forces, so that ρ2 does not have a clear mechanistic meaning.

A question is the extent to which the two estimators might lead to a drastically different inference about LD. Not surprisingly, the answer is frequency and sample size dependent. A comparison between the estimates of r2 and ρ2 at the three allelic frequencies considered and at sample sizes of N=200 or 800 individuals is shown in Figures 8 and 9 for the ρ-induced (ρ2=0.36) and r-induced simulations (r2=0.25), respectively. The estimates varied concomitantly, but the strength of their association depended more on allelic frequencies than on sample size. At intermediate allelic frequencies, the two sets of estimates were almost linearly related; at smaller frequencies, scatter was more patent. As seen throughout this study, to the extent that disequilibrium is the result of forces acting on some latent scale, the r2 metric conveys a weaker picture of LD than ρ2.

Figure 8
figure 8

Relationship between estimates (4000 samples of size N=200 or 800 each) of LD parameters r2 and ρ2. ‘True’ ρ2=0.36. Joint frequencies elicited from the knowledge of allelic frequencies and simulated value of ρ.

Figure 9
figure 9

Relationship between estimates (4000 samples of size N=200 or 800 each) of LD parameters r2 and ρ2. ‘True’ r2=0.25. Joint frequencies elicited from the knowledge of allelic frequencies and simulated value of r.

Coalescent simulation

We also simulated a sample of 1600 haplotypes over a region of 1 Mb based on the coalescent model in the ‘ms’ simulator (Hudson, 2002) and estimated all pairwise LD using the two measures. Figure 10 displays the corresponding estimates of the two LD measures plotted against inter-marker distance and ΔMAF. The picture was fairly similar to what it was found with the cattle and HapMap data sets. The simulation and empirical results confirmed that ρ2 produces a quantitatively and qualitatively different picture of LD from that resulting from r2 and that the distribution of the estimates of ρ2 was frequency independent for most practical purposes.

Figure 10
figure 10

Estimates of ρ2 and of r2 obtained with a simulation under the coalescent (details are in the text). A full color version of this figure is available at the Heredity journal online.

Conclusion

A novel metric for assessing pair-wise LD was developed and evaluated in an empirical data set using markers in a sample of dairy cattle, in a Tuscan population available in HapMap III and in small-scale simulation studies. The new estimator targets a parameter (ρ) reflecting an association in a conceptual bivariate normal distribution, the tetra-choric correlation of Pearson (1901); on the other hand, r2 relates to the chi-square statistic for large samples. The ρ parameter can be estimated in a simple manner, using haplotype frequencies and a cosine transformation. Contrary to the method of maximum likelihood for estimating the tetra-choric correlation, which requires iteration (Tallis, 1962), ρ can be estimated without computational difficulty and it scales well when LD involves an analysis of high-dimensional genomic data. We found that, to the extent that it is sensible to view LD as stemming from forces that operate in a latent scale, the estimator is more precise, and it is less affected by differences in MAFs than the estimator of r2, the most commonly used metric of LD. Although the distribution of the estimates still remains somewhat dependent on frequencies, ρ is an appealing measure and provides a picture of stronger LD. This is similar to what occurs with threshold-liability models of quantitative genetics, which are now common in genetic analysis of binary data: estimates of heritability are larger in a latent scale (Dempster and Lerner, 1950; Gianola and Foulley, 1983; Harville and Mee, 1984; Gilmour et al., 1985). The method does not generalize easily to multiple alleles, but the concept extends naturally to more than two loci by introducing a higher-order latent Gaussian distribution. This would allow to examine LD in many different settings, introducing measures of conditional (given sets of loci) LD. In this case, the estimation procedure would probably have to be maximum likelihood or Bayesian, so computations would be considerably more involved.

In conclusion, our study supports the view that ρ2 is a useful metric, one that facilitates comparison of levels of LD among populations differing in allelic frequencies; such comparison is distorted by the frequency-dependent nature of r2. On the other hand, the expected value of r2 can be related to genetic parameters, such as effective population size (Sved, 1971). This implies that the distribution of estimates of effective size is frequency dependent as well. Perhaps similar relationships can also be developed for ρ2 and this may be a subject for future research.

Data archiving

Data deposited in the Dryad repository: doi:10.5061/dryad.t9f2g.