Introduction

During the 100th anniversary year of the Hardy–Weinberg (HW) law, it hardly seems necessary to remind readers of its fundamental role in population genetics.1, 2, 3 The simple HW rules for computing genotype frequencies from allele frequencies undergird almost all linkage and association studies. Deviations from Hardy–Weinberg equilibrium (HWE) occur when there is selection, drift, population admixture, or forms of non-random mating. Tests of HWE have been used to fine map susceptibility loci,4, 5 to screen for association in genome-wide data,6, 7 and to identify genotyping errors.7, 8, 9

As HWE testing has been thoroughly reviewed elsewhere,10, 11, 12, 13, 14, 15, 16 we only cover background material directly pertinent to our new test. When the number of alleles m is small, it is feasible to test for HWE by comparing observed genotype numbers to expected genotype numbers using either a Pearson's χ2-test or a likelihood ratio test (LRT).10 The degrees of freedom of either test are m(m−1)/2. As m increases in these tests, power decreases and expected cell counts fall below recommended levels.17 It is possible to construct a more parsimonious test by aggregating all heterozygotes and all homozygotes and computing the observed and expected numbers in the two aggregates.18 Superficially, the statistic

is appealing, where n is the sample size, ni/j is the number of observed genotypes of type i/j, and p̂i is the gene counting estimate of the allele frequency pi. Unfortunately, the GOF statistic does not follow a χ2 distribution when the pi are unequal.18 Weir18 suggests replacing the GOF test with the homozygosity test19

The test statistic H asymptotically follows a χ2 distribution with one degree of freedom.19

Very few of the HWE tests have been extended to pedigree data. The obvious LRT extension compares the likelihood using arbitrary genotype frequencies for the founders to the likelihood using HW genotype frequencies for the founders. This test also has m(m−1)/2 degrees of freedom and suffers from overparameterization for large m. Bourgain et al.20 developed a parsimonious quasi-score test that is parameterized by a heterozygous–homozygous fixation index.21 Their model does not readily lend itself to maximum likelihood estimation because of the complex bounds on the fixation index. Inspired by the Bourgain et al. model, we propose a model motivated by a simple acceptance–rejection mechanism. Our heterozygote–homozygote test (HH test) admits pedigree data, inbreeding internal to each pedigree and markers with dominant or recessive alleles (non-codominant markers). With no segregation distortion, complete typing of pedigree members, and a codominant marker, the HH test can dispense with the nonfounders in the pedigree and concentrate solely on the founders.

Heterozygote–homozygote model

Suppose we are interested in testing HWE at a locus with m alleles and corresponding allele frequencies (p1,…,pm). In our model, we postulate that nature operates by first sampling the genotypes i/i and i/j with the usual HW probabilities and then subjecting a proposed genotype to an acceptance–rejection step. Let α be the acceptance probability for homozygotes, and let β be the acceptance probability for heterozygotes. If γ=β/α, then the various genotypes are ultimately accepted with the probabilities

The parameters α and β cannot both be estimated. This identifiability crisis is overcome by estimating their ratio γ. When the individuals in the sample are unrelated, then the distribution of the genotype count vector n=(n1/1,…,n1/m,…,nm/m) is multinomial with likelihood proportional to

By declaring isolated individuals to be degenerate pedigrees, the model easily extends to samples consisting of independent pedigrees or mixtures of pedigrees and unrelateds. As the log-likelihoods of independent pedigrees add, it suffices to consider the likelihood of a single pedigree. If this pedigree has s members, then the likelihood formula

of Ott22 applies, where Xi is person i's marker phenotype and Gi is his/her marker genotype. The product on j is taken over all founders, and the product on {k,l,m} is taken over all parent–offspring triples. The prior Prior(Gj) is given by Eq. (2) or (3). The penetrance for a marker Pen(XG) is either zero or one. The transmission probability Tran(GmGk,Gl) follows the traditional Mendelian law of independent segregation.

Computational efficiency is improved by lumping the alleles not seen in a pedigree. Our implementation of the HH test in the pedigree analysis package Mendel23 employs a combination of genotype elimination24 and allele lumping.23 Alleles observed in a pedigree are preserved; the remaining alleles are lumped. Computation of founder priors must be performed carefully to preserve the correct form of the likelihood (Eq. (4)). Suppose the first k alleles are observed, and the remaining m-k alleles are missing in a pedigree. Eqs. (2) and (3) hold without modification for founder genotypes formed from the k observed alleles. Let q denote the lumped allele, and let

denote its frequency. To account for lumping, we adjust the remaining genotype probabilities as follows:

The numerator of Eq. (5) accounts for the fact that genotype q/q is a composite of homozygous and heterozygous genotypes.

We maximize the sum of the log-likelihoods subject to the constraints pi≥0, and γ≥0. In our computer program Mendel,23 maximum likelihood estimation is accomplished efficiently by a quasi-Newton algorithm. To form the likelihood ratio statistic, we maximize the log-likelihood both under the null hypothesis (γ=1) and the alternative hypothesis (γ≠1). Under the null hypothesis, twice the difference in the two maximum log-likelihoods follows an asymptotic χ2 distribution with one degree of freedom.

For a biallelic marker, the HH test is fully equivalent to the LRT based on genotypes. Proof of this fact just boils down to showing that one model is a reparameterization of the other. We have already demonstrated how to pass from the allele frequencies p1 and p2 and ratio γ of our model to the genotype probabilities Pr(1/1), Pr(1/2), and Pr(2/2) of the standard model. The reverse transformation simply amounts to

Type-one error rate and power

Now that we have presented our basic theory, we concentrate on demonstrating the flexibility of the HH test and on showing that it has the appropriate type-one error rates and comparable power to the Homozygosity test (H test). To address the question of type-one error rates, we simulated 1000 datasets under the null hypothesis of HWE. Each dataset includes 100 unrelated individuals typed at a marker with four equally frequent alleles. Using a significance level of 0.05, we find that the HH test has a type-one error rate of 0.047 (SD=0.007). In comparison, the benchmark H test has a type-one error rate of 0.042 (SD=0.006).

To assess the power of the HH test to detect deviations from HWE, we conducted two sets of simulations under our model. Each involved (a) a codominant marker with four alleles, (b) a grid of γ values ranging from 0.31 to 3.0, and (c) 1000 trials at each grid point. The first set of simulations used 100 unrelated individuals and assumed equally frequent alleles; the second used 200 unrelated individuals and assumed the allele frequency vector (0.8, 0.1, 0.05, 0.05). We chose these conditions so that in both cases the power was similar, approximately 0.80, when γ=2. As power depends on allele frequencies, we had to double the sample size for the skewed allele frequencies.

To generate genotypes under the alternative hypothesis γ≠1, we sampled with replacement as dictated by the model. This was realized for γ<1 by defining the acceptance probabilities α=1 and β=γ and for γ≥1 by defining α=1/γ and β=1. Genotypes were successively sampled for each individual until the first acceptance. We computed power at each grid point as the fraction of the datasets that reject the null hypothesis at the 0.05 level. Figure 1 plots power as a function of γ for both the HH test (solid line) and the H test (dashed line) for equally frequent alleles; Figure 2 does the same for the allele frequency vector (0.8, 0.1, 0.05, 0.05). For both sets of simulation conditions, the power curves of the H test and the HH test are not substantially different. At the point γ=1, both tests continue to show the correct type-one error rate.

Figure 1
figure 1

Comparison of the power of the HH and H tests with equally frequent alleles. Results of simulations to determine power as a function of γ. Each simulation is comprised of 1000 datasets each with n=100 unrelated individuals and 4 equally frequent alleles. The solid line denotes the power of the HH test. The dotted line denotes the power of the H test.

Figure 2
figure 2

Comparison of the power of the HH and H tests with skewed allele frequencies. Results of simulations to determine power as a function of γ. Each simulation is comprised of 1000 datasets with n=200 unrelated individuals and allele frequency vector=(0.8, 0.1, 0.05, 0.05). The solid line denotes the power of the HH test. The dotted line denotes the power of the H test.

A robust test should also possess good power against alternatives not covered by the model. We accordingly simulated another 1000 datasets with (a) n=200 unrelated individuals, (b) 3 codominant alleles with frequencies p1=0.5 and p2=p3=0.25, and (c) acceptance probabilities α1/1=1.0, α2/2=0.5, α3/3=0.25, β1/2=1.0, β1/3=0.5, and β2/3=0.25. In these datasets, the rejection rate for the H test is 0.328 and the rejection rate for the HH test is 0.343.

In practice, violations of HWE often stem from population stratification. We therefore conducted 100 simulations combining 100 random individuals from each of two different populations separately in HWE. In population 1 the frequency vector of the three codominant alleles is (0.4, 0.3, 0.3); in population 2 it is (0.8, 0.1, 0.1). If we keep the two sets of 100 individuals separate, the average estimated γ in the first set is 0.99 and 1.07 in the second set. When we analyze the combined datasets, the averaged estimated γ is 0.71. In 86 out of the 100 combined datasets, the HH test has a P value less than 0.05.

Finally to validate the HH test in family data, we simulated nuclear families with missing founder genotypes. Specifically we simulated 1000 sets of 100 nuclear families under various scenarios. Each simulation scenario assumed (a) 3 fully genotyped siblings, (b) a codominant marker with 3 alleles, and (c) the allele frequency vector (0.5, 0.25, 0.25). In the first scenario, we took γ=1. In this case the type-one error rate is 0.046 and the average estimate of γ is 1.07, with 95.2% of the 95% confidence intervals covering γ=1 (95% coverage rate =95.2%.) In the second scenario γ=1.5. Here the average estimate of γ is 1.64 and the 95% coverage rate is 93.0%. In the third scenario γ=0.50. Now the average estimate of γ is 0.52 and the 95% coverage rate is 93.4%. These results appear reasonable.

Real data examples

It is also revealing to consider real data. Our first example involves the classical survey by Waaler25, 26 of X-linked recessive color blindness. These European data consist of 40 affected women, 725 affected men, 9032 unaffected women, and 8324 unaffected men. We analyze the data assuming that the color-blindness locus has two alleles, d and D, and that men with genotype d and women with genotype d/d are color blind. The likelihood contributions for affected and unaffected men are pd and pD, respectively. Note that these contributions do not depend on γ. The likelihood of an affected woman is

An unaffected woman is either a D/D homozygote or a D/d heterozygote. Thus the likelihood of an unaffected woman is

As all individuals are unrelated, the likelihood for the full data is

Our computer program Mendel can maximize likelihoods of this sort involving dominant and recessive alleles. All the user needs to do is define the X-linked locus as non-codominant in Mendel's definition file. Details can be found in the Mendel manual available at the UCLA web site mentioned later.

One of the virtues of the HH test on this problem is that it correctly incorporates allele frequency information contributed by men. The estimate γ̂=4.087 (SD=1.661), and corresponding likelihood ratio statistic of 5.119 output by Mendel give an asymptotic P value of 0.0237. This apparent violation of HWE is hardly surprising because there are actually two different forms of color blindness, protanopia and deuteranopia, located on the X chromosome.

As a more computationally challenging example, we now consider 1296 highly polymorphic markers on chromosomes 1, 2, and 3 deposited in the CEPH database version 10 (website: www.cephb.fr/cephdb/). The number of families, individuals, founders, and missing founder genotypes vary from marker to marker. The minimum number of genotyped families is 8, the minimum number of genotyped individuals is 84, and the minimum number of genotyped founders is 24. The maximum number of families is 40, the maximum number of genotyped individuals is 510, and the maximum number of genotyped founders is 131. To avoid needless computation, we trimmed ungenotyped family members unless they were critical in determining relationships between genotyped family members.27 We also removed all Mendelian genotyping errors.8, 23, 24, 28 Under suboption 1 of the Mistyping option of Mendel, evidence of a genotyping error at a particular marker in a particular family causes the program to delete all genotypes at the marker in the family.

Testing of HWE is conducted by comparing maximum likelihoods under the null hypothesis (Ho: γ=1) and the alternative hypothesis (Ha: γ unrestricted). In both cases allele frequencies are estimated. The usual likelihood ratio statistic

asymptotically follows a χ2 distribution with one degree of freedom. For problems involving this many markers, Mendel takes tens of minutes of computing time. Exact times depend on the computing platform and compiler.

The resulting histogram of P values is shown in Figure 3. Except for a small spike near 0, the distribution appears uniform. Figure 4 compares H and HH P values based on founders alone. As expected from our previous simulation results, P values under the two tests are similar. As the HH test estimates allele frequencies as well as the heterozygote/homozygote ratio γ, it can provide a more sensitive test of HWE when an excess in the number of homozygous genotypes of one kind is balanced by a deficit in the number of homozygous genotypes of another kind. Violation of HWE in this situation is masked in the H test because the test statistic uses only the total counts of heterozygotes and homozygotes.18

Figure 3
figure 3

The distribution of HH test P values using data from the CEPH families. The results are from 1296 markers, each with 3 to 26 alleles. The number of families ranges from 8 to 40, the number of genotyped founders ranges from 24 to 131, and the number of genotyped individuals ranges from 84 to 510.

Figure 4
figure 4

Comparison of the HH test and H test P values using the genotyped founders from the CEPH families. The results are for the same 1296 highly polymorphic markers genotyped in the CEPH families (Figure 3). The number of unrelated individuals ranges from 24 to 131.

Finally, we illustrate the versatility of the HH test in handling tightly linked SNPs. Combining neighboring SNPs is a standard tactic for increasing their information content. When HWE holds, the frequency of the multi-locus phenotypes remain constant from generation to generation and can be predicted using haplotype frequencies. Owing to phase uncertainties, the supermarkers constructed from tightly linked SNPs are non-codominant.

For the sake of simplicity, let us consider the case of two neighboring SNPs. More SNPs can be handled by the same tactics. We denote a bi-locus phenotype as i1i2j1j2, where i1/i2 is the unordered genotype at the first SNP and j1/j2 is the unordered genotype at the second SNP. There are nine possible phenotypes. The corresponding alleles are the four haplotypes, 1-1, 1-2, 2-1, and 2-2. Eight of the nine bi-locus phenotypes correspond to a single haplotype pair. For example, the bi-locus phenotype 1112 is consistent with only the single haplotype pair 1-1/1-2. The doubly heterozygous phenotype 1212 is consistent with the two haplotype pairs 1-1/2-2 and 1-2/2-1. To convert two tightly linked SNPs into one non-codominant marker with four alleles, one can use the Combining_Loci option of Mendel.23, 29 The Mendel manual supplies the full syntax for constructing input files and interpreting output files.

As discussed in the X-linked color-blindness example, non-codominant markers can be easily handled in computing the pedigree likelihood (Eq. (4)); one simply sums over all possible genotypes consistent with each observed marker phenotype.

For example with a supermarker constructed from two neighboring SNPs, whenever an individual with phenotype 1212 is encountered, both of the genotypes 1-1/2-2 and 1-2/2-1 are visited in likelihood calculation. The priors shown in Eqs. (2) and (3), the null hypothesis (γ=1), and the alternative hypothesis (γ unrestricted) are interpreted exactly the same regardless of whether a marker is codominant or not. As an example, we combined SNPs rs998132 and rs434410 in the CEPH data. Although these two SNPs are approximately 3.5 Mb apart, no recombination events are observed between them. Table 1 summarizes our HH test results with the combined markers. Notice here that the evidence against HWE increases when we analyze the two markers together.

Table 1 Combined SNP results

Discussion

In summary, the HH test has the advantages of allowing (a) a mixture of pedigree and random sample data, (b) non-codominant markers, and (c) highly polymorphic markers. It is motivated by a simple to understand acceptance–rejection model. It is slightly conservative and almost always as powerful as the benchmark H test. In some settings, for example in pedigree data with untyped founders, it is more powerful than the H test. The HH test does entail considerably more computing, but this is hardly a consideration for moderate-sized datasets. Inbred children are allowed but not inbred or related founders. Mendelian segregation is assumed to hold in the transmission of genes from parent to offspring. If departures from these assumptions are suspected of occuring, the methods of Bourgain et al.20 will be more appropriate.

The HH test is now implemented in the publicly available software package Mendel (version 9.0 or later at www.genetics.ucla.edu/software) as part its Allele_Frequencies option. The option is fully documented and illustrated by specific numerical examples. We hope that this free software implementation will prove attractive to users.