A key problem in human genetics is how to identify polymorphisms that confer differences in the expression of genes. In contrast to coding polymorphisms, which are relatively easy to catalog comprehensively by resequencing well defined exonic sequences across individuals1, regulatory polymorphisms are difficult to pinpoint among the sea of polymorphisms localized in the vicinity of genes2. Many diverse strategies will probably be needed to create databases of putative regulatory polymorphisms.

Regulatory single-nucleotide polymorphism (rSNP). Two allelic variants of the same gene (shown as a red and a yellow arrow) are transcribed in different amounts as a consequence of an adjacent polymorphism. In this example, allele G, located upstream of the gene, has a higher transcript level than does allele T.

Current high-throughput strategies include large-scale genetic studies that compare gene expression patterns in individuals3,4 and in silico annotation of non-coding polymorphisms that are located in regions of DNA that show conservation among species or have a high probability of containing regulatory elements5. These approaches provide a relatively coarse and insensitive means to identify noncoding single-nucleotide polymorphisms (SNPs) that may alter gene expression.

In contrast, popular functional studies, such as transient or stable transfection6, have never been validated for high-throughput screening and cannot be used to assess the effects of potential regulatory SNPs (rSNPs) in their normal chromatin context. On page 469 of this issue, Julian Knight and colleagues7 describe a new approach for the functional association of DNA variation with expression by examining the binding of the transcriptional machinery of a gene and correlating it with DNA variation at the locus.

The target for this proof-of-concept experiment is phosphorylated RNA polymerase II (Pol II). Phosphorylation of the Pol II C-terminal domain releases the enzyme from the preinitiation complex and is a key step in initiating transcript synthesis at active promoters. The allelic differences in transcriptional activity of a specific gene caused by cis-acting regulatory motifs are theoretically measurable by comparing the amounts of Pol II bound to chromatin for each allelic copy of a gene. In the method of Knight et al.7, dubbed haploChIP, antibodies against Pol II are used to isolate chromatin crosslinked with Pol II. Gene-based polymorphisms can be identified using the co-immunoprecipitated DNA, and heterozygous SNPs can be assessed to see whether they maintain the expected 1:1 ratio for each allele. The theory not only works but also provides a highly sensitive means to accurately detect small deviations in allele ratios. In addition, the assay detects different alleles using dideoxy-terminated primer extension reactions followed by mass spectrometry—a strategy that could easily be adapted to create a multiplexed high-throughput assay.

A controversial rSNP

The application of the method to the investigation of a putative regulatory SNP at the gene encoding tumor necrosis factor (TNF) gave surprising results: the controversial TNF−308A promoter polymorphism is not correlated with transcription of TNF but with lymphotoxin-α (encoded by LTA), the upstream neighbor of TNF. The authors show that Pol II loading is similar for both the A and the G alleles of TNF−308, casting doubts on the functional effect of this polymorphism. In contrast, Pol II loading is different for specific LTA haplotypes that were also shown to differ in LTA transcriptional activity by allele-specific RT–PCR analysis: these haplotypes incorporate the TNF−308A allele. There is no doubt that this whodunit will intrigue aspiring Hercules Poirots and that a number of labs will want to reconsider TNF as the susceptibility gene for asthma, malaria and other diseases. LTA recently became prominent as a susceptibility gene for myocardial infarction in a Japanese population8 and was independently shown to be transcriptionally regulated by a SNP in intron 1.

The evidence added to the TNF/LTA story by haploChIP leads one to wonder how many other similar occurrences there are. The high level of correlation between close polymorphisms9 makes it probable that a number of existing genetic associations were obtained with indirect markers in linkage disequilibrium with the functional risk variant. For example, a large common haplotype of the cytokine gene cluster that confers risk to Crohn disease10 contains many genes and many highly correlated SNPs, but the causative variant is difficult to pinpoint because it seems to be genetically indistinguishable from other variants embedded in this haplotype block. The advent of whole-genome association studies using the haplotype map11 will probably implicate a substantial number of new disease associations with indirect markers that are in linkage disequilibrium with rSNPs.

Associations in search of a cause

Providing that Pol II (or another DNA-binding protein with similar properties) is equally useful for testing any human gene, haploChIP has the potential to allow the testing of each gene contained in a risk haplotype block for differential transcriptional activity. HaploChIP requires that suitable cells or tissues that express the gene are available and that the regulatory variant affecting the test gene is heterozygous. Another surrogate approach to in vivo transcriptional activity is allele-specific RT–PCR12,13. This is probably easier to carry out but has an extra requirement that the RT–PCR product contain polymorphisms, which is not possible for genes (such as TNF) that lack exonic SNPs. By comparing both gene transcripts in the same sample, both methods decrease the effects of environmental and other confounding factors. They also have advantages over in vitro techniques, such as transient transfection assays with allele-specific promoter constructs, as these studies are executed outside of their normal chromosome environment.

The haploChIP assay is useful as a surrogate marker of allelic variation in gene transcription, although it does not directly identify the cis-acting polymorphism or mechanism that is responsible for this variation. This is not surprising, as there are many SNPs on the haplotype associated with higher LTA transcriptional activity. The task is also difficult because of the extensive distances at which transcriptional control elements can be found14. Should human geneticists care about finding the true regulatory SNP when the risk gene is identified and there already are many highly correlated SNPs that tag the disease haplotype? Yes! The full knowledge of the risk variant is important, as it helps refine the knowledge of gene transcription and points to a target for therapy. Thus, the genetic community continues to require more tools to efficiently pinpoint risk variants that affect gene expression. There is no doubt that many more inventive approaches will be developed for this. Ultimately, the manipulation of regulatory mutations that affect expression levels should be easier than repairing or modulating the effects of an abnormal protein.