Introduction

Understanding the genetic basis of ecologically important traits – traits that increase an organism's ability to survive and reproduce in natural environments – has been and continues to be a central goal for ecological and evolutionary genetics (Feder and Mitchell-Olds, 2003). Identifying the genes for ecologically relevant traits will allow a host of important genetic and ecological questions to be answered: how many genes influence ecologically important traits, and what are their relative effect sizes (Orr and Coyne, 1992; Orr, 1998)? Do these genes show evidence of non-neutral evolution at the sequence level (Stahl et al., 1999; Tian et al., 2002; Mauricio et al., 2003)? What ecological and evolutionary forces lead to the maintenance of variation at these loci (Mitchell-Olds and Schmitt, 2006)? Do ecologically similar environments favor the same genes (Calboli et al., 2003; Colosimo et al., 2004, 2005; Protas et al., 2006), or is it possible to achieve a similar phenotype with different genetic mechanisms (Hoekstra and Nachman, 2003; Hoekstra et al., 2006)? Answering these questions is not trivial, yet to begin to make progress on them, identifying the genes that influence ecologically important traits is a prerequisite. In addition, these questions must be answered in a number of organisms, including and extending beyond traditional model systems – representing diverse taxonomic groups, life histories and ecological roles – before a clear picture of the ecology and genetics of adaptation emerges. Here we review recent contributions of a relatively new approach, population genomics and an old-mainstay, quantitative genetics, to the challenge of finding genes that underlie ecologically important traits. We argue that combining these approaches provides a powerful and promising way to move from chromosomal regions to genes and even to mutations underlying adaptive phenotypic variation.

What is population genomics?

At its core, population genomics is simply population genetics writ large – that is, population genetic analyses of a large number of loci, distributed throughout the genome (Black et al., 2001; Luikart et al., 2003; Schlotterer, 2003). Population genomics can be narrowly defined as separating locus-specific effects (recombination, selection, mutation and so on) that affect one or a few loci at a time from genome-wide demographic effects (genetic bottlenecks, founder events, inbreeding and so on). By utilizing a large number of loci spread throughout the genome, the effects of selection on a beneficial mutation and neutral variation at flanking sites (genetic hitch-hiking; Maynard Smith and Haigh, 1974) can be compared to genome-wide demographic effects, which are not locus specific. As such, the population genomic approach can be described in four phases (Luikart et al., 2003): (1) sample many individuals, (2) genotype this large population sample for many independent loci, (3) identify statistical ‘outlier’ loci and (4) either estimate demographic parameters and statistics (e.g., FST, phylogeographic structure, evidence of past bottlenecks) in a large data set with outlier loci removed, or alternatively, study the outlier loci specifically in an attempt to infer potential selective mechanisms underlying them.

At its core, population genomics relies on two key factors. First, it requires genotyping of a large number of loci, whether through amplified fragment length polymorphism (AFLP)'s, microsatellites, single-nucleotide polymorphism (SNP)'s or sequences. The current explosion of molecular techniques and genomic tools available suggests that this is unlikely to be a rate-limiting step, even for non-model species. One key working assumption of population genomics approaches, particularly important for studies using anonymous markers, is that the loci are independent. Even with markers of known locations in the genome, as the number of markers increases, a degree of auto-correlation will be introduced, potentially resulting in misleading inferences (see Hahn, 2006). Second, the population genomics approach requires a reliable means to detect outlier loci that may indicate regions that have been under selection – either to remove these loci to study genome-wide effects, or to identify such loci as the focus of study (Figure 1a). Because local adaptation and directional selection should have locus-specific effects of reducing genetic variability within populations and increasing differentiation between populations, loci that are outliers for these characteristics are strong candidate regions for involvement in adaptation. Determining whether an individual locus behaves as an outlier can be statistically evaluated with a battery of approaches, among them: testing whether FST is significantly different from either zero or neutral expectations (Lewontin and Krakauer, 1973; Beaumont and Nichols, 1996; Vitalis et al., 2001; Beaumont and Balding, 2004); the lnRV and lnRH statistics (natural log of the ratio of the variance and heterozygosity of alleles between two populations; (Schlotterer, 2002)); and the Ewens–Watterson test (Ewens, 1972; Watterson, 1978; Vigouroux et al., 2002); see Storz (2005) for a recent review of such tests. Importantly, the statistical significance of these estimates can be determined from the genome-wide empirical distributions of the test statistics (Akey et al., 2002), or by comparing observed statistics to a distribution generated by neutral coalescent simulations (Beaumont and Nichols, 1996; Beaumont and Balding, 2004), or neutral, non-equilibrium simulations using parameters estimated from the data (Thornton and Andolfatto, 2006).

Figure 1
figure 1

Conceptual model for the integration of population genomics and quantitative genetics. In (a), using population genomics approaches, outlier loci can be identified either by identifying loci with FST values that exceed confidence limits or intervals based on neutral coalescent simulations (dashed line, left panel) or are in the tails of the empirical, genome-wide distributions (filled portions of distribution, right panel). Once these outlier loci (shown in red), which are some unknown distance from the causal mutations, have been identified statistically, the next step of identifying the causal gene and mutation(s) can be pursued using genetic mapping techniques common to quantitative genetics (b). These mapping approaches can entail genetic crosses, or identification of homologous regions/candidate loci in related model organisms, or both.

Applications: estimating genome-wide effects

One consequence of gathering anonymous genome-wide polymorphism data is the potential it offers for investigators to separate locus- and region-specific selective effects from genome-wide effects such as demography. Several recent studies provide new methods to distinguish the effects of demography and selection in shaping genome-wide levels of polymorphism. These studies also caution that for candidate genes or loci linked with so-called outlier loci (see below), the challenge of distinguishing between purely demographic factors and the combined effects of demography and selection will be difficult (Przeworski et al., 2005; Teshima et al., 2006).

Because many demographic factors can affect patterns of nucleotide polymorphism in a way similar to the effects of selection, methods that can differentiate the effects of these forces are necessary before inferences can be made about their relative importance. Indeed, several recent studies have detected genome-wide departures from predictions of equilibrium neutral models in standard tests of selection (see Ford, 2002 for a review of such tests), presumably because of the effects of population genetic structure and demography (Andolfatto and Przeworski, 2000; Nordborg et al., 2005; Schmid et al., 2005, 2006). Although application in more traditional ecological settings is limited, three recent papers have used alternative approaches to distinguish between demographic and selective forces in shaping human polymorphism levels (Nielsen et al., 2005; Stajich and Hahn, 2005; Williamson et al., 2005), and some generalizations appear to be emerging. First, purely demographic factors can generate much of the observed variation in the amount and frequency of polymorphism in human populations. Based on this result, it seems likely that demography can have a large effect on genetic variability in many species that have similar ecological, demographic and genetic histories. Second, against this backdrop of demographic factors, it is still possible to detect loci that appear to have been under natural selection, either because patterns of variation at individual loci show a poor fit to a purely demographic model (Stajich and Hahn, 2005), or models incorporating selection provided a better fit to the data than demographic models parameterized with putatively neutral non-coding SNPs (Williamson et al., 2005), or because individual regions of the genome show allele frequency distributions that differ from global, genome-wide allele frequency distributions (Nielsen et al., 2005). Importantly, an ongoing challenge will be to distinguish whether patterns of variation at these loci truly show evidence of natural selection, or could as easily be explained by slightly more complicated (yet still realistic) demographic models.

In situations in which an ancestor-descendant relationship exists between different species or samples within a species (e.g., colonization of an island or novel habitat, domestication), it is possible to gain additional information by utilizing data from the ancestral population (Ometto et al., 2005; Wright et al., 2005; Yamasaki et al., 2005). In the case of maize and its wild ancestor, teosinte, Wright et al. (2005) used a simulation approach to partition selective and demographic effects on polymorphism levels at 774 genes. By running coalescent simulations conditioned on the simulated data fitting multiple summaries of teosinte data, the authors were able to control for the shared history of demography, mutation, and recombination of the maize and teosinte lineages before domestication. Within this context, the severity of the bottleneck that accompanied domestication was estimated for each locus to arrive at a multilocus (genome-wide) estimate of the bottleneck severity. By comparing these models to other models that allowed a fraction of loci to show evidence of a more severe bottleneck that is indicative of artificial selection, Wright et al. (2005) estimated that approximately 2–4% of genes in the maize genome were targets of artificial selection. Importantly, these candidate loci were then aligned with published linkage and quantitative trait locus (QTL) maps, showing a significant clustering between candidate loci and QTL for morphological differences between teosinte and maize.

Applications: detecting outlier loci

Many applications of the population genomics approach have concentrated on attempts to detect outlier loci, either by screening a large number of anonymous loci or by comparing test statistics between candidate genes and a random sample of unlinked loci. There have been numerous applications of both approaches utilizing data from humans (Payseur et al., 2002; Akey et al., 2004; Hahn et al., 2004; Rockman et al., 2004, 2005; Storz et al., 2004; Voight et al., 2006), Drosophila (Harr et al., 2002; Glinka et al., 2003; Kauer et al., 2003; Orengo and Aguade, 2004; Schofl and Schlotterer, 2004; Pool et al., 2006), Mus musculus (Ihle et al., 2006) and Arabidopsis thaliana (Cork and Purugganan, 2005). However, because in most of these cases neither the ecological context in which selection occurred nor the potential selective agent are known (but see Cork and Purugganan, 2005), here we focus on other recent applications.

Two clear cases in which the ‘ecological’ context and agent of selection are known are artificial selection/domestication and pesticide use. These cases provide a test for population genomics methods, at least in cases in which selection is strong and recent. To date, the population genomics approach has been used successfully to confirm loci that might have undergone a selective sweep in maize during domestication (Vigouroux et al., 2002), genes for coat color and shortened limbs in dog breed formation (Pollinger et al., 2005), chloroquine resistance in the malaria-inducing parasite Plasmodium falciparum (Wootton et al., 2002) and warfarin resistance in rats (Kohn et al., 2003). However, it is important to note that in all of these cases, strong artificial rather than natural selection is driving phenotypic divergence. In a ‘proof of concept’ paper, Anderson et al. (2005) compared FST for 10 non-synonymous mutations in four loci known to be involved in antimalarial drug resistance to FST for 10 synonymous mutations in housekeeping genes or genes of unknown function. They found that not only was FST higher for non-synonymous mutations in drug resistance loci than for synonymous mutations at other loci, but that it was higher than neutral coalescent simulations that had been based on their putatively neutral loci, confirming that in this case loci subject to natural selection indeed exhibit higher FST relative to neutral loci.

In more traditional ecological settings, the population genomics approach has been applied in several cases in which species show clinal variation or ecotypic differentiation. Although not at a genomic scale, Storz and Dubach (2004) showed a clear example of detecting outlier loci: the albumin (Alb) locus in the deer mouse Peromyscus maniculatus showed significant altitudinal differentiation that exceeded neutral expectations based on 18 other allozyme markers, although the precise selective agent remains unclear. Studies that implicate an environmental gradient as the selective force producing differentiation are clearly strengthened by multiple tests (e.g., multiple altitudinal or latitudinal transects), and preferably using multiple statistical approaches (Campbell and Bernatchez, 2004; Storz et al., 2004; Vasemagi and Primmer, 2005). However, identifying truly independent tests may prove to be a challenge because before population divergence, individual loci will share mutational environment and coalescent histories, potentially introducing some degree of correlation between populations.

To date, four studies, using anonymous genome-wide markers, have used multiple comparisons to test for consistent or repeatable outlier loci, using a variety of species (Table 1). For example, the common frog (Rana temporaria) exhibits altitudinal clines in a host of life history traits in Europe. Bonin et al. (2006) showed that approximately 2% of the AFLP loci they screened also exhibited elevated altitudinal differentiation; to guard against false positives, the authors only considered true outlier loci to be those that showed elevated differentiation in multiple tests. Regions in linkage with these AFLP loci would be strong candidates to contain genes underlying life history traits in this species that have been subject to altitudinally varying selection. Results from these studies (Table 1) suggest that <5–10% of loci screened show significantly elevated FST between differentiated ecotypes or populations, although the small number of examples available means generalizations are tentative.

Table 1 Examples of recent studies using the population genomics approach to identify ‘outlier loci’ involved in differentiation between habitats

Limitations of population genomics

Despite the appeal of these methods, especially for non-model organisms, they suffer from three glaring weaknesses from the standpoint of ecological and evolutionary functional genomics when applied in isolation. First, and perhaps most importantly, in cases where anonymous genetic markers are used to scan the genome, it is extremely likely that any anonymous locus showing ‘outlier’ behavior is not the causal locus itself, but is either physically linked or in linkage disequilibrium (LD) with the selected site(s). The extent of LD between the marker locus and the functionally relevant mutation can vary dramatically across the genome and also study systems, and will be affected by population history, mating system, recombination rate, the age of the selected allele, the strength of selection and many other factors (Nordborg and Tavare, 2002), making it difficult to localize the functionally relevant mutation. Similarly, the size and position of the genomic regions that show differentiation will be unknown, at least for species without detailed linkage maps (see below). On their own, most population genomic studies in natural populations have been limited to detecting a few statistical outlier loci, often in regions of unknown position in the genome. Therefore, the next and most important step of moving from anonymous marker to functional gene/mutation is unclear.

Second, population genomic studies are usually carried out in the absence of any information about phenotype. Thus, although genetic loci that show significant differentiation may be indicative of the effects of natural selection and local adaptation, in many cases it is unclear which traits may differ between samples, and if any correspond to the differentiated loci. The absence of knowledge about the phenotype under selection limits both ecological investigation about the putative selective agents as well as any knowledge or future use of candidate genes (see below).

The third potential weakness of the approach is with the logical inference that loci showing patterns of high differentiation (or reduced variation) have been subject to selection, whereas loci that do not show these patterns have not. Existing evidence suggests that it is possible and even probable for some loci to show high levels of differentiation (or reduced within population variation) without having been targets of selection either owing to chance alone, or for instance, due to incorrect models of demographic history used in estimating parameters like FST (e.g., island versus stepping stone models; see Akey et al., 2004) or ascertainment bias (Thornton and Jensen, in press). Similarly, it is also possible for loci to be under selection without yielding statistically significant results in tests for selection (Gallavotti et al., 2004; McVean et al., 2005; Przeworski et al., 2005; Teshima et al., 2006). Simulation studies by Teshima et al. (2006) suggest that a sizable proportion of loci under selection will be missed in empirical genome-wide scans, especially if the loci selected had previously been neutral. In addition, requiring loci to show outlier behavior in independent population comparisons or transects, while helpful in guarding against false positives, implicitly assumes that the same loci will be fixed in response to similar environmental conditions (Bonin et al., 2006). Existing evidence demonstrates that this may not be the case even when both phenotypes and selective environments are very similar (Hoekstra and Nachman, 2003; Hoekstra et al., 2006), suggesting that this criterion will lead investigators to miss some loci involved with adaptation.

New contributions from quantitative genetics

Unlike population genomics, quantitative genetics is not a novel approach, but is instead rooted in a long history (Galton, 1869, 1889). More recently, molecular tools have reinvigorated quantitative genetics through LD and QTL mapping. Like population genomics approaches, both LD and QTL mapping require the survey of a large number of genome-wide molecular markers (Figure 2). Specifically, LD mapping relies on surveys of genetic polymorphism data from a collection of samples (inbred lines, accessions, individuals and populations) to test for statistical associations between these genetic markers and particular phenotypes, again based on the premise that the marker(s) is in LD with the causal locus, or less likely, is in fact the causal mutation itself (Box 1; see Mackay, 2001; Clark, 2003; Mitchell-Olds and Schmitt, 2006). By contrast, in a QTL mapping approach, statistical analyses of genome-wide molecular markers and phenotypes measured in progeny of controlled crosses are used to identify chromosomal regions contributing to phenotypic differentiation (reviewed in Mackay, 2001; Erickson et al., 2004).

Figure 2
figure 2

Schematic illustration of the relationships between population genomics, LD mapping and QTL mapping, emphasizing the different types of data required.

Box 1 Recent approaches for gene mapping in populations without a known cross or pedigree structure

LD mapping and related methods (Box 1) offer the prospect of identifying genes for ecologically important traits. By utilizing naturally occurring variation sampled in wild populations that have accumulated hundreds to thousands of recombination events over time (compared to a few generations in laboratory crosses), LD mapping is expected to (1) necessitate more markers than traditional QTL studies to provide complete coverage of the genome, but (2) have substantially higher resolution for fine-scale mapping of genomic regions. This approach offers great potential, especially if candidate genes are available for association tests. However, one of the major hurdles facing LD mapping is the need to control for cryptic population structure or stratification, which can lead to false positives (see Pritchard et al., 2000a, 2000b; Cardon and Palmer, 2003; Marchini et al., 2004; Yu et al., 2005). The LD mapping approach has successfully been applied in Drosophila and maize (Long et al., 1998; Thornsberry et al., 2001; Palsson and Gibson, 2004), and is starting to be applied in ecological settings. For example, Stinchcombe et al. (2004, 2005) showed that accessions of Arabidopsis thaliana with putatively functional FRIGIDA alleles exhibited significant latitudinal clines for flowering time and vernalization sensitivity, as would be predicted based on FRIGIDA's role in the vernalization flowering time pathway (Simpson and Dean, 2002). In like fashion, Aranzana et al. (2005) showed that genome-wide association tests could successfully identify known flowering time and pathogen resistance genes in Arabidopsis thaliana, despite appreciable population structure. At present, most success stories in non-model organisms are limited to associations between a phenotype (inherited in a simple Mendelian manner) and one or a few candidate genes. For example, allelic variation at the melanocortin-1 receptor (Mc1r) was perfectly associated with coat color phenotype (melanic versus wild-type dorsal pelage) within populations (Nachman et al., 2003) and with environmental variation (dark-colored lava versus light-colored granitic habitat) among populations (Hoekstra et al., 2004); similar statistical associations were not observed at neutral mtDNA markers.

Unlike LD mapping, QTL approaches require the breeding of a large number of progeny, but thereby skirt the complications associated with genetic structure in natural populations. The genetic architecture of one phenotype, bristle number in Drosophila, has perhaps been the most intensively studied in a QTL context (reviewed by Mackay, 1995, 1996), and after tireless work, genes underlying bristle variation have been identified (Lai et al., 1994; Long et al., 1995). Although the precise molecular mechanisms remain elusive and the ecological relevance of bristle number is unclear, the progress in identifying the genes underlying bristle number suggest that moving from QTL to gene can be daunting even in model systems. Moreover, available data from both Drosophila melanogaster and Arabidopsis thaliana suggests that considerable heterogeneity exists in the causal mutations for ecologically important traits, either because of different loci affecting traits in natural populations than in mapping crosses (McDonald and Long, 2004), or because of genotype × environment interactions lead to different loci being identified in field versus laboratory settings (Weinig et al., 2002).

For these reasons, most QTL studies have been limited to describing the genetic architecture of traits, with little progress in reaching the level of genes and mutations (Flint et al., 2005), especially in non-model systems. Nonetheless, a small but growing number of exceptions exist (e.g., Johanson et al., 2000; El-Assal et al., 2001; Shapiro et al., 2004; Colosimo et al., 2005; Balasubramanian et al., 2006; Protas et al., 2006), suggesting that QTL mapping is a feasible method of identifying the genes for ecologically important traits. And, although time intensive, costly and challenging, QTL approaches arguably represent the most comprehensive way to identify genomic regions and ultimately genes contributing to adaptive variation, especially for multigenic traits (Price, 2006).

There are three major ways in which genetic mapping approaches can interface with population genomics approaches in natural populations. First, data from genetic mapping studies (such as QTL studies) can be applied to population genomics studies. By scoring genetic markers in controlled crosses or pedigrees, genetic linkage maps can be generated, allowing for the possibility of linking outlier loci detected using population genomics approaches to ‘real’ chromosomal positions in the genome – representing a first step in localizing the genes of interest (Figure 1b). Second, by providing a large number of anonymous markers for study, the data gathered for population genomics approaches can do ‘double duty’ and be used to test and control for population genetic structure in subsequent studies using an LD mapping approach. Finally, population genomics approaches can be used to fine-scale map within the large chromosomal regions identified by lab-based QTL studies.

Applications: linkage map development and QTL mapping

A prerequisite for QTL mapping is the development of a linkage map, which allows investigators to associate phenotypes with specific identifiable regions of genome. Although the development of a linkage map and QTL mapping are clearly distinct issues, and the development of linkage maps is no longer necessary in many model systems with complete genome sequences, generating linkage maps can remain a challenge in many novel systems. Recently much effort has been spent generating linkage maps in non-model species, using a variety of experimental approaches and a diversity of molecular markers, with great potential for identifying genes underlying ecologically relevant variation (Table 2). Species that can be maintained in captivity, bred in the lab, and have relatively large brood sizes are often ideal for generating linkage maps using traditional crosses (e.g., butterflies (Heliconius, Bicyclus), sticklebacks (Gasterous), deermice (Peromyscus), monkeyflowers (Mimulus) and columbines (Aqueligia)). In other cases, linkage maps can be generated by following large pedigrees in natural populations (e.g., red deer (Cervus elaphus), soay sheep (Ovis aries), great reed warblers (Acrocephalus arundinaceus)); such long-term studies are time intensive and are only applicable to species that can be easily followed over time.

Table 2 A sampling of non-model species for which robust linkage maps have been developed

It is clear that many systems of ecological interest are not easily manipulated in the laboratory (i.e., genetic crosses are not feasible or generation times are prohibitively long). In many cases, ecological systems can take advantage of either closely related genetic model systems with genetic linkage maps or even complete genome sequences (e.g., Dawson et al., 2006; Windsor et al., 2006). For example, a recent study generated a predicted linkage map for passerine birds by taking advantage of the sequence similarity of available microsatellites and the draft chicken genome sequence (Dawson et al., 2006), and then evaluated the accuracy of the predicted linkage map by comparing it to a previously published map for the great reed warbler (Acrocephalus arundinaceus). Despite the fact that chickens and warblers are diverged by millions of years, 24 microsatellite markers were conserved between the linkage maps, and synteny was maintained across genomes, highlighting the utility of the chicken genome for generating genomic resources for other avian species. Similar levels of conserved linkage have been reported between model organisms and non-model relatives, including Drosophila and the apple maggot fly (Rhagoletis; Roethele et al., 2001), Mus and deer mice (Peromyscus; Steiner et al., in review), and zebrafish and salamanders (Voss et al., 2001). The availability of linkage maps for non-model species can be extremely useful for two primary reasons: (1) evenly spaced markers representing even coverage of the genome can be chosen for use in population genomic scans of the genome, or (2) alternatively, once regions of interest are identified, homologous regions in a closely related species (either a model with a complete genome sequence or one more amenable to controlled crosses and breeding) can be used to either design additional markers for fine-scale mapping or to search for candidate loci.

The benefits of combining the population genomics approach with traditional linkage maps can be seen in two studies that focused on closely related plant species (maize and teosinte: Vigouroux et al., 2002; pedunculate and sessile oak: Scotti-Saintagne et al., 2004). Both Vigouroux et al. (2002) and Scotti-Saintagne et al. (2004) detected loci that behaved as outliers in comparisons of population samples between closely related species. Because linkage maps have been made from experimental crosses, it is possible to determine (1) the genomic position in which these outliers occur, and (2) in some cases, test if loci showing elevated differentiation are also the loci closest to QTL for traits that are differentiated between the species. In the maize example, two of the outlier loci were located near known QTL for ear structure and endosperm weight, two traits that differ dramatically between maize and teosinte and could have been past targets of artificial selection (Vigouroux et al., 2002). In fact, because even the largest QTL mapping populations are limited by the number of recombination events, population genomic approaches may be useful in this context to fine-scale map genes.

Utilizing knowledge of candidate genes

One appeal of both population genomics and quantitative genetic approaches is that anonymous markers can easily be generated in non-model species and then scored in a large number of individuals without any a priori knowledge of the genetic or developmental mechanisms responsible for ecological differentiation. However, the use of candidate genes, although not necessary, can certainly aid in moving from the identification of a genomic region (a QTL) to a single gene or even a nucleotide mutation (a QTN). The vast majority of successes in identifying genes responsible for adaptive phenotypic variation arguably have involved either candidate loci in the initial genomic scan or the identification of candidate loci within a genomic region of interest. For example, population genomic approaches need not be restricted to completely anonymous markers (e.g., AFLPs or microsatellites), and instead can include markers in candidate loci themselves or a subset of loci chosen to include possible candidate genes (e.g., markers based on expressed sequence tags developed in an appropriate tissue type or from microarray experiments). Similarly, association studies in natural populations can include candidate loci; for example, Olsen et al. (2004) used this approach to assess how allelic variation at the photoperiod receptor gene CRY2 contributes to variation in flower timing in 95 wild accessions of Arabidopsis.

Even in large genetic crosses, candidate genes have played a major role in the success stories of linking adaptive phenotypic variation to genes. For example, in three-spine sticklebacks (Gasterosteus aculeatus), a QTL approach identified a 10 Mb region containing a large effect region contributing to adaptive variation in pelvic morphology between oceanic and lake populations (Shapiro et al., 2004). A candidate gene, the Pitx1 gene, was identified in this region based on its knockout phenotype in laboratory mice, which affects pelvic morphology. When interrogated in sticklebacks, Pitx1 expression differences were associated with a reduced pelvis in lake populations, although the precise molecular change is yet to be identified. Additional phenotypes, like pigmentation variation, have been well explored in vertebrate systems, in part because the wealth of genetic and developmental information on pigmentation provides an extensive list of well-characterized candidate loci (Hoekstra, 2006). First, mutations in the tyrosine-related protein 1 (Tyrp-1) gene have been mapped in a pedigreed population of Soay sheep (Ovis aries), and are associated with a naturally segregating light/dark coat color polymorphism (Gratten et al., 2007). Second, genetic crosses and exploration of candidate genes in Mexican tetra (Astyanax mexicanus) led to the discovery that multiple independent deletions in the ocular and cutaneous albinism-2 (Oca2) gene were responsible for parallel loss of pigmentation in cave-dwelling tetra populations (Protas et al., 2006). Finally, a QTL study of adaptive color pattern in beach mice (Peromyscus polionotus) identified several regions of major effect (Steiner et al., in review), one of which contained the candidate gene, Mc1r. A single amino-acid change in the Mc1r coding region is associated with between 10 and 36% of the variation in several adaptive pigment traits and the functional effects of this amino-acid change was verified in pharmacological assays (Hoekstra et al., 2006).

Future directions: combining data from laboratory crosses and natural populations

Both population genomics and quantitative genetic approaches have limitations, especially in non-model systems, which often lack complete genome sequences. Although population genomic studies have been largely successful in generating large-scale genomic data for comparisons between populations, disentangling the effects of demography and sifting through false positives have been major challenges. Beyond the statistical challenges, the next step of moving from anonymous markers to known genetic regions and eventually to genes is perhaps even more daunting. Whereas QTL studies have successfully identified chromosomal regions contributing to phenotypic variation for those species which are amenable to genetic crossing experiments, narrowing these regions to genes, especially for traits with limited candidate loci requires enormous sample sizes and a plethora of genetic markers to detect rare recombination events (Flint et al., 2005; Slate, 2005). Because of the limitations of each respective method, combining these approaches has the potential to be extremely powerful for identifying genes responsible for ecologically relevant variation.

Here we provide a powerful example of how combining multiple approaches can yield more insight than a single method applied in isolation. Rogers and Bernatchez (2005) combined population genomics scans of the genome for outlier loci with QTL mapping to examine the genetic basis of growth rate differences between dwarf (limnetic) and normal (benthic) ecotypes of whitefish (Coregonus clupeaformis). By constructing a linkage map and performing QTL mapping using AFLP loci that had previously been used in population genomics scans (Campbell and Bernatchez, 2004), they were able to determine whether the loci closest to growth rate QTL were the same as loci showing elevated differentiation in genome-wide scans of natural populations. They found that eight loci closest to QTL for growth rate showed FST values outside the empirically determined 95% confidence limits estimated from 440 AFLP loci, suggesting that differentiation at these loci was due to selection on nearby growth rate loci. Moreover, because benthic and limnetic fish were sampled from four lakes, the authors were able to show that one AFLP locus corresponding to a growth rate QTL exhibited significantly higher levels of genetic differentiation between ecotypes than expected by neutrality in three of the four lakes, suggesting genetic parallelism in how growth rate differences have evolved in lakefish. By combining QTL mapping, population genomics and surveys of multiple populations, this study illustrates the potential utility of combining approaches to (1) link markers identified in population genomics scans to phenotype and (2) test for parallel evolution using comparative genomic scans. However, it is important to note that additional work in both natural and lab-based populations will be needed to narrow these genomic regions to genes and mutations.

Conclusions

Population genomics provides an alluring first glimpse into the genome of previously unexplored organisms. In isolation, this approach can provide estimates of the proportion of the genome that are inconsistent with simple patterns of neutrality and hints of the possibility of parallel evolution, but it is thus far limited in its ability to point us directly to genes underlying adaptive phenotypic variation. The recent explosion of genome-wide linkage maps in novel systems highlights the ease by which large-scale genomic markers can be generated, and represents a clear way in which population genomic data can be linked to genome/chromosomal position, bringing us one step closer to the adaptive alleles themselves. It is clear from recent studies that combining data from natural populations (e.g., population genomics approaches or LD mapping) with information from lab-based experiments (e.g., linkage maps and QTL) provides a powerful approach for identifying the genes responsible for adaptive phenotypes (e.g., Colosimo et al., 2005).

Importantly, the identification of genes underlying ecologically relevant traits does not represent a scientific end point, but rather the beginning of a new set of questions! Are adaptations to similar environments due to the same genes or mutations either within or between species? Do adaptive alleles emerge from standing genetic variation or as new mutations? How does the strength of selection affect the genetic architecture of adaptive traits? How do demographic and stochastic factors affect the ability of organisms to adapt to changing environments? Although the tools for non-model systems will by definition lag behind model systems, the ecological and evolutionary questions that can be answered in a diversity of novel systems will often be unique. These questions and others can be more directly addressed once ecologically relevant genes are in hand for a diversity of systems and will together provide important insight into both the ecology and evolution of adaptation.