Introduction

Gibberellins (GAs) are hormones which play crucial roles in plant growth and development such as seed germination, stem and hypocotyl elongation, leaf expansion, and floral and fruit development (Hooley 1994). Some GA-regulated genes have been analysed in detail showing the presence of GA-responsive cis-elements in their promoters. Trans-acting factors able to bind to the cis-elements have been identified. One such factor identified in barley (Hordeum vulgare L.) is GAMYB, a GA-dependent MYB transcription factor (Gubler et al. 1995). Other than in barley, GAMYB homoeologs have been isolated from rice (Oryza sativa), wheat (Triticum aestivum), Lolium temulentum and Arabidopsis thaliana (Gubler et al. 1997; Gocal et al. 1999; Chen et al. 2001; Stracke et al. 2001). The Poaceae appear to have a single copy gene of GAMYB, whereas Arabidopsis has a small family of GAMYB-like genes with three copies (Stracke et al. 2001).

The physiological functions of the GAMYB gene and its interactions have been studied. In rice and barley, the GAMYB protein induces the expression of genes encoding hydrolytic enzymes needed for germination, such as α-amylases, proteinases and cell-wall degrading proteins, through direct binding to a highly conserved 21 bp GA-responsive element (GARE, TAACAA/GA) (Gubler and Jacobsen 1992; Gubler et al. 1997, 1999; Cercós et al. 1999). GAMYB also activates gene expression during endosperm development (Diaz et al. 2002). Moreover, the involvement of GAMYB as a trans-activator of GA signalling may not be restricted to the cereal aleurone. GAMYB genes are also involved in flower development (Gocal et al. 1999; Murray et al. 2003), seed maturation (Diaz et al. 2002), and stem elongation (Gocal et al. 2001; Chen et al. 2001). GAMYB thus plays multiple roles in the GA signalling cascade throughout plant growth. In rice, Tsuji et al. (2006) showed that GAMYB function differs in aleurone cells and flower organs. These diverse functions appear not to depend on GAMYB alone and they might result from differences in the organ-regulation of GAMYB expression (Tsuji et al. 2006) and interactions with other proteins (Diaz et al. 2002; 2005; Gubler et al. 2002; Isabel-LaMoneda et al. 2003; Washio 2003; Rubio-Somoza et al. 2006).

In animals, the DNA-binding domain characteristic of MYB proteins consists of three repeats of about 50 residues (R1, R2 and R3). Plant MYB homologs are structurally and functionally more variable. Their MYB domain usually contains two imperfect repeats (R2 and R3) characterized by three regularly spaced tryptophan residues which play a role in the folding of the hydrophobic core of the domain and are thus generally conserved (Kanei-Ishii et al. 1990; Martin and Paz-Ares 1997). Members of the MYB family of transcription factors have a highly conserved N-terminus corresponding to the MYB domain, but are very variable in the C-terminal region. In barley GAMYB, the typical R2 and R3 repeats are located in the N terminal region and are followed by two transcriptional activation domains (see Woodger et al. 2003).

To describe and compare sequence diversity of GAMYB within and between wheat and barley, we sequenced GAMYB genes from two collections of barley and wheat that represent a cross section of the genetic diversity of these species. Our results show, that HvGAMYB and TaGAMYB share a high similarity on the nucleotide level. Nevertheless, there are marked differences in nucleotide and haplotype diversity between the three homoeologs in wheat and highly divergent patterns of linkage disequilibrium in wheat and barley.

Materials and methods

Plant material

In total 155 accessions were selected from the Barley Core Collection (BCC) and the Gaterslebener Genebank (HOR) (Table S1a) originating from Europe (N = 77), East Asia (N = 28), America (N = 24), and West Asia and North Africa (N = 26). This germplasm set includes two-rowed and six-rowed barleys and is part of a larger germplasm that is described in more detail in Haseneyer et al. (in preparation). Nine accessions were classified as “breeding/research material”, 33 accessions as “traditional cultivars/landrace” and 113 accessions represented “advanced/improved cultivars”.

A wheat collection of 42 lines from different origins (Table S1b) was chosen from the worldwide bread wheat core collection of 372 accessions (Balfourier et al. 2007) to represent the variability of this core collection as described by Ravel et al. (2006). The wheat lines used here were sampled by a strategy reported to maximise the diversity (Gouesnard et al. 2001). For each wheat line, seeds used in this work were provided by INRA-Clermont-Ferrand Genetic Resource Centre for Cereal Crops and came from a single, self-pollinated head. All plantlets for a given accession are thus considered as genetically identical.

Eight seeds from each accession were grown in the greenhouse and leaves from 2-weeks-old seedlings were harvested and pooled for genomic DNA extraction using the methods described in Stein et al. (2001) for barley and Tixier et al. (1998) for wheat. The different sample sizes chosen for barley and wheat were based on the different ploidy levels. Thus, the wheat sample is smaller than the barley sample but the number of sequences analysed is approximately identical.

Physical assignment of the GAMYB fragments obtained to chromosomes was done using a set of Chinese Spring aneuploid lines (Sears 1966; Endo and Gill 1996).

Gene amplification and sequencing

The oligonucleotides used for amplification and sequencing (Table S2) were designed using Primer 3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3.cgi) based on the reference sequences for wheat (AY615200) and barley (AY008692). As wheat is a hexaploid species, direct sequencing of genes from PCR products requires the design of locus-specific PCR primers to avoid co-amplification of the different copies (Ravel et al. 2006). PCR profiles are summarized in Table S2. For PCR from barley DNA a mix of 10× Qiagen® PCR Buffer, deionised H2O, 5 μM of each primer, 0.25 units of Taq DNA Polymerase (Qiagen) and 20 ng DNA was prepared. For wheat, PCR reactions were performed in a final volume of 25 μl containing 25 ng of genomic DNA, 250 μM of each dNTP, 0.4 μM of each primer, 1 unit of Taq polymerase (Qiagen) and 1× Taq polymerase buffer.

Sequencing of the barley and wheat amplicons was done according to the ABI PRISM® BigDye™ Terminator Cycle Sequencing protocol using AmpliTaq®. Both strands of each fragment were re-sequenced on the ABI3730xl DNA Analyzer system (Applied Biosystems). As the TaGAMYB-D1 fragment (Table S2) was approximately 2,000 bp long, an internal primer (5′-GAGCTGGATGATGAGCCTCT-3′) was used to obtain the complete sequence.

We used the Sequencher™ program Version 4.5 (Gene Codes Cooperation) and the Staden package (Staden et al. 2000) for sequence alignment and editing in barley and wheat, respectively. Barley and wheat sequences were aligned by ClustalW (Thompson et al. 1994). The alignment was manually corrected using BioEdit software (Hall 1999). All positions given in the text correspond to the position in the haplotype sequence alignment relative to the start codon (Fig. S1).

Diversity analysis

Haplotype diversity (\( \hat h \)), nucleotide diversity estimated as Pi (π; Nei 1987), the mean pair-wise differences, and Theta (θ; Watterson 1975), the number of segregating sites were computed using the DNA-SP version 4.10 software package (Rozas et al. 2003). This software does not take into account the alignment gaps that can cause an underestimation of π and θ values. For this, insertion–deletion events (indels) were treated as single sites. To analyse the pattern of diversity we applied the sliding window method with a window size of 100 bp and a step size of 25 bp. The π and θ values were compared by the Tajima’s D test (Tajima 1989) implemented in DNA-SP.

The π values obtained for each species were compared by resampling. For both species, 100 resampled files were generated using the bootstrap method implemented in seqboot (Phylip 3.6; Felsenstein 2005). Each file was used to calculate its π value. Such sampling allowed us to estimate the average and the standard deviation of the empirical distribution for π in each species. The averages obtained were compared using the SAS TTEST procedure (SAS Institute, Inc., USA).

Linkage disequilibrium was estimated using squared allele–frequency correlations, R 2 (Hill and Robertson 1968), for pairs of polymorphic sites with minor allele frequencies over 5%. The Fisher’s exact test was used to determine whether the associations between polymorphisms were significant. The analyses were performed by applying TASSEL version 9.3.1 (Buckler et al. 2006).

Genetic mapping

HvGAMYB was genetically mapped in the Steptoe × Morex mapping population developed by Kleinhofs et al. (1993). Positions were determined on an updated Steptoe × Morex map (Stein et al. 2007; http://pgrc.ipk-gatersleben.de/transcript_map/momavis.php). We designed a cleaved amplified polymorphic sequence (CAPS) marker that requires the use of the restriction enzyme MspI based on the single nucleotide polymorphism (SNP) at position 2542 (Fig. S1). In wheat, genetic mapping was done using the International Triticeae Mapping Initiative (ITMI) segregating population. We genotyped the SNP at position 2037 (Fig. S1) using primer-extension chemistry to generate allele-specific products analysed by mass spectrometry as explained in Balfourier et al. (2006).

Linkage was analysed using Mapmaker/exp 3.06 (Lander et al. 1987) in wheat and Joinmap 3.0 (Kyazma, The Netherlands) in barley. The Kosambi mapping function was applied to transform recombination frequencies into map distances given in centiMorgans (cM).

Phylogenetic analysis

Phylogenetic analyses were conducted using MEGA version 2.1 (Kumar et al. 2001). A pair-wise Kimura (1980) distance matrix among the haplotype sequences of GAMYB was calculated and represented by a neighbour-joining tree. The reliability and goodness of fit of dendrograms obtained from sequences were tested through resampling (1,000 bootstrap samples).

Results

Gene structure

GAMYB genes were amplified from 155 barley and 42 wheat lines from two germplasm collections that are representative of the diversity of these two species. The alignment of the different barley GAMYB sequences gave a total gene length of 3,332 bp including a gap of 25 bp, although sequences for exon1 and exon4 were incomplete (Fig. 1). For wheat, TaGAMYB-A, -B and -D correspond to A, B, and D homoeologous copies of GAMYB, respectively. The lengths of the consensus sequences obtained for TaGAMYB-A, -B and -D were 3,380, 3,395 and 3,368 bp, respectively. These three consensus sequences contained the entire coding sequence.

Fig. 1
figure 1

Structure and fragments of the GAMYB gene that were sequenced in Hordeum vulgare and from each genome of Triticum aestivum. Numbers indicate nucleotide position relative to the start codon (ATG) which is 1; positions left to the initiation site are counted as negative starting with –1. The asterisk shows the stop codon (alignment position 2722–2724 bp). The start and end points of the four sequences in the alignment are shown. The white boxes indicate the functional MYB domain

The structure of TaGAMYB and HvGAMYB was identical containing four exons with the coding sequence spanning from exon2 to exon4 (Fig. 1). However, the length of exons and introns differed slightly between HvGAMYB and TaGAMYB-B (Fig. S1). The alignment of sequences corresponding to the different haplotypes is provided as supplementary material (Fig. S1).

Nucleotide diversity

Within the 155 barley accessions studied, we detected 45 polymorphisms (Table S3a) including 13 indels and an average polymorphism density of one polymorphic site per 74 bp. Two (position 1546, position 2017) of the seven single nucleotide polymorphisms (SNPs) in the coding sequence caused a change in the amino acid composition. Most of the remaining polymorphisms (N = 38) were in the noncoding sequence.

In the 42 wheat lines, we detected 0, 23, and 2 polymorphisms in GAMYB-A, -B and -D, respectively. All sequence information obtained for wheat is available at http://urgi.infobiogen.fr/Gnp. In the B genome, we found 19 polymorphisms and 4 indels (Table S3b) and observed on average one polymorphism per 178 bp. As for barley, most of the polymorphims (17 out of the 23) were present in introns. Only the SNP (position 2468) in the fourth exon leads to an amino-acid change (glutamic acid to glutamine).

In barley, π and θ values were calculated using the sequence information of all 155 accessions and amounted to π = 2.23 × 10−3 and θ = 2.37 × 10−3. These π and θ values were not significantly different. When classified according to geographical origin the nucleotide diversity ranged from π = 1.74 × 10−3 (East Asia) to π = 2.78 × 10−3 (America, Table S4).

In the B genome of wheat, π and θ values were 2.68 × 10−3 and 1.53 × 10−3, respectively. A significantly positive value was found for Tajima’s D test (P < 0.05). When classed by geographical region, the nucleotide diversity values of lines ranged from π = 2.51 × 10−3 (America) to π = 3.74 × 10−3 (West Asia, Table S4). Due to their lack of polymorphism, TaGAMYB-A and -D were not considered in this way. The mean values calculated for π by the bootstrap strategy were 2.36 (±0.49) × 10−3 and 2.68 (±0.57) × 10−3 for HvGAMYB and TaGAMYB-B, respectively. For θ, we obtained values of 2.51 (±0.37) × 10−3 and 1.52 (±0.30) × 10−3 for barley and the wheat B genome, respectively. The nucleotide diversity of TaGAMYB-B estimated by the π value was significantly higher than that of HvGAMYB (P < 0.0001). By contrast, the θ value of TaGAMYB-B was significantly lower than that of HvGAMYB (< 0.0001).

The distribution of GAMYB polymorphisms was similar in both collections (Fig. 2). In barley and wheat, most of the polymorphisms detected were in the first two introns, while the third intron was totally conserved. Exons 1 and 2 were conserved in wheat; but showed one polymorphism in barley. The third exon contained no polymorphisms. In both species, we observed a high conservation of the functional domain containing no polymorphism in wheat and one synonymous SNP in barley.

Fig. 2
figure 2

Nucleotide diversity values Pi (π, black line) and Theta (θ, grey line) in GAMYB using the sliding window approach (step size, 25 bp; window size, 100 bp) in H. vulgare (a) and T. aestivum B-genome (b). The structure of GAMYB is given between Fig. a and b; the open boxes show the position of the functional MYB domain

Haplotype analysis

Within the barley germplasm set 18 haplotypes were classified (Table S3a; Fig. 3) resulting in 4 amino acid sequences. One dominant haplotype (haplotype 1) occured at a frequency of 47.7%. Haplotypes 2–11 were less frequent but each represented more than 1% of lines. Haplotypes 12–18 were singletons. Haplotype 11 was only represented by American accessions. Almost 50% of the European accessions were haplotype 1. The diversity within the barley set was a result of diverse alleles being evenly distributed across all geographical origins. Despite the lower number of East Asian accessions (N = 45 = 29.0%), these were classified in 8 of the 18 haplotypes and three haplotypes (10, 12, and 17) were specific to this region. The haplotype diversity based on geographical origins ranged from \( \hat h = 0.{\text{622}} \) (East Asia) to \( \hat h = 0.{\text{837}} \) (West Asia, Table S4). Overall, we observed a high haplotype diversity within the HvGAMYB gene with \( \hat h = 0.{\text{735}} \pm 0.0{\text{33}}. \)

Fig. 3
figure 3

Haplotype frequencies of the GAMYB gene in barley (a) and wheat B-genome (b). Pie charts show the composition of each germplasm with numbers (N) of accessions per geographical origin. The origins are symbolised by different patterns as shown in the key (AM America, EA East Asia, EU Europe, OC Oceania, WA West Asia only, WANA West Asia and North Africa)

For TaGAMYB, one, seven and three haplotypes were identified for the A, B and D genomes, respectively. For the B genome, the seven haplotypes resulted in two different amino acid sequences. The haplotype diversity for this gene was high, \( \hat h = 0.{\text{746}} \pm 0.00{\text{3}}, \) which was almost identical to that observed in barley. The most frequent haplotype (haplotype 1, about 48%) included lines from all the origins studied (Fig. 3). Haplotypes 2 and 4 were each represented by one accession (Récital from France and A4 from Afghanistan). In this set of lines, the overall diversity was mainly due to the diversity present in European and Asian lines as reflected by the high haplotype diversity values for Europe (0.770 ± 0.043) and West Asia (1.000 ± 0.272, Table S4). For the D genome, we found three haplotypes (\( \hat h = 0.{\text{148}} \pm 0.00{\text{7}} \)) that did not result in variation at the protein level.

Linkage disequilibrium (LD)

In barley, LD was generally significant along the sequenced part of the gene (on average R² = 0.3). Eighteen pairs of sites (among 120) revealed a high level of R² (>0.6) and 13 site pairs were in complete LD (Fig. 4). The remaining significant pairwise comparisons yielded moderate LD values. The plot of R 2-values as a function of the pairwise distance between polymorphic sites revealed a steep decay of LD within the GAMYB locus at a distance of about 2.5 kb (Fig. 5).

Fig. 4
figure 4

Linkage disequilibrium matrix for SNPs within GAMYB in H. vulgare (top) and T. aestivum B-genome (bottom). SNPs with a minor allele frequency below 5% were not included. Polymorphic sites are plotted on the x-axis and y-axis. Pairwise comparison values of linkage disequilibrium are displayed as R² and the corresponding P values for Fisher’s exact test are given. The structure of the gene is shown between the two heat plots. Black lines indicate the positions of SNPs in the gene; open boxes indicate the functional MYB domain

Fig. 5
figure 5

Linkage disequilibrium structure in HvGAMYB (filled circles) and TaGAMYB-B (crosses). The plot shows the pair-wise LD measurement R² related to physical distance (bp). The data consist of 23 polymorphic sites in the germplasm set of 155 barley accessions and 21 polymorphic sites in 42 wheat accessions. Only sites with a minor allele frequency > 0.05 are considered. Non-significant R²-values are indicated by open symbols

In the B genome of wheat (Fig. 4), we observed an absolute LD between all SNPs located in the first intron. Additionally, we detected a significant LD between pairs of polymorphic sites in the first intron and the third exon. A strong LD was found between the two polymorphic sites in the second intron and the site in the 3′UTR region, but no significant LD was found between these and other sites. No significant LD was detected between the site located in the coding part of the fourth exon and other sites. This result suggested that LD is gradually lost along TaGAMYB-B (Fig. 5). Unlike in barley, the pattern of LD in wheat allowed the identification of a set of SNPs, which can be used to distinguish between different haplotypes (except for haplotypes 2 and 4). This set should include at least four SNPs: the first SNP being chosen from those at positions −389, −317, −293, −261, −260, −200, −166, −156, −65, −52 or 1515; the second SNP being at positions 863 or 1000; the third at position 1749; and the last SNP at position 2079 or 2037.

Genetic mapping of the GAMYB genes

Using a CAPS marker designed for the polymorphic site at position 2542 HvGAMYB was mapped on the long arm of chromosome 3H in the Steptoe × Morex mapping population, where it co-segregated with the marker ABG453 at position 100.6 cM (Stein et al. 2007) (Fig. 6).

Fig. 6
figure 6

Genetic map of GAMYB on chromosome 3H of H. vulgare (left and centre) and T. aestivum (B genome, right). Lines between maps connect syntenic marker positions

In wheat, the A, B and D homoeologs were assigned to the long arm of group 3 chromosomes (data not shown) to the bins 3AL3-0.42–0.78, 3BL7-0.63–1.00 and 3DL3-0.81–1.00. Our results suggested that these genes are likely to be located in the sub-telomeric region of the group 3 chromosomes. In the ITMI population, using the SNP at position 2037 TaGAMYB-B was mapped on the 3BL between the markers bcd1418 and fbb378a (Fig. 6).

Comparison between HvGAMYB and homoeologs of TaGAMYB

The sequences of HvGAMYB and TaGAMYB homoeologs were compared phylogenetically. A neighbour-joining tree revealed two distinct haplotype groups (Fig. S2). In the first group, all sequences from HvGAMYB were clustered together while the second group contained the TaGAMYB sequences. Within this second group, there were three subgroups corresponding to the homoeologous groups of wheat. Pairwise comparisons between the coding sequences of each haplotype yielded similar results. Polymorphism between the individual TaGAMYB homoeologs was lower than between HvGAMYB and TaGAMYB alleles (Table 1). Despite a few differences in the coding sequence, the nucleotide identity between the consensus for HvGAMYB and TaGAMYB was high (about 95%, Table 1). In wheat, the A and D alleles were most similar.

Table 1 Range of sequence identity (%) from pairwise comparisons

Discussion

In the present study, a detailed comparison of GAMYB diversity in germplasm sets of wheat and barley is reported. The close relationship between wheat and barley is known from comparative genomics (Gale and Devos 1998, Feuillet and Keller 1999; Sandhu and Gill 2002) where differences in gene order and relative recombination in cereals were least between wheat and barley.

The germplasm analyzed in this study includes accessions from different geographical origins and represents a wide range of diversity known for both species. Nevertheless GAMYB genes showed a high level of conservation at the nucleotide level both within and between the two species. As expected, the level of conservation is higher in the exons than in the noncoding regions. We found that the nucleotide sequence of the functional MYB domain was almost completely conserved. Translation of the coding sequences revealed few amino acid differences within GAMYB from barley and the three wheat genomes. In agreement with Gubler et al. (1995) who reported that the N-terminus of GAMYB is highly conserved, most of the amino-acid polymorphisms in wheat and barley and between the species are at the C-terminus of the protein. However, the amino acid sequences of the two homologous tandem repeats R2 (42–94aa) and R3 (95–145aa), which constitute the functional MYB domain, are identical in both species indicating conservation due to function. This corresponds to findings in Arabidopsis, where the functional domains of GAMYB-like proteins where shown to have been conserved during natural and artificial selection (Gocal et al. 2001).

Despite the similar distribution of GAMYB polymorphisms in both collections, none of the polymorphisms identified in barley and wheat was located at the same position. This indicates that all the polymorphisms may have occurred after the ancestors of H. vulgare and T. aestivum diverged (10–14 million years, Wolfe et al. 1989). On average, we observed 1 polymorphic site per 74 bp for barley and 1 SNP per 441 bp for wheat when all the wheat genomes are considered together. In general, SNP frequency in different sets of barley germplasm and across different loci varies from 1 SNP every 27–78 bp (Bundock and Henry 2004; Russell et al. 2004) to 1 SNP every 200–240 bp (Rostoks et al. 2005). Hence the level of sequence polymorphism found here fits well with previous reports based on a large number of lines including land races and wild barley (Bundock and Henry 2004; Russell et al. 2004). Results of the haplotype analysis confirm that the set of barley lines used here represents a wide range of genetic diversity of cultivated spring barley.

In wheat, the polymorphism density of GAMYB was lower than the value reported by Ravel et al. (2006), that is, 1 SNP per 212 bp for a set of 26 lines with different geographical origins, although the diversity of the B homoeolog is similar. In wheat, diversity within GAMYB was mainly caused by European and Asian accessions. European accessions were found in 5 of the 7 haplotypes. The two remaining haplotypes came from Asia. This agrees with reports on current worldwide wheat diversity showing the prevalent influence of lines of European and of Asian origin on diversity (Balfourier et al. 2007). The most frequent wheat haplotype comprised lines from all origins represented in the present collection. We observed that TaGAMYB-A was totally conserved and TaGAMYB-B had more sequence variation than its D homoeologs. This agrees with previous reports that indicate that the B genome is more polymorphic than the A and D genomes (e.g. Nelson et al. 1995; Ravel et al. 2006). However, despite the different levels of polymorphism observed between the three homoeologous genes of wheat, expression data from EST databases showed that the GAMYBs from all three genomes are expressed.

The Tajima’s D test revealed no departure from the neutral equilibrium model for HvGAMYB. This indicates that DNA sequences of GAMYB in spring barley evolved randomly and were not under directional selection. In the wheat collection, a significantly positive Tajima D value for TaGAMYB was observed. This may reflect directional selection on TaGAMYB that was followed by a decrease of rare alleles and an excess of common alleles, respectively. This is supported by the reduced diversity and the higher frequency of minor alleles in wheat compared to HvGAMYB. Unlike barley, where only the total amount raw protein content is of importance, because of the requirements for malting quality (low protein levels) and feed (high protein level), a balanced composition of storage proteins is very important with respect to baking quality. To what extent the protein composition in wheat is dependent on the orchestrated expression of the three TaGAMYB homoeologs and the selective forces acting on the B-genome alleles needs to be further studied. From the present data we hypothesize that GAMYB effects rather the protein composition than the total protein content in the grain.

Recombination and mutation influence LD, which decreases with distance. In selfing species such as wheat and barley, effective recombination is rare and thus more extensive LD is expected than in outcrossing species. In the present study, the level of LD detected within GAMYB was lower than expected both for wheat and barley (10 cM, Kraakman et al. 2004; up to 5.5 cM, Stracke et al. 2007; up to 5 cM, Breseghello and Sorrells 2006). Caldwell et al. (2006) found different patterns of LD for four gene loci of barley and observed that the extent of LD depends strongly on the population under study. Therefore, the level of LD within GAMYB, may reflect the wide genetic basis of the germplasm as well as a high recombination frequency around the gene. Regarding the latter the cosegregating marker ABG453 has been genetically and physically mapped to a chromosomal segment on barley chromosome 3HL showing medium recombination activity (2.1 Mbp/cM). However, a recombination hotspot (0,3 Mbp/cM) is present only about 1 cM distal to the gene (Kunzel et al. 2000). A similar situation is expected for wheat, where a comparable pattern of recombination has been observed (Akhunov et al. 2003).

The structural organization of GAMYB has been conserved in wheat, barley. Conservation of gene structure in cereals has been described previously for other genes (Bennetzen and Ma 2003), e.g. the Waxy genes, the genes coding for the Opaque2 sub-family of bZIP factors (Vicente-Carbajosa et al. 1998), and the genes encoding class II starch synthase (Li et al. 2003).

The rice homolog OsGAMYB (Gubler et al. 1997) was mapped on rice chromosome 1 (Kaneko et al. 2002), which across its entire length shows a collinear marker order with barley 3H (Stein et al. 2007). Similarly, wheat group 3 chromosomes share the highest homology with rice chromosome 1 (Devos and Gale 1993) despite some rearrangements (Munkvold et al. 2004).

TaGAMYB colocalized with quantitative trait loci (QTL) influencing pre-harvest sprouting and nitrogen tolerance (Groos et al. 2002; Laperche et al. 2007). This suggests that TaGAMYB-B may be a candidate gene affecting these traits. In barley, QTLs for grain protein content and heading date near the HvGAMYB candidate gene have been reported (Hayes et al. 1993). A role of GAMYB in pre-harvest sprouting would be not surprising as GAMYB was previously shown to be expressed in barley during germination (Gubler et al. 1995). Likewise, a role of GAMYB in nitrogen metabolism and therefore grain protein content might be expected as this transcriptional protein is also involved in storage protein synthesis (Diaz et al. 2002).

To conclude, the high level of conservation observed in GAMYB coding sequences in both wheat and barley may reflect the importance of this transcription factor in several developmental mechanisms. Transcription factors often act pleiotropically and are sometimes modulated through protein–protein interactions, as is the case for GAMYB (Diaz et al. 2002). Their molecular structures and hence their primary sequences may thus be subject to strong constraints which can explain why regulatory genes may show less diversity than structural genes. Therefore, the polymorphisms identified in the present study will provide an opportunity to studying the involvement of GAMYB in a variety of economically important phenotypic traits.