Introduction

During the last glacial period, European temperate forests were much more restricted than today and occupied mainly the southern European peninsulas, considered as main refugia (Bennett et al. 1991; Tzedakis et al. 2002). Actual spatial distribution of genetic diversity of forest tree species (but also other biota) is largely the result of the interplay between patterns of migration from refugia and the processes of adaptation to variable environmental conditions (Taberlet et al. 1998; Hewitt 1999).

Phylogeography is a powerful tool for inferring potential glacial refugia and post-glacial migration routes using spatial patterns of genetic polymorphism sampled from currently existing populations, as well as palaeodata (Hickerson et al. 2010). Many phylogeographic studies in angiosperms are based on chloroplast DNA polymorphisms for two main reasons. First, in the absence of DNA recombination during sexual reproduction and relatively low mutation rates, therefore, haplotypes remain mostly unchanged when passed to the next generation (King and Ferris 1998; Palmé 2002; Petit et al. 2002a, b). Second, chloroplast genome (cpDNA) is usually maternally inherited in angiosperms; therefore, cpDNA is dispersed through seeds and colonization patterns are not interfered by pollen flow (Rajora and Dancik 1992; Dumolin et al. 1995). However, deviations from strict maternal inheritance of cpDNA appear to be possible (McCauley et al. 2007), which may complicate the inference of phylogeographic structure. Knowledge of the historical distribution, postglacial migration, and expansion of a species is important to predict the potential fate of species in the future, especially under climate change conditions, as well as in conservation and management of its genetic resources (Zinck and Rajora 2016).

The European beech (Fagus sylvatica L.) is one of the most widespread species in deciduous forests in Europe (Leuschner et al. 2006; Packham et al. 2012). European beech contributes to both the high economic and ecological value through maintenance of European forest diversity and also provides diverse ecosystem services. For the first time, Demesure et al. (1996) studied phylogeography of beech using chloroplast DNA. They showed that a single haplotype predominates in a large part of the natural range of a species, but major parts of the range in the Apennines, Balkans, and Asia Minor were not included in that study. Nevertheless, it was proposed that the current distribution of beech in Central Europe derives from the post-glacial recolonization of populations originating mainly from the Balkan refugia (Demesure et al. 1996; Taberlet et al. 1998). In contrast, Magri et al. (2006), combining palaeo and genetic data, suggested that the main refugial areas, from where the recolonization to Central and Western Europe occurred, were located close to the Alps in Slovenia and south-western France and that the populations deriving from the Balkan Peninsula did not migrate to Northern Europe. Finally, both nuclear and chloroplast markers indicated that most of the beech populations of Central, Eastern, and Northern Europe have a very homogeneous genetic structure (Magri et al. 2006). Unfortunately, the available cpDNA markers did not detect differences among populations of F. sylvatica at a finer scale in Central Europe. On the other hand, considerable chloroplast diversity, geographically structured, was found in southern populations in Italy (Vettori et al. 2004) and in Greece (Hatziskakis et al. 2009; Papageorgiou et al. 2014). This raises the question if other cpDNA markers could be developed, which would contribute to a more detailed phylogeography of beech in Central Europe.

The increasing availability and affordability of next-generation DNA sequencing approaches (NGS) provides an impulse for researchers to switch from PCR-RFLP and microsatellite data analyses with relatively small numbers of loci to genome-wide sequencing methods that provide access to hundreds or thousands of loci for a number of individuals in a single run (Steele and Pires 2011). Now, sequencing of plastomes for hundreds of individuals is more easily achieved through whole-genome sequencing methods (Straub et al. 2012; Bock et al. 2014) or through targeted enrichment strategies (Mariac et al. 2014). With an increased number of chloroplast genomes sequenced, different studies showed many polymorphic sites when calling SNPs (Sabir et al. 2014). While the application of NGS methods for whole-genome population-scale studies may still be financially prohibitive, the chloroplast-based SNP markers identified in the representative sample of individuals using NGS methods may be further used to optimize SNP calling procedures targeting these loci in a future research.

In this study, we used two reduced representation libraries sequencing methods, i.e., restriction site-associated DNA sequencing (RAD-seq; Baird et al. 2008) and genotyping-by-sequencing (GBS; Elshire et al. 2011), to identify novel cpDNA polymorphisms in European beech, which would allow the detection of spatial genetic structure at a regional scale, particularly in Central Europe. For this purpose, we sampled 91 individuals from 47 European provenances, representing a wide geographical range of the species, with special attention to Central Europe. To detect novel chloroplast SNP markers, the DNA reads generated by the RAD-seq and GBS methods were subsequently aligned to the reference chloroplast genome of F. sylvatica. The identified SNP polymorphisms were tested for their nonrandom spatial distribution across the sampled area, discussing their usefulness in future phylogeographic studies.

Materials and methods

Plant material and DNA extraction

Samples of F. sylvatica were collected in April 2014 from one or two individuals representing 47 European provenances (91 individuals in total) growing in the common-garden experimental site located in Siemianice Experimental Forest District in south-central Poland (Fig. S1) (Barzdajn and Rzeznik 2002). Flushing leaf buds were sampled and dried at room temperature for 2 days. Desiccated tissue (30 mg) was cut into segments and placed in 2-ml Eppendorf tubes. Plates containing samples and 4-mm stainless steel grinding spheres were frozen and placed at − 20 °C overnight. Tissue samples were homogenized using grinding spheres in a mill (Mixer Mill MM 400, Retsch). DNA extraction was carried by using GeneMATRIXPlant & Fungi DNA Purification Kit (EURx, Poland), according to manufacturer’s protocol. DNA was quantified using the Eppendorf BioPhotometer (Eppendorf, Germany) and Quantus Fluorometer (Promega, Germany).

RAD-seq and GBS libraries preparation and sequencing

Restriction site-associated DNA sequencing (RAD-seq; Baird et al. 2008) and genotype-by-sequencing (GBS; Elshire et al. 2011) approaches were used to obtain single-nucleotide polymorphism (SNP) genetic markers. Extracted genomic DNA was normalized to a concentration of 30 ng/lL in 96-well plates and processed into the RAD-seq library using the restriction enzyme PstI, individually barcoded and sequenced by Floragenex Inc. (Eugene, OR, USA). The GBS library was prepared based on the ApeKI restriction enzyme, individually barcoded and sequenced by Cornell University, Biotechnology Resource Center (Ithaca, NY, USA). Libraries were sequenced in one lane run in 100-bp single-end mode on Illumina HiSeq2000 for both methods according to manufacturer’s instructions.

Quality control, demultiplexing, mapping, and SNP identification

Library quality and per-base quality were controlled with FastQC (Andrews 2010), and reads were checked for adaptor content, quality, quantity, and integrity of restriction enzyme cut sites. Raw sequence reads were demultiplexed based on the 10-bp barcodes in RAD-seq and 4–8 bp barcodes in GBS and were sorted individually using Stacks ver. 1.35 (Catchen et al. 2011). Only the reads with sufficiently high sequencing quality (Phred score ≥ 20) and with exact barcode match were retained for further analyses. The reads lengths were trimmed to 90 bp for the RAD-seq and to 64 bp for the GBS method to prevent false positives due to the decrease in base quality and adapter content towards the end of reads. Demultiplexed reads were subsequently aligned against the reference chloroplast genome of F. sylvatica (Vendramin et al., unpublished data) using BWA v.0.7.10 (Li 2013) with default parameters. Duplicate reads were removed using Picard (http://broadinstitute.github.io/picard/) and SAM files were converted to BAM files, sorted and indexed using SAMtools v.0.1.19 (Li et al. 2009). To identify SNP polymorphisms, we applied GATK v.3.3 (McKenna et al. 2010). Base quality score recalibration, indel realignment, SNP, and indel discovery and genotyping of all 91 samples were performed based on standard hard filtering parameters according to GATK Best Practices recommendations (DePristo et al. 2011; Van der Auwera et al. 2013). Only SNP polymorphisms with minimum depth of 10 reads, minor allele frequency (MAF) > 10%, and missing data per SNP < 25% were retained for further analyses.

Data analysis

To quantify the variation at chloroplast SNP loci, effective number of alleles and genetic diversity were calculated using FSTAT 2.9.3.2 software (Goudet 1995). The cpDNA was annotated using DOGMA (Dual Organellar GenoMe Annotator; Wyman et al. 2004). Haplotypes were determined as a combination of polymorphic SNP variants across the cpDNA loci. Haplotype Analysis ver. 1.05 software (Eliades and Eliades 2009) was used to estimate the following haplotype genetic diversity parameters: number of haplotypes (A), effective number of haplotypes (Ne), haplotypic richness (Rh; El Mousadik and Petit 1996), haplotypic diversity (H), and mean genetic distance between individuals (D2sh; Goldstein et al. 1995). Phylogenetic relationships among the established SNP haplotypes were depicted by a median-joining network analysis (Bandelt et al. 1999) within NETWORK ver 5.0.0.3 (available at http://www.fluxus-engineering.com/). Correlation between geographical and genetic distances between samples was tested based on Mantel tests (Mantel 1967) and Mantel’s correlograms, using the program PASSaGE ver.2 (Rosenberg and Anderson 2011). The significance of the Mantel test was evaluated based on 10,000 permutations.

Results

Genetic diversity

Mean number of sequence reads per individual passing QC filters was equal to 1.58 million for the RAD-seq method and 1.94 million for the GBS approach. Nearly 3.5% of the sequence reads were mapped to the cpDNA for RAD-seq and 1.32% for GBS. In order to improve the SNP discovery accuracy, we used only the uniquely mapped and non-redundant reads for downstream analyses. For the RAD-seq approach, 13 unique reads were mapped to chloroplast genome, with 6 SNPs detected. However, two pairs of SNPs appeared to be redundant and located within the same sequence reads; therefore, only a single SNP per read was selected for further analyses. After filtering, 94 sequence reads resulting from the GBS method were identified that mapped to cpDNA; however, among them, only 4 SNPs were identified. DNA sequences flanking the identified polymorphic sites are presented in Table S1. The mean coverage of the identified cpDNA SNPs (number of mapping sequence reads per individual) was 48 for the RAD-seq and 93.7 for the GBS data. We emphasize that sequence reads mapping to cpDNA still represents only a small fraction of the F. sylvatica chloroplast genome (size of 161,015 bp), as the sequence reads revealed 1.17 kb (0.73%) and 6.02 kb (3.74%)of the total cpDNA for RAD-seq and GBS, respectively. A total of 7 SNPs were identified in the regions of the four genes: psbA, psbD, rbcL, and atpA, and also 1 SNP was found in the intergenic region. The average genetic diversity across all 8 SNPs was 0.263 (Table 1).

Table 1 Genetic diversity of cpSNP loci in Fagus sylvatica. Ae effective number of alleles, D genetic diversity

Haplotypic diversity

The number of haplotypes identified was higher for the RAD-seq (7 haplotypes) than the GBS method (5 haplotypes). As a consequence, effective number of haplotypes, haplotype diversity, and mean genetic distance among individuals were all higher for the RAD-seq than that for the GBS approach (Table 2). However, the combination of alleles at all 8 polymorphic loci provided a set of 25 unique haplotypes. The haplotype diversity (H = 0.93) and haplotypic richness (Rh = 24) were high. The mean genetic distance between individuals was D2sh = 3.35 (Table 2). We constructed the median-joining networks among haplotypes defined based on RAD-seq, GBS, and a combination of both types of methods (Fig. S2). It appeared that the most frequent haplotypes were always located in the central parts of the network topologies indicating their ancestral character (Fig. S2).

Table 2 Measures of cpDNA haplotypic diversity for Fagus sylvatica based on SNP polymorphisms detected with the aid of restriction-site associated DNA sequencing (RAD-seq) and genotyping-by-sequencing (GBS) next-generation sequencing methods

Geographic distribution of chloroplast haplotypes

The geographic distribution of identified haplotypes suggests the presence of spatial structure of F. sylvatica in Central Europe (Fig. 1). Following the RAD-seq method, the most frequent haplotype H1 occurred widely across Europe, but was not present in Northern Europe (Fig. 1a). Haplotype H2 was present on the area from Denmark to south-eastern part of Poland; H3 appeared only in Central Europe; H4 and H5 occurred in several provenances originating from Poland and Germany, although H5 was also found in Slovakia. Haplotypes H6 and H7 were found mostly in Southern Europe.

Fig. 1
figure 1figure 1

Geographic distribution and frequency of chloroplast haplotypes in Fagus sylvatica based on RAD-seq (a), GBS (b), and combination of both types of markers (c)

In the case of the GBS method, the haplotype H1 appeared to be dominant and widespread over large parts of Europe; haplotype H2 was less frequent but it occurred on a large area from France, throughout Central Europe to Romania; haplotype H3 was particularly frequent near the border between Poland and Germany; finally, haplotypes H4 and H5 occurred mostly in Southeastern Europe, ranging from Slovakia to Romania (Fig. 1b).

Considering the most frequent haplotypes (n > 5) identified based on the combination of alleles obtained from both RRL methods (RAD-seq, GBS), their spatial distribution across Europe was also not random (Fig. 1c). Two of the 24 haplotypes (H1, H2) were found in more than 28% of the sampled individuals: H1 occurred in Northern and Southeastern Europe, while H2 was widespread over Central Europe from France throughout Germany to western Poland. Haplotypes H4 and H5 appeared in several provenances from Poland and Germany; H6 was found only in Poland and Slovakia; H7 occurred in Southern Europe and H8 was found only in Southwestern Europe. Rare haplotypes were mainly found in Southern Europe.

Spatial genetic structure

In general, correlations between genetic and geographic distances among sampled individuals appeared to be significant for the RAD-seq data (r = 0.083, p = 0.01), while non-significant for GBS (r = − 0.042) and combined datasets (r = 0.053), as revealed by Mantel test applied to all samples. However, when calculating Mantel r statistics for pairs of individuals within consecutive distance classes (often referred to as Mantel correlogram), significant positive correlations were found in the first three distance classes for RAD-seq and the combined dataset, and the first two distance classes for the GBS data (Table 3). Notably, the estimates of Mantel r statistics were higher for RAD-seq and the datasets generated based on both approaches jointly, than that for the GBS data alone, indicating stronger spatial genetic structure detected based on RAD-seq and combined datasets. In the latter case, the spatial autocorrelation appeared to be significant up to about 300 km.

Table 3 Summary of Mantel test statistics calculated within consecutive distance classes for SNP polymorphisms obtained based on RADseq, GBS, and both methods applied to sampled Fagus sylvatica individuals

Discussion

In recent years, cpDNA-based markers were efficiently used for phylogeographic studies in angiosperm forest tree species contributing to our understanding of their actual distribution and migratory routes since last glaciation (Demesure et al. 1996; Mousadik and Petit 1996; King and Ferris 1998; Dutech et al. 2000; Fineschi et al. 2000; Mohanty et al. 2001; Petit et al. 2002b; Heuertz et al. 2004; Bagnoli et al. 2015). However, inferences on phylogeography of Fagus sylvatica in Central Europe appeared to be difficult, mainly due to the lack of available polymorphic markers. In fact, in the previous study, Demesure et al. (1996) found only one PCR-RFLP chloroplast haplotype spanning across Central Europe, and similarly, based on cpDNA, Magri et al. (2006) identified only two PCR-RFLP haplotypes and just one microsatellite haplotype predominant in the same geographical areas.

In this paper, we found eight novel SNPs in the F. sylvatica chloroplast genome based on NGS sequencing of reduced representation genome libraries (RAD-seq and GBS methods), demonstrating that this approach can be an efficient way to identify novel SNPs in a chloroplast genome, once the reference cpDNA sequence is available. These novel markers provided considerable polymorphic information and indicated presence of spatial genetic structure across the central range of the species distribution, which was not possible with the already available markers (Demesure et al. 1996; Magri et al. 2006). Since Magri (2008) suggested that the recent postglacial expansion of F. sylvatica resulted from many small refugia, the availability of markers depicting geographic structure at a regional scale in Central Europe can be useful in defining routes and/or mechanisms of range expansion of this species. While the small sample size of this study prevents us from a detailed analyses and interpretation of phylogeographic structure, the identified haplotypes were not randomly distributed within the study area showing apparent spatial aggregation of haplotypes, on average up to about 250 km. Nevertheless, it might be expected that increasing the number of samples should provide a more detailed information about the routes and mechanisms of range expansion of F. sylvatica after the last glacial maximum towards central and northern parts of Europe.

For phylogeographic studies, SNPs may have some advantages over microsatellites or PCR-RFLP markers. First, SNPs are more densely and evenly distributed across the genome (Xing et al. 2005) and it has been shown that they display lower rates of genotyping errors (Montgomery et al. 2005). Moreover, SNPs are also less susceptible to homoplasy than microsatellites (Morin et al. 2009), which is important since homoplasy may affect the estimates of genetic distances (between individuals and populations), neighborhood size, and phylogenetic inference (Estoup et al. 2002). In particular, SNPs have an advantage over other types of genetic markers for the purposes of characterizing population divergence over long time scales.

We expect that the novel polymorphisms identified in this study will be useful for phylogeography studies in F. sylvatica. Although we provided DNA sequences flanking the identified SNPs (Table S1), these sequences are located within fairly conservative regions (7 SNPs located within genes) and can be used to design respective protocols for SNP genotyping. However, SNP genotyping by sequencing which was employed in this study is also possible. Other types of RRLs (such as ddRAD; Peterson et al. 2012) can be additional way for detecting further novel SNPs in the chloroplast genome. Ultimately, we anticipate that the resequencing of the full cpDNA genome of F. sylvatica at a larger number of individuals will provide more SNP data allowing for additional analyses of phylogeography of F. sylvatica in Europe based on chloroplast genome.