Exploring the utility of “next-generation” sequence data on inferring the phylogeny of the South American Valeriana (Valerianaceae)
Graphical abstract
Introduction
The phylogeny of Valerianaceae (Dipsacales) has received a fair amount of attention over the past 10 years, with recent studies recovering strong support among the major lineages within the group (Bell and Donoghue, 2005, Bell et al., 2012, Bell et al., 2015). Molecular phylogenetic studies suggest that following an introduction into South America, the group subsequently radiated and diversified, primarily in high Andean habitats. In addition, previous studies also find support for two South American subclades; one consisting of species in the north (primarily found in páramo and puna high-elevation habitats) and another made up of species in the southern Andes (Bell et al., 2012, Bell et al., 2015). This southern Andean clade is the focus of our study and consists of 40 described species that are distributed across a wide elevational and ecological gradient (Kutschker and Morrone, 2011). They occur on the east and west side of the Andean Cordillera and at low and high elevations, encompassing many different habitat types. The group radiated recently and rapidly (Bell et al., 2012) and many of its species occur in one of the world’s biodiversity hotspots in central Chile (Myers et al., 2000). As such, the Valerianaceae represents a powerful model to study how biogeography, ecology and genetics drive diversification and its implications for conservation. In order to conduct further studies, a well-supported, well-resolved phylogeny is essential. However, recent molecular phylogenetic studies (Bell et al., 2012, Bell et al., 2015) based on traditional molecular markers have had little success in resolving the relationships with any confidence within this subclade.
Over the past decade, sequencing technologies have made significant progress, most recently with high-throughput sequencing (Mardis, 2008, Kircher and Kelso, 2010, Godden et al., 2013). These “next-generation” sequencing (NGS) methods produce large amounts of genomic sequence data quickly and in a more cost effective manner than traditional Sanger sequencing. Phylogeneticists have begun to take advantage of the reduced-representation of genomic approaches, such as restriction-site associated DNA sequencing (RADseq; Baird et al. 2008) and genotyping-by-sequencing (GBS; Elshire et al., 2011), which produce datasets of many short sequences from all over the genome, at restriction enzyme cut-sites (McCormack and Faircloth, 2013, Eaton and Ree, 2013, Jones et al., 2013, Wagner et al., 2013, Escudero et al., 2014, Hipp et al., 2014, Eaton et al., 2017, Hipp et al., 2018, Hauser et al., 2017). These “reduced-representation genome sequencing” methods are particularly useful for phylogenetic studies because they produce many loci that may be phylogenetically informative and used for non-model organisms, or taxa lacking a reference genome. Reduced-representation methods have shown promise for phylogenetic studies, especially among lineages that are <60 million years old (Rubin et al., 2012, Emerson et al., 2010, Cariou et al., 2013, Eaton et al., 2017). There are some drawbacks to these methods however, including short sequence reads (50–100 bp), no distinction between orthologs and paralogs, loci dropout due to sampling error, disruption of restriction sites due to mutation, and the intensive bioinformatics needed to analyze such data. These drawbacks can limit the utility of reduced-representation methods for deeper timescale studies. Despite that, those methods have successfully produced robust phylogenies for several different genera of plants (e.g., Eaton and Ree, 2013, Hipp et al., 2014, Escudero et al., 2014, Cavender-Bares et al., 2015, Boucher et al., 2016, Eaton et al., 2017, Fernandez-Mazuecos et al., 2017, Hauser et al., 2017).
This ability to obtain large numbers of sequences, from multiple individuals per species, has led phylogeneticists to start using multilocus, and especially multispecies coalescent-based tree inference methods (e.g., BEST Liu, 2008; STEM Kubatko et al., 2009; *BEAST Heled and Drummond, 2010; ASTRAL Mirarab et al., 2014; SVDquartets Chifman and Kubatko 2014). Using a concatenated approach with multiple genes can result in a well-supported, but incorrect, phylogeny (Kubatko and Degnan, 2007, Edwards et al., 2007). However, multispecies coalescent-based approaches have had success in overcoming these challenges by taking into account gene history variation (Delsuc et al., 2005, Rannala and Yang, 2008, Kumar et al., 2012). This becomes exceedingly important for lineages that have diversified rapidly, as they are more likely to retain ancestral polymorphisms due to the limited time to achieve reciprocal monophyly (Sanders et al., 2013, Eaton and Ree, 2013).
In this study, we examine the phylogenetic utility of GBS data for inferring the phylogeny of the southern Andean Valeriana L. clade. To gather phylogenetic data that spanned the clade, we sampled 14 of the 40 recognized species from this area. We then used the hierarchical Bayesian model implemented in *BEAST (Heled and Drummond, 2010) because it specifically models the discord between gene trees and species tree due to incomplete lineage sorting to infer a species tree for our sample taxa. In addition, we analyzed a concatenated GBS data set with traditional maximum likelihood (ML) methods. Although we included only a subset of the species in this subclade, this work will serve as a starting point to see if these data and methods will help to confidently resolve relationships and determine if further efforts will be valuable for understanding the evolutionary history of Valerianaceae.
Section snippets
Sampling & sequencing
For this study, we originally sampled 31 species of southern Andean valerians, with 48 total samples. We extracted genomic DNA from silica dried plant tissues using CTAB methods (Doyle and Doyle, 1987, Cullings, 1992). We prepared the GBS libraries using the protocol outlined in Elshire et al. (2011). We used the restriction enzyme PstI (CTGCAG) to digest the extracted genomic DNA from each individual, and then ligated the resulting fragments to a barcode adaptor and a common adaptor with the
Sequences
Illumina sequencing returned 283,325,239 total reads made up of 13,339 Mbases. We chose to leave out some of the samples due to low coverage, possibly due to low quality of the extracted DNA, and ended up with 14 species, for a total of 18 samples (Table 1). Raw reads are available through NCBI Sequence Read Archive (BioProject ID: PRJNA295150).
Clustering of consensus sequences with our previously mentioned parameters in pyRAD revealed 8323 unique clusters, or loci, across all samples with 140
Phylogenetic studies and biological implications
Due to the limited taxon sampling for these analyses, direct comparisons to previous results are not always possible. However, resulting phylogenies from this study support several previous hypotheses concerning the evolution of the southern South American valerians. Based on molecular sequence data, Bell et al. (2012) inferred an initial appearance of Valerianaceae in the southern Andes during the Miocene (13.7 million years ago), a time that corresponds to the development of open
Acknowlegements
We thank C. Moreau and B. Rubin (Pritzker DNA Laboratory, Field Museum) for assistance in GBS library preparation. We also thank D. Eaton, R. Ree, and A. Hipp for help and advise running pyRAD and L. Coghill for additional help in assembly programs. Many of the specimens used in this study were kindly provided by S. Liede-Schumann (University of Bayreuth). This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References (57)
- et al.
Phylogeny and biogeography of Valerianaceae (Dipsacales) with special reference to the South American valerians
Org. Div. Evol.
(2005) - et al.
Phylogeny and diversification of Valerianaceae (Dipsacales) in the Southern Andes
Mol. Phylogenet. Evol.
(2012) - et al.
Sequence capture using RAD probes clarifies phylogenetic relationships and species boundaries in Primula sect. Auricula
Mol. Phylogenet. Evol.
(2016) - et al.
Genotyping-by-sequencing as a tool to infer phylogeny and ancestral hybridization: A case study in Carex (Cyperaceae)
Mol. Phylogenet. Evol.
(2014) - et al.
Multilocus phylogeny and recent rapid radiation of the viviparous sea snakes (Elapidae: Hydrophiinae)
Mol. Phylogenet. Evol.
(2013) - et al.
Bayesian estimation of concordance among gene trees
Mol. Biol. Evol.
(2007) - et al.
The Mediterranean environment of Central Chile
- et al.
Rapid SNP discovery and genetic mapping using sequenced RAD markers
PLoS One
(2008) - et al.
Resolving relationships within Valerianaceae (Dipsacales): New insights and hypotheses from low-copy nuclear regions
Syst. Bot.
(2015) - et al.
Is RAD-seq suitable for phylogenetic inference? An in silico assessment and optimization
Ecol. Evol.
(2013)
Phylogeny and biogeography of the American live oaks (Quercus subsection Virentes): A genomic and population genetics approach
Mol. Ecol.
Quartet inference from SNP data under the coalescent model
Bioinformatics
Design and testing of a plant-specific PCR primer for ecological and evolutionary studies
Mol. Ecol.
Phylogenomics and the reconstruction of the tree of life
Nat. Rev. Genet.
A rapid DNA isolation procedure for small quantities of fresh leaf tissue
Phytochem. Bull.
BEAST: Bayesian evolutionary analysis by sampling trees
BMC Evol. Biol.
PyRAD: Assembly of de novo RADseq loci for phylogenetic analyses
Bioinformatics
Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae)
Syst. Biol.
Misconceptions of missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants
Syst. Biol.
Search and clustering orders of magnitude faster than BLAST
Bioinformatics
High-resolution species tree without concatenation
Proc. Natl. Acad. Sci. U.S.A.
A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species
PLoS One
Resolving postglacial phylogeography using high- throughput sequencing
Proc. Natl. Acad. Sci. U.S.A.
Local de novo assembly of RAD paired-end contigs using short sequencing reads
PLoS One
Resolving recent plant radiations: power and robustness of genotyping-by-sequencing
Syst. Biol.
Making next-generation sequencing work for you: approaches and practical considerations for marker development and phylogenetics
Plant Ecol. Div.
Sequence capture versus restriction site associated DNA sequencing for shallow systematics
Syst. Biol.
Cited by (3)
RAD sequencing resolves the phylogeny, taxonomy and biogeography of Trichophoreae despite a recent rapid radiation (Cyperaceae)
2020, Molecular Phylogenetics and EvolutionCitation Excerpt :One method that has become increasingly popular for phylogenetic analysis at shallow time scales is Restriction-site associated DNA sequencing (RADseq, including ddRAD, GBS; Baird et al., 2008; Elshire et al., 2011). Although originally intended for population genetics, RADseq has been used in dozens of fungal, plant, and animal phylogenetic studies because it provides large numbers of informative characters, and can easily be applied to non-model organisms (Massatti et al., 2016; Hauser et al., 2017; Vargas et al., 2017; Bell and Gonzalez, 2018; Curto et al., 2018; Hipp et al., 2018; Lin et al., 2019; Salas-Lizana and Oono, 2018; Spriggs et al., 2019). The method is particularly well suited for studying recent evolutionary history, as it provides data from thousands of nuclear loci at considerably lower cost per sample (<40$/sample) than alternative NGS methods (Andrews et al., 2016).