Exploring the utility of “next-generation” sequence data on inferring the phylogeny of the South American Valeriana (Valerianaceae)

doi:10.1016/j.ympev.2018.02.014

Molecular Phylogenetics and Evolution

Volume 123, June 2018, Pages 44-49

https://doi.org/10.1016/j.ympev.2018.02.014 Get rights and content

Highlights

•
We assessed the utility of GBS for resolving a radiation of a South American clade of Valeriana.
•
We recovered over 3000 unique loci, with 140 loci being shared by all taxa sampled.
•
Different phylogenetic methods inferred similar topologies with varying support.
•
The supermatrix approach recovered the most well-resolved and well-supported phylogeny.

Abstract

This study aimed to investigate the phylogenetic utility of genotyping-by-sequencing (GBS) data in the southern South American subclade of Valerianaceae (Dipsacales). The variety of forms that has arisen in this clade, presumably over the past 5–10 million years, has all the signatures of an adaptive and rapid radiation. While the phylogeny of Valerianaceae has received a great deal of attention in the last decade, species relationships have been hard to resolve using traditional phylogenetic markers. Here, we collected high-throughput genomic sequence data from reduced-representation libraries obtained through GBS protocols. Putative orthologs were identified using within- and among-sample clustering using the computer software pyRAD. We recovered over 3000 loci for 14 species of southern South American Valeriana, with 140 loci present across all samples. We analyzed a set of phylogenetic trees generated from each locus using maximum likelihood methods, as well as multispecies coalescent (∗BEAST) methods. For comparative purposes, we also used a supermatrix approach to infer the phylogeny for these taxa. Across different methods and data sets, we recovered consistent relationships for the southern South American valerians that we sampled with varying degrees of support.

Graphical abstract

Introduction

The phylogeny of Valerianaceae (Dipsacales) has received a fair amount of attention over the past 10 years, with recent studies recovering strong support among the major lineages within the group (Bell and Donoghue, 2005, Bell et al., 2012, Bell et al., 2015). Molecular phylogenetic studies suggest that following an introduction into South America, the group subsequently radiated and diversified, primarily in high Andean habitats. In addition, previous studies also find support for two South American subclades; one consisting of species in the north (primarily found in páramo and puna high-elevation habitats) and another made up of species in the southern Andes (Bell et al., 2012, Bell et al., 2015). This southern Andean clade is the focus of our study and consists of 40 described species that are distributed across a wide elevational and ecological gradient (Kutschker and Morrone, 2011). They occur on the east and west side of the Andean Cordillera and at low and high elevations, encompassing many different habitat types. The group radiated recently and rapidly (Bell et al., 2012) and many of its species occur in one of the world’s biodiversity hotspots in central Chile (Myers et al., 2000). As such, the Valerianaceae represents a powerful model to study how biogeography, ecology and genetics drive diversification and its implications for conservation. In order to conduct further studies, a well-supported, well-resolved phylogeny is essential. However, recent molecular phylogenetic studies (Bell et al., 2012, Bell et al., 2015) based on traditional molecular markers have had little success in resolving the relationships with any confidence within this subclade.

Over the past decade, sequencing technologies have made significant progress, most recently with high-throughput sequencing (Mardis, 2008, Kircher and Kelso, 2010, Godden et al., 2013). These “next-generation” sequencing (NGS) methods produce large amounts of genomic sequence data quickly and in a more cost effective manner than traditional Sanger sequencing. Phylogeneticists have begun to take advantage of the reduced-representation of genomic approaches, such as restriction-site associated DNA sequencing (RADseq; Baird et al. 2008) and genotyping-by-sequencing (GBS; Elshire et al., 2011), which produce datasets of many short sequences from all over the genome, at restriction enzyme cut-sites (McCormack and Faircloth, 2013, Eaton and Ree, 2013, Jones et al., 2013, Wagner et al., 2013, Escudero et al., 2014, Hipp et al., 2014, Eaton et al., 2017, Hipp et al., 2018, Hauser et al., 2017). These “reduced-representation genome sequencing” methods are particularly useful for phylogenetic studies because they produce many loci that may be phylogenetically informative and used for non-model organisms, or taxa lacking a reference genome. Reduced-representation methods have shown promise for phylogenetic studies, especially among lineages that are <60 million years old (Rubin et al., 2012, Emerson et al., 2010, Cariou et al., 2013, Eaton et al., 2017). There are some drawbacks to these methods however, including short sequence reads (50–100 bp), no distinction between orthologs and paralogs, loci dropout due to sampling error, disruption of restriction sites due to mutation, and the intensive bioinformatics needed to analyze such data. These drawbacks can limit the utility of reduced-representation methods for deeper timescale studies. Despite that, those methods have successfully produced robust phylogenies for several different genera of plants (e.g., Eaton and Ree, 2013, Hipp et al., 2014, Escudero et al., 2014, Cavender-Bares et al., 2015, Boucher et al., 2016, Eaton et al., 2017, Fernandez-Mazuecos et al., 2017, Hauser et al., 2017).

This ability to obtain large numbers of sequences, from multiple individuals per species, has led phylogeneticists to start using multilocus, and especially multispecies coalescent-based tree inference methods (e.g., BEST Liu, 2008; STEM Kubatko et al., 2009; *BEAST Heled and Drummond, 2010; ASTRAL Mirarab et al., 2014; SVDquartets Chifman and Kubatko 2014). Using a concatenated approach with multiple genes can result in a well-supported, but incorrect, phylogeny (Kubatko and Degnan, 2007, Edwards et al., 2007). However, multispecies coalescent-based approaches have had success in overcoming these challenges by taking into account gene history variation (Delsuc et al., 2005, Rannala and Yang, 2008, Kumar et al., 2012). This becomes exceedingly important for lineages that have diversified rapidly, as they are more likely to retain ancestral polymorphisms due to the limited time to achieve reciprocal monophyly (Sanders et al., 2013, Eaton and Ree, 2013).

In this study, we examine the phylogenetic utility of GBS data for inferring the phylogeny of the southern Andean Valeriana L. clade. To gather phylogenetic data that spanned the clade, we sampled 14 of the 40 recognized species from this area. We then used the hierarchical Bayesian model implemented in *BEAST (Heled and Drummond, 2010) because it specifically models the discord between gene trees and species tree due to incomplete lineage sorting to infer a species tree for our sample taxa. In addition, we analyzed a concatenated GBS data set with traditional maximum likelihood (ML) methods. Although we included only a subset of the species in this subclade, this work will serve as a starting point to see if these data and methods will help to confidently resolve relationships and determine if further efforts will be valuable for understanding the evolutionary history of Valerianaceae.

Section snippets

Sampling & sequencing

For this study, we originally sampled 31 species of southern Andean valerians, with 48 total samples. We extracted genomic DNA from silica dried plant tissues using CTAB methods (Doyle and Doyle, 1987, Cullings, 1992). We prepared the GBS libraries using the protocol outlined in Elshire et al. (2011). We used the restriction enzyme PstI (CTGCAG) to digest the extracted genomic DNA from each individual, and then ligated the resulting fragments to a barcode adaptor and a common adaptor with the

Sequences

Illumina sequencing returned 283,325,239 total reads made up of 13,339 Mbases. We chose to leave out some of the samples due to low coverage, possibly due to low quality of the extracted DNA, and ended up with 14 species, for a total of 18 samples (Table 1). Raw reads are available through NCBI Sequence Read Archive (BioProject ID: PRJNA295150).

Clustering of consensus sequences with our previously mentioned parameters in pyRAD revealed 8323 unique clusters, or loci, across all samples with 140

Phylogenetic studies and biological implications

Due to the limited taxon sampling for these analyses, direct comparisons to previous results are not always possible. However, resulting phylogenies from this study support several previous hypotheses concerning the evolution of the southern South American valerians. Based on molecular sequence data, Bell et al. (2012) inferred an initial appearance of Valerianaceae in the southern Andes during the Miocene (13.7 million years ago), a time that corresponds to the development of open

Acknowlegements

We thank C. Moreau and B. Rubin (Pritzker DNA Laboratory, Field Museum) for assistance in GBS library preparation. We also thank D. Eaton, R. Ree, and A. Hipp for help and advise running pyRAD and L. Coghill for additional help in assembly programs. Many of the specimens used in this study were kindly provided by S. Liede-Schumann (University of Bayreuth). This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References (57)

C.D. Bell et al.
Phylogeny and biogeography of Valerianaceae (Dipsacales) with special reference to the South American valerians
Org. Div. Evol.
(2005)
C.D. Bell et al.
Phylogeny and diversification of Valerianaceae (Dipsacales) in the Southern Andes
Mol. Phylogenet. Evol.
(2012)
F.C. Boucher et al.
Sequence capture using RAD probes clarifies phylogenetic relationships and species boundaries in Primula sect. Auricula
Mol. Phylogenet. Evol.
(2016)
M. Escudero et al.
Genotyping-by-sequencing as a tool to infer phylogeny and ancestral hybridization: A case study in Carex (Cyperaceae)
Mol. Phylogenet. Evol.
(2014)
K.L. Sanders et al.
Multilocus phylogeny and recent rapid radiation of the viviparous sea snakes (Elapidae: Hydrophiinae)
Mol. Phylogenet. Evol.
(2013)
C. Ané et al.
Bayesian estimation of concordance among gene trees
Mol. Biol. Evol.
(2007)
J.J. Armesto et al.
The Mediterranean environment of Central Chile
N.A. Baird et al.
Rapid SNP discovery and genetic mapping using sequenced RAD markers
PLoS One
(2008)
C.D. Bell et al.
Resolving relationships within Valerianaceae (Dipsacales): New insights and hypotheses from low-copy nuclear regions
Syst. Bot.
(2015)
M. Cariou et al.
Is RAD-seq suitable for phylogenetic inference? An in silico assessment and optimization
Ecol. Evol.
(2013)

Catchen, J.M., Amores, A., Hohenlohe, P., Cresko, W., Postlethwait, J.H., 2011. Stacks: building and genotyping loci de...

J. Cavender-Bares et al.

Phylogeny and biogeography of the American live oaks (Quercus subsection Virentes): A genomic and population genetics approach

Mol. Ecol.

(2015)

J. Chifman et al.

Quartet inference from SNP data under the coalescent model

Bioinformatics

(2014)

K.W. Cullings

Design and testing of a plant-specific PCR primer for ecological and evolutionary studies

Mol. Ecol.

(1992)

F. Delsuc et al.

Phylogenomics and the reconstruction of the tree of life

Nat. Rev. Genet.

(2005)

J.J. Doyle et al.

A rapid DNA isolation procedure for small quantities of fresh leaf tissue

Phytochem. Bull.

(1987)

A. Drummond et al.

BEAST: Bayesian evolutionary analysis by sampling trees

BMC Evol. Biol.

(2007)

D.A.R. Eaton

PyRAD: Assembly of de novo RADseq loci for phylogenetic analyses

Bioinformatics

(2014)

D.A.R. Eaton et al.

Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae)

Syst. Biol.

(2013)

D.A.R. Eaton et al.

Misconceptions of missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants

Syst. Biol.

(2017)

R.C. Edgar

Search and clustering orders of magnitude faster than BLAST

Bioinformatics

(2010)

S.V. Edwards et al.

High-resolution species tree without concatenation

Proc. Natl. Acad. Sci. U.S.A.

(2007)

R.J. Elshire et al.

A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species

PLoS One

(2011)

K.J. Emerson et al.

Resolving postglacial phylogeography using high- throughput sequencing

Proc. Natl. Acad. Sci. U.S.A.

(2010)

P. Etter et al.

Local de novo assembly of RAD paired-end contigs using short sequencing reads

PLoS One

(2011)

M. Fernandez-Mazuecos et al.

Resolving recent plant radiations: power and robustness of genotyping-by-sequencing

Syst. Biol.

(2017)

G.T. Godden et al.

Making next-generation sequencing work for you: approaches and practical considerations for marker development and phylogenetics

Plant Ecol. Div.

(2013)

M.G. Harvey et al.

Sequence capture versus restriction site associated DNA sequencing for shallow systematics

Syst. Biol.

(2016)

Cited by (3)

RAD sequencing resolves the phylogeny, taxonomy and biogeography of Trichophoreae despite a recent rapid radiation (Cyperaceae)
2020, Molecular Phylogenetics and Evolution
Citation Excerpt :
One method that has become increasingly popular for phylogenetic analysis at shallow time scales is Restriction-site associated DNA sequencing (RADseq, including ddRAD, GBS; Baird et al., 2008; Elshire et al., 2011). Although originally intended for population genetics, RADseq has been used in dozens of fungal, plant, and animal phylogenetic studies because it provides large numbers of informative characters, and can easily be applied to non-model organisms (Massatti et al., 2016; Hauser et al., 2017; Vargas et al., 2017; Bell and Gonzalez, 2018; Curto et al., 2018; Hipp et al., 2018; Lin et al., 2019; Salas-Lizana and Oono, 2018; Spriggs et al., 2019). The method is particularly well suited for studying recent evolutionary history, as it provides data from thousands of nuclear loci at considerably lower cost per sample (<40$/sample) than alternative NGS methods (Andrews et al., 2016).
Trichophoreae is a nearly cosmopolitan Cyperaceae tribe that contains ~17 species displaying striking variation in size, inflorescence complexity, and perianth morphology. Although morphologically distinct, the status of its three genera (Cypringlea, Oreobolopsis and Trichophorum) is controversial because recent phylogenetic studies have suggested they might not be reciprocally monophyletic. However, previous analyses have shown conflicting topologies and consistently poor support due to an initial rapid diversification of the tribe. We analysed restriction-site associated DNA sequencing (RADseq) data from nearly all species of the clade, combined with five Sanger-based markers (matK, ndhF, rps16, ETS-1f, ITS) sampled extensively within species. This approach allowed us to resolve deep and shallow relationships within Trichophoreae for the first time, despite an anomaly zone spanning several successive short branches that produced considerable gene tree incongruence. Analyses reveal a primary phylogenetic split of the tribe into two clades roughly corresponding to an East Asian-North American disjunction that dates back to the mid-Miocene, with both clades comprised of a mixture of reduced unispicate and larger taxa with highly compound inflorescences. Morphological characters traditionally used in the circumscription of Trichophoreae genera are shown to be homoplasious. Several of these characters correlate best with climatic conditions, with the most reduced species occurring in open habitats at high latitudes and altitudes. Close relatives with highly compound inflorescences are found in temperate or subtropical forest understories. Cypringlea and Oreobolopsis are deeply nested within Trichophorum, and we merge all three genera into a more broadly circumscribed Trichophorum. We also show that Scirpus filipes is another previously unrecognized East Asian species of Trichophorum with highly compound inflorescences.
Valeriana sobraliana (Valerianaceae), a new species from Southern Brazil
2019, Phytotaxa
Valeriana iganciana (Valerianaceae), a new species from the highland grasslands of Serra do Tabuleiro, Santa Catarina, Brazil
2018, Phytotaxa

View full text

Exploring the utility of “next-generation” sequence data on inferring the phylogeny of the South American Valeriana (Valerianaceae)

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Sampling & sequencing

Sequences

Phylogenetic studies and biological implications

Acknowlegements

Org. Div. Evol.

Mol. Phylogenet. Evol.

Mol. Phylogenet. Evol.

Mol. Phylogenet. Evol.

Mol. Phylogenet. Evol.

Bayesian estimation of concordance among gene trees

Mol. Biol. Evol.

The Mediterranean environment of Central Chile

Rapid SNP discovery and genetic mapping using sequenced RAD markers

PLoS One

Resolving relationships within Valerianaceae (Dipsacales): New insights and hypotheses from low-copy nuclear regions

Syst. Bot.

Is RAD-seq suitable for phylogenetic inference? An in silico assessment and optimization

Ecol. Evol.

Phylogeny and biogeography of the American live oaks (Quercus subsection Virentes): A genomic and population genetics approach

Mol. Ecol.

Quartet inference from SNP data under the coalescent model

Bioinformatics

Design and testing of a plant-specific PCR primer for ecological and evolutionary studies

Mol. Ecol.

Phylogenomics and the reconstruction of the tree of life

Nat. Rev. Genet.

A rapid DNA isolation procedure for small quantities of fresh leaf tissue

Phytochem. Bull.

BEAST: Bayesian evolutionary analysis by sampling trees

BMC Evol. Biol.

PyRAD: Assembly of de novo RADseq loci for phylogenetic analyses

Bioinformatics

Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae)

Syst. Biol.

Misconceptions of missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants

Syst. Biol.

Search and clustering orders of magnitude faster than BLAST

Bioinformatics

High-resolution species tree without concatenation

Proc. Natl. Acad. Sci. U.S.A.

A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species

PLoS One

Resolving postglacial phylogeography using high- throughput sequencing

Proc. Natl. Acad. Sci. U.S.A.

Local de novo assembly of RAD paired-end contigs using short sequencing reads

PLoS One

Resolving recent plant radiations: power and robustness of genotyping-by-sequencing

Syst. Biol.

Making next-generation sequencing work for you: approaches and practical considerations for marker development and phylogenetics

Plant Ecol. Div.

Sequence capture versus restriction site associated DNA sequencing for shallow systematics

Syst. Biol.