Elsevier

Gene

Volume 540, Issue 2, 1 May 2014, Pages 201-209
Gene

The complete chloroplast genome sequence of Taxus chinensis var. mairei (Taxaceae): loss of an inverted repeat region and comparative analysis with related species

https://doi.org/10.1016/j.gene.2014.02.037Get rights and content

Highlights

  • We sequenced the complete chloroplast genome of Taxus c. var. mairei.

  • One of the large IRs was lost in Taxus c. var. mairei.

  • Several genes (rpl32, clpP etc.) differ significantly with related species.

  • A 110 kb inversion with gene contents unarranged was found.

Abstract

Taxus chinensis var. mairei (Taxaceae) is a domestic variety of yew species in local China. This plant is one of the sources for paclitaxel, which is a promising antineoplastic chemotherapy drugs during the last decade. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of T. chinensis var. mairei. The T. chinensis var. mairei cp genome is 129,513 bp in length, with 113 single copy genes and two duplicated genes (trnI-CAU, trnQ-UUG). Among the 113 single copy genes, 9 are intron-containing. Compared to other land plant cp genomes, the T. chinensis var. mairei cp genome has lost one of the large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperm such as Cycas revoluta and Ginkgo biloba L. Compared to related species, the gene order of T. chinensis var. mairei has a large inversion of ~ 110 kb including 91 genes (from rps18 to accD) with gene contents unarranged. Repeat analysis identified 48 direct and 2 inverted repeats 30 bp long or longer with a sequence identity greater than 90%. Repeated short segments were found in genes rps18, rps19 and clpP. Analysis also revealed 22 simple sequence repeat (SSR) loci and almost all are composed of A or T.

Introduction

Since the first report of the complete chloroplast (cp) genome sequences of the tobacco and the liverwort (Shinozaki et al., 1986), a number of land plant chloroplast genomic sequences have been determined. These recent determination of complete cp genomic sequence of various plant species have enabled numerous comparative analyses as well as advancements in plant and genome evolutionary studies, including transcriptome analysis and pangenomes that are based on these data (Medini et al., 2005). Although the published complete cp genome sequence of gymnosperm species were few in number, unique characteristics such as genome-scale genomic rearrangement and a more frequent gene lost and gain events were found in them (Jansen et al., 2007). The probability of genomic rearrangements and gene loss events of a land plant cp genome during evolutionary progress was thought to have intimate relationship with the size of IRs (Wu et al., 2007). Large IRs can help stabilize the cp genome and reduce the possibility of gene loss and rearrangements (Xiao et al., 2008). In most angiosperms such as date palm (Phoenix dactylifera L.), the relative size of LSC, SSC and IRs remains constant, the gene order and organization are almost the same with inferred ancestral angiosperm cp genomes (Yang et al., 2010). However, some clades of gymnosperm such as Pinaceae and Cupressaceae have lost one of the large inverted repeats, which lead to more gene loss and structural rearrangements in their cp genomes (Kolodner and Tewari, 1979).

Taxus chinensis var. mairei is a variety of the Taxus genus, yew family (Taxaceae) in domestic China. Its secondary metabolite paclitaxel (taxol) is a chemotherapy drug given to treat ovarian, breast and non-small cell lung cancer, which is one of the most promising antineoplastic agents of the last decade, with demonstrated activity in advanced and refractory ovarian, breast, lung, and head and neck cancers (Rowinsky et al., 1993). Paclitaxel was first isolated from the bark of pacific yew tree in 1970s, but leaves of Taxus were also examined as a source of paclitaxel and related toxoids (Ketchum et al., 1999). As the breast cancer rate increases, the unique medicinal value of Taxus was gradually recognized. The access to plastid genome information of T. chinensis var. mairei will provide usage of information for further transcriptomic and proteomic analysis, and pave the way to study the enzymes that catalyze the biosynthesis of the natural compounds in chloroplast.

Currently, the gene content and genomic structure of some species of gymnosperms are still little known, because there are only 3 published complete cp genome sequences of Taxaceae in GenBank (http://www.ncbi.nlm.nih.gov). Here, we report the complete cp genome sequence of Taxus c. var. mairei, the first reported cp genome in the Taxus genus. In this report, we described details of the genome assembly, annotation, and simple sequence repeats (SSRs). Dot-plot analyses and genomic comparative analyses were also performed in order to better understand the unique structure of the cp genome of T. c. var. mairei.

Section snippets

DNA sequencing and genome assembly

Fresh leaves of T. chinensis var. mairei were collected for the preparation of genomic DNA extraction. 5 μg purified DNA was used for the construction of cp DNA libraries. Solexa high-throughput sequencing system (Illumina Genome Analyzer II) was used to generate raw sequence reads for this project.

Since the original sequence reads are a mixture of DNA from nucleus and organelles, BLAT (Kent, 2002) software was used to isolate chloroplast-related reads from the raw reads based on known reference

Genome assembly and validation

Using the Illumina Hiseq 2000 system, 49,743,352 paired-end reads were generated to assemble the cp genome of T. chinensis var. mairei. After filtering low-quality reads (≤ Q20 bases) and aligning with reference cp genomes, we collected 1,802,286 reads (3.62% of total) reaching 95 × coverage over the cp genome (Supplementary Table 1). The unassembled reads (~ 96.38%) were mostly from the nuclear genome due to the raw reads which was a collection of DNA from nucleus and organelles. We have manually

Gene content differences between T. chinensis var. mairei and other gymnosperms

There are marked differences in gene content between T. chinensis var. mairei and several other gymnosperm cp genomes. The gene rps16 was absent from the cp genome of T. chinensis var. mairei. The absence of rps16 is also found in the cp genome of P. thunbergii (Tsudzuki et al., 1992), an early divergent plant of the Pinaceae family.

Another gene psbG, positioned between ndhJ and ndhK in C. wilsoniana (Wu et al., 2011) and G. biloba (Lin et al., 2012), is completely absent from the cp genome of

Conclusions

The complete chloroplast genome sequence of the T. chinensis var. mairei has revealed that the Taxus species has a distinct cp genome compared to previously reported gymnosperm cp genomes, lost one of the large inverted repeats (IRs), making more gene loss events and structural rearrangements happened in its cp genome. The deleted large IRs and the numerous genome rearrangements that have occurred in the cp genome of T. chinensis var. mairei provided new insights into the evolutionary lineage

Conflict of interest

There is no conflict of interest.

Acknowledgments

This work was supported by the National Science Foundation of China (Grant Nos. 81274033, 81202424) and the Research Project of Chinese Ministry of Education (Grant No. 113037A).

References (32)

  • R.E. Ketchum et al.

    Efficient extraction of paclitaxel and related taxoids from leaf tissue of Taxus using a potable solvent system

    Journal of Liquid Chromatography & Related Technologies

    (1999)
  • R. Kolodner et al.

    Inverted repeats in chloroplast DNA from higher plants

    Proceedings of the National Academy of Sciences of the United States of America

    (1979)
  • M. Kuang

    Construction of DNA fingerprinting and analysis of genetic diversity with SSR markers for cotton major cultivars in China

    Scientia Agricultura Sinica

    (2011)
  • S. Kurtz et al.

    REPuter: the manifold applications of repeat analysis on a genomic scale

    Nucleic Acids Research

    (2001)
  • M.A. Larkin

    Clustal W and Clustal X version 2.0

    Bioinformatics

    (2007)
  • C.P. Lin et al.

    The complete chloroplast genome of Ginkgo biloba reveals the mechanism of inverted repeat contraction

    Genome Biology and Evolution

    (2012)
  • Cited by (63)

    • Comparative analyses of six complete chloroplast genomes from the genus Cupressus and Juniperus (Cupressaceae)

      2022, Gene
      Citation Excerpt :

      In gymnosperms, the IR is highly reduced (Wu et al. 2018; Kwon et al. 2020). Moreover, some gymnosperms lack all IRs, such as Cupressaceae and Taxaceae (Zhang et al. 2014; Kim & Lee 2020). In this research, we chose the complete cp genomes of two Cupressus and four Juniperus species as the research objective, and based on phylogenomic analysis, we elucidated the phylogenetic relationship between these two groups inferred from complete plastome evidence.

    • Gene duplication and rate variation in the evolution of plastid ACCase and Clp genes in angiosperms

      2022, Molecular Phylogenetics and Evolution
      Citation Excerpt :

      Major non-photosynthetic functions of plastids include the reaction catalyzed by the acetyl-CoA carboxylase (ACCase) enzyme and protein degradation performed by the caseinolytic protease (Clp) complex (Caroca et al., 2021; Green, 2011; Konishi et al., 1996; Nishimura et al., 2017; Nishimura and van Wijk, 2015). Both of these functions are essential in plants and thus the genes involved are generally highly conserved; however, these genes have undergone rapid evolution in multiple angiosperm species (Barnard-Kubow et al., 2014; Erixon and Oxelman, 2008; Jansen et al., 2007; Park et al., 2017; Sloan et al., 2014, 2014; Wicke et al., 2011; Williams et al., 2019, 2015; Zhang et al., 2014). While many hypotheses about these patterns of accelerated evolution have been posited, the underlying evolutionary mechanisms, causes, and consequences remain largely unknown.

    • Taxaceae and Cephalotaxaceae: Biodiversity, Chemodiversity, and Pharmacotherapy

      2021, Taxaceae and Cephalotaxaceae: Biodiversity, Chemodiversity, and Pharmacotherapy
    • Chloroplast genome evolution in the Dracunculus clade (Aroideae, Araceae)

      2021, Genomics
      Citation Excerpt :

      Chloroplast genomes are mostly quadripartite in structure in which the large-single copy (LSC) region and the small-single copy (SSC) region are separated by a pair of inverted repeats (IRs: IRa and IRb) [2–4]. In some plant lineages, the quadripartite structure is not observed due to loss of one or two IRs [5]. Moreover, very short IRs are also reported in some plant lineages [6].

    View all citing articles on Scopus
    1

    These authors contributed equally to this work.

    View full text