Next Article in Journal
Bioactive Compound Contents and Biological Activities of the Algerian Medicinal Plant Rhus Pentaphylla (Jacq.) Desf. (Anacardiaceae)
Previous Article in Journal
Tyrosinase Inhibition Ability Provided by Hop Tannins: A Mechanistic Investigation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Transcriptome Characterization of Different Tissues of Stone Pine (Pinus pinea L.): De Novo Assembly †

1
Centro de Biotecnologia Agrícola e Agro-Alimentar do Alentejo (CEBAL), Instituto Politécnico de Beja (IPBeja), 7801-908 Beja, Portugal
2
Mediterranean Institute for Agriculture, Environment and Development (MED), Pólo da Mitra, Ap. 94, 7006-554 Évora, Portugal
3
The Centre for Ecology, Evolution and Environmental Changes (cE3c), Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
4
Laboratório Nacional de Referência de Saúde Animal, Instituto Nacional de Investigação Agrária e Veterinária, I.P (INIAV, I.P.), 2780-157 Oeiras, Portugal
*
Authors to whom correspondence should be addressed.
Presented at the 2nd International Electronic Conference on Plant Sciences—10th Anniversary of Journal Plants, 1–15 December 2021. Available online: https://iecps2021.sciforum.net/.
These authors contributed equally to this work.
Biol. Life Sci. Forum 2022, 11(1), 77; https://doi.org/10.3390/IECPS2021-11937
Published: 30 November 2021

Abstract

:
Stone pine (Pinus pinea L.) is an emblematic tree distributed around the whole Mediterranean basin. The species is well known for the economics of its timber, resins and edible seeds, the stone pine nuts commercialized in the food industry. Despite its relevance, the genomic information available for the species is scarce, and until now no reference genome has been available. The main purpose of this study was to characterize the stone pine transcriptome of seven different tissues, by performing a de novo transcriptome assembly. A total of 55,328 genes were predicted and functionally annotated based on the SWISS-PROT and nr-NCBI databases and InterProScan signatures.

1. Introduction

Stone pine (Pinus pinea L.) is a Mediterranean species, distributed in coastal areas from the western area of the Iberian Peninsula to Turkey [1]. Stone pine is a valuable species for its pine nuts or pine kernels, which are a delicious and highly nutritious edible seeds, providing a good source of fat, proteins and vitamins, among other phytochemical characteristics [2,3,4]. In addition, the species is also well known for the economics of its timber and resins. Between 2010 and 2015, the Portuguese stone pine area increased by 20,700 ha, reaching 193,600 ha in 2015 [5], representing the second largest area of stone pine of the world.
Advances in sequencing and assembly technologies have allowed rapid progress in the characterization of the genomes of angiosperms, whereas for gymnosperms, such as conifers, the same has not occurred due to the complexity and higher size of their genomes. For instance, the mean size of genomes in the Pinus genus and subgenus are 28.3 Gbs and 26.4 Gbs, respectively [6]. The recent advances in the third generation of high-throughput sequencing technologies and the reduction in their cost allowed the sequencing of two pine genomes from the genus Pinus, P. labertiana (GCA_001447015.2) and P. taeda [7].
Despite the scarce genomic information available for the stone pine, the characterization of the transcriptome even for species with no reference genome available can be performed using RNA-Seq. Transcriptome differences between different plant tissues have been well studied so far, providing a comprehensive characterization of the species transcriptome. Here, in order to explore the differences in the transcriptome between different tissues of stone pine, a transcriptome characterization of needles, xylem, stem bark, terminal bud, first and second year pine cones, and pine nuts, was performed using a de novo transcriptome assembly. This study provides, for the first time, transcriptome resources for seven different tissues of the stone pine, providing a valuable resource for further studies.

2. Materials and Methods

2.1. Sample Preparation, RNA Extraction, and Sequencing

Samples of different tissues (needles, xylem, stem bark, terminal bud, first and second year pine cones, and pine nuts) were collected from five trees located in Coruche (Portugal). Samples were immediately frozen in liquid nitrogen and stored at −80 °C until being processed. The RNA extraction was performed according to the method of Le Provost [8] with minor modifications. The extracted RNA was sequenced in two different Illumina platforms, NextSeq 550 and HiSeq 4000, producing paired-end (PE) reads of 75 bp and 100 bp in length, respectively.

2.2. Sequencing Data, Transcriptome Assembly, and Annotation

The raw reads were pre-processed with Trimmomatic v.0.38 [9], keeping reads with a minimum quality of 20, over a screen size window of 10% of the read length, and a minimum length of 80% of the read length. Then, the de novo transcriptome assembly was performed using Mira v.4.0.2 [10], discarding contigs shorter than 200 bp.
Gene prediction and transcriptome annotation was performed using TransDecoder v.5.5 [11], following its guidelines. BlastP was used to functionally annotate the predicted genes, identifying homologous genes from the SWISS-PROT and nr-NCBI plant databases, and InterProScan was used to obtain protein domains, gene ontology (GO) terms, and KEGG pathways [12,13,14,15].

2.3. Tissue-Specific Characterization

In order to characterize the transcriptome at the tissue level, the pre-processed reads of all individuals were mapped against the assembled transcriptome with STAR v.2.7.3a [16], using the two-pass mode according to the manual guidelines. The unique mapped reads were retained and used to estimate the RNA abundance of the predicted genes by means of StringTie (parameter used -e) [17]. The tissue-specific characterization was performed taking into account only genes with an abundance ≥5 in at least one of the biological replicates in at least one tissue, considering those genes as genes expressed. Then, the BinGO plugin from Cytoscape was used to identify GOs overrepresented over the set of genes expressed in each tissue, performing a (BH) multiple-testing correction with a p-value ≤ 0.05 [18].

3. Results

3.1. Transcriptome Assembly and Annotation

A de novo transcriptome assembly was generated from the sampled tissues. Sequencing of cDNA of all samples from both sequencing platforms generated a total of 2,026,716,380 PE reads. After trimming low-quality bases and removing low-quality reads with Trimmomatic, 1,898,376,282 high-quality reads were kept, representing 93.7% of the raw reads (Table 1). The transcriptome assembly of stone pine resulted in 165,179 contigs equal or greater than 200 bp, which represented an accumulative assembly size of 81,310 Mb (Table 2). A total of 55,328 candidate genes were identified by Transdecoder, of which 41,839 found at least one homologous hit against the SWISS-PROT database. The remaining predicted genes with no hits were further blasted against the nr-NCBI plants, with 8322 genes finding at least one homology hit.
Functional categories in terms of GOs and associated KEGG pathways were identified using InterProScan. A total of 28,258 (51.07%) predicted genes were assigned with at least one GO term, covering 2079 different GO terms (BP—biological processes: 41.75%; MF—molecular function: 45.46%; CC—cellular components: 12.94%). Moreover, 4134 predicted genes were successfully assigned to at least one KEGG pathway of the 124 identified, codifying 482 different enzymes.

3.2. Tissue-Specific Characterization

After removing genes with low abundances, the universe of expressed genes considered for the transcriptome characterization was 54,627, of which 30,137 genes were co-expressed in all tissues (Figure 1). In addition, 5738 genes were found exclusively expressed in pine nuts, whereas in the other tissues the number of exclusively expressed genes was much lower (needles: 1087; stem bark: 212; xylem: 210; terminal bud: 143; first year pine cone: 74; second year pine cone: 21). By performing the pairwise comparisons of genes expressed per tissue, the pine nut tissue is the one with the lowest number of genes expressed in common with the other tissues. For instance, 5862 genes were co-expressed in all tissues but pine nut tissues. On the other hand, the two pine cone tissues, first year and second year, were the ones with the greatest similarity (the highest Jaccard index).
In order to understand the enrichment occurrence of overrepresented GO terms in each tissue, the proportion of genes expressed in each tissue was compared with the genes expressed overall among the transcriptome assembly. The analysis showed that 1170 GOs were found to be overrepresented among all tissues (BP: 683; MF: 284; CC: 203). When looking for exclusive overrepresented GO terms per tissue, a total of 409 overrepresented GO terms were found (Table 3).
In terms of exclusively overrepresented GOs identified in each tissue, we observed that in pine nut tissue, most of them were related with seed maturation, initiation of transcription, translation and stored nutrient mobilization, cell expansion, root development, and cell division, among others, whereas those in needles were related with photosynthesis and energy metabolism. Additionally, metabolic processes associated with coenzymes and cofactors were related with exclusively overrepresented GOs in stem bark tissue and in terminal buds with the translation of elongation factors, cell structure, and cell wall organization. Finally, catalytic activities were related with exclusively overrepresented GOs in second-year pine cone tissue.
Regarding KEGG pathways, 482 different enzymes were codified by 4057 genes expressed among the whole transcriptome (needles: 3769; stem bark: 3474; terminal bud: 3067; first-year pine: 3114; pine nut: 2909; second-year pine: 2900; xylem: 2400). The most representative KEGG pathways per tissue are represented in Figure 2. Clear differences were observed between needles and stem bark tissues in comparison with the other tissues. Both contained a higher number of expressed genes that were associated directly with energy metabolism, such as glycolysis; gluconeogenesis, and diverse sugar metabolism (galactose, fructose and mannose). The highest difference was observed in both of these tissues against the others in “Glyoxylate and dicarboxylate” and “Carbon fixation in photosynthetic organisms” metabolisms, which are usually more active in photosynthetic tissues.

4. Conclusions

This is the first time that a large-scale RNA-seq dataset was generated from seven different tissues from stone pines, providing a complete transcriptome characterization. The data produced in this process will be a useful resource for future studies on the species. The generated transcriptome assembly resulted in a total of 55,328 identified genes, of which 50,161 were functionally annotated. More studies are on-going in order to assess differences in gene expression between tissues in stone pine.

Author Contributions

This study was conceived by A.M.R. The collection and identification of field material was performed by A.U., B.M., M.A., C.L. and A.M.R. RNA extraction was performed by C.L. Bioinformatics data analyses were conducted by A.U., B.M. and M.A. Biological interpretation of the results was conducted by A.U., B.M. and L.M. The manuscript was written by A.U. and L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was co-financed by Program Alentejo 2020, through the European Fund for Regional Development under the scope of SelectPinea- Development of genetic markers for relevant traits in stone pine (ALT20-03-0145-FEDER-000041). Contrato–Programa to L.M. (CEECINST/00131/2018) and UIDB/05183/2020 were funded by FCT.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Raw read files are available at NCBI Sequence Read Archive under the accession PRJNA827975.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Quézel, P.; Médail, F. Ecologie et Biogéographie des Forêts du Bassin Méditerranéen; Elsevier: Amsterdam, The Netherlands, 2003; p. 574. ISBN 2-84299-451-5. [Google Scholar]
  2. Nergiz, C.; Dönmez, I. Chemical composition and nutritive value of Pinus pinea L. seeds. Food Chem. 2004, 86, 365–368. [Google Scholar] [CrossRef]
  3. Bolling, B.W.; Chen, C.Y.; McKay, D.L.; Blumberg, J.B. Tree nut phytochemicals: Composition, antioxidant capacity, bioactivity, impact factors. A systematic review of almonds, Brazils, cashews, hazelnuts, macadamias, pecans, pine nuts, pistachios and walnuts. Nutr. Res. Rev. 2011, 24, 244–275. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Evaristo, I.; Batista, D.; Correia, I.; Correia, P.; Costa, R. Chemical profiling of Portuguese Pinus pinea L. nuts. J. Sci. Food Agric. 2010, 90, 1041–1049. [Google Scholar] [CrossRef] [PubMed]
  5. ICNF. 2019. Available online: http://www2.icnf.pt/portal/florestas/ifn/resource/doc/ifn/ifn6/IFN6_Relatorio_completo-2019-11-28.pdf (accessed on 28 August 2021).
  6. Grotkopp, E.; Marcel, R.; Sanderson, M.J.; Rost, T.L. Evolution of genome size in pines (Pinus) and its life-history correlates: Supertree analyses. Evolution 2004, 58, 1705–1729. [Google Scholar] [CrossRef]
  7. Zimin, A.V.; Stevens , K.A.; Crepeau, M.W.; Puiu, D.; Wegrzyn, J.L.; Yorke, J.A.; Langley, C.H.; Neale, D.B.; Salzberg, S.L. An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. Gigascience 2017, 6, 1–4. [Google Scholar] [CrossRef] [PubMed]
  8. Le Provost, G.; Herrera, R.; Paiva, J.A.P.; Chaumeil, P.; Salin, F.; Plomion, C. A micromethod for high throughput RNA extraction in forest trees. Biol. Res. 2007, 40, 291–297. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Chevreux, B.; Suhai, S. Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Ger. Conf. Bioinform. 1999, 99, 45–56. [Google Scholar]
  11. Haas, B.; Papanicolaou, A.J.G.S. TransDecoder (Find Coding Regions within Transcripts) Version 5.5.0. Available online: http://transdecoder.github.io (accessed on 28 August 2021).
  12. Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and applications. BMC Bioinform. 2009, 10, 421. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M.C.; Estreicher, A.; Gasteiger, E.; Martin, M.J.; Michoud, K.; O’Donovan, C.; Phan, I.; et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003, 31, 365–370. [Google Scholar] [CrossRef]
  14. Wheeler, D.L.; Barrett, T.; Benson, D.A.; Bryant, S.H.; Canese, K.; Chetvernin, V.; Church, D.M.; DiCuccio, M.; Edgar, R.; Federhen, S.; et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2018, 46, D8–D13. [Google Scholar] [CrossRef] [Green Version]
  15. Jones, P.; Binns, D.; Chang, H.Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef]
  17. Pertea, M.; Pertea, G.M.; Antonescu, C.M.; Chang, T.C.; Mendell, J.T.; Salzberg, S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015, 33, 290–295. [Google Scholar] [CrossRef] [Green Version]
  18. Maere, S.; Heymans, K.; Kuiper, M. BiNGO: A Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks. Bioinformatics 2005, 21, 3448–3449. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Pairwise comparison per tissue. In brackets are represented the total number of genes expressed per tissue. Within each square is represented the number of genes in common between tissues and below that number, the corresponding Jaccard index is presented. The higher the index value, the more similar the two tissues compared.
Figure 1. Pairwise comparison per tissue. In brackets are represented the total number of genes expressed per tissue. Within each square is represented the number of genes in common between tissues and below that number, the corresponding Jaccard index is presented. The higher the index value, the more similar the two tissues compared.
Blsf 11 00077 g001
Figure 2. The most representative KEGG pathways associated with all predicted genes.
Figure 2. The most representative KEGG pathways associated with all predicted genes.
Blsf 11 00077 g002
Table 1. Number of reads from the RNA-Seq data of different tissues of Pinus Pinea, before and after quality control (QC).
Table 1. Number of reads from the RNA-Seq data of different tissues of Pinus Pinea, before and after quality control (QC).
TissueNº SamplesNº Raw ReadsNº Reads after QC% Reads after QC
BGI- Illumina Platform HiSeq 4000
Needle5146,326,868138,159,55894.4
Xylem5138,531,098133,130,58096.1
Stem bark5143,731,678135,937,99694.6
Terminal bud5135,949,880130,136,48295.7
1st year pine cone5135,310,934125,938,52293.1
2nd year pine cone569,963,35465,575,69693.7
Pine nut5144,025,338135,919,53694.4
Total35913,839,150864,798,37094.6
BIOCANT-Illumina Platform NextSeq 550
Needle2131,986,590121,104,59691.8
Xylem3157,461,310149,028,61294.6
Stem bark4160,306,566140,699,32087.8
Terminal bud2150,995,220141,263,03093.6
1st year pine cone3203,695,358190,859,77293.7
2nd year pine cone3154,578,072145,230,24694,0
Pine nut3153,854,114145,392,33694.5
Total201,112,877,2301,033,577,91292.9
Table 2. General assembly metrics for the stone pine transcriptome.
Table 2. General assembly metrics for the stone pine transcriptome.
MetricValue
Total number of contigs165,179
Nº of contigs ≥ 200 bp165,179
Nº of contigs ≥ 500 bp45,648
Nº of contigs ≥ 1000 bp13,912
Nº of contigs ≥ 2000 bp4043
Nº of contigs ≥ 4000 bp467
Nº of contigs ≥ 6000 bp58
Nº of contigs ≥ 8000 bp13
Total length of contigs813,10,033 bp
Largest contig11,938 bp
GC%45.32
N50567
Table 3. Exclusive overrepresented GO terms per tissue classified by categories. BP: biological processes; MF: molecular functions; CC: cellular components.
Table 3. Exclusive overrepresented GO terms per tissue classified by categories. BP: biological processes; MF: molecular functions; CC: cellular components.
TissueBPMFCCTotal
Needles733110114
Stem bark2204
Terminal bud73111
First year pine cone0303
Second year pine cone2305
Pine nut1811436231
Xylem306541
TOTAL2956252409
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Usié, A.; Mendes, B.; Antunes, M.; Leão, C.; Marum, L.; Ramos, A.M. Transcriptome Characterization of Different Tissues of Stone Pine (Pinus pinea L.): De Novo Assembly. Biol. Life Sci. Forum 2022, 11, 77. https://doi.org/10.3390/IECPS2021-11937

AMA Style

Usié A, Mendes B, Antunes M, Leão C, Marum L, Ramos AM. Transcriptome Characterization of Different Tissues of Stone Pine (Pinus pinea L.): De Novo Assembly. Biology and Life Sciences Forum. 2022; 11(1):77. https://doi.org/10.3390/IECPS2021-11937

Chicago/Turabian Style

Usié, Ana, Bruna Mendes, Marta Antunes, Célia Leão, Liliana Marum, and António Marcos Ramos. 2022. "Transcriptome Characterization of Different Tissues of Stone Pine (Pinus pinea L.): De Novo Assembly" Biology and Life Sciences Forum 11, no. 1: 77. https://doi.org/10.3390/IECPS2021-11937

Article Metrics

Back to TopTop