Introduction

Camellia, family Theaceae, is well known as a tea and ornamental tree and is also considered as a valuable oil source (Robards et al. 2009). Camellia oil is extracted from a number of different species including Camellia oleifera, C. meiocarpa, C. sinensis, and C. semiserrata and has been used as a culinary oil for over two thousand years (Yang et al. 2016; He et al. 2016). Camellia meiocarpa and C. oleifera cultivated area are the largest in China; however, the former is known for its higher oil quantity and quality than the latter (Yuan et al. 2012; Xie et al. 2013).

Plant oils are of great importance for the agricultural industry and their overall consumption has been increasing by around 5% per year over the past half century (Harwood and Guschina 2013). China’s camellia oil is known as the “Eastern olive oil” (Lee and Yen 2006). Production reached 180,000 tons in 2014 with a monetary value of 42 billion Yuan and by 2020 it is expected to reach more than 250,000 tons with an output value of 100 billion Yuan (National Development and Reform Commission of China 2009). The oleic acid content of camellia oil could reach above 80%, along with high content of monounsaturated lipid, the lowest levels of saturated fats (Haiyan et al. 2007). It is reputed to aid cholesterol reduction, resistance to stress, oxidation and inflammation reduction, and neoplasma improving human immunity (Haro Bailon et al. 2014).

The amount and quality of the oil produced from camellia species can vary according to specie (Su et al. 2014; De et al. 2014), stage of development (Li et al. 2014a, b), as well as extraction (Fang et al. 2016) and drying methods (Liu et al. 2014; Hsieh et al. 2013). Before oil extraction, the drying process is a fundamental factor (Tibaldi et al. 2010) as the seed moisture content should decrease below 10% (Liu et al. 2014). It is proven that the fatty acid synthesis in the dry seed results in improving the quality of fatty acid through increasing the unsaturated fatty acid and decreasing the saturated fatty acid (Hsieh et al. 2013; Wang et al. 2011). Seed drying management not only improves the quality and quantity of oil in camellia species but also other oilseed plants (Tibaldi et al. 2010; Dušková et al. 2016; Evaristo et al. 2016; Tibaldi et al. 2013). Several transcriptome analyses showed that drying management affects transcript abundances and the genetic regulatory networks in seeds, resulting in the selective change of specific transcripts (Rohini et al. 2016; Fu et al. 2016; Pan et al. 2016). However, very little is known about the relationship with expression of genes involved in the fatty acid pathway and oil accumulation during drying management.

The main objective of the present study is to gain knowledge on the molecular mechanisms underlying the role of drying management on the fatty acid pathway and oil accumulation in camellia matured seed and to identify transcripts putatively related to lipid metabolism. To accomplish this goal, we performed comparative transcriptomic analyses to document the dynamics of five different levels of seed moisture content (10, 20, 30, 40 and 50%) in two camellia species C. oleifera and C. meiocarpa under natural drying condition. We expect that this research will provide insights into how natural drying contributes to lipid metabolism for the two camellia species.

Materials and methods

Plant material

In 2012, mature fruits of C. oleifera and C. Meiocarpa, with removed fuzz, deep and shiny color, and showing micro-cracking, were collected from the four crown cardinal directions of superior trees growing at the Minhou Tongkou State Forest Farm (26°05′N, 119°17′E), Fujian Province, China. For each species, one tree located on each of the upper, middle, and lower southeast slopes of the State Forest Farm were selected. Fruits were collected from healthy trees with known good fruit production (>15 kg for 3 consecutive years). The collected fruits were mixed per species, and placed in a ventilated room until they naturally cracked and seeds were extracted by manual shell cutting. The seed moisture content at the time of extraction was close to 50%. Seeds were naturally dried and their moisture content was determined daily. Over time, seed samples were collected at moisture content of 50, 40, 30, 20 and 10% and were sequentially identified as T01 to T05 and T06 and T10 for C. meiocarpa and C. oleifera, respectively, and were flash frozen in liquid nitrogen and stored at −80 °C until RNA extraction.

Ultrastructure

The seeds of different moisture content (10, 20, 30, 40, and 50%) of the two species were immersed in 2.5% glutaraldehyde solution. After three washes with phosphate buffer (0.1 M, pH 7.2), seeds were post-fixed for 4 h at room temperature under 1% osmium tetroxide. Seeds were washed three times with phosphate buffer (0.1 M, pH 7.2), dehydrated in graded ethanol series (30, 50, 70, 80, 90, 95, and 100%, 15 min each), then replaced by acetone. Seeds were infiltrated with epoxy resin 812, and polymerized at the 37 °C oven for 24 h, then at the 60 °C oven for 48 h. Ultrathin sections (100 nm) of seeds were cut with EM UC6 microtome (Leica, Germany), using diamond knives and picked up with copper grids (200 mesh). Sections were double stained with 2% uranyl acetate and 1% lemon lead acid, observed and photographed under Tecnai-12 transmission electron microscope (FEI, America).

To calculate the area fraction of lipid droplets in matured seeds during natural drying, photos were taken in same 72 × 72 lattice system based on calibration curves of 10 μm. Three images from each section were selected and analyzed by Image Analyzer software (1.37). The hit points of lipid droplets were marked manually on blinded images, and counted by the software. The area fractions were estimated by the ratio of hit points/(total points − cell wall points) (Li et al. 2014a, b).

RNA isolation

Following manufacturers’ instructions, total RNA was: (1) extracted from pooled samples of seeds using RNAsimple® kit (Tiangen Biotech, Beijing, China), (2) checked for degradation and contamination after monitoring on 1% agarose gels, (3) purity was determined using the NanoPhotometer® spectrophotometer (IMPLEN, CA, USA), (4) concentration was measured using Qubit® RNA Assay Kit in Qubit® 2.0 Flurometer (Life Technologies, CA, USA), and (5) integrity was assessed using the RNA Nano 6000 Assay Kit using the (Agilent Bioanalyzer 2100 system, Agilent Technologies, CA, USA).

cDNA library construction, and high throughput sequencing

After total RNA extraction, mRNA was purified using poly-T oligo-attached magnetic beads (Connell et al. 2012). A total of 3 μg RNA was used as the input for each RNA sample preparation. Sequencing libraries were generated using a NEBNext UItra RNA Library Prep Kit (Illumina, NEB, USA) following the manufacturer’s instructions. Index codes were assigned to attribute the sequences to each sample. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq 2000 platform and paired-end reads were generated (Akazawa et al. 2013).

Data analysis, transcriptome assembly, gene functional annotation

Raw reads of fastq format were first processed through in-house perl scripts. In this step, clean reads were obtained after removing reads containing adapters, low quality reads (with ambiguous “N” bases >5%, and those with >10% of Q value <20 bases) (Gao et al. 2014). At the same time, the clean reads from each library were evaluated for Q20, Q30, GC-content, N-content, CycleQ 20 (a base quality greater than 20 and an error probability of 0.01). All the downstream analyses were based on high quality clean data.

Clean reads from the ten libraries (T01–T10) were assembled using Trinity, which is an efficient method for de novo assembly of full-length transcripts without a reference genome (Grabherr et al. 2013). Then the clean reads were clustered based on nucleotide sequence with parameters set at K-mer length of 25 and a similarity of 80%. First, we combined all the clean reads to form contigs and calculated the distance and relation among these contigs using Trinity software. Then these contigs were connected to obtain transcripts with consensus sequences that could not be extended on either end, and further clustered them into unigenes for annotation. To calculate abundance estimation for each unigene, clean data were mapped back onto the assembled transcriptome, and read count for each unigene was obtained from the mapping results. Fragments per kilobase per transcript per million mapped reads (FPKM) was used to quantify gene expression abundance (Mortazavi et al. 2008).

The program BLAST was used to assign putative functions to the assembled unigenes. Unigenes were aligned against the Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (A manually annotated and reviewed protein sequence database), KO (KEGG Ortholog database), and GO (Gene Ontology) using the Blast algorithm with a E value cut-off of 10−5. Clean reads were mapped back onto the assembled transcriptome; read count for each unigene was obtained from the mapping results. To annotate the unigenes with GO terms, the best BLASTx hit from Nr database for each transcript was submitted to BLAST2GO platform (Conesa and Götz 2008), and GO terms for each unigene were retrieved based on the relationship between gene names and GO terms, EC number was assigned and parsed based on the BLAST2GO results. To determine metabolic pathways, KEGG mapping was used. KEGG pathways were retrieved from the KEGG web server (KAAS—http://www.genome.jp/kegg/) (Kanehisa et al. 2008). KAAS provides functional annotation of putative genes by BLAST comparison against the KEGG GENES database. The output includes KO (KEGG Orthology) assignments and automatically generated KEGG pathways that are populated with the KO assignments. For coding sequences (CDS) annotation, BLASTx alignments were carried out between unigenes against Nr, KEGG, Swiss-Prot, and COD protein databases, and the transcriptional directions and coding frame of unigenes were predicted from BLASTx results. Unigene CDS with no specific BLASTx matches were predicted by ESTScan (Iseli et al. 1999).

Differential expression analysis

Differential expression analysis of two samples was performed using the DEGseq R package (Wang et al. 2009). The P value was adjusted for multiple testing using the Benjamini–Hochberg method. P value <0.005 and |log2 (foldchange)| ≥1 was set as the threshold for significantly differential expression (Renfro et al. 2011). Gene Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was implemented by the GOseq R packages based on Wallenius non-central hyper-geometric distribution was used to test the statistical enrichment of DEGs in KEGG pathways (Mao et al. 2005; Young et al. 2010).

Quantitative RT-PCR analysis

Reverse transcription quantitative real-time PCR (qRT-PCR) of a set of 8 DGEs was carried out to validate the expression profile (Table 1). The concentration of the RNA samples was determined by Nanodrop 2000 and all cDNAs were synthesized from 1.5 μl of total RNA using HiFiScript Quick gDNA Removal cDNA Kit (CWBIO, Beijing, China). Specific primers were designed using Primer Premier 6 software (Zhang et al. 2013). Quantitative real-time PCR experiments were then performed in Bioer 96 plus using UltraSYBR Mixture. For the unigenes tested, 3 biological replicates were used and the reaction mixtures were performed in a final volume of 20 μL containing 10 μL of 2× UltraSYBR Mixture, 500 nM of each primer, 1 μL of cDNA as template and 8 μL of RNase-Free Water. The amplification program was the same for all unigenes tested: 95 °C for 10 min, 40 cycles of 15 s at 95 °C, 1 min at 60 °C. The 2−∆∆Ct method was used to calculate individual unigene’s relative expression levels (Livak and Schmittgen 2001). A calibrator sample was used in each plate to normalize the values obtained and the potential differences among plates. Normalization was carried out with reference gene ACTIN (Reddy et al. 2013; Shi et al. 2011; Zhou et al. 2013). PCR and agarose gel analyses were used to verify the absence of non-specific amplification prior to qRT-PCR. Additionally, following reactions DNA melt curves were created for each primer combination to confirm the presence of a single product. The average of two technical repeats was used for each reaction, and the standard error of the mean was calculated for the three biological replicates. All primers pairs for qRT-PCR are listed in Table 1.

Table 1 Primers used in RT-qPCR for validation of the expression profile

Results

Lipid drops in different moisture content

In both camellia species, the area fraction of lipid droplets showed steady increase in size with deceasing moisture content (Fig. 1; Table 2). Significant difference between all moisture contents’ area fraction of lipid droplets were observed for C. meiocarpa, with the largest change noted during T03-04 (20–30% moisture content), while only significant difference was noted between T07 (40%) and T08 (30%) for C. oleifera (Fig. 1; Table 2). At the 10% moisture content, C. meiocarpa showed more lipid droplets than C. oleifera, but this trend was reversed for the other moisture contents (Fig. 1; Table 2).

Fig. 1
figure 1figure 1

Ultrastructure section electron microscope of seeds lipid laden by natural drying treatment in two studied camellia species. a The cells contain accumulations of lipid droplets (LD) in C. meiocarpa seeds. b The cells contain accumulations of LD in C. Oleifera seeds. 15 The cells were in 50, 40, 30, 20 and 10% moisture content in two camellia species, respectively. CW cell wall, P plastids containing starch grains. Scale bars 10 μm

Table 2 Area fraction of lipid droplet at different moisture content in two studied species matured seeds

Illumina paired-end sequencing and de novo transcriptome assembly

We obtained a total of 78.18 Gb clean data from the ten (T01–T10) C. meiocarpa and C. oleifera libraries, with per library ≥7.19 Gb (Table 3). After stringent quality assessment and data filtering, we obtained >28.56 million reads with no N per library from end of cDNA fragment with 100% Q20 bases. The GC-content averages were 46.51 and 45.56% for C. meiocarpa and C. oleifera, respectively (Table 3). Furthermore, the CycleQ 20% was 100% for all libraries and a base quality score of >30 of clean reads was above 86.15% (Table 3). This suggested that the sequencing was highly accurate.

Table 3 Summary for raw reads from 10 (T01–T10) C. meiocarpa and C. oleifera libraries

Using the Trinity de novo assembly, all high-quality reads were assembled into transcripts, resulting in 197,381 (mean length of 1006.06 bp) and 209,913 (mean length of 1041.81 bp) for C. meiocarpa and C. oleifera, respectively (Table S1). Then these transcripts were clustered based on nucleotide sequence identity, we harvested a total of 111,156 unigenes with an N50 length of 1149 bp and mean length of 838.1 2 bp, which included 26,282 unigenes (23.64%) with lengths greater than 1 kb. C. meiocarpa and C. oleifera had 74,016 (N50 length of 1038 bp and average length of 789.61) and 76,374 (N50 length of 1046 bp and average length of 798.23) unigenes, respectively (Table 4, S1). These results indicated that the throughput and sequencing quality were high and suitable for the following analysis.

Table 4 Summary for unigenes for C. meiocarpa and C. oleifera (bp)

Functional annotation of transcriptome

A total of 98,398 coding sequence (CDS) were extracted from BLASTx results with 111,156 unigene sequences being translated into protein sequences (see Fig. 2a for the length distribution of CDS). To better assign unigene names and annotations, CDS and predicted proteins, all assembled unigenes were first searched against the NCBI Nr and Swiss-Prot databases using BLASTx with an E value cut-off of 10−5. And as a result, 12,932 (29%) unigenes showed significant similarity with sequences of Vitis vinifera, and about 5% of the mapped sequences have a high similarity with sequences of Theobroma cacao (2341 unigenes), Coffea canephora (2300 unigenes) and Nelumbo nucifera (1973 unigenes) (Fig. 2b). Only 44,886 unigenes (40.38%) were annotated with an E value threshold of 10−5 by performing BLASTX search against diverse protein databases, revealing that 11,958 (10.76%) unigenes had significant matches with sequences in COG databases, 23,650 (21.28%) in GO databases, 7931 (7.14%) in KEGG databases, 24,328 (21.89%) in KOG databases, 26,981 (24.27%) in Pfam databases, 29,160 (26.23%) in Swissprot databases, and 44,032 (39.61%) in Nr protein databases, respectively (see Table 5, for the overall functional annotation for the studied camellia species).

Fig. 2
figure 2

Summary of the camellia transcriptome coding sequence predictions. a Length distribution of CDS predicted from BLAST× with alignments against Nr, KEGG, Swiss-Prot, and COD protein databases and b Species distribution of BLAST hits of camellia sequences with other plant species

Table 5 Functional annotation of two camellia species

Functional classification by gene ontology and eukaryotic orthologous groups

Gene ontology (GO) was used to classify the functions of the assembled transcripts and describe gene products in terms of their associated biological processes, cellular components, and molecular functions. A total of 128,299 predicted proteins were categorized into 52 functional groups with 59,884 (46.68%), 39,149 (30.51%), and 29,266 (22.81%) unigenes grouped in biological processes, cellular components, and molecular functions, respectively (Fig. 3). In the biological processes category, metabolic (12.38%) and cellular processes (10.53%) were the predominant groups, followed by single-organism process (8.12%), response to stimulus (3.43%), and biological regulation (3.35%). In the cellular components category, cell part (7.13%) and cell (7.09%) were the most representative ones, followed by organelle (5.21%) and membrane (4.14%). With regard to molecular function category, the predominant categories were catalytic activity (10.13%) and binding (9.67%), followed by transporter activity (1.22%) (Fig. 3). These results indicated that most of unigenes were responsible for fundamental biological regulation and metabolism.

Fig. 3
figure 3

Gene ontology classification of assembled camellia unigenes

To further evaluate the completeness of the transcriptome and effectiveness of the annotation process, the annotated unigenes were compared with the Eukaryotic Orthologous Groups (KOG) database for functional prediction and classification. In total, 27,156 annotated putative proteins were classified functionally into 25 KOG groups (Fig. 4). Among the 25 categories, the largest group was “general function prediction only” (26.95%), second was “signal transduction mechanisms” (9.81%), followed by “posttranslational modification, protein turnover, chaperones” (8.64%), and “transcription” (5.24%) (Fig. 4). The three smallest groups were “cell motility” (0.04%), “nuclear structure” (0.40%), and “extracellular structures” (0.46%). However, there were 868 unigenes (3.20%) associated with “lipid transport and metabolism”.

Fig. 4
figure 4

Eukaryotic orthologous groups (KOG) classification of assembled camellia unigenes

Pathway analysis during camellia seed after-ripen

Pathway-based analysis was used to assist in understanding unigenes’ functions and interactions. Following the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, 7931 unigenes were assigned to 116 pathways (Table S2). Pathway mapping assigned 927 unigenes into 103 functional pathways (KO groups) (Table S3). Plant hormone signal transduction (ko04075, 44, 15.17%) had the largest number of KO identifiers, followed by RNA transport (ko03013, 40, 15.15%), protein processing in endoplasmic reticulum (ko04141, 38, 15.14%) and spliceosome (ko03040, 31, 10.65%) (Table S2). Notably, 10 pathways (85 unigenes) were closely linked to changes in oil content and composition, which were Glycolysis/Gluconeogenesis (19 unigenes), starch and sucrose metabolism (28 unigenes), pyruvate metabolism (20 unigenes), glycerophospholipid metabolism (10 unigenes), fatty acid metabolism (4 unigenes), glycerolipid metabolism (9 unigenes), fatty acid biosynthesis (12 unigenes), biosynthesis of unsaturated fatty acids (7 unigenes) and fatty acid elongation in mitochondria (1 unigenes) (Table 6, S2).

Table 6 KEGG of unigenes involved in fatty acid biosynthesis and accumulation in the studied two camellia species

Classifying unigenes by KEGG Orthology resulted in 10 pathway related with lipid metabolism, which had the same KO identifiers to each other (Table 6; Fig. 5). These pathways collectively configured a network for the development of oil body droplet (Fig. 5). The network of lipid metabolism sustained by starch and sucrose metabolism is divided into three parts, one part is related to the fatty acid synthesis supported by pyruvate metabolism which included fatty acid biosynthesis, fatty acid elongation and biosynthesis of unsaturated fatty acid. The second part is dedicated to triacylglycerol synthesis and included glycolysis/gluconeogenesis, glycerolipid metabolism and glycerophospholipid metabolism. These two parts combined to form the oil body droplet. The third part is associated with the fatty acid metabolism (Fig. 5). These lipids pathways provided critical clues to the identification and characterization of key functional genes involved in unsaturated fatty acid (FA) and triacylglyceride (TAG) biosynthesis during the natural seed drying process of the two camellia species.

Fig. 5
figure 5

Diagram depicting lipid metabolism network in the two studied camellia species. Ellipses contained the metabolic pathway and polygon oil droplets. Names on arrows refer to the key intermediate products produced from upstream to downstream metabolism. Brackets refer to the enzyme code roles (K00844 hexokinase, K01810 glucose-6-phosphate isomerase, K00627 dihydrolipoamide acetyltransferase, K00161 pyruvate dehydrogenase E1 component alpha subunit, K00128 aldehyde dehydrogenase, K00162 pyruvate dehydrogenase E1 component beta subunit, K00382 dihydrolipoamide dehydrogenase, K01962 acetyl-CoA carboxylase carboxyl transferase subunit alpha, K02160 acetyl-CoA carboxylase biotin carboxyl carrier protein, K01961 acetyl-CoA carboxylase, K03921 acyl-[acyl-carrier-protein] desaturase, K00059 3-oxoacyl-[acyl-carrier protein] reductase, K13506 glycerol-3-phosphate O-acyltransferase, K00901 diacylglycerol kinase)

Identification and characterization of unigenes involved in fatty acids and triacylglycerols biosynthesis

To identify unigenes showing changes in expression at the studied five moisture content levels, the expression level of each unigene was determined using Fragments Per Kilobase of transcripts per Million mapped reads (FPKM) method (Mortazavi et al. 2008). A total of 44,886 unigenes were annotated, with 244 unigenes involved in FA biosynthesis and TAG assemblage (Table 7, S4).

Table 7 Enzymes involved in fatty acid biosynthesis and catabolism identified by the annotation of the two camellia species transcriptome

Differentially expressed genes (DEGs) during after-ripening

Differentially expressed genes (DEGs) were identified by pairwise sample comparisons, and the majority of unigenes were identified as differentially expressed in at least one comparison. We calculated the number of up- and down-regulated unigenes according to decrease in moisture content (Table 8). The data sets from each moisture content level were compared, and Pearson product–moment correlation was determined between sets. The observed low correlation coefficients (range 0.230–0.972) indicated the presence of significant difference between the investigated moisture content levels (Fig. 6; Table S5). In this analysis, we used the tissues from a specific moisture content level as the control for comparison with its consecutive moisture content (i.e., lower was the control for its next higher moisture content sample). This was done within and across species and in this case C. oleifera served as the control. A total of 17,996 unigenes showed ≥twofold expression difference between the two libraries (Table 8).

Table 8 Comparison of the number of differentially expressed genes during the seed natural drying process between samples with different moisture content within and across the two camellia species
Fig. 6
figure 6

Heat-map correlation analysis of differentially expressed genes (DEGs) during the after-ripening process across the ten moisture content samples of two camellia species (the darker the colors, the higher the correlation)

In-depth analysis of fatty acid synthesis and accumulation by natural drying

DEGs provide clues about the molecular events related to the functional roles they play during the seed natural drying specifically those related to synthesis and accumulation of fatty acid. To evaluate the potential functions of genes that showed significant transcriptional changes in fatty acid synthesis and accumulation between the two contrast libraries, DEGs were further classified into subsets with Gene Ontology (GO) enrichment analysis. The results showed that 13 and 23 GO terms in biological processes and molecular function, respectively, were related with fatty acid synthesis and accumulation (Tables S6, S7).

Comparative analyses of the DEGs between any two consecutive moisture content samples within and between C. meiocarpa and C. oleifera are illustrated in Fig. 7. For C. meiocarpa, 77 genes were common to all four comparisons, whereas 1255, 777, 936, and 897 genes clusters were restricted to T01 vs T02, T02 vs T03, T01 vs T04, and T01 vs T05 comparisons, respectively (Fig. 7a). For C. oleifera, 24 genes also were common to all four comparisons, whereas 662, 1277, 721, and 1027 gene clusters were restricted to T06 vs T07, T07 vs T08, T08 vs T09, and T09 vs T10 comparisons, respectively (Fig. 7b). Between species comparisons produced 189 genes in common to all five comparisons, whereas 1892, 1208, 1470, 949, and 1248 gene clusters were restricted to T07 vsT02, T06 vs T09, T06 vs T10, and T10 vs T05 comparisons, respectively (Fig. 7c). To get key unigenes associated with fatty acid synthesis and accumulation by natural drying in the two camellia species, significant DEGs within each comparison were selected for further analysis (Tables S4, S8, S9, S10).

Fig. 7
figure 7

Venn diagram illustrating the number of transcripts differentially expressed between consecutive moisture content samples. a Within C. meiocarpa, b within C. oleifera, c between C. meiocarpa and C. oleifera

Validation of the differential expression profiles

According to the genes annotated in KEGG, eight DEGs were selected for expression analysis, in which 7 transcripts belonged to lipid metabolism and one to RNA regulation (Table 1). Transcript abundance was analyzed by qRT-PCR in C. oleifera and C. meiocarpa seed at the 5 different moisture content levels (Figs. 8, 9, S1; Table S11). Correlation coefficient between qRT-PCR and RNA-Seq in the two species was 0.77 on the average, 0.86 for C. meiocarp and 0.68 for C. oleifera. Most of the genes showed strong or moderately strong correlation. These results confirmed the validity of the transcriptome results.

Fig. 8
figure 8

Validation of the RNA-Seq transcript profiles in C. meiocarpa seed at different moisture content levels. Comparison of transcripts expression patterns from RNA-Seq data and from reverse transcription quantitative real-time PCR (qRT-PCR). The numbers above the graphics correspond to Pearson’s correlation value

Fig. 9
figure 9

Validation of the RNA-Seq transcript profiles in C. oleifera seeds at different moisture content level. Comparison of transcripts expression patterns from RNA-Seq data and from reverse transcription quantitative real-time PCR (qRT-PCR). The numbers above the graphics correspond to Pearson’s correlation value

Discussion

Fatty acids biosynthesis and accumulation

The annotated unigenes were involved in at least 32 plant enzymes in the fatty acid biosynthesis and accumulation (Table S4). These data were integrated and compiled to propose schematic metabolic pathways that lead to oil accumulation in the studied two camellia species (Fig. 10). These results suggested that fatty acid biosynthesis, elongation, desaturation, and TAG biosynthesis were all activated in the seed natural dry process (Fig. 10) (Wang et al. 2014). There were seven fatty acid enzymes (LACS, HCD, SAD, ∆12D, PAP, Oleosin and Caleosin) related to unigenes at an average of FPKM >100 in the two studied camellia species (Table S4), indicating that the C. Meiocarpa and C. Oleifera seed experienced fatty acid modification and accumulation during the natural drying process. This was confirmed by the increase in area fraction of lipid droplets accompanied with decrease in the moisture content in the two camellia species (Fig. 1; Table 2).

Fig. 10
figure 10

Scheme of the reactions involved in fatty acid biosynthesis and triacylglycerol assembly in the studied camellia species. (Lipid substrates abbreviation phosphatidylcholine (PC); fatty acids (FA); lysophosphatidylcholine (LPC); glycerol 3-phosphate (G3P); lysophosphatidic acid (LPA); phosphatidic acid (PA); diacylglycerol (DGA); triacylglycerol (TAG). Enzyme/protein abbreviations malonyl-CoA:ACP transacylase (MAT); biotin carboxyl carrier protein of acetyl-CoA carboxylase (BCCP); biotin carboxylase (BC); acetyl-coenzyme A carboxylase carboxyl transferase subunit (CT); 3-oxoacyl-[acyl-carrier-protein] synthase III (KASIII); 3-oxoacyl-[acyl-carrier-protein] reductase (KER); hydroxyacyl-ACP dehydratase (HAD); enoyl-(acyl-carrier-protein) reductase (EAR); 3-oxoacyl-[acyl-carrier-protein] synthase I (KASI); 3-oxoacyl-[acyl-carrier-protein] synthase II (KASII); acyl-[acyl-carrier-protein] desaturase (∆9D(SAD)); palmitoyl-acyl carrier protein thioesteraseB (FATB); palmitoyl-acyl carrier protein thioesterase A (TATA); long chain acyl-CoA synthetase (LACS); 3-ketoacyl-CoA synthase (KCS); 3-hydroxyacyl-CoA dehydrogenase (HCD); very-long-chain enoyl-CoA reductase (ECR); phospholipase A (PLA); lysophospholipid acyltransferase (LPCAT); omega-6 fatty acid desaturase (∆12D); omega-3 fatty acid desaturase (∆15D); delta(8)-fatty-acid desaturase (∆8D); fatty acid desaturase (4∆5D); sphingolipid delta(4)-desaturase (∆4D); glycerol-3-phosphate acyltransferase (GPAT); glycerol-3-phosphate O-acyltransferase (LPAAT); phosphatidate phosphatase (PAP); phosphatidylcholine:diacylglycerol cholinephosphotransferase (PDCT); diacylglycerol acyltransferase (DGAT); Phospholipid:diacylglycerol acyltransferase (PDAT))

We identified new fatty acid enzymes (BCCP, BC, LPCAT, ∆9D, ∆8D, ∆5D, Oleosin and Caleosin) compared to previously published C. Oleifera transcriptome dataset (Xia et al. 2014), and new fatty acid enzymes (KCS, HCD, CER, ∆9D, ∆15D, ∆8D, ∆5D, Oleosin and Caleosin) compared to C. reticulata transcriptome dataset (Yao et al. 2016). These observations were due to using different study material across studies (i.e., matured seed in the present study vs. buds, leaves, flowers, shoots, and immature fruits in the referenced studies) (Xia et al. 2014; Yao et al. 2016).In the fatty acid biosynthesis, we found 10 BC genes, 5 BCCP and 5 CT, which can freely combine to form the ACCs (Fig. 10; Table 7). A similar expression pattern was observed in tung tree (Galli et al. 2014), Jatropha curcas (Xu et al. 2011), and oil palm seeds (Nakkaew et al. 2008). This step is most likely a feedback inhibition in fatty acid synthesis (Galli et al. 2014; Andre et al. 2012). So expression of genes were low in the fatty acid biosynthesis in this study (Table S4), and BC, BCCP and CT failed to be identified in immature camellia seed (Yao et al. 2016) and free fatty acids were generated from fatty acids acyl-CoA under the action of long chain acyl-CoA synthetase (LACS) (Nobusawa et al. 2013) (Fig. 10). This process is related to 21 LACS during natural seed drying, similar to that observed in Sacha Inchi (Wang et al. 2012) and tree peony (Li et al. 2015). For the formation of unsaturated fatty acids, 34 unigenes encoding fatty acid desaturase were identified, including six kinds of FAD (SAD, ∆15D, ∆12D, ∆9D, ∆8D and ∆5D) (Table 7). However, some oil seed plants failed to identify ∆8D, ∆5D, such as Sacha Inchi (Wang et al. 2012), tung tree (Galli et al. 2014) and Yellow Horn (Liu et al. 2013). There were four unigenes expressed over 100 RPKM, in which three (CL32923Contig1, CL27152Contig1 and CL21434Contig1) were annotated to SAD, and the remaining one (CL32549Contig1) to ∆12D, which are correlated with fatty acid composition and content (Zeng et al. 2014), and corresponded to fast oil accumulation stage (Galli et al. 2014; Wang et al. 2012; Rajwade et al. 2014). This was verified by unsaturated fatty acid content, in which C18:1 was the highest and followed by C18:2 (Yang et al. 2016; Zhang et al. 2015; Long et al. 2012).

The major fate of newly synthesized fatty acids is for TAG assembly as acyl-CoA via two different pathways, catalyzed by phospholipid:diacylglycerolacyltransferase (PDAT) and diacyl-glyceroltransacylase (DGTA) (Simpson and Ohlrogge 2016), in which both pathways were present in the two camellia species’ seeds during the natural drying process. However, lack of cholinephosphotransferase (CPT) was observed compared with the normal biosynthesis of triacylglycerol in seed (Baud and Lepiniec 2010; Lock et al. 2009). DAG was converted to TAG in camellia seed with the involvement of 11 DGAT and 5 PDAT unigenes (Table 7), which indicated that DGAT was the main way to camellia seed oil formation. Drying management may brought about over expression of DGAT (Weselake et al. 2008). Oil bodies are the main form of oil in mature seed, internally containing liquid TAG and externally forming phospholipid monolayer (Jolivet et al. 2013). We identified the presence of oleosins and caleosins proteins in camellia seed, and involved in 4 oleosin unigenes (CL734Contig1, CL422Contig1, CL32127Contig1 and CL25464Contig1) with expression over 1000 RPKM in average (Table 7, S4). So oleosin is the most abundant structural protein in camellia oil bodies like other oilseed crops (Hyun et al. 2013; Yu et al. 2015; Parthibane et al. 2012).

Changes of fatty acids biosynthesis and accumulation in C. meiocarpa during natural drying process

During seed natural drying, C. meiocarpa exhibited 3324 more differentially expressed unigenes in T01 vs. T02 libraries, then the number of DEGs steadily declined to 3000 with seed moisture loss, but in T04 vs. T05 libraries the number of DEGs sharply declined to 1626 unigenes (Table 8). This indicates that seed metabolic activity decreased during the drying process and T04 (20% moisture content) was the turning point. Correlation coefficients of DEGs in T01 vs. T02, T02 vs. T03, T03 vs. T04, and T04 vs. T05 were 0.928, 0.972, 0.728, and 0.966, respectively, with T03 vs. T04 showing the lowest relation during natural drying (Fig. 6; Table S5). Additionally, the correlation of T01 with T04 was 0.595 showing the lowest value among all comparisons (Table S5), indicating that T04 (20% moisture) had the highest metabolism.

To understand the lipid metabolism at different moisture content in C. Meiocarpa seeds, we analyzed the role of DEGs with fatty acids biosynthesis and accumulation based on GO enrichment analysis. In the biological processes, stage T01 vs. T02 (40–50% moisture content) showed the highest DEGs number in lipid biosynthetic, long-chain fatty acid metabolic, phospholipid biosynthetic and glycerolipid biosynthetic processes (Table S7), indicating that the 40–50% moisture content had the highest triacylglycerol synthesis. T02 vs. T03 (30–40% moisture content) had the highest DEGs in the regulation of unsaturated fatty acid biosynthetic process and unsaturated fatty acid metabolic process with the highest unsaturated fatty acid synthetic. Additionally, stage T03 vs. T04 (20–30% moisture content) produced the highest DEGs number in 6 GO terms (fatty acid biosynthetic process, cellular lipid, fatty acid, and glycerolipid metabolic processes, lipid modification, and lipid storage) (Table S7). These indicate that maximum fatty acid synthetic and lipid storage occurred at 20–30% moisture content, which is identified by highest change in area fraction of lipid drops at 20–30% moisture content (Fig. 1; Table 2). However, T04 vs. T05 (10–20% moisture content), showed a dramatic reduction in the number of DEGs compared the other three libraries (Table S7), indicating obvious reduction in lipid metabolism. From the molecular function aspect, 5 GO terms over 2 DEGs were detected (fatty acid synthase, acetyltransferase, O-acyltransferase, and N-acyltransferase activities, and lipid binding), confirming that after-ripening processed the fatty acid synthesis and accumulation (Table S7). So for C. Meiocarpa, quantity increase of fatty acids mainly occurred in 30–50% moisture contents and quality increase mainly in 30–40% moisture content.

To identify key unigenes, we comparatively analyzed the DEGs between any two consecutive moisture content samples. A total of 41 significant DEGs in fatty acid synthesis and accumulation were detected, 12 significant DEGs expressed over 10.00 FPKM, in which Group1_Unigene_BMK.31364 (Oleosin) was largest, followed by Group1_Unigene_BMK.39872 (Oleosin) and CL24499Contig1(HAD) (Table S8). And four DEGs (CL21514Contig1(KAR), CL24499Contig1 (HAD), CL5860Contig1 (∆15D) and Group2_Unigene_BMK.25584 (∆8D)) had significance at three stages (Table S8). So during C. meiocarpa natural seed drying process, six unigenes (Group1_Unigene_BMK.31364, Group1_Unigene_BMK.39872, CL21514Contig1, CL24499Contig1, CL5860Contig1 and Group2_Unigene_BMK.25584) were key unigenes.

Changes of fatty acids biosynthesis and accumulation in C. Oleifera during natural drying process

Significant DEGs were observed during seed natural drying process with 2107 unigenes (1093 up-regulated and 1014 down-regulated) in the T07 vs. T08 libraries (Table 8), indicating that the seed vital metabolisms reached its peak during this moisture content transition (i.e., from 40 to 30%). The correlation coefficients of DEGs in T06 vs. T07, T07 vs. T08, T08 vs. T09, and T09 vs. T10 were 0.947, 0.947, 0.961, and 0.897, respectively (Fig. 6; Table S5), with little differences among them, indicating modest rate of the metabolism in C. oleifera.

To understand the lipid metabolism at different moisture content in C. oleifera seeds, DEGs of fatty acids biosynthesis and accumulation were subjected to GO enrichment analysis. The results indicated that in the biological processes, T07 vs. T08 (30–40% moisture) had the highest DEGs number representing 7 GO terms (lipid biosynthetic, fatty acid biosynthetic, lipid modification, fatty acid metabolic processes, cellular lipid metabolic, long-chain fatty acid metabolic and phospholipid biosynthetic) (Table S7). This confirms that the 30–40% moisture content had the highest lipid metabolism, identified by the highest change in area fraction of lipid droplets (Fig. 1; Table 2). The moisture content transition of 20–30% (T08 vs. T09) produced high DEGs that were related to unsaturated fatty acid biosynthetic and metabolic processes, while 10–20% (T09 vs. T10) DEGs were related to lipid storage and glycerolipid biosynthetic and metabolic processes (Table S7), indicating that 20–30% moisture content had highest fatty acid modification and 10–20% moisture content had highest triacylglycerol synthesis and lipid storage (Table S7). From the molecular function side, there were 3 GO terms over 2 DEGs which were involved in acetyltransferase activity, lipid binding and N-acyltransferase activity, indicating that the main activity was related to fatty acid accumulation (Table S7). So for C. Oleifera, quantity increase of fatty acids mainly occurred in 30–40% moisture contents, and quality increase mainly in 20–30% moisture content.

To identify key unigenes involved in fatty acid synthesis and accumulation, we comparatively analyzed DEGs between any two consecutive moisture content samples and a total of 23 significant DEGs showed the largest expression located in Group1_Unigene_BMK.39872 (Oleosin) followed by Group1_Unigene_BMK.31364 (Oleosin) and CL23502Contig1 (∆12D) (Table S9). So during C. oleifera seed drying process, three unigenes (Group1_Unigene_BMK.31364, Group1_Unigene_BMK.39872 and CL23502Contig1) were key unigenes.

Comparison of fatty acids biosynthesis and accumulation between C. meiocaipa and C. oleifera during natural drying process

The two camellia species differed substantially in the number of DEGs, up-regulated, and down-regulated unigenes with the highest number of 4819 unigenes in the T06 vs. T01 libraries (50% moisture) (Table 8). The correlation coefficients of DEGs in T06 vs, T01, T07 vs. T02, T08 vs. T03, T09 vs. T04, and T10 vs. T05 were 0.347, 0.483, 0.464, 0.901, and 0.703, respectively (Table S5; Fig. 6), with T06 vs. T01 being the lowest, indicating that 50% moisture content had the biggest difference in gene expressions in both species.

The two species utilized special enzymes for their lipid metabolism, with C. meiocaipa utilizing BC, BCCP, MAT, KAR, HAD, and FATB for fatty acid biosynthesis, SAD for fatty acid desaturation, GPAT and LPCAT for TAG biosynthesis, while C. oleifera utilized ∆12D for fatty acid desaturation and LPAAT for TAG biosynthesis. However, both species utilized the same enzymes for the very-long-chain fatty acid elongation and lipid storage (Tables S8, S9), indicating that during the drying process C. meiocarpa had more activity in fatty acid synthesis and elongation, fatty acid desaturation of plastid, but C. oleifera had more activity in fatty acid desaturation of ER. This clearly explains the observed higher oleic acid (C18:1, SAD(∆9D)) in C. meiocarpa and linoleic acid (C18:2, ∆9,12D) and stearic acid (C18:0) in C. Oleifera (Yang et al. 2016; Zhang et al. 2015; Long et al. 2012). There were three DEGs (CL6765Contig1 (PDAT), CL21434Contig1 (SAD), and CL19261Contig1 (LACS)) significantly expressed at all moisture content during drying process in the two species. The highest difference was observed for CL21434Contig1 (SAD) with 546.27 FPKM, followed by CL6765Contig1 (PDAT) with 23.18, FPKM and CL19261Contig1 (LACS) with 23.08 FPKM (Table S10). So CL21434Contig1 (SAD) represented the key unigenes in difference of the two camellia species for FA biosynthesis and TAG assemblage.

Conclusions

RNA-Seq analysis of transcript abundances during seed natural drying process in C. meiocarpa and C. Oleifera has facilitated a global investigation of unigene expression in fatty acid synthesis and accumulation at five specific moisture content levels and comprehensively describe the change of the differential transcriptional events that occurred within the two species. We harvested a total of 111,156 unigenes with 74,016 and 76,374 representing C. meiocarpa and C. Oleifera, respectively. A total of 98,398 coding sequence were extracted from BLASTx and were translated into protein sequences, of which 44,886 were annotated in protein databases. A total of 247 unigenes were involved in fatty acid biosynthesis and accumulation. These unigenes provided a comprehensive molecular biology background for fatty acid biosynthesis and accumulation in C. meiocarpa and C. Oleifera by drying management.

In summary, seed drying was helpful to increase the oil content and improve the quality of fatty acid. For C. Meiocarpa, quantity increase mainly occurred in 30–50% moisture contents and quality increase mainly in 30–40% moisture content. For C. Oleifera, quantity increase occurred in 30–40% moisture contents, and quality increase mainly in 20–30% moisture content. For lipid metabolism pathway, C. Meiocarpa focuses on fatty acid synthesis and accumulation, and C. Oleifera on fatty acid accumulation. For key different unigenes expressed in fatty acid biosynthesis and accumulation during natural drying process, there were six key unigenes (Group1_Unigene_BMK.31364, Group1_Unigene_BMK.39872, CL21514Contig1, CL24499Contig1, CL5860Contig1 and Group2_Unigene_BMK.25584) in C. meiocarpa seeds, three key unigenes (Group1_Unigene_BMK.31364, Group1_Unigene_BMK.39872 and CL23502Contig1) in C. Oleifera seeds, CL21434Contig1 between two species.

Finally, it is interesting to point out that while the two camellia species expressed the same unigenes, their expression level differed at different moisture contents. An explanation to this phenomenon is beyond the scope of the present study as further research related to the regulation of RNA expression during lipid metabolism is expected to shed light on the understanding of fatty acid biosynthesis and accumulation, and simultaneously lay a foundation for its artificial regulation during the after-ripening in oil seed.

Author contribution statement

Conceived and designed the experiment: J-LF, HC and YAK. Performed the experiment: J-LF. Data collection and Figures preparation: J-LF, Z-JY, W-W B, S-P C and W-Q X. Data analysis: J-LF. Wrote the manuscript: J-LF and YAK.