Introduction

There are several underutilized and neglected legumes with great potentiality in terms of its nutritional values. One among them is Psophocarpus tetragonolobus (L.) DC., the “soybean of tropics1”. This plant has exceptionally high protein content and has been suggested as a potential food source for the tropics2. Almost all parts of this plant like: leaves, pods, seeds and tubers are edible and rich in protein. Winged bean (Psophocarpus tetragonolobus (L.) DC. is synonymously known as: goa bean or princess bean. It is a diploid (2n = 2x = 18), self-fertilizing legume with diverse use3. Winged bean is a tropical legume and is listed under unutilized or under-exploited category4.

Apart from its nutritional value, P. tetragonolobus contains some pharmacologically active anti-nutrients like: inhibitors, haemagglutinins along with condensed tannins (proanthocyanidins). Condensed tannin (CT) is reported to be present in the seeds of P. tetragonolobus5,6,7,8. Modifications such as hydroxylation, methylation, acylation and glycosylation result in various kinds of flavonoid colours9,10,11,12,13. CTs are synthesized via phenylpropanoid and the flavonoid pathways and are widespread in the plant kingdom. It confers protection against predation and pathogen attack14,15. They contribute to the bitter flavor and astringency and have a significant influence on mouth feel16. But, till date, there are very few reports on condensed tannins (CTs) and its biosynthesis in any of the underutilized plant. The herbages reported to contain CT include sainfonin (Onobrychis viciifolia), horse foot trefoil (Trifolium arvense L.), birds foot trefoil (Lotus corniculatus) and grasslands maku lotus (Lotus pedunculatus). The presence of CT in over 8–10% of dietary dry matter, markedly reduces the nutritive value of the browse shrub mulga (Acacia aneura)17. However, the present study focuses on the role of various genes or contigs in the biosynthesis of CT in this underutilized P. tetragonolobus. The present study and its finding will open up new avenue for understanding the molecular basis of biosynthesis of CT for further improvement of this crop.

In recent years, several studies have successfully reported the generation of transcriptome data for gene discovery in non-model plants for which no reference genome sequences are available18. Due to the availability of quick, low cost sequencing and high quality annotation and assembly tools, it has become possible to analyze and understand the genome of non-model plants. As a non-model legume P. tetragonolobus is carried out for transcriptome sequencing. Condensed tannin is considered as an anti-nutrient. Effective and optimum reduction of CT concentration from the plants will promote the lesser-known underutilized legumes for further consumption where there is an ever increasing demand for food and nutrition security. So, in order to genetically-improve this plant, it is needed to identify and understand the metabolic pathways and the underlying factors or genes regulating the biosynthesis of CT.

Most tracheophytes synthesize condensed tannins in the vacuole as uniformly stained deposits and are termed tannin accretions, these appear as the lining in the inner face of the tonoplast and are formed in the thylakoid-derived organelles in the name “tannosomes”. These are packed in membrane-bound shuttles, it has been suggested that these shuttles agglomerate into tannin accretions. Further, the tannosomes travel from the plastid towards the vacuole in multiple membrane-bound shuttles. So, leaf is the centre of condensed tannin biosynthesis in plants. So, the leaf-tissues of P. tetragonolobus are targeted for transcriptome sequencing.

Recently, de novo assembled trancriptomes of four underutilized legumes namely: Lablab purpurea (Malwi origin), Lathyrus sativus (Indian origin), Psophocrapus tetragonolobus (Nigeria origin) and Vigna subterranean (Africa origin) were carried out and 1139 SSRs out of 33,042 genes and 49,280 contigs of P. tetragonolobus were identified19. Recently the report identifying SSRs and SNPs among the de novo assembled transcriptome data of two Sri Lankan accessions of P. tetragonolobus had been published20. However, most of their studies related to the detection of SSR and SNP molecular markers from transcriptome data of P. teragonolobus from Nigerian and Sri Lankan origin. It was carried out on Roche 454 sequencing platform. But, the present study focuses on the biochemical screening of P. tetragonolobus for condensed tannin and mining contigs or genes responsible for biosynthesis of condensed tannin in P. tetragonolobus. This is the first report of transcriptome study of phenylpropanoid biosynthesis pathway responsible for condensed tannin biosynthesis in P. tetragonolobus. The study may pave path for further manipulation of genes or transcription factors for reducing the content of condensed tannin in the plant to make it more acceptable.

Result

Quantitative and qualitative estimation of condensed tannin in P. teragonolobus

Quantitative estimation of CT through vanillin assay was carried out among 100 accessions of P. tetragonolobus collected from different parts of the world (Supplementary Table 1). The range of CT varies between (3.59 ± 0.31)mg/g DW and (0.27 ± 0.04) mg/g DW (Fig. 1C) and accordingly denoted as high condensed tannin-containing winged bean (HCTW) for accession (EC-38956-1 from Papua New Guinea) and low condensed tannin-containing winged bean (LCTW) for accession (EC-178268 from Malaysia). Histochemical staining of the plant-parts with dimethylamminocinnamaldehyde (DMACA) provided a clear picture of its deposition in different tissues as it developed a deep-blue colour when stained with DMACA (Fig. 1B). Flavan-3-ol (catechin) is inferred to be the basic monomeric unit as per the analysis of the plant-tissues on HPLC-platform and reported the presence of pelargonidin (0.62 ± 0.1 mg/g DW), delphinidin (0.83 ± 0.08 mg/g DW) and cyanidin (0.38 ± 0.06 mg/g DW) in HCTW and pelargonidin (0.32 ± 0.04 mg/g DW), delphinidin (0.13 ± 0.03 mg/g DW), cyanidin (0.08 ± 0.04 mg/g DW) in LCTW P. tetragonolobus (Fig. 1D).

Figure 1: Vanillin Assay and HPLC analysis of HCTW and LCTW plant.
figure 1

(A) low condensed tannin winged bean (LCTW) and high condensed tannin winged bean (HCTW) seeds (B) Localization of condensed tannin in leaf and pod contains seed of HCTW plant (C) Estimation of condensed tannin through vanillin assay (D) Estimation of major metabolites of condensed tannin through HPLC of both HCTW and LCTW plant. (E) Flavonoid pathway showing the major compound and gene responsible for condensed tannin biosynthesis.

Transcriptome sequencing, de novo assembly and functional annotation of contigs

The complementary DNA (cDNA) libraries prepared from the diverse CT leaf tissues of winged bean were sequenced using Illumina Nextseq500 platform. Paired-end sequencing-by-synthesis (SBS) yielded raw data 13901780 (13.9 million) for VS1 and 30155788 (30.15 million) for VS2 in HCTW line and 9090662 (9.09 million) for VS3 and 10394022 (10.3 million) reads for VS4 in LCTW line of P. tetragonolobus. VS1 and VS2 are the replicated leaf transcriptome data for high condensed tannin-containing (HCTW) line of winged bean and VS3 and VS4 represented the duplicate low condensed tannin (LCTW) lines of winged bean. After trimming, number of high quality reads in HCTW line were (VS1) 12604265 (84.01%) and (VS2) 27769919 (92.09%) and LCTW line were (VS3) 8098380 (89.08%) and (VS4) 9518980 (91.58%). The replicated data of HCTW (VS1 and VS2) and LCTW (VS3 and VS4) were clustered and de novo assembled together using Velvet assembler with a hash length of 45. This generated 102586 and 88433 contigs for HCTW and LCTW line of winged bean, respectively. Contig generation was carried out using Oases-0.2.08 with same hash length that resulted in 87925 and 69464 contigs for HCTW and LCTW lines of P. tetragonolobus, respectively. In both cases, average contig lengths were of 713.1 bp and 736.0 bp with N50 values of 1466 and 1567 for HCTW and LCTW lines, respectively (Table 1). Distribution of assembled contig length ranged from 151 to >87925 bases. Maximum number of contigs was of (150–300) bp size with 36646 contigs followed by 13732 contigs of (301–500) bp size in HCTW and LCTW lines of P. tetragonolobus. About (150–300) bp size contigs were highest in number (31631 contigs) followed by 11310 contigs of (301–500) bp size. In both cases, some of the contigs, 1120 in HCTW and 1096 in LCTW were more than 5000 bp size (Fig. 2A). The contigs from both P. tetragonolobus samples were clustered using CD-HIT-v4.5.4 at 95% identity and query coverage resulting in a total of 44972 contigs. A total of 44972 contigs were generated after clustering and they were further annotated with Arabidopsis thaliana, Glycine max and Lycopersicum esculentum (Supplementary Table 2). A total of 25755 contigs were annotated with Arabidopsis thaliana, Glycine max and Lycopersicum esculentum and 115, 1426 and 364 contigs were uniquely annotated (Fig. 2B). A total of 28833 contigs were annotated and 16139 contigs were unannotated.

Table 1 Summary of annotation in HCTW (high condensed tannin winged bean) and LCTW (low condensed tannin winged bean) of P. tetragonolobus.
Figure 2: Secondary assembled Transcript length distribution.
figure 2

(A) Count of Contigs vs Lengh (bp) of Contig (B) venn diagram representing number of transcript data sets unique and shared data set from G. max, S. lycopersicum and A. thaliana after assembling all data of both LCTW and HCTW.

Functional classification and differential gene expression of P. tetragonolobus

In order to assign functions, contigs from HCTW and LCTW were compared against the NR protein database of Arabidopsis thaliana, some members of solanaceae and fabaceae family available at uniport database using blastx algorithm. To investigate the expression level of contigs in HCTW and LCTW, the numbers of clean reads were compared between libraries of each of 44972 assembled contigs through FPKM value. A total of 1235 contigs were found to be differentially expressed between the HCTW and LCTW, among them 183 contigs were up regulated and 1052 contigs were down-regulated (Supplementary Table 3).

Based on the sequence homology, 10492 sequences from HCTW and 674 sequences from LCTW were characterized into 45 functional groups under three main categories i.e. biological processes (BP), cellular component (CC) and molecular function (MF) (Fig. 3). The highest gene expression in other biological and metabolic processes in all the three GO categories followed by cellular component related genes especially ‘cell organization and biogenesis’ related gene (28.1% in HCTW and 20% in LCTW), ‘kinase activity’ is quite higher in HCTW compared to LCTW, ‘protein binding’ (19.1% in HCTW and 8.7% in LCTW), transport related pathway genes (25% in LCTW and 21% in HCTW) and response to abiotic and biotic stimulus (32.6% in LCTW and 31% HCTW) (Supplementary Table 4).

Figure 3
figure 3

Gene ontology classification of unigenes of P. tetragonolobus leaf in HCTW and LCTW line.

KEGG result

To identify the biological pathways and their function in the leaf tissues of LCTW and HCTW, 44968 contigs were assembled from the contrasting lines of P. tetragonolobus and were mapped to the reference canonical pathways in KEGG. A total of 5210 contigs from both HCTW and LCTW lines were assigned to 229 KEGG pathways (Supplementary Table 5). 1861 contigs (35.7%) were involved in metabolic pathways, similarly, biosynthesis of secondary metabolites 995 contigs (19%), 271 contigs (5%) of plant hormone and signal transduction and 360 contigs (6.9%) of ribosome-related proteins and 154 contigs for phenylpropanoid pathway (Supplementary Table 6). As per the KEGG analysis mentioned in Supplementary Fig. 1 and Supplementary Fig. 2, the up regulated enzymes in the HCTW lines were chalcone synthase (EC2.3.1.74) and chalcone isomerase (EC5.5.1.6), peroxidases(EC1.11.1.7), cinnamyl-alcohol dehydrogeanse (EC1.1.1.195), coniferyl aldehyde dehydrogenase (EC1.2.1.68), Beta-glucosidase (EC3.2.1.21), phenylalanine ammonia lyase (EC4.3.1.24). All these enzymes and their corresponding genes were up-regulated in HCTW lines of P. tetragonolobus. Contrarily, S-adenosyl L-methionine: caffeic acid O-methyltransferase (EC2.1.1.68) is down-regulated in LCTW line of P. tetragonolobus.

PageMan analysis21 also showed the different expression profile of anthocyanidin-biosynthesis related pathway genes in HCTW and LCTW lines of P. tetragonolobus (Fig. 4). The Pageman analysis categorized the genes as: up- or down-regulated in HCTW and LCTW lines. LCTW genotype of P. tetragonolobus exhibited repressed or regulated flavonoid biosynthesis. Anthocyanidin biosynthesis genes were up regulated in HCTW line whereas, in LCTW line certain genes were down regulated e.g. the dihydroflavonol-4-reductase (DFR) gene was down regulated in LCTW line of winged bean. It had also been observed that, some genes related to unknown secondary metabolite biosynthesis like: sulphur containing glucosinolate genes were down regulated in LCTW line as compared to HCTW line. Similarly, certain isoprenoids non-mavalonate pathway related genes were up regulated in HCTW as compared to LCTW line. From these observations, it can be inferred that, there is a cross-talk of different pathways in the biosynthesis of condensed tannin. But, the present study relates its study to phenylpropanoid pathway and the expression of the genes corresponding to this pathway.

Figure 4
figure 4

Phenyl propanoid pathway gene up and down regulated through PageMan.

Detection of transcription factors in winged bean

Among all the de novo assembled contigs fifteen different types of transcription factor (TF) families were reported. Among these reported TFs, 33 contigs encode for HCTW and 5 contigs encode for LCTW line P. tetragonolobus. The maximum number of contigs were reported for C2H2 family and WRKY family TFs. The transcription factors were more frequently present in HCTW as compared to LCTW line (Fig. 5). However, bHLH group of transcription factors were only present in HCTW line of P. tetragonolobus.

Figure 5
figure 5

Number of Transcription factor (TF) families showed in HCTW and LCTW line of P. tetragonolobus.

Identification of simple sequence repeats (SSRs)

P. tetragonolobus contigs of HCTW and LCTW were analyzed and 2237 and 1618 SSRs were reported between LCTW and HCTW lines respectively. Trinucleotide repeats were maximum at 881 and 663 SSRs markers in HCTW and LCTW respectively. Whereas, pentanucleotide repeats were minimum i.e. 16 and 11 in HCTW and LCTW lines respectively (Table 2).

Table 2 SSRs detected of LCTW and HCTW through MISA.

Genes reported for condensed tannin biosynthesis in P. tetragonolobus

Precursor molecules for condensed tannin biosynthesis were reported to be derived from the phenylpropanoid pathway. Uniport annotations against members of solanaceae family were used to identify genes encoding enzymes involved in different steps of phenylpropanoid and flavonoid backbone biosynthesis in P. tetragonolobus. KEGG analysis showed that 10 CT-biosynthesis genes especially, anthocyanidin synthase (ANS), 4-coumarate-CoA ligase (4-CCL), chalcone synthase (CHS), chalcone—flavonone isomerase (CHFI), chalcone isomerase (CHI), cinnamyl alcohol dehydrogenase (CAD), dihydroflavonol 4-reductase (DFR), cinnamoyl CoA reductase (CCR), phenylalanine ammonia-lyase (PAL) and anthocyanidin 3-O-glucosyltransferase (A3GT) had lower expression in case of LCTW in leaf tissues as compared to HCTW gene expression of P. tetragonolobus. CHS and ANS genes were more expressed in HCTW plant. A3GT gene which is involved in glycosylation of anthocyanidin compounds are responsible for biosynthesis of cyanidin compounds in plant22, this was also reported to have higher expression in HCTW plant. The gene analyses showed that, HCTW line have higher level of polymerization properties of condensed tannin. Among the, 26 contigs of CCL in which 14 contigs were highly expressed and 2 contigs were in lower expression in HCTW and 10 contigs were highly expressed and 8 contigs in lower expression in LCTW line, 3 contigs of ANS and 5 contigs of A3GT genes were highly-expressed in HCTW line as compared to LCTW and showed that the genes responsible for condensed tannin biosynthesis were highly expressed. A3GT and ANS were the major genes for cyanide compounds and its glycosylation23 (Fig. 6). 2, 6 and 1 contigs were present in CHS, CHFI and CHI gene respectively which is the initial enzyme for synthesis of condensed tannin and its higher expression were reported in HCTW line. DFR gene had 3 contigs and highly expressed in HCTW as like ANS contigs as DFR is the major responsible gene for ANS biosynthesis (Fig. 7). Similarly in case of PAL and CAD gene, there were 7 and 9 contigs respectively. The expression of all contigs in PAL gene were highly expressed in HCTW whereas, CAD contigs showed mixed type of expression but 6 contigs were highly expressed in HCTW line.

Figure 6
figure 6

Contigs involved in flavonoid biosynthetic pathway in leaf tissue of P. tetragonolobus.

Figure 7
figure 7

Heat map showing expression of different contigs evolved in condensed tannin biosynthesis(A) PAL, (B) 4CAL, (C) CHS, (D) CHI, (E) CHFI, (F) DFR, (G) ANS and (H) A3GT:The values in red and green indicate log2 fold increase and decrease respectively.

Real time validation of the genes involved in the condensed tannin biosynthesis in P. tetragonolobus

For conformation and validation of differential expression of contrasting lines of P. tetragonolobus from FPKM analysis 12 contigs from phenylpropanoid pathway were chosen for qRT-PCR. These contigs were: contig_86640 (A-3GOT), contig_39313 (ANS), contig_37704 (CHI), contig_4538 (CHS), contig_50751 (CFI), contig_29334 (CCR), contig_11480 (CAD), contig_29894 (4CCL), contig_23916 (DFR), contig_34164 (PAL) which was more FPKM value in HCTW in comparative expression with LCTW contigs. In the HCTW line, mostly the genes of flavonoid biosynthesis were down regulated. The qRT-PCR data revealed the expression profile of these contigs by FPKM value (Fig. 8).

Figure 8
figure 8

(A) validation of FPKM analysis of the 10 contigs involved condensed tannin biosynthesis in LCTW compared to HCTW. (B) Showing that qPCR validation of both HCTW and LCTW with 10 contigs.

Discussion

In this study, we explored the genes and TFs responsible for condensed tannin biosynthesis in the underutilized P. tetragonolobus. Each contrasting-line expressed and accumulated varying concentration of condensed tannin. Tannins are defined as a group of phenolic metabolites of varying molecular weight (500–30000 Da). Proanthocyanidins are major group of tannin which are deposited in seed and leaves of plants. Presence of high content of tannins in the presently studied wild legume seeds might be due to the metabolism of monomeric polyphenolic compounds (catechin) or polymerization of existing phenolic compounds during development and maturation24.

According to Serrano25 the estimated amount of condensed tannin for daily intake for a person is 53.6 mg/person/day. There are a lot of epidemiological data suggesting, tannin intakes which may prevent the onset of many chronic diseases. But, in case of winged bean HCTW plant seeds contained 3.5 mg/g by dry weight. If people consume it as a source of protein it will cause indigestion. Optimum intake of condensed tannins or proanthocyanidins (PAs) are considered as functional ingredients in nutritional supplements and therefore, they are presently attracting more attention.

Genome sequences of legume species of L. japonicus26, soybean27 and pigeonpea28 are now available and the corresponding data may be exploited for interpretation of transcriptome resources from unknown or less revealed data such as winged bean, to support gene annotation and comparative genome analysis. RNA-Seq technique has a lots of application for analyzing a transcript, including exploration of different biological processes of different phenotype of plant, RNA-seq technology has previously been used to characterize the transcriptomes of a number of plant species, including pea and soybean29,30. Our experiment was designed in a manner to find the genes responsible for condensed tannin biosynthesis in leaf through Illumina sequencing. Illumina sequencing (de novo sequencing) platform has been mostly used for transcriptome analysis of plants which has no previous reference genome31,32,33. Result of winged bean data and the lengths of contigs generated using Illumina platform is similar to Vigna radiata and Pisum sativum transcriptomes. They have been reported with varied lengths of 874 bp and 809 bp respectively34,35 and were recently published. The winged bean genotypes CPP34 and CPP37 also showed the average contig size is 837 and 823 bp respectively20.

TRINITY assembler generated high number of primary assembled contigs from recently duplicated genes which lead to the generation of similar contigs36. The small sized sequences (~100–200 bp) may be tough to BLASTX analysis, and so the short sequences remained as unpredicted whose function was unknown37. Sequence similarity of contigs to the genomes and transcriptomes of other legume and tomato plant were determined using BLASTn analysis, exploring the levels of conservation up to 63%. Comparable results were observed for the winged bean transcriptome in similarity searches against other legumes38. Moreover, up to 63% of the annotated contigs were similar to legume genomes. BLAST analysis showed a highest level of similarity to sequences of Glycine max, followed by Ricinus communis, Vitis vinifera and Populous trichocarpa. The HCTW and LCTW lines also showed similarity with Sorghum bicolor (Fig. 9). Winged bean, Vigna radiata and soybean all belongs to the tropical warm-season legume, and may be mutually more closely related, within the Papilionoideae39.

Figure 9
figure 9

Organism sequence similarity in case of both transcript data (A) HCTW and (B) LCTW lines of P.tetragonolobus.

Gene ontology (GO) is a bioinformatics-based gene functional classification system offering an updated and a strictly defined concept to comprehensively describe the properties of genes and gene products of an organism40. Our transcriptome data showed similar result with winged bean recent transcriptome data i.e. molecular function (MF), biological process (BP) and cellular component (CC) categories which was 46%, 37% and 16.8% respectively20. Kinase activity and DNA binding had more number of contigs in winged bean transcriptome showing the similar result of HCTW line of winged bean.

In case of KEGG analysis, well-represented pathways in the winged bean transcriptome included those involved in metabolic pathway, biosynthesis of secondary metabolites, flavonoid biosynthesis, carbohydrate metabolism and energy metabolism. All the major genes involved in the flavonoid biosynthesis pathway were identified and majority of them were present in HCTW lines of winged bean as compared to LCTW plant. Vigna radiata and Pisum sativum transcriptome data showed similar KEGG profile and had similar result with our data34,35. Transcription factor families were varied in both type of plant-lines. Transcription factor bHLH was predominantly present in HCTW line. Other studies showed that this TF family is most regulating factor for flavonoid biosynthesis41,42. Recently published winged bean transcriptome data of African variety showed ten transcription factors family in which WRKY family was near to similar in HCTW plant20. Simple sequence repeats (SSRs) marker was frequently used for diverse genetic analysis by plant breeders due to co-dominant and highly polymorphic nature43. Our result showed 2237 SSRs in HCTW and 1618 SSRs in LCTW, which was near to transcript data showed by other legumes19. Trinucleotide repeats were more predominantly present in our data like other legumes medicago trancatula and Arachis hypogaea44,45.

Accumulation of proanthocyanidin polymers within plant vacuoles is synthesized in the chloroplast and then transported to vacuoles of cell46. Tian (2008) reported the chalcone synthase gene expression and localization in chloroplast47 and Wang (2010) also localized the anthocyanidin synthase in plastid and vacuoles48. The gold immunolabelling experiment confirmed the presence of three enzymes of phenylpropanoid and flavonoid pathways like: Cinnamate -4-hydroxylase (C4H)49, chalcone synthase (CHS)47 and anthocyanidin reductase48 in the chloroplast of developing grape berries. Saslowsky and Winkel-Shirley (2001) suggested that, flavonoids are synthesized from phenylpropanoids by a multienzyme complex which is closely bound to the cytosolic face of the endoplasmic reticulum50. Few ultrastructural data also supports the production of phenolic compounds by the chloroplast46. The flavonoid monomers generally accumulated in the vacuolar storage pool51 through different possible routes52. However, the mechanism of polymerization of condensed tannins is yet to be elucidated53 while the nature of precursor molecules remains hypothetical54.

Some reports coined that chalcone synthase and anthocyanidin synthase are the possible precursor of condensed tannin52,53. So, we selected leaf tissue of HCTW and LCTW lines of winged bean. The seed of the contrasting lines showed variable metabolite concentration. As we have validated certain genes from leaf-tissues of P. tetragonolobus so, it is proposed that, the condensed tannin biosynthesis takes place in leaf tissues and subsequently might be transported to seed (Fig. 10)46. As transcriptome PageMan and pathway analysis showed that the anthocyanidin synthase and chalcone synthase genes were more expressed in HCTW line. Similarly, these genes were less expressed in LCTW lines of P. tetragonolobus.

Figure 10: Expression of representative ANS gene showing different level of condensed tannin in different tissue.
figure 10

Schematic of possible mechanisms of increased level of ANS gene expression showed the condensed tannin deposition in HCTW line.

The genes and transcription factors as well as markers identified in HCTW and LCTW lines of winged bean transcriptome study may further be utilized in generating silencing line and may be utilized for MAS breeding programme for reducing the anti-nutrient condensed tannin and improvement of this underutilized legume for enhanced consumption and utilization.

Methods and Materials

Plant material

Plants of P. tetragonolobus were procured from ICAR-National Bureau of Plant Genetic Resources, Akola, Maharashtra, India and was maintained at the botanic garden of CSIR-National Botanical Research Institute, India for further screening and experiment.

Quantitative and qualitative estimation of condensed tannin in the plant

Vanillin assay

The total condensed tannin was determined by colorimetric method as well as modified vanillin-hydrochloric acid assay developed by Hagerman55. The deduced value was confirmed thrice. The highest and lowest condensed tannin-containing lines of P. tetragonolobus were selected for further study and anlaysis.

HPLC analysis of leaf extracts

The leaves of high and low-CT containing-lines (HCTW i.e. High condensed tannin winged bean and LCTW i.e low condensed tannin winged bean) were quantified by standard HPLC method56. The standards cyanidin, pelargonidin and delphinidin were used for further compositional quantification.

Histochemical staining with dimethylamminocinnamaldehyde (DMACA)

The histochemical staining of condensed tannin (CT) in leaves was carried out by staining tissues in a solution of ethanol: 6 M HCl (1:1) containing 0.1% (w/v) dimethylamminocinnamaldehyde (DMACA) (Sigma-Aldrich) for 3 to 6 min57.

RNA isolation, library preparation and sequencing

Leaf tissues of diverse CT- containing lines (VS1, VS2 biological replicate of HCTW and VS3, VS4 biological replicates of LCTW) of P. tetragonolobus were collected from two months old plants grown in the garden of CSIR-National Botanical Research Institute. RNA was isolated from leaf tissues through RNA isolation kit (Sigma). The isolated RNA was treated with DNase enzyme for removal of contaminant DNA. The quality and quantity of total RNA was calculated through Bioanalyzer (Agilent). Good quality RNA (RIN value > 7) was further subjected to analysis and preparation of cDNA. The cDNA were amplified according to Illumina True-Seq protocol and sequenced on Illumina Nextseq500 platform. The samples were run in duplicates. The transcriptome library for sequencing was carried out according to Illumina Trueseq RNA sample preparation guide (part # 1004898). The enriched poly A RNA (1000 ng) was prepared using RNA purification beads which were fragmented for 4 min at elevated temperature (94 °C) in presence of divalent cations and reverse transcribed with superscript III reverse transcriptase by priming with random hexamers. Second strand cDNA was synthesized in presence of DNA polymerase I and RNaseH. The cDNA was cleaned up using Agencourt Ampure XP SPRI beads (Beckman Coulter, USA) followed by ligation of “Illumina adapters” to the cDNA molecules after end repair and addition of “A”-base. Following SPRI cleanup after ligation, the library was amplified using 11 cycles of PCR for enrichment of adapter ligated fragments. This method of library-preparation had been followed as per the manufacturer’s protocol. The prepared library was further quantified using nanodrop and validated for quality by running an aliquot on high sensitivity bioanalyzer chip (Agilent). There were total four libraries prepared for further processing and analysis. Two libraries for high condensed tannin-containing winged bean HCTW (VS1, VS2) and the other two libraries for low condensed tannin-containing winged bean LCTW (VS3, VS4).

De novo assembly and annotation

Raw reads which were generated after sequencing were in fastq file format. The fastq files were trimmed to remove adapter sequences using fastq-mcf tool. The fastq file were filtered to remove reads having average quality score less than 20 in any of the paired end data. The trimmed reads were then de novo assembled using TRINITY58 with default options. The raw reads data is available at the NCBI public domain with accession number SRA293442 for HCTW variety (VS1 and VS2) and SRA293454 for LCTW variety (VS3-VS4). The assembled contigs were annotated with the NCBI-NR and TAIR10 protein database using BLASTX with e-value < 10−5. The data was also annotated with Lycopersicum esculentum and Glycine max genomic database with same parameters. Non-redundant sequence dataset was generated using CD-HIT (Cluster Database at High Identity with Tolerance) by removing homologous sequences with identity threshold of (0.95 or 95%).

Sequence annotation and functional characterization

Differential gene expression analysis was performed using DEGseq program and FDR value was retrieved from q-value program in R-Bioc. The contigs with p-value < 0.05, FDR < 0.05 and Fold change <−1 or >1 were selected. The GO annotation was performed using the TAIR10 IDs for the annotated contigs using agriGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php). The GO annotation was retrieved using singular enrichment analysis (SEA) statistical test at p-value < 0.05.

Biological pathway annotation

The PageMan analysis was performed using the differential genes which were annotated with the TAIR10 protein database. The analysis was performed with Benjamini Hochberg multiple testing correction and average statistics method. All the TAIR10 IDs of single assemblies were used to extract the KO IDs from the KAAS database (http://www.genome.jp/tools/kaas/). These KO IDs were submitted in KEGG (http://www.genome.jp/kegg/) to extract all the metabolic pathways59.

Detection of SSR molecular markers

Total Contig of HCTW and LCTW were analyzed with using Micro-satellite identification tool (MISA) for identification of simple sequence repeats (SSRs) having mononucleotides to hexanucleotides repeats.

Validation of gene expression by RT-PCR

Total RNA from both plants was extracted and treated with RNase-free DNase I (Ambion). First-strand complementary DNA was synthesized using 5 μg of total RNA with oligo (dT) primer (Fermentas). The qRT-PCR was done in the Step OnePlus Real-Time PCR System (Applied Biosystems 7500). Primers used for qRT-PCR are listed in Supplementary Table 7. Contig abundances were calculated using Cufflinks60 ver.0.93 and the output normalized expression values in FPKM (Fragments per kilo-base of exon per million fragments mapped) analogous to single-read FPKM61 were used for further comparative analysis. To confirm the expression level using FPKM, 12 genes were chosen as cases for PCR validation. Each reaction contained cDNA (1 μl) and Fast SYBR Green master Mix (Applied Biosystem, USA) 10 μl in a final volume of 20 μl. The reactions were performed using the following cycle conditions, an initial 94 °C for 2 min, followed by 30 cycles of 94 °C for 30 s, 60 °C for 30 s, and 72 °C for 30 s, and the final 5 min extension at 72 °C. The mRNA levels were normalized with 18 s rRNA gene62. All experiments were used in triplicates for each sample and relative gene expression levels were calculated using 2−ΔΔCT method.

Additional Information

Accession codes: Transcriptome data of both diverse genotype of winged bean supporting the conclusion of the article are available in NCBI SRA repository as an accession no. SRA293442 for HCTW and SRA293454 for LCTW plant. Dataset of these transcriptome are available as excel file.

How to cite this article: Singh, V. et al. De novo sequencing and comparative analysis of leaf transcriptomes of diverse condensed tannin-containing lines of underutilized Psophocarpus tetragonolobus (L.) DC. Sci. Rep. 7, 44733; doi: 10.1038/srep44733 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.