Introduction

Mungbean (Vigna radiata L.) is a grain legume crop widely cultivated in Asia as an important source of nutrients such as proteins, amino acids, folate, and iron1. Because mungbean plants are fast-growing and can fix atmospheric nitrogen through symbiosis with Rhizobium, cultivating mungbean in rotation with other grain crops increases the yield of the subsequent crop and decreases the incidence of pests2. Despite the agronomic and nutritional importance of mungbean, desirable agronomic traits such as determinate growth habit, simultaneous flowering, and synchronous pod maturity (SPM) have not been thoroughly achieved in modern breeding programs.

Mungbean exhibits nonsynchronous pod maturity3. If mungbean is harvested only once during the entire growing season (at the R6 growth stage, when most pods reach maturity), approximately 50% of the yield potential is lost because mungbean plants continue flowering and producing pods4. Delayed harvest also causes substantial yield losses because mungbean plants are more susceptible to pests and pathogens at later growth stages, and mature and dried pods are more likely to shatter. Therefore, a single mungbean crop must be harvested multiple times to reduce yield loss. Moreover, each harvest must be conducted carefully to avoid any damage to plants, which makes mechanical harvesting inefficient. Although the genome of mungbean has been sequenced5, the genetic or epigenetic basis of SPM remains unknown because SPM is a complex trait affected by multiple factors and few quantitative trait locus (QTL) mapping or gene cloning studies have been conducted in mungbean.

Phenotypic variation is determined by the plant genotype, environment, and genotype × environment interaction. DNA methylation, a heritable epigenetic modification, also contributes to phenotypic variation6. Epialleles are classified into three major groups, based on their dependence on the genotype: obligate epialleles, which are completely dependent on genetic variation; pure epialleles, which are independent of any genetic variation; and facilitated epialleles, which originate from genetic variants but do not necessarily depend on genetic variation for maintenance7. However, the effect of epigenetic variation on the phenotype remains unclear because only a few epigenetic marks have been characterized to date8. In soybean (Glycine max L.), the model species closest to mungbean, carbohydrate metabolism pathway genes are enriched in epigenetic variation9. In tomato (Solanum lycopersicum L.) fruits, vitamin E content is determined by epialleles10. Although genome-wide DNA methylation patterns have been investigated in two mungbean genotypes, VC1973A and V2984, epigenetic factors associated with any agronomic trait have not been studied in mungbean5,11.

In this study, we aimed to characterize the epigenetic variation associated with SPM in mungbean. Whole genome resequencing and bisulfite-sequencing of four high SPM recombinant inbred lines (RILs), four low SPM RILs, and two parental genotypes (VC1973A and V2984) were conducted to determine genome-wide DNA methylation patterns. The mungbean cultivar VC1973A is widely grown in Asia and exhibits high SPM, whereas V2984 is a Korean landrace with low SPM3. Comparison of these DNA methylation profiles between high vs. low SPM lines revealed pure differentially methylated regions (DMRs), independent of genetic variation. Genes in these DMRs showed differential expression levels between the parental genotypes VC1973A and V2984. These results advance our understanding of SPM in mungbean and of the contribution of epialleles to phenotypic variation.

Results

Variation in the SPM index of the RIL population

A total of 187 RILs were harvested weekly for 8 weeks (Figure S1). The SPM index (range: 0 to 1) was calculated for each RIL. A value of SPM close to 1 theoretically indicates that all pods mature at the same time. The number of pods in VC1973A (maternal parent) was the highest at week 2 (SPM index: 0.862), while the number of pods in V2984 (paternal parent) peaked again at week 7 (SPM index: 0.487) (Fig. 1, Table S1)3. Among the 187 RILs, we selected the four RILs (SK10, SK68, SK186, and SK49) with the highest SPM index (0.792, 0.783, 0.772, and 0.764, respectively) and the four RILs (SK94, SK152, SK176, and SK156) with the lowest SPM index (0.394, 0.374, 0.364, and 0.319, respectively) for bisulfite-sequencing and resequencing analyses with their parental lines. Groups of RILs with high and low SPM indices showed similar patterns of weekly harvests as VC1973A and V2984, respectively, indicating that the group with a high SPM index showed higher synchronicity in pod maturity (Fig. 1).

Figure 1
figure 1

Weekly harvest for SPM index calculation. Ten plants per line were harvested weekly. X and Y axes indicate weeks and pod numbers weekly harvested, respectively. A value of SPM close to 1 theoretically represents that all pods matured at the same time. (A) VC1973A and four lines that show the highest SPM. (B) V2984 with four lines that show the lowest SPM3.

Identification of DMRs associated with SPM via bisulfite-sequencing

To identify candidate epialleles involved in the regulation of SPM, the selected ten lines, including eight RILs and two parental genotypes, were subjected to Methyl-Seq after bisulfite conversion (Table S2). The bisulfite conversion efficiencies ranged from 98.5 to 99.0%, which were sufficiently high for subsequent biological applications such as Methyl-Seq analysis, as reported previously12. A total of 488, 336, and 406 DMRs in contexts of CHG, CHH, and CpG methylation were identified (Figure S2), of which 134 (36%), 160 (19%), and 65 (33%) were located at genic regions, respectively, including upstream sequence (− 2 kb from the transcription start site), 5′ untranslated region (5′-UTR), coding sequence (CDS), intron, 3′-UTR, and downstream sequence (+ 2 kb from the transcription termination site) (Table 1). After the removal of genes redundant in DMRs of all methylation contexts, 211 genes were identified as proximal to DMRs. Among the genes in genic DMRs associated with SPM, gene ontology (GO) analysis revealed the enrichment of reproductive structure development and transcription factor activity-related GO terms such as ‘nucleotide binding’ and ‘catalytic activity’ (Figure S3). Additionally, ‘metabolic pathways’, ‘biosynthesis of secondary metabolites’, and ‘plant hormone signal transduction’ were the top three enriched pathways in the KEGG database (Table S3)13.

Table 1 The numbers of DMRs.

Methylation levels at genic DMRs

The distribution of methylation at genic regions, including upstream and downstream sequences of genes in mungbean, was consistent with that in Arabidopsis and soybean (Fig. 2)14,15. A slight increase in methylation was detected in the middle of the gene body, and a steep increase was observed in upstream and downstream sequences in mungbean. On the basis of differences in methylation levels between high and low SPM groups, we classified genic DMRs into two sets: DMRs with significantly higher methylation in the high SPM group than in the low SPM group (hhDMRs), and DMRs with significantly higher methylation in the low SPM group than in the high SPM group (hlDMRs). In both DMR groups, differences in average methylation levels for CpG and CHG between high and low SPM lines were higher in gene bodies than in upstream and downstream regions. Cytosine residues methylated in the CpG context showed the largest difference in methylation levels in genic DMRs, whereas those methylated in the CHH context showed the lowest difference in methylation levels (Table S4). This result was consistent with the previous observation that methylation level of CHH had the least effect on gene expression levels in mungbean11.

Figure 2
figure 2

Average methylation levels of cytosines on genic DMRs (A) where methylation level is significantly higher in high SPM group than low SPM group (hhDMRs) and (B) where methylation level is significantly higher in low SPM group than high SPM group (hlDMRs). Solid and dotted lines indicate methylation levels of DMRs in high and low SPM group, respectively. Upstream region, gene body and downstream region consist of 20 bins and average methylation levels are calculated over three sliding windows.

Genetic diversity and pure genic DMRs

To identify pure DMRs, where no SNPs are significantly associated with methylation levels, approximately 5 Gb of Illumina raw data were generated for each RIL (Table S5). Raw sequence reads of eight RILs and V2984 (paternal genotype) were mapped against the mungbean reference genome (VC1973A; maternal genotype)5. The number of SNPs and Indels in each line ranged from 269 to 798 k and 26 to 134 k, respectively. Out of 282,517 SNPs identified among the ten genotypes, 40,178 SNPs co-segregated with high and low SPM groups. A total of 1081, 1343, and 397 SNPs were located within ± 1 kb of the DMRs in CpG, CHG, and CHH contexts, respectively; among these, 1059, 1248, and 247 SNPs, respectively, were significantly associated with methylation levels of the DMRs (p < 0.01) (Table 2, Table S6). Thus, three CpG, 18 CHG, and 28 CHH pure DMRs were revealed, of which one CpG, ten CHG, and ten CHH were pure genic DMRs located within ± 2 kb sequence of 20 genes. These genes included Vradi11g00660 (log2fold-change [log2FC] = 4.51; encodes an expressed protein), Vradi09g08580 (log2FC = 2.28; encodes a receptor-like protein), Vradi01g11940 (log2FC = − 2.14; encodes a peroxisomal membrane protein), Vradi03g00420 (log2FC = 1.86; encodes a gibberellin receptor GID1L2), Vradi08g18940 (log2FC = 0.94; encodes an F-box domain containing protein), and Vradi01g09470 (log2FC = 0.92; encodes a transmembrane protein) (Table 3, Fig. 3)11. The difference in methylation status possibly affected the difference in expression levels of these genes between VC1973A and V2984, independent of nucleotide sequence variation.

Table 2 The number of SNPs within DMRs.
Table 3 List of genes in pure genic DMRs.
Figure 3
figure 3

Comparison of average methylation levels of four specific genes between high and low SPM RIL groups. The average number of methylated and unmethylated cytosines of the genes proximal to pure genic DMRs are indicated as red and green, respectively. Left and right columns represent high and low SPM groups, respectively. Yellow, black and blue bars represent exon, intron and UTR. Negative log2FC indicates higher expression in V2984.

Discussion

There is an increased interest in utilizing epialleles as a potential breeding resource for harnessing previously unassessed heritable variation affecting plant phenotypes15. Because differential DNA methylation, together with genetic variation, can affect gene expression and thus phenotypic variation, DNA methylation should be considered as a critical factor in molecular breeding6,14.

To measure SPM in mungbean, we had tried three different approaches. The first approach was the length of productive days indicating each plant produces at least three pods when harvested each week. Synchronous plants should have shorter productive day compared to non-synchronous plants. The second approach was the ratio between the highest pod number among weekly harvests and total pod number. Synchronous plant should have a value close to one. These two approaches did not represent SPM well because of the presence of the second peak of weekly harvest (Fig. 1). The third approach was SPM index used in this study, which showed synchronicity of pod maturity the best among three approaches.

To elucidate the epigenetic effect on SPM in mungbean, DMRs were identified between high and low SPM groups, and genic DMRs were subsequently selected for bisulfite-sequencing and resequencing analyses (Table 1). Transcription factor activity-related GO terms were enriched among the 211 genes proximal to SPM-associated genic DMRs, and 17 of these genes encoded transcription factors (Figure S3, Table S7). Because ‘plant hormone signal transduction’ was one of the enriched KEGG pathways in our data (Table S3), three genes, including Vradi07g03990, Vradi08g06860, and Vradi0284s00060 encoding bZIP, AP2, and ARF transcription factors, respectively, were mapped to auxin, abscisic acid, and ethylene-related signal transduction pathways, respectively, suggesting that these plant hormones participate in the regulation of SPM in mungbean (Figure S4)13.

To investigate whether gene expression was affected by nucleotide sequence variation or not, we identified 544 SNPs within 2 kb upstream and downstream of 20 genes proximal to 21 pure DMRs (Table 3). These DMRs were located in upstream, CDS, intron, and downstream regions of the 20 genes. The number of SNPs in each gene varied from 1 to 100 (Figure S5). Most SNPs (> 80%) were located in non-coding regions of genes (Table S6). Only 24 SNPs were missense variants, and one SNP resulted in a premature stop codon, indicating that most nucleotide variants around the genes do not have a significant impact on the expression level of the 20 genes.

Flowering pathways have been well studied in model plant species: photoperiod, autonomous, vernalization and gibberellin pathways16. Gibberellins are plant hormones that regulate various developmental processes, including flower induction and development, seed development, and fruit senescence17. Plants flower in response to gibberellins, and DELLA proteins negatively regulate this gibberellin signaling (Fig. 4)18. Degradation of DELLAs is mediated by gibberellins. Out of the 20 genes, we were able to retrieve four genes, where pure epialleles possibly affected gene expression levels (Fig. 3). These four genes, including Vradi01g09470, Vradi01g11940, Vradi03g00420, and Vradi09g08580, showed the highest log2FC values (0.92, − 2.14, 1.86, and 2.28, respectively) between VC1973A and V2984, and their Arabidopsis orthologs have been well characterized. Vradi03g00420 encodes a gibberellin receptor, which is a positive regulator of the gibberellin-mediated signaling pathway in Arabidopsis (Table 3)19. At4G08850, an Arabidopsis ortholog of Vradi09g08580, which encodes a receptor-like protein, is regulated by DELLA proteins in flowering buds18. Vradi01g09470 and Vradi01g11940 are annotated as encoding a transmembrane protein and peroxisomal membrane protein, respectively, which may participate in communication among cells20,21. These results suggest that the expression levels of these genes are influenced by epigenetic factors and gibberellin-mediated signaling pathways involve in synchronicity of pod maturity in mungbean. About 1 kbp genomic regions around the DMRs for these four genes were confirmed by Sanger sequencing (Table S8).

Figure 4
figure 4

Gibberellin mediated flowering pathway. Plants flower in response to gibberellic acid and DELLA proteins negatively regulate this pathway. When DELLA proteins are degraded, the pathway is turned on. GA indicates gibberellic acids.

In this study, we identified pure genic DMRs between high and low SPM groups in a mungbean RIL population. Although the data was collected without environmental repeat, phenotypic variation was normally distributed (p-value < 0.01), widely ranged from 0.319 to 0.862 among 187 lines. Based on pure epialleles independent of genetic variation, epigenetic factors underlying SPM in mungbean were identified, which would not be detected by nucleotide-based analysis. The data consistently indicated that various transcription factors and receptor proteins regulate SPM in mungbean via plant hormone (especially gibberellin)-mediated signaling pathways. However, our study has a limitation for estimating environmental variance of SPM, as phenotypic data was collected in a single environment without replication. Functional validation for the candidate genes responsible for SPM and phenotypic data with environmental repeats are still needed to fully elucidate SPM in mungbean. Overall, our results will help to understand the potential of epialleles as a possible breeding resource for improving the SPM phenotype of mungbean elite cultivars.

Methods

Plant materials and phenotyping

187 F10 RIL population derived from a cross between VC1973A (maternal parent) and V2984 (paternal parent) was planted at Seoul National University Experimental Farm in Suwon, Korea (37°16′12.7″N, 126°59′19.2″E) in 15 July 201622. Natural photoperiod and average temperature from July to September 2016 in Suwon were 11.9–14.7 h per day and 22.7–27.7 °C, respectively. The mungbean cultivar VC1973A was developed at the World Vegetable Center (AVRDC) in 1982, and the Korean landrace V2984 is mostly grown in the Kyung-Ki province of South Korea. Genomic DNA was extracted from the first trifoliate leaf of ten mungbean genotypes, including eight RILs and two parental genotypes, using GeneAll Exgene Plant SV Kit (GeneAll Biotechnology, Co., Ltd., Korea). 10 plants for each genotype were harvested once a week for 8 weeks from August 26 to October 14, 20163. The SPM index was calculated as the highest sum of pod numbers of 2 consecutive weeks divided by total pod numbers. Genotypes with the highest and lowest SPM indices were selected for bisulfite-sequencing and resequencing analyses. Resequencing was performed on the Illumina HiSeq4000 platform.

Bisulfite-sequencing

Libraries for bisulfite-sequencing were constructed using TruSeq DNA Methylation Kit (Illumina, USA), after bisulfite conversion using Zymo EZ DNA Methylation Gold Kit (Zymo Research, USA), according to the manufacturer’s guidelines. Methyl-Seq was performed on the HiSeq X platform (Table S2). For each library construction, 0.1% lambda DNA (Invitrogen, USA) was added to plant genomic DNA. Methyl-Seq reads were mapped onto the in silico bisulfite-converted lambda DNA genome, and the efficiency of bisulfite conversion was calculated as the ratio of the number of converted cytosine residues to the total number of cytosine residues (including converted and unconverted).

Identification of DMRs

Raw Methyl-Seq reads were mapped onto the in silico bisulfite-converted mungbean reference genome using Bismark-0.20.023. To minimize potential clonal bias by PCR amplification, reads that mapped to the same position were removed. Cytosine residues with three or more reads were selected. DMRs were identified using Metilene with default parameters24. DMRs located within ± 2 kb of gene sequences were defined as genic DMRs. Arabidopsis thaliana orthologs of mungbean genes containing DMRs were mapped to the Kyoto Encyclopedia of Genes and Genomes database (KEGG; https://www.genome.jp/kegg/tool/map_pathway1.html)13 . GO enrichment was analyzed using BiNGO plugin in Cytoscape software25.

To calculate the methylation levels of genic DMRs, upstream (− 2 kb from the transcription start site), gene body, and downstream (+ 2 kb from the transcription termination site) sequences containing DMRs were divided into 20 bins. Average methylation levels of cytosine residues in three consecutive bins were calculated and plotted using a sliding window approach.

Identification of pure DMRs

Illumina short reads generated using the Hiseq4000 platform were mapped against the mungbean reference genome (VC1973A) using Burrows-Wheeler Aligner (BWA) (Table S5)5,26. Single nucleotide polymorphisms (SNPs) and insertion/deletion mutations (Indels) were identified using Samtools with the following criteria: mapping quality > 30, minimum mapping depth ≥ 3, and maximum mapping depth ≤ 20 for RILs and ≤ 100 for V298427. After the removal of duplicate variants, SNPs shared by all ten lines were used to identify pure DMRs. The association between methylation level and SNPs located within ± 1 kb of the DMRs was tested using Student’s t-test with p-value < 0.01. SNPs were annotated using SnpEff28. The DMRs that showed no significant association with SNPs were defined as pure DMRs.