Introduction

Plants express a number of species and numerous cultivars, genotypes, accessions etc. occurring in most parts of the world. Such resources are accepted as one of the most important plant genetic resources of biodiversity and support life system on earth. They are also important for human nutrition and health1,2,3,4.

Environmental stress causes physiological barriers for plants. In response to unfavorable environmental conditions, plants plan their cell rehearsal exercises through genetic monitoring systems such as post-transcription gene expression control. Transcription factors (TFs) and ncRNAs are two vital elements in functional genomics5,6,7. Development, progress and profitability of plants are irregularly affected by various biological stresses such as drought, salt, cold and heat. In order to survive and grow under abiotic stress, plants have created a complex reaction mechanism that prevents the development of genes with a different performance. As the main class of regulatory proteins, TFs assume the focal areas in the control system and identify the pathways for plant development due to biological stresses. Among the TFs, AP2/ERF is one of the largest plant TF super families which is likely to reduce the reactive ability of the highly patentable ethylene couple and the variable margins that contain 50–60 amino acids8,9. Due to similarities in the sequence and obstruction of the limiting region of AP2 DNA, it can be ordered to AP2, ERF and RAV10 families. While the AP2 family of proteins contains 2 AP2/ERF regions and is most often divided into the monophyletic group of AP2 and AINTEGUMENTA (ANT)11,12, the ERF subgroups have a specific WLG domain and it’s divided into 10 groups10, where Groups I to IV have a subgroup of DREB, and Groups V to X have the subgroup of ERF. ERF is represented by an additional Cis-acting component of the AGCCCGCC of the GCC-enclose in the region of the promoter13, while the DREB sub-region usually responds to the drying of the constructor limiting variable having the CCGAC central theme14. The relatives of the RAV contain the single region of AP2/ERF and a particular B3 DNA-binding motif15. Additionally, different accessions with Ap2-like domain but due to the extra motif are regularly identified as Soloist10.

Extensive examinations have deciphered the essential part of AP2/ERF genes in the growth of plant, improvement and stress response11,16,17,18. For the most cases, the AP2 subfamily the main elements of organic design and organic progression, for example, the determination of epidermal leaf cells, spikelet meristem and the design of plant organs19 and grain yield20,21, while the RAV subfamily demonstrated significant operations in transduction of plant hormone, including ethylene22, brassinosteroids23, and responses to biotic/abiotic stress24,25. In addition, the DREB, along with a different individual in ERF family, is mainly affected by biological and abiotic stresses, for example, water defect26, low temperature27,28 and high salt stress29. Proline free accumulation is a common reaction to stress in high plants30. There are several reports of positive correlations between proline accumulation and the compatibility of plants with stress conditions under drought stress and salinity31. Proline affects the solubility of various proteins and enzymes and prevents them from changing their nature. In plants such as beans and soybeans, there has been a significant increase in proline content as a result of a decrease in water potential32.

As a major oil seed crop, sunflower (Helianthus annuus L.) is impervious to different abiotic stress because of its different forms of metabolism, physiology, and methods of regulating the reproductive stress metabolism. This function is of unique enthusiasm for adjusting it to high temperatures, limited access to water, high salinity and predominant metal scrap in soil33. Typically, the DREB subfamily as a candidate can possibly increase the environmental tolerance of the product. The DREB subfamily shows distinct reaction patterns relative to ecological strategies including low temperature (AtCBF1)27, heat (ZmDREB2A, AtDREB1A)28,34, osmosis (CkDREB)35, drought (OsDREB1)26,36 and the lack of water and high stress (CaDREBLP1)29. DREB provide a large number of hydration/cold genes in collaboration with the DRE/CRT components (A/GCCGAC) available in COR/RD promoters37. However, a few genes from the DREB subfamily have been reported to be positive and negative intermediary of ABA and sugar reactions. This is especially true during both germination and the initial stages of plant breeding38.

Over-expression of the DREB gene within the framework of plants increased salt tolerance as a positive control39,40. Expression of OsDREB2A and OsDREB1F enhancement increased drought/salinity stress in rice and Arabidopsis40. In rice, cold stress created OsDREB1A and OsDREB1F. OsDREB1F was also used for drought, salt, and ABA treatments. Over-expression of OsDREB1A and OsDREB1F led to increased resistance to dry season and severe salt susceptibility in Arabidopsis39,40,41. A reverse genetic approach could identify a CBF2 mutation in Arabidopsis in which the CBF2/DREB1C gene was abnormal. The mutation of CBF2 had increased resistance to drought stress and salt. The expression analysis showed the inconsistency of CBF2/DREB1C with CBF1/DREB1B and CBF3/DREB1A instructions42. Surprisingly, the DREB1/2 suggesting there was a cross-talk between them under drought and salt stress41,43,44,45,46. These results indicate that the DREB1 and DREB2 gene elements, when combined with ABA, are not only maintained in monocotyledonous and dicotyledonous plants but also play important roles in drought and salt stress conditions39.

Typically, genome information encourages the recognition of gene function and increases the definitive information required to comprehend the molecular component of stress reactions, thus increasing the abiotic tolerance of various products to date. The AP2/ERF family is known in Arabidopsis8, bamboo47, grapevine48, maize49, peach50, and rice51. To the best of our knowledge, no systematic proof of the AP2/ERF family has been accomplished in sunflower. Moreover, the limitations of research on its genetic family are essential.

In this study, a broad bioinformatic research led to the discovery of genomic linkage, the phylogenetic relationship and the gene expression of AP2/ERF genes in Helianthus annuus. Also, the researchers analyzed chromosome localization, gene structure, gene ontology, homologous modeling of HaAP2/ERF protein, cis components in the promoter region, gene amplification, and evolutionary systems. Our research is aimed at grounding the way for adding to the families of AP2/ERF family control in the advancement of sunflower in response to abiotic and biotic stress, which not only does provide supportive data on changing the mechanism of evolution of this TF family in the plant but it also adds to the discovery of a molecular system for the improvement of stress response in the above mentioned plant and other crops of different species.

Results

Reconnaissance of AP2/ERF family in sunflower

Overall, 288 genes were recognized as acceptable AP2/ERF genes in sunflower. The predicted HaAP2/ERF genes (generic name and locus Tag has been shown in Supplementary Table S in details) were then chosen according to the location of chromosome and their family classification (Table S1). Based on the classification, we categorized them into21 ANT, 14 AP2, 4 RAV, 1 Soloist, 105 DREB, and 143ERF. Thirty five AP2 TFs genes including2 AP2 domains or a single AP2 domain which were similar to AP2 domains in the double domain groups were assigned to the AP2 family, and four genes together with the B3 type domain were classified into the RAV family.248 genes had only one domain belonging to the ERF subfamily, which was itself divided into ERF and DREB subgroups. In addition, a specific gene called HaAP2/ERF-288 resembled other family members which were in the Soloist subgroup (Table S2).

Distributions of HaAP2/ERF genes on 17 chromosomes of sunflower were unequally demonstrated (Fig. 1), with the maximum and minimum number of AP2/ERFs located on chromosomes 2 and 16 (30 genes) and chromosome 5 (9 genes), respectively. The length of specified proteins of HaAP2/ERFs ranged from 7 to 1347 amino acids. Their molecular weights (MW) ranged from 8.52 to 69.44 kDa and their theoretical isoelectric points (PI) ranged from 4.48 to10.27. Prediction of protein localization in cellular compartments revealed that the majority of HaAP2/ERFs (234 of 288, 81.25%) were located in the nucleus while 54 genes were located in the extracellular compartment (Table S1).

Figure 1
figure 1

Chromosome-wise distribution of 288 AP2/ERF genes on 17 chromosomes of Helianthus annuus.

Phylogenetic relationship, conserved motif and gene structure analysis

To assess the evolutionary associations of the HaAP2/ERF genes, phylogenetic analysis was performed based on the multiple correlations of all HaAP2/ERF with ArabidopsisAP2/ERF genes. The NJ tree categorized the studied genes into three main categories of ERF, AP2, and RAV based on the composition of their amplitude as described in Fig. 2. A nunrooted phylogenetic tree was made with HaAP2/ERF family proteins (Fig. 3). Also, ERF clades were divided into 10 groups. Similar to Arabidopsis assortment criteria6, ERFs families could further be subdivided into a subgroup of DREB and the ERF families. Four groups (I–IV) belonged to DREB and the remaining 6 groups (V-X) were the ERF subgroups (Fig. 4). It was found that DREB subgroups contributed mainly to environmental reactions. Several of DREBs genes that inducible by stress from numerous plants have been recognized to date28,29,52. Identification of DREB genes of Helianthus annuus provides valuable resources for the characterization of stress-response genes. In addition, the bootstrap values of the nodes in the NJ tree were not very high in each class, which was in accordance with earlier studies10,53. The reliability of the NJ tree was confirmed by the production of another phylogenetic tree using the maximum parsimony analysis (MP). It was recognized that almost all HaAP2/ERF were located in similar clustering groups. In addition, HaAP2/ERFs’ retained motifs were analyzed and considered. Altogether, twenty-five conserved motifs were identified (Fig. S1). Among them, 9 subjects included motifs 1, 2, 3, 4, 5, 7, 10, 11 and 12 in the AP2/ERF range, while 16 were linked to areas outside the DNA-binding domain. Seemingly, these are as contain either functional factors or are related to nuclear positioning and transcription regulation54.

Figure 2
figure 2

Conserved motifs analysis of HaAP2/ERF genes according to the phylogenetic relationship. Each motif is represented by a number in a colored box. Box length corresponds to motif length.

Figure 3
figure 3

An unrooted phylogenetic tree of AP2/ERF family proteins in Helianthus annuus. The complete sequences of 288 AP2/ERF family proteins identified in this study were aligned by ClustalX2.1 and the phylogenetic tree was constructed using the neighbor-joining method with MEGA7.0 software.

Figure 4
figure 4

The percentage of genes belonging to different groups HaAP2/ERF.

Conserved amino acids in HaAP2/ERF transcription factors family

Amino acid 30 G was completely protected in all 288 sequences. In 99% of the DREB and ERF sequences, amino acids 27 W, 28 L, 29 G, 4 G, 8 R, 16E were completely protected. The sequences 37 A, 38 A were also highly protected.

Protein sequences of the two AP2 domains in AP2genes were found to have the following conserved amino acid residues in most of the sequences: 9 R, 12 G, 34Q, 38 G, 46 A, 47 A, in the first AP2 domain;1 R, 8 R, 9 G, 11 S, 15 G, and 26 W in the second domain. In this study, ERFs and DREBs were identified based on sequence alignment. The DREB and the ERF subfamilies were recognized from each other by the conserved amino acids 14 V and 19E in the former subfamily and 14Aand 19D in the latter. The researchers also classified the domains having observed 14 V as DREBs, irrespective of a residue in the 19th position due to the fact that 14 V has importance over 19E in determining the DNA-binding specificity of DREB transcription factor to the DRE cis-element8 (Fig. S1).

Interestingly, it was observed that in RAVs, ‘Glycine’ was found instead of ‘Valine’, and ‘Alanine’ was found conserved at position 14 in AP2/ERF proteins.

The sequences of the Soloist were HLG and LYD which have also been found in other plants like Arabidopsis.

Gene structure of the AP2/ERF gene family

The members of HaAP2/ERF subfamily demonstrated similar exon-intron structures based on gene structure analysis. Generally, the number of exons ranged from 1 to 10, Soloist: 5 introns, RAV: 0-1 intron, AP2 and ANT: 4–9 introns, ERF and DREB: 0-1 intron. Of course, HaAP2/ERF-182 had 2 introns, HaAP2/ERF-215 and HaAP2/ERF-217had3 introns, and HaAP2/ERF-214 contained 4 introns.

Some 86.29% of genes in the ERF subfamily were recognized as intronless, which was in agreement with a formerly published study8. On the other hand, the members of AP2 subfamily had more introns that the ERFs with at least five exons (Fig. S2). This extremely high variation in the gene structure suggests that a great differentiation may have occurred during the evolution of sunflower genome.

Duplication and divergence rate of HaAP2/ERF genes

Analysis of synteny and gene duplication of AP2/ERFs among sunflower, Arabidopsis, soybean, and rice for the events of tandem and segmental duplication of HaAP2/ERF genes were surveyed through 17chromosomes of sunflower (Fig. 5). 288 AP2/ERF gene clusters contained 40 pairs of tandem duplicated genes located on chromosomes 2, 3, 4, 5, 8, 9, 12, and 13. Furthermore, 50 pairs of segment duplications were also identified(Fig. 6). In order to deduce the evolutionary origin of AP2/ERF genes, comparative syntenic analysis was performed among sunflower and Arabidopsis, soybean, and rice (Fig. 7a,b,c). Most of the HaAP2/ERF genes showed syntenic bias towards particular chromosomes of Arabidopsis, soybean and rice, which illustrated that the distribution and organization of AP2/ERF genes in these genomes have predominantly been shaped by the events of chromosomal reconstruction such as duplication and inversion.

Figure 5
figure 5

Distribution of 288 AP2/ERF genes on the 17 sunflower chromosomes. The tandemly duplicated gene pairs are indicated with yellow color.

Figure 6
figure 6

Distribution of segmentally duplicated HaAP2/ERF genes on Helianthus annuus chromosomes.

Figure 7
figure 7

Comparative physical mapping showing the degree of orthologous relationships of HaAP2/ERF genes with (a) Arabidopsis, (b) rice, (c) soybean.

Ka/Ks is an effective criterion for checking the positive selection pressure after duplication. Then if the Ka/Ks ratio = 1 means neutral selection, Ka/Ks < 1 represents pure selection and Ka/Ks > 1 represents the trend evolution accelerator with positive selection9,55. Additionally, the tandem and segmental duplications of the HaAP2/ERF genes were calculated to measure the influence of the selection (Tables S3 and S4). The Ka/Ks ratio for the pair of tandem duplication genes ranged from 0.05 to 1.33 with a mean of 0.42, while Ka/Ks for the segmental duplication was 0.03 to 1.81 with an average of 0.53. These results showed that duplicated genes of HaAP2/ERF were under the pressure of a strong purification selection by natural substitution and extensive selection constraints by natural selection during the evolution process. Additionally, such events of tandem and segmental duplication seem to have occurred around~2 to 101 Mya, respectively. Although tandem (Ka/Ks = 0.53) and segmental (Ka/Ks = 0.42) duplication of HaAP2/ERF genes are not similar under the positive evolutionary selection pressure, the duplication might have occurred simultaneously in both sets. In addition, the Ka/Ks ratio of orthologous gene pairs between sunflowers and the other three species was calculated (Tables S5, S6 and S7). The mean Ka/Ks was the highest between sunflowers and Arabidopsis (0.81), sunflowers and rice (0.81), and sunflowers and soybean (0.71), respectively, indicating that the genetic pairs between sunflowers and the studied species are strongly subjected to pure selection.

Divergence times were 52, 50 and 56 Mya for Arabidopsis, rice and soybeans, respectively. Therefore, we can conclude that tandem and segmental duplication events greatly contribute to the evolution and functional divergence of the AP2/ERF families of sunflowers and other species.

Analysis of putative promoter regions of DREB gene subfamily

The regulatory elements of the cis play a key role in determining the characteristics of tissue or stress. In additions, the gene expression profiles have showed that multiple genes are closely correlated with cis-regulatory elements in their promoter sequences56. The upstream genes in 2000 bp greatly influence binding to target genes. In order to better understand the transcription rules and potential performance of the DREB subgenus genes in Helianthus annuus, 2000 bp justification zones were used to respond in terms of stress.

The regulatory cis-elements, i.e., multiple reproductive stress elements S000176, S000408 andS000415 for drought stress, S000453 for saline stress, S000030 for heat stress, S000407 for cold stress, and S000457 for wound stress were widely used in sunflower DREBs promoter regions as listed in Table S8. This clearly demonstrated that DREB is a transcription of the subcategory factors which can respond to abiotic stress and increase the potential functions in increasing acute abiotic resistance to stress. For example, HaAP2/ERF-070 has a maximum of 28 drought stress elements (S000415) and HaAP2/ERF-066 contains 34 elements of cold stress (S000407). Many studies on the performance of HaDREB genes provide a better understanding of the stress tolerance mechanism in sunflower.

Gene ontology annotation

The GO analysis revealed, as presented in Table S9, the putative participation of HaAP2/ERF proteins in diverse biological, cellular and molecular processes. Annotation was performed on 288 HaAP2/ERF proteins and the results were described in 80 categories of biological processes. The analysis showed that predominant HaAP2/ERF proteins were involved in the regulation of transcription, i.e. the DNA-template process. This illustrated that the HaAP2/ERF proteins in association with the molecular functions were shown transcription factor activity and sequence-specific DNA binding. Prediction of cellular localization showed that the predominant 92% HaAP2/ERF proteins were localized in the nucleus. These are in concordance with formerly-reported experimental findings10,45,57,58.

Gene expression and network interaction analysis

Protein interactions in sunflower and Arabidopsis, including the functional physical interactions, were examined using STRING database for the identification of protein interactions. Nine proteins which displayed sequence similar to RAP2.4 (HaAP2/ERF-046, HaAP2/ERF-047) and DEB2C (HaAP/ERF-133) were involved in a more powerful cross-linking network. HaAP2/ERF-059,which showed high coordination with DEAR3, was not involved in interactions. There were a large number of DREB and ERF types of stress sequence. We obtained 9 HaDREB family genes based on Arabidopsis protein interaction. To analyze the expression, real-time PCR was used to help us analyze the specifications of the family of CBF/DREB families under cold, salt, drought and heat stress conditions using two biological replicate. The accumulation in responses to abiotic stress for all genes analyzed (Fig. 8). In our study, we could find 9 novels HaDREB genes which showed different gene expression patterns under different treatment stress and control environment conditions (Table S10).

Figure 8
figure 8

Interaction network of 9 HaDREB genes identified in sunflower and related genes in Arabidopsis.

Homology modeling of HaAP2/ERF proteins

Using 3D protein models, four proteins were created by looking for a sequence similar to the PDB database using BLASTP. These four proteins were chosen because of their higher coordination with known protein sequences in PDB, and Phyre2 was used to model their predicted structure coordination. The protein structure of each of the four HaAP2/ERFs was modeled with 90% confidence and active potential sites were identified (Fig. 9). The 3D structure showed that the protection range of AP2/ERF contained about 50–60 amino acids in all HaAP2/ERF proteins with a typical three-dimensional compound to a layer of three antiparallel β-sheets followed by a parallel α-helix. Further examination of the AP2/ERF indicated the presence of YRG regions. The YRG region of 20 amino acids is a long-term N-terminal prolonged elongation at the base. Hydrophilic bases was reported to play an important role in direct communication with DNA59. AP2 subfamily members have two AP2/ERF domains separated by a linker sequence of 25 amino acids responsible for the placement of DNA binding domains. Molecular modeling has shown that all of the predicted protein structures are highly consistent and provide the basis for understanding the molecular sequence of HaAP2/ERF proteins.

Figure 9
figure 9

Predicated structures of HaAP2/ERF proteins. The structures of 4 HaAP2/ERF proteins with greater than 90% confidence level were shown along with its potential active site.

Expression profiles of HaDREBs under abiotic stress

To further confirm the expression of these recognized AP2/ERF genes, 9HaAP2/ERF genes were randomly selected to help the researchers detect their expression levels in two tissues and under drought, cold, salt and heat treatments through qPCR (Fig. 10).

Figure 10
figure 10

Relative quantitative (RQ) expression levels of 9 HaDREB genes at a series of time points following the abiotic stress treatments. (a) cold stress, (b) salt stress, (c) drought stress, (d) heat stress).

To understand the expressions of HaDREB transcription factors under abiotic stress, nine peptides were analyzed using the qRT-PCR method. To find out the expression of the genes, nine genes were randomly selected. Three genes belonged to group I, two genes belonged to group II, three genes belonged to group III and one gene belonged to group IV. During the experiments, it was found that the highest expression was associatedwithHaAP2/ERF-047gene and the lowest expression was found in gene HaAP2/ERF-114. HaAP2/ERF-047 and HaAP2/ERF-120 genes in all the applied abiotic stresses were up-regulating the effect of these two genes on sunflower resistance. Except for HaAP2/ERF-033, other genes were also up-regulated by increasing drought stress for 24 hours. However, after 48 hours from the onset of stress, they were down-regulated. Under cold stress, HaAP2/ERF-047 and HaAP2/ERF-039 were up-regulated, respectively, and played an effective role in resistance to cold stress. Yet,HaAP2/ERF-059 had a negative effect in cold stress. HaAP2/ERF-047 and HaAP2/ERF-120 can be considered as effective factors in salinity resistance due to their high expression in salinity stress. HaAP2/ERF-067 had a pronounced increase in heat stress conditions, but with passing of time, down-regulation of gene expression occurred in both leaf and root (Fig. 11).

Figure 11
figure 11

Heat map of the real-time quantitative PCR (qRT-PCR) analysis results of HaDREB genes in leaves and roots under drought, cold, heat and high salinity treatments, with three biological and technical replicates.

Determination of proline content

In this experiment, proline was studied under drought, salinity, cold and heat stress. The results indicated that proline was increased under abiotic stress. The highest amount of proline in leaf and root is in drought stress after 48 hours. After 6 hours of heat treatment, the lowest change occurred in control conditions. In the cold stress, the proline increased. Also, in salinity and heat stresses, the amount of proline increased in leaves and roots by increasing the levels of stress, which indicates the response of the plant to the stress and that this response helps preserve the plant under stress conditions (Fig. S3).

Relative water content (RWC) and Na+/K+

The effects of various saltiness stresses on physiological and genetic attributes of plants were assessed by measuring the progressions in RWC and sodium and potassium fixation in leaf and root under stress. The variation of Na+/K+ was astoundingly higher in root tissues as compared with leaf (Fig. S4). The average of RWC of plants in response to drought stress demonstrated a slight decrease at the earlier stage of stress, beginning to lessen strongly afterwards (Fig. S5). The outcomes demonstrated that plants were, altogether influenced by abiotic stress.

Discussion

The AP2/ERF is one of the major families of transcriptional factor plants. They plays an important role in transcription regulation, which involves complex growth processes, biological stresses, seed germination, flower growth, aging, fruit arrival, response to salt, drought, low temperature, and pathogen attack39,60,61,62,63,64. Compared to other species, AP2/ERF in sunflower is much larger than rice (174genes) and Arabidopsis (148genes)10. A protected motif is a sequence of amino acids protected by a variety of biological functions and can be involved in transcriptional activities, proteins, and nuclear positions10. Similar motifs and functions are observed in the proteins which are categorized in a subgroup. Researchers have identified various preservative patterns in Arabidopsis family AP2/ERF and rice10. In this study, we have a form of each group within the family of PeAP2/ERF gene compared to the family of AtAP2/ERF family. It is known that the family of AP2/ERF gene is significantly based on the number of ERF members65. It has been found that 248 members are in the family of ERF in sunflower. However, this number is 122 and 139 in Arabidopsis and rice, respectively. Also, 35 genes were attributed to the AP2 subfamily, which represents 18 and 26 plants in the Arabidopsis and rice plants, respectively10. In contrast, the number of RAV family members did not significant among 4, 6 and 7 of sunflower, Arabidopsis and rice10, respectively. Therefore, the frequency of the upperAP2/ERF gene in sunflower may be due to the high number of members in ERF and DREB subfamily. It has been widely recognized that AP2/ERF transcription factors play an important role in regulating the growth, evolution and response of plants to various stresses as a signal transmission pathway in plants66. However, the HaAP2/ERFs performance is not well understood at the moment. In the current experimental research, gene expressions of patterns in different plant tissues under various stress conditions were studied to help clearly understand their potential capability during environmental tensions. We recognized that ERF gene family had fewer introns than the AP2 family in sunflower, which may have faster response and expression of most ERF genes during development65. AP2/ERF proteins can link to the GCC-box or DRE through the ERF domain and then express the target gene under stress conditions67,68. Among them, HaAP2/ERF-047, a member of the subfamily of the DREB family, is significantly controlled by both cold and drought stress. In addition, a total of 8 proteins, S000415 (elemental response to dehydration), 14S000407 (cold responsive element), and 11 S000453 (salt-responding element) of cis-elements in the HaAP2/ERF-047 promoter region were identified. Moreover, HaAP2/ERF-047, HaAP2/ERF-039 and HaAP2/ERF-120 were significantly up-regulated under studied abiotic stress. We predicted that cis-elements were vital regimens for controlling the expression of HaAP2/ERF which respond to other functional proteins with an AP2/ERF transcription factor in order to create a complex regulatory metabolic network throughout developmental processes and stress conditions. Previous studies showed that DREB and ERF included protected WLG motifs in AP2/ERF10,69. In this research, the WLG motif in the DREB and ERF subfamilies were very well protected (Fig. S1). While the 14thalanine and the 19thaspartic acid are conserved in the ERF proteins, valine and glutamic acid are conserved within DREB proteins10. Two protected amino acids on the putty are β-sheet in AP2/ERF gene family, which like to bind to DNA sequences8.

Significantly, all subgroups of DREB and ERF were completely protected in the amino acid residues of Val-14 and Ala-14, respectively. These preserved amino acid residues are likely to play an important role for the genes of the DREB/ERF family subfamily that are involved in various forms of physical interaction with DNA8.Recurring events on a large scale are defined as the simultaneous repetition of genes. Ka and Ks are steps to discover the mechanism of gene deviation after replication. It is expected that in a molecular clock, Ks of repetition is expected to be similar over time. However, there are significant changes among genes70. For a better description of evolutionary patterns, estimates of evolutionary rates are very useful71.

Time (million years ago, Mya) of duplication and divergence was calculated using non-synonymous mutation rate of one substitution per synonymous site per year as T = Ks/2λ (λ = 6.5 × 10–9)55,72. We estimated the divergence between sunflower and Arabidopsis, and rice and soybeans, the value of Ka/Ks (the ratio of the amount of substitution unknown to the amount of synonymous replacement), and the ratio of the height of the species compared to the figures above, which indicated a strong selection pressure on these genes. If the amino acid replacement with the same synonymous equations occurs, after several amino acid replacement reactions, the gene was removed from the copy, meaning that Ka/Ks = 1. In other words, duplicate genes had selective or limited constraints. If Ka/Ks < 1, after replacing the replacement by natural selection, perhaps because of the deleterious effects, the smaller Ka/Ks represents a greater selective limitation and the number of removed substitutions in which the two genes evolved73,74. The mean Ka/Ks indicated a pair of genes between sunflower and Arabidopsis (0.81), sunflower and rice (0.81), and sunflower and soybean (0.71). This indicated that a specific texture and strategy derived from HaAP2/ERF provide valuable candidates for more applied studies on AP2/ERF genes in Helianthus annuus as well as in other oil-seed plants.

Conclusions

The current study aimed at identifying and characterizing the AP2/ERF transcription factors in Helianthus annuus. By conducting an extensive genome search, 288 HaAP2/ERF genes were obtained. Visiting EST or complete cDNA sequences confirmed all of their facts. The location of the chromosomes, the exon-intron structure, the protective motif combination, and the phonological relationship of HaAP2/ERFs were analyzed and compared. HaAP2/ERFs can be categorized into four sub-groups with regard to the number of AP2 domains and probabilistic functions. Gene expression of HaAP2/ERF genes in different tissues (leaf and root) and also when exposed to heat, cold, salt and drought stress were studied. Several HaAP2/ERF genes were identified that could be considered as a candidate for a further study on their performance in plant growth and stress response. This study, for the first time, provides the organization, structure, evolution and expression of the HaAP2/ERF family, which facilitates the analysis of the HaAP2/ERF gene function analysis and establishes a basis for a better understanding of the molecular mechanism of plant development and physiological stress processes in Helianthus annuus.

Methods

Sequence recovery and recognition of AP2/ERF gene family in sunflower genome

The entire genome information of Helianthus annuus was available at sunflower database (https://www.heliagene.org). The researchers downloaded the anticipated protein sequences as the dataset for downstream analysis (v1.0.29). The AP2/ERF domain (PF00847) obtained from PFAM database (http://pfam.xfam.org/) was utilized as the question for Hidden Markov Model(HMM) look utilizing HMMER 3.0 program with a pre-characterized limit of E < 1e−5. Moreover, the plant transcription factor database (http://plntfdb.bio.uni-potsdam.de/v3.0/) was used to take the AP2/ERF protein arrangements of Arabidopsis and then utilized as query to search against the Helianthus annuus protein dataset using the BLASTP program with an estimation of 1e−5 and 50% as the threshold. Besides, HMMER and BLAST hits were analyzed and parsed. Afterwards, a self-blast of these sequences was performed in order to remove the redundancy without considering any alternative splice variants. After manual amending, the putative HaAP2/ERF proteins were obtained. At that point, the NCBICDD web server (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and SMART database (http://smart.embl-heidelberg.de/site) were utilized to further confirm the anticipated HaAP2/ERF genes.

Then, the analysis of the composition as well as the physical/chemical characterization of the sunflower AP2/ERFs (number of amino acids, molecular weight, and pI) was conducted. Protein statistics were analyzed using the Sequence Manipulation Suite (http://www.bio-soft.net/sms/). Finally, Softberry (http://linux1.softberry.com/) was used for the prediction of subcellular localization.

Phylogenetic analysis

Clustal X ver.2.1 was utilized to perform multiple sequence alignment75. Also, an un-rooted neighbor joining (NJ) tree with 1000 bootstrap replications was constructed using MEGA ver.7.076.

Chromosome distribution, gene structure and conserved motif analysis

The chromosome distributions of these genes were obtained from the genome annotation information and then validated by BLASTN search. The Gene Structure Display Server (http://gsds.cbi.pku.edu.cn/) was deployed to show the exon-intron organizations of the predicted AP2/ERF genes. Conserved motifs or domains were predicted using the MEME Suite web server (http://meme-suite.org/). The physical distribution of AP2/ERF genes on chromosomes was drawn by MapChart based on gene position in the genome77.

Gene duplication, orthologous analysis, evolutionary patterns, and divergence of AP2/ERF gene family

To identify the contribution of segmental and tandem gene duplications in genome-wide expansion of AP2/ERF family in the considered Helianthus annuus, genes which were found within 5-Mb regions with 80% and higher similarity with e-value threshold of 1e−10 were considered as tandemly duplicated genes, and the ones separated by >5 Mb distance were identified as segmentally duplicated genes78.

The occurrence of duplication events, the divergence of homologous genes, and the selective pressure on duplicated genes were estimated by calculating synonymous (Ks) and non-synonymous substitutions (Ka) per site between the duplicated gene-pairs using DnaSPver.5.10.179,80. Time (million years ago, Mya) of duplication and divergence were calculated using asynonymous mutation rate of λ substitutions per synonymous site per year as T = Ks/2λ (λ = 6.5 × 10−9)55,71.

For synteny analysis, duplications between Helianthus annuusAP2/ERF genes as well as the synteny block of this family among sunflower and other three species (Arabidopsis, rice and soybean) were obtained from PGDBj and the diagrams were drawn by Circos ver.0.6781.

Analysis of putative promoter regions of DREB gene subfamily in Helianthus annuus

The upstream 2000-bp genomic DNA sequences of all recognized AP2/ERF genes were downloaded from the Helianthus annuus genome. They were then submitted to PLACE database (http://www.dna.affrc.go.jp/PLACE/) for the purpose of identifying cis-regulatory elements in the promoter regions.

AP2/ERF protein annotations and interaction networks

Gene ontology (GO) analysis was applied to predict gene functions and to calculate the frequency of functional categories based on the sequences obtained. Blast2GO software (https://www.blast2go.com/)82 was used to determine the GO annotations. The GO terms for each of the three main categories (biological process, molecular function, and cellular component) were obtained from sequence similarity using the default parameters. Network interaction analysis data related to the nine genes in DREB of sunflower were obtained from STRING online database (https://string-db.org/)83.

Homology modeling of HaAP2/ERF proteins

For homology modeling, all the HaAP2/ERF proteins were queried against the Protein Data Bank (PDB)84 to identify the best template with a similar amino acid sequence and a known 3-D structure. The data was fed in Phyre2 server (Protein Homology/AnalogY Recognition Engine; http://www.sbg.bio.ic.ac.uk/ phyre2) for the prediction of the 3-D structure of proteins by homology modeling under ‘normal’ mode85. COACH server (http://zhanglab.ccmb.med. umich.edu/COACH/) and UCSF Chimera 1.8 were used to predict the active site and to highlight it, respectively.

Plant materials, growth conditions and stress treatments

Seeds of sunflower cultivar ‘Fantasia’ were obtained from Agriculture Research Institute (ARI), Safie-Abad Dezful, Iran. The seeds were sown in composite soil (peat compost: vermiculite: sand, 2:2:1) in the glasshouse at Shahid Chamran University of Ahvaz, Iran at 28 ± 1 °C day/23 ± 1 °C night temperature with 70 ± 5% relative humidity and natural sunlight during June–July, 2017. Roots and leaves were collected from ‘Fantasia’ genotype for RNA isolation and organ-specific analysis.

Drought stress

Four -week-old plantlets (6 leaf stages) of sunflower were subjected to water stress by withholding water for 12, 24 and 48 h86. To determine the plant water status, the relative water content (RWC) was measured in leaves of the samples via Catsky method87.

Salt stress

Plantlets were moved into solutions containing either 75 or150 mM of NaCl87,88. Leaf and root samples from the stressed plants were collected 24 h later alongside the control samples. Potassium to sodium ratio was measured as the criterion for examining different salinity stress levels87.

Heat stress

Four-week-old plantlets in strength Hoagland’s solution were transferred to humidity growth chamber (Memmert, Germany) with 70% relative humidity and maintained at 42 ± 1 °C for1.5, 3 and 6 h78,87.

Cold stress

For low-temperature stress treatments, plants were transferred to an illuminated incubator at 4 °C with other culture conditions unchanged. After two hours, the leaves and roots were collected for analysis. The roots were rapidly washed with distilled water (4 °C incubated water for low- temperature treatment)89.

The plants were supplemented with water and Hoagland’s solution on alternate days. Unstressed plants were maintained as the control group. After stress treatments, whole seedlings were carefully harvested, immediately frozen in liquid nitrogen and stored at −80 °C until RNA isolation. For precision and reproducibility concerns, the researchers conducted three independent experiments in each of which 100 mg seedling samples were collected by random sampling.

Determination of proline content

The same fresh samples which were utilized for gene expression analysis were used to determine the proline content in the samples. Each treatment had three pot replications and the sample from each pot was mixed together as replications. Free proline content was determined by ninhydrin assay at A520 nm in line with the method described by Bates et al.90.

RNA extraction and expression analysis using qRT-PCR

Total RNA was isolated from leave and root samples (abiotic stress and control sunflower cv. ‘Fantasia’ seedlings) using BioBasic RNA extraction kit (BS82314-BioBasic, Canada) following the manufacturer’s instructions.DNA contamination was removed from the RNA samples using RNase-free DNaseI (1 U ml2l, TaKaRa, Dalian, China). The quality and purity of the RNA preparations were determined by measuring the OD260/OD280 absorption ratio (1.9–2.0), and the integrity of the preparations was determined by electrophoresis in a 1.2% agarose gel containing formaldehyde as described in previous studies91,92. RNA concentrations were measured by a spectrophotometer (Eppendorf, USA). About 1 mg of total RNA was used to synthesize first strand cDNA primer with OligodT in a 20 ml reaction mix using 200 U/ml of PrimeScript M-MuLV reverse transcriptase (Takara Bio Inc., USA) following the manufacturer’s instructions. Quantitative real time (qRT) PCR was performed using SYBR Premix ExTaq II (TliRNaseH Plus) (Takara Bio Inc., USA) on Master cycler system (ABI, Biosystem, USA) in triplicate93. The constitutive gene Actin (Gene Bank ID: AF282624.1.) and GAPDH (Gene Bank ID: DQ503718.1) from sunflower were used as endogenous control.

qPCR was used to resolve the transcript levels of nine randomly selected HaAP2/ERF genes according to EbrahimiKhaksefidi et al.86 and Chen et al.94. The primers are listed in Table S11.

Data availibity

Availability of data and all material of the datasets supporting the results of this article are included within the article and its supplementary files.