Introduction

The family Caulimoviridae comprises plant viruses with a double-stranded (ds) DNA genome that replicate through an RNA intermediate (Geering and Hull 2012). Based on their genome organization, host range, insect vector and nucleotide sequence identity, members of Caulimoviridae are classified into seven genera: Badnavirus, Caulimovirus, Cavemovirus, Petuvirus, Solendovirus, Soymovirus and Tungrovirus (Geering and Hull 2012). Viruses belonging to the genus Badnavirus have a genome comprised of a single circular dsDNA of about 7.0–7.6 kbp encapsidated in a non-enveloped bacilliform particle. Each DNA strand has discontinuities at specific sites. The genomes of all badnaviruses possess three open reading frames (ORFs) giving origin to three proteins, P1, P2 and P3. The function of P1 is unknown; P2 is the virion-associated capsid protein. Polyprotein P3 contains several functional domains: cystein rich (CYS), RNA-binding (RB), aspartate protease (PR) and viral replicase [reverse transcriptase (RT) and ribonuclease H (RNaseH)] (Medberry et al. 1990; Harper and Hull 1998; Geering and Hull 2012). The demarcation of badnavirus species is based on host range, vector specificities and the determination of the nucleotide sequence of the RT/RNaseH genomic region. The International Committee on Taxonomy of Viruses (ICTV) has established a species demarcation threshold of ≥80 % nucleotide identity for the RT/RNaseH (Geering and Hull 2012).

Badnaviruses are transmitted in a semi-persistent manner by mealybugs (Geering and Hull 2012) to a wide range of economically important tropical crops, including yams (Dioscorea spp.) (Phillips et al. 1999). The first report of badnaviruses in yam occurred in Barbados in association with brown staining in tubers of Dioscorea alata L. (a disease that requires also the presence of a member of the genus Potyvirus) (Harrison and Roberts 1973). Yam-infecting badnaviruses are widespread in tropical regions, particularly in Africa and in the South Pacific (Seal and Muller 2007; Eni et al. 2008; Kenyon et al. 2008). Currently, two badnavirus species that infect yam are recognized by the ICTV: Dioscorea bacilliform AL virus (DBALV) and Dioscorea bacilliform SN virus (DBSNV), obtained from D. alata and D. sansibarensis Pax, respectively (Geering and Hull 2012). However, studies carried out in Africa and the South Pacific based on RT/RNaseH sequences revealed a high incidence and diversity of yam-infecting badnaviruses (Eni et al. 2008; Kenyon et al. 2008). This fact has been attributed to the many different species of Dioscorea found in these areas, to their vegetative mode of propagation, which contributes to virus accumulation, and to the unrestricted exchange of plant material (Eni et al. 2008; Kenyon et al. 2008; Bousalem et al. 2009).

Brazil is the second major producer of yam in the Americas after Colombia, with a production of 244,142 t in 2012 (FAO 2010). Badnaviruses are limiting factors to production, and have been described infecting the two most important commercial cultivated species of yams: São Tomé (D. alata) and da Costa (D. cayennensis Lam.). In contrast with what is found in Africa and Southern Asia, DBALV seems to be the only badnavirus infecting yam, being widely distributed in the northeastern region (Lima et al. 2013).

Determining the genetic variability of a viral population is important to understand how these populations evolve, as well as the implications for durability of control strategies (Seal et al. 2006). Evidence has accumulated indicating that RNA viruses and retroviruses evolve quickly due to the high mutation rates and lack of proofreading activity of their RNA-dependent RNA polymerases and reverse transcriptases (Drake 1993; Drake and Holland 1999; Bebenek et al. 1999). Since dsDNA viruses encode less error-prone polymerases or rely on proofreading host-encoded DNA polymerases and repair systems, they are generally less variable in their sequences (Duffy et al. 2008; Garcia-Diaz and Bebenek 2007; Kunkel and Erie 2005; McCulloch and Kunkel 2008).

According to Firth et al. (2010), dsDNA viruses are frequently described to evolve through long-term co-divergent associations with their hosts, where a pattern of low rates of nucleotide substitution is expected. Rates of molecular evolution estimated for Herpes simplex virus (HSV; 3.5 × 10−8) and Human papilloma virus (HPV; 4.5 × 107) are low, reinforcing the hypothesis of co-divergence with human populations (Sakaoka et al. 1994; Ong et al. 1993). However, a high mutation rate of approximately 1.0 × 10−5 substitutions/site/year was observed for Variola virus (VARV) (Firth et al. 2010). Rates between 1.71 and 5.81 × 10−4 substitutions/site/year were observed for Cauliflower mosaic virus (CaMV, the type species of the genus Caulimovirus), approaching that of many RNA and ssDNA viruses (Yasaka et al. 2014). Little is known about the variability within dsDNA plant viruses, especially badnaviruses. Considering that these viruses replicate using reverse transcription, it is possible that their genetic variability is closer to the one of retroviruses than to dsDNA viruses which replicate using DNA-dependent DNA polymerases.

The purpose of this study was to assess the genetic variability of yam badnaviruses in northeastern Brazil.

Material and methods

Sample collection and storage

Foliar samples with virus-like symptoms such as mosaic, leaf distortion, shoestring and stunting were collected in five yam fields throughout the states of Alagoas (AL), Paraíba (PB) and Pernambuco (PE) in 2011–2012 (Table 1). Leaves were placed in paper bags and stored at −80 °C.

Table 1 Isolates of Dioscorea bacilliform AL virus (DBALV) obtained from yam samples collected in Alagoas (AL), Paraíba (PB) and Pernambuco (PE) states, Brazil

DNA extraction and viral genome amplification

Total DNA extraction was carried out from fresh or frozen (−80 °C) leaves according to Doyle and Doyle (1987). The DNAs were used as templates for PCR amplification using the degenerate primer set Badna-FP (5′- ATG CCI TTY GGI ITI AAR AAY GCI CC-3′) and Badna-RP (5′-CCA YTT RCA IAC ISC ICC CCA ICC-3′), designed based on the RT/RNaseH coding sequences of members of the genus Badnavirus (Yang et al. 2003).

PCR reactions were performed in a volume of 60 μL containing 6 μL of 10X buffer (100 mM KCl, 100 mM Tris–HCl pH 9.0, 1 % Triton-X100), 4.8 μL of dNTPs (2.5 mM), 1.8 μL of MgCl2 (50 mM), 3 μL of each primer (10 μM), 1U Taq DNA polymerase (Life Technologies), 1 μL (10 to 100 ng) of template DNA and 40.2 μL ultrapure water. PCR cycling was performed in an Applied Biosystems 2720 thermal cycler programmed as follows: initial denaturation at 94 °C for 4 min, followed by 35 cycles at 94 °C for 30 s, 50 °C for 30 s and 72 °C for 1 min, and one final cycle at 72 °C for 10 min. Amplicons of the expected size (579 bp) were purified from agarose gels using the GFX PCR DNA and Gel Band Purification kit (GE Healthcare) according to the manufacturer’s instructions. Sequencing was performed by Macrogen Inc. (Seoul, South Korea) directly from the gel-purified PCR products using forward and reverse primers.

Sequence comparisons and phylogenetic analysis

Sequences of the RT/RNaseH region were initially analyzed with the BLASTn algorithm (Altschul et al. 1990) to determine the viral species with which they shared the greatest similarity.

To determine the taxonomic placement of viral isolates, additional pairwise nucleotide sequence comparisons between sequences obtained in this study, other species of badnavirus available in GenBank and a DBALV sequence dataset comprising isolates obtained from D. alata and D. cayenensis plants collected in the states of AL, PB and PE in 2011 (Table 2; Lima et al. 2013) were performed with the program Species Demarcation Tool (SDT) v. 1.0 (Muhire et al. 2013).

Table 2 Badnavirus and tungrovirus species used for pairwise comparisons, phylogenetic and recombination analysis

Multiple nucleotide sequence alignments of partial nucleotide sequences of the RT/RNaseH regions were prepared using MUSCLE (Edgar 2004). Phylogenies were reconstructed using maximum-likelihood (ML) with RAxML v.7.0.3 (bioinformatics.oxfordjournals.org/content/22/21/2688.long) and the General Time Reversible (GTR) nucleotide substitution model with gamma distribution rate of heterogeneity. The RT/RNaseH sequences were chosen for analysis because of their essential role for viral replication, therefore being subject to stricter variability constraints (Geering and Hull 2012). However, no conclusion drawn for this region can be extended to other genomic regions. Robustness of each internal branch was estimated by bootstrapping (1000 replications). Trees were visualized using the Tree View program (Page 1996) and edited using CorelDraw X3.

Recombination analysis

Evidence of non-tree-like evolution was assessed for RT/RNaseH dataset using the Neighbor-Net method implemented in the program SplitsTree4 (Huson and Bryant 2006). Analysis of potential recombination events was carried out using the recombination detection methods rdp, Geneconv, Bootscan, Maximum Chi Square, Chimaera and Sister Scan implemented in the Recombination Detection Program (RDP) ver. 3.0 (Martin et al. 2010) using default parameters, except that sequences were considered linear.

General descriptors of the genetic structure of viral populations

The RT/RNaseH dataset was used to assess the partition of genetic variability and population structure based on Wright’s F fixation index (Weir 1996).

The main descriptors of molecular variability were estimated, including the total number of segregating sites (s), total number of mutations (Eta), average number of nucleotide differences between sequences (k), nucleotide diversity (π), mutation frequencies, number of haplotypes (h) and haplotype diversity (Hd). These analyses were performed using the program DnaSP v. 5 (Rozas et al. 2003).

Parameterization of evolutionary mechanisms

Four types of neutrality tests were used to verify the occurrence of selection in populations: Tajima’s D, Fu and Li’s D* and F* and the test based on the number of synonymous (dS) and non-synonymous (dN) substitutions with the Pamilo-Bianchi-Li (PBL) model. These analyses were performed using the program DnaSP v. 5, with different sets of data considering unique populations or subpopulations separated on the basis of the year of collection, geographical location and host.

To detect amino acid sites under positive and negative selection, amino acid sequence data sets were analyzed using the single-likelihood ancestor counting (SLAC) method implemented in the DataMonkey server (www.datamonkey.org).

Results

Sequence data set

A total of 150 samples were collected, of which 108 were positive for the presence of a badnavirus based on the detection of a PCR fragment of 579 bp corresponding to the RT/RNaseH domains, located in the C-terminal part of the ORF3. All positive samples were sequenced, and contigs of good quality and length of 435 pb were assembled for 48 sequences.

Sequence comparisons and phylogenetic analysis

Using pairwise comparisons of the RT/RNaseH sequences and the ≥80 % nucleotide identity criterion established by the ICTV (Geering and Hull 2012), the 48 isolates described here were assigned to only one badnavirus species: DBALV (Fig. 1). These results were confirmed by phylogenetic analysis, in which all isolates from this study and the 48 isolates from Lima et al. (2013) clustered in a single monophyletic branch with isolates of DBALV, supported by a bootstrap value of 90 % (Fig. 2). These sequences were considered as representative sequences of the isolates present in the samples, and therefore collectively referred as populations.

Fig. 1
figure 1

Two-dimensional graphic representing the percentage of pairwise sequence identity for the nucleotide sequences of the RT/RNaseH between the badnavirus isolates obtained in this work and the isolates reported by Lima et al. (2013) and other badnavirus species with sequences available in GenBank

Fig. 2
figure 2

Maximum likelihood phylogenetic tree based on the sequence of the RT/RNaseH region of DBALV isolates obtained in this work (indicated in blue), from Lima et al. (2013) (red) and other badnavirus species with sequences available in GenBank (black). Rice tungro bacilliform virus (RTBV, genus Tungrovirus) was used as the outgroup

Recombination analysis

Occurrence of recombination events among DBALV isolates and other badnavirus species was initially tested by Neighbor-Net analysis. The results did not indicate any significant evidence of intra- or interspecies recombination (data not shown).

To further investigate recombination within these sequences, a data set including badnavirus species available in GenBank, the 48 sequences from this study plus the 48 sequences from Lima et al. (2013) was analyzed using the RDP3 package. However no evidence of recombination was observed, confirming the Neighbor-Net result.

General descriptors of the genetic structure of viral populations

The DBALV isolates were initially divided into putative subpopulations according to the year of collection (2011 or 2012), geographic region (Alhandra, Arapiraca, Bonito, Chã Preta or Viçosa) and host species (D. alata or D. cayenensis). The value obtained for the Fst test was statistically significant only for the year of collection, indicating that these two subpopulations were actually structured.

The DBALV population obtained in this work has a high degree of genetic variability, with a nucleotide diversity (π) for the RT/RNAseH region of 0.05342 and a mutation frequency on the order of 10−4. The 2011 subpopulation is more diverse than the 2012 subpopulation, with higher values for every descriptor (Table 3).

Table 3 Genetic structure of the Dioscorea AL bacilliform virus (DBALV) population obtained from Dioscorea spp. from northeastern Brazil

Parameterization of evolutionary mechanisms

Neutrality tests were used to assess what kind of selection or demographic forces are acting upon the DBALV population. The values obtained for Tajima’s D, Fu and Li’s D* and F* tests were not statistically significant (Table 4). The SLAC method detected dN/dS values <1, with 58 sites found to be evolving under negative selection versus 0 sites evolving under positive selection. These results are indicative of purifying selection acting on this population.

Table 4 Results of the different tests of neutrality performed for the reverse transcriptase domain of ORF III of the population of Dioscorea bacilliform virus AL virus (DBALV) obtained from Dioscorea spp. in northeastern Brazil

Discussion

Diseases caused by viruses of the genus Badnavirus are responsible for major damage to yam crops in many tropical regions, including Brazil (Seal and Muller 2007; Eni et al. 2008; Kenyon et al. 2008; Bousalem et al. 2009; Lima et al. 2013). A recent study about badnaviruses infecting yam in northeastern Brazil suggested that DBALV may be the only species occurring in this region, and that its sequence variability is high (Lima et al. 2013). Here, we have studied the genetic structure and variability of yam badnavirus populations in detail. Our results showed that DBALV was the only yam-infecting badnavirus, corroborating the data obtained by Lima et al. (2013) and reinforcing that DBALV seems to be the only badnavirus infecting this crop in northeastern Brazil. Factors such as constant exchange of infected plant material between areas can be contributing to the predominance of DBALV in this region. Conversely, studies performed in Africa and the South Pacific suggest the presence of up to twelve species of badnaviruses in yam (Eni et al. 2008; Kenyon et al. 2008; Bousalem et al. 2009).

Recombination events can result in changes in the biological properties of a virus, such as the expansion of its host range (Gibbs and Weiller 1999), virulence (Zhou et al. 1997; Pita et al. 2001) and fitness (Monci et al. 2002). Recombination among members of the Caulimoviridae family is of particular interest, as these viruses are able to integrate into the host plant genome as endogenous pararetroviral sequences (EPRVs) (Squires et al. 2011). In some cases, EPRVs can be activated by homologous recombination resulting in the episomal form of the virus which causes systemic infection of the host. Homologous recombination events have been reported for the activation of the episomal genome of CaMV, Banana streak virus (BSV) and Tobacco vein clearing virus (TVCV) (Gal et al. 1991; Ndowora et al. 1999; Froissart et al. 2005; Staginnus and Richert-Poggeler 2006). The recombination rate of a CaMV population was experimentally estimated to be on the order of 10−5, indicating that it is a very frequent event for this virus (Froissart et al. 2005). Nevertheless, we did not find evidence of recombination among the DBALV isolates. A recombination event was detected between two yam-infecting badnaviruses, Dioscorea esculenta bacillifom virus A (DeBV-A) and Dioscorea esculenta bacilliform virus B (DeBV-B), and was located in the RT region (Bousalem et al. 2009). The relatively short length of the fragment analyzed in our study (435 bp, in a genome with over 7.0 kbp) could be one of the reasons for the failure to detect any recombination event.

Analysis of the DBALV population demonstrated that its genetic variability is indeed very high (Table 3), to a level which is similar to that for RNA viruses. It was also slightly higher than that found for a population of genotype F of Hepatitis B virus (a human dsDNA, reverse-transcribing virus in the family Hepadnaviridae) obtained in the northern region in Brazil (Mello et al. 2013; Table 3). Interestingly, the variability of the DBALV 2011 subpopulation was higher than that of the 2012 subpopulation. This could reflect local variations in cultivated and non-cultivated host distribution at sampling sites or the presence of distinct vector populations.

Mutation and substitution rates commonly observed for retroviruses are considerably higher than those observed for non-reverse transcribing dsDNA viruses, and most likely reflect differences between the fidelity of reverse transcriptases and DNA-dependent DNA polymerases (Ong et al. 1993; Sakaoka et al. 1994; McGeoch et al. 2000; Zhou and Holmes 2007; Mello et al. 2013). Recently, substitution rates between 1.71 and 5.81 × 10−4 subs/site/year were observed for CaMV (Yasaka et al. 2014). We estimated a mutation frequency for the DBALV population similar to the value estimated for the hepadnavirus HBV (Mello et al. 2013). Although the mutation frequencies calculated for DBALV and HBV cannot be directly compared with the substitution rates determined by Yasaka et al. (2014), it is noteworthy that the mutation frequency determined for DBALV is very high. Although there is no precise measure of the fidelity of the hepadnavirus RT, reverse transcriptases are known to be highly error prone in all retroviruses and other retroelements for which a rate has been estimated (Drake et al. 1998; Svarovskaia et al. 2003).

Neutrality tests and dN/dS ratio indicated that the DBALV population may be under purifying selection, or that a recent expansion is operating at both the nucleotide and protein levels. Purifying selection acting upon the RT/RNaseH could emphasize the critical function performed by this protein and its requirement for DBALV survival. Purifying selection was also the major evolutionary force acting on the reverse transcriptase domain of HBV in humans (Osiowy et al. 2006). However, the occurrence of mutations is not sufficient to fully explain the genetic variability of the DBALV population, reinforcing a possible influence of other evolutionary forces such as migration and recombination.

We conclude that the high genetic variability observed for DBALV populations may reflect its ability to evolve rapidly under the yam cultivation system implemented in Brazil. This should be taken into account in cases of implementation of control measures based on genetic resistance, as well as in the use of detection methods based on genome sequence, since both could fail due to the rapid evolution of this virus.