Introduction

The family Geminiviridae consists of viruses with either monopartite or bipartite circular single-stranded (ss)DNA genomes, with each component individually encapsidated in twinned quasi-isometric (geminate) 22-38 nm virions [3, 17, 55]. The family contains nine genera [46, 48, 54], of which two are relevant to this communication here - Begomovirus and Mastrevirus. Begomoviruses and mastreviruses transmitted by the whitefly Bemisia tabaci and by leafhoppers, respectively, in a circulative manner. The genomes/genomic components of geminiviruses are typically 2600-3600 nucleotides [5, 54].

The family Nanoviridae encompasses viruses with multicomponent circular ssDNA genomes that are individually encapsidated within isometric 17-20 nm virions. The family is divided into two genera, Nanovirus and Babuvirus, containing viruses transmitted by aphids in a circulative, non-propagative manner [50, 51]. Members of the genus Nanovirus infect dicotyledonous host plants whereas those of the genus Babuvirus infect monocotyledonous plants. Typically, the genomes of nanoviruses have eight components and the genomes of babuviruses consist of six components. The genomic components of viruses in both genera typically range in size from 970 to 1116 nucleotides [51]. The bona fide genome components of a nanovirus genome share two regions of sequence similarity known as the common region stem-loop (CR-SL) and the common region major (CRM; Figure 1). The DNA-R component of nanoviruses encodes a replication-associated protein (Rep) [7, 12, 13, 15, 16, 25, 38, 42] which is involved in replicating all bona fide genome components.

Fig. 1
figure 1

Illustration of the Rep-encoding DNA molecules of nanovirus- and geminivirus-associated alphasatellites in comparison to the DNA-R components of nanoviruses. The diagrams include the position of the replication-associated protein (Rep), the TATA box of the presumed Rep promoter, an adenine rich sequence (A-rich) in geminivirus-associated alphasatellites and the common region major and common region stem-loop of nanovirus DNA-R components. All components have a predicted stem-loop structure (at position zero) with, in most cases, the nonanucleotide sequence TAGTATTAC forming part of the loop. This likely forms the origin of virion-strand DNA replication that is nicked by Rep to initiate rolling-circle replication

Viruses of the family Nanoviridae, and some begomoviruses and mastreviruses in the family Geminiviridae may be associated with additional circular ssDNA components that resemble the DNA-R components of nanoviruses; these will hereinafter be referred to here as alphasatellites. In common with the DNA-R component of nanoviruses, the alphasatellites encode a Rep protein and have a predicted stem-loop structure with, in most cases, the nonanucleotide sequence TAGTATTAC forming part of the loop (Figure 1). Alphasatellites associated with nanoviruses have a similar size to that of typical nanovirus components (~ 1100 nucleotides), lack the CR-SL and CRM of their helper viruses and are also unable to trans-replicate the bona fide genome components of these helper viruses [18, 43,44,45]. Alphasatellites have also been identified in association with begomoviruses and, in one reported case, a mastrevirus [4, 24]. Typically comprising ~ 1400 nucleotides, the geminivirus associated alphasatellites are significantly larger than both nanovirus components and nanovirus alphasatellites [4]. This size difference, primarily due to the presence in geminivirus alphasatellites of an adenine rich region of sequence and a larger rep gene, is thought to be necessary for the encapsidation of geminivirus alphasatellites by their helper viruses. Geminiviruses have strict restrictions on the size of circular ssDNA molecule that they can encapsidate being half the geminivirus genome length (~ 1400 nucleotides), geminivirus alphasatellites could possibly be encapsidated in isometric (half geminate) virions [8]. For both nanovirus- and geminivirus-associated alphasatellites the sequence relatedness between, and similarity in structure to, nanovirus DNA-R components has led to the hypothesis that the satellites evolved from DNA-R components that were “captured” following co-infections [26, 40].

It is interesting to note that most geminivirus-associated alphasatellites described from the Old World have been associated with monopartite begomoviruses infections that frequently include the presence of betasatellites [39, 52, 56]. In the New World where monopartite begomoviruses are very rare, betasatellites have not been identified and alphasatellites have only ever been reported to occur in association with bipartite Begomovirus infections [10, 32, 37].

The precise impact of alphasatellites on begomovirus and nanovirus infections remains unclear. The presence of nanoalphasatellites in some nanovirus infections has been associated with reduced nanovirus infectivity [45] while geminialphasatellites have been associated with reduced or exacerbated symptoms [27, 32] and/or reduced titer of genomic or betasatellite DNA [19, 21]. Whereas the encoded Rep proteins of alphasatellites associated with the begomoviruses Gossypium darwinii symptomless virus and Gossypium mustelinium symptomless virus suppress post-transcriptional gene silencing [30], those encoded by various other begomovirus alphasatellites are suppressors of transcriptional gene silencing [1]. Due to the rising number of alphasatellites being described, owing mainly to both increasing interest in begomoviruses and nanoviruses, and the ease with which these molecules can be isolated and characterized, there is an urgent need for a robust and workable system of nomenclature and classification for these molecules. Since alphasatellites rely on a helper virus for their spread and are therefore not independent entities, there is very little biological data that can be unequivocally attributed to any of these molecules. It is therefore necessary to base an alphasatellite classification system primarily on information extracted from the nucleotide sequences of these molecules. Such a system could then be adjusted once biological evidence becomes available.

We communicate the creation of the family Alphasatellitidae to which alphasatellites have been assigned and the establishment of two subfamilies based on the fact that there are clear differences between geminivirus- and nanovirus-associated alphasatellites. These include genome length and a distinctive A-rich region downstream of the Rep coding region of the alphasatellites that are associated with geminiviruses. The two sub families are

  1. 1)

    Geminialphasatellitinae: geminivirus-associated alphasatellites

  2. 2)

    Nanoalphasatellitinae: nanovirus-associated alphasatellites

Geminialphasatellitinae

Sequences of geminivirus-associated alphasatellites were downloaded from GenBank (1st May 2017). These were checked to ensure they were complete molecules, had intact Rep open reading frames and were not sub-genomic begomovirus genome components. The filtered 628 geminivirus-associated alphasatellites were analysed with SDT v1.2 [29]. The 628 sequences that remained were aligned using MUSCLE [9] and used to infer a maximum likelihood phylogenetic tree using IQ-TREE [31] with the nucleotide substitution model, GTR+I+G4 (adjudged to be the best fitting by ModelFinder [20]). This tree was rooted with the representative DNA-R sequence of nanoviruses (AB027511, AF102780, AJ290434, AJ749894, EF546807, GU553134, HE654123, JX867550, KC978949, KC979035, KC979054, KX534388, KX534389, KX534390, KX534391, MF535450)

The distribution of the 197506 pairwise identities between each of the possible pair of these 628 sequences had troughs at ~ 70% and ~ 88% pairwise identity (Figure 2) indicating that these values could respectively ben used as genus and species demarcation thresholds, respectively, thus yielding a classification of these sequences with a minimal degree of conflict (Figure 2). These demarcation thresholds are broadly consistent with the viruses in the different species and genera having distinctive helper viruses, geographical distributions and phylogenetic placements (Figure 3; Figure 4).

Fig. 2
figure 2

Distribution of pairwise identities of geminivirus-associated alphasatellites determined using SDT v1.2 [29]

Fig. 3
figure 3

A ‘three color’ pairwise identity matrix inferred using SDT v1.2 [29] showing that both the genera demarcation threshold of 70% and that for species at 88% are supported

Fig. 4
figure 4

Maximum likelihood phylogenetic tree of representative geminivirus-associated alphasatellite sequences from each species inferred using IQ-TREE [31] with GTR+I+G4 chosen as the best-fit model. Branches with less than 60% bootstrap support have been collapsed

The name Alphasatellitidae derives from the name alphasatellite which is now in common usage for this group of satellites. Upon first identification, the begomovirus-associated alphasatellites were called DNA 1 in recognition of their relatedness to the Rep-encoding components of nanoviruses (DNA-R) which at the time were called DNA 1 [26]. The alphasatellite genera names that we have chosen are derived from the names of the type species so, for example, the genus name Colecusatellite derives from the name of the first begomovirus alphasatellite identified, Cotton leaf curl Multan alphasatellite. The alphasatellite species names are, whenever possible, derived from the name of the helper virus with which they were first identified; so cotton leaf curl Multan alphasatellite was first identified in a cotton plant infected with cotton leaf curl Multan virus.

Based on the above criteria, four genera and 43 species are established in the subfamily Geminialphasatellitinae. The classification of the genera and species in the subfamily Geminialphasatellitinae is summarized in Table 1.

Table 1 Summary of the genera and species in subfamilies Geminialphasatellitinae and Nanoalphasatellitinae

The 43 Geminialphasatellitinae species

  1. 1.

    All have a distinctive organization consisting of a single conserved Rep encoding gene in the virion-sense, and a predicted hairpin structure at their presumed origin of virion strand replication that contains, in most cases, a TAGTATTAC loop sequence.

  2. 2.

    With the exception of two alphasatellites, all have been shown to be associated with a geminivirus.

For two alphasatellite sequences that were included in the analysis no helper begomovirus is known. Sequence JX458742 (species Dragonfly associated alphasatellite) was isolated from a dragonfly caught in Puerto Rico and no virus could conclusively be shown to be associated with the molecule. Sequence KT099172 (species Whitefly associated Guatemala alphasatellite 1) was isolated from a B. tabaci whitefly originating from Guatemala and no virus could conclusively be shown to be associated with the molecule.

Guidelines for classification of new geminialphasatellites species

Pairs of complete geminialphasatellite sequences with ≥ 88% pairwise identity calculated using pairwise MUSCLE alignments with similarities calculated ignoring sites with gaps (such as is implemented in SDT v1.2 [29]), should be considered as members of the same species. Those that have < 88% pairwise identity can be considered as new species when coupled with good phylogenetic support.

We recommend that the following steps, similar to those outlined for geminiviruses [6, 28, 46] be taken to resolve such conflicts in cases where 1) a complete geminialphasatellite sequence shares ≥ 88% pairwise identity to sequences that have been assigned to two different species; or 2) a complete geminialphasatellite sequence shares ≥ 88% genome-wide pairwise identity to one or a few sequences assigned to a particular species, even though it shares < 88% identity with the majority of sequences in that species

  1. 1)

    The new geminialphasatellite sequence should be considered as belonging to the species containing sequences with which it shares the highest percentage pairwise identity.

  2. 2)

    The geminialphasatellite should be classified as belonging to any species with which it shares ≥ 88% pairwise identity to any one sequence formerly classified as belonging to that species even if it has < 88% pairwise identity to all other sequences classified as belonging to that species.

Nanoalphasatellitinae

Sequences of nanovirus-associated alphasatellites were downloaded from GenBank (1st May 2017). These sequences were checked to make sure they were complete and had intact Rep open reading frames. The 54 filtered nanovirus-associated alphasatellites were analysed with SDT v1.2 [29]. The sequecnes were aligned using MUSCLE [9] and the resulting alignment was used to infer a maximum likelihood phylogenetic tree using IQ-TREE [31] with the nucleotide substitution model, TIM2+F+G4 (adjudged the best fitting model for these sequences by ModelFinder [20]). The tree was rooted with representative DNA-R sequences of nanoviruses (AB027511, AF102780, AJ290434, AJ749894, EF546807, GU553134, HE654123, JX867550, KC978949, KC979035, KC979054, KX534388, KX534389, KX534390, KX534391, MF535450).

The distribution of 1326 pairwise identities obtained from every possible pair of the 54 nanovirus-associated alphasatellites had troughs at ~ 67% and ~ 80% (Figure 5), and these were chosen respectively as genus and species demarcation cut offs that would yield the minimum number of classification conflicts (Figure 5). The species demarcation threshold implies the existence of at least 19 nanovirus-associated alphasatellite species and the genus demarcation threshold implies the existence of at least seven genera (Figures 6 and 7). The seven genera and 19 species are supported by the phylogenetic clustering of the nanovirus-associated alphasatellite sequences (Figure 7). A summary of the genera and species in the subfamily Nanoalphasatellitinae is provided in Table 1.

Fig. 5
figure 5

Distribution of pairwise identities of nanovirus-associated alphasatellites determined using SDT v1.2 [29]

Fig. 6
figure 6

A ‘three color’ pairwise identity matrix inferred using SDT v1.2 [29] showing that both the genera demarcation threshold of ~ 67% and that for species at 80% are supported

Fig. 7
figure 7

Maximum likelihood phylogenetic trees of the nanovirus-associated alphasatellite sequences inferred using IQ-TREE [31] with TIM2+F+I+G4 chosen as the best fit model. Branches with less than 60% bootstrap support have been collapsed. The Maximum likelihood phylogenetic tree on the left is a guide for genera demarcation with the cyan line showing the rough 67% threshold

Guidelines for classification of new nanoalphasatellites species

Pairs of nanoalphasatellite sequences with ≥ 80% pairwise identity calculated using pairwise MUSCLE alignments with similarities calculated ignoring sites with gaps (such as is implemented in SDT v1.2 [29], should be considered as members of the same species. Those that hare < 80% pairwise identity can be considered as new species when coupled with good phylogenetic support.

We recommend that the following steps, similar to those outlined for geminiviruses [6, 28, 46] be taken to resolve such conflicts in cases where 1) a complete nanoalphasatellite sequence shares ≥ 80% pairwise identity to sequences that have been assigned to two different species; or 2) a complete nanoalphasatellite sequence shares ≥ 80% genome-wide pairwise identity to one or a few sequences assigned to a particular species, even though it shares < 80% identity with the majority of sequences in that species.

  1. 1)

    The new nanoalphasatellite molecule sequence should be considered as belonging to the species containing sequences with which it shares the highest percentage pairwise identity.

  2. 2)

    The nanoalphasatellite should be classified as belonging to any species with which it shares ≥ 80% pairwise identity to any one sequence formerly classified as belonging to that species even if it has < 80% pairwise identity to all other sequences classified as belonging to that species.

Unassigned species in the family Alphasatellitidae

Sequence M29963 (Table 1; species Coconut foliar decay alphasatellite) was isolated from a coconut palm affected by coconut foliar decay disease [36], a disease earlier shown to be associated with a circular ssDNA virus [14, 33,34,35]. However, no virus has so far been identified or characterized infecting coconut even though virions have been described [35]. M29963 does not have all the hallmarks of either Geminialphasatellitinae or Nanoalphasatellitinae alphasatellites in that it has a size (1291 nt) intermediate between the alphasatellites assigned to these sub-families and lacks an A-rich sequence. Furthermore, it was isolated from a monocotyledonous host. Two geminialphasatellites have been isolated from monocotyledonous plants and both of these have features typical of the Geminialphasatellitinae [24]; and twelve have been isolated from bananas and cardamom, all of which have similarities to other nanoalphasatellites [2, 11, 18, 25, 53]. We therefore propose this molecule to represent an unassigned species in the family Alphasatellitidae. It is likely that it is a member of a third sub-family that could be formally proposed once more member sequences have been identified.

Concluding remarks

The last decade has seen a flurry of activity in the identification of circular ssDNA molecules and viruses primarily being driven by multiple displacement amplification techniques coupled with affordable sequencing (both Sanger and high throughput). This has resulted in the identification of a large number of sequences corresponding to diverse viral and viral-like elements sequences from symptomatic and asymptomatic hosts, and from environmental sampling using viral metagenomics approaches. The communication by Simmonds et al. [41] addressed the classification of diverse groups of viral sequences derived from metagenomics approaches and this has resulted in significant changes in viral classification. For example, new families have been established to accommodate virophages and satellite viruses [23] and various novel groups of circular Rep encoding ssDNA viruses derived from environmental sources (with unknown hosts or pathology) [22, 47, 49]. In line with these significant changes on classification of viruses and viral-like elements, we communicate the establishment of a new family Alphasatellitidae with two sub-families Geminialphasatellitinae and Nanoalphasatellitinae for geminivirus- and nanovirus-alphasatellite molecules.