Next Article in Journal
Structural and Biological Features of G-Quadruplex Aptamers as Promising Inhibitors of the STAT3 Signaling Pathway
Next Article in Special Issue
In Silico Identification of Potential Quadruplex Forming Sequences in LncRNAs of Cervical Cancer
Previous Article in Journal
Chondrocyte Thrombomodulin Protects against Osteoarthritis
Previous Article in Special Issue
Virus-Induced Gene Silencing (VIGS): A Powerful Tool for Crop Improvement and Its Advancement towards Epigenetics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Presence, Location and Conservation of Putative G-Quadruplex Forming Sequences in Arboviruses Infecting Humans

Department of Molecular Medicine, University of Padua, Via A. Gabelli 63, 35121 Padua, Italy
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(11), 9523; https://doi.org/10.3390/ijms24119523
Submission received: 21 March 2023 / Revised: 12 May 2023 / Accepted: 12 May 2023 / Published: 30 May 2023
(This article belongs to the Special Issue Bioinformatics of Unusual DNA and RNA Structures)

Abstract

:
Guanine quadruplexes (G4s) are non-canonical nucleic acid structures formed by guanine (G)-rich tracts that assemble into a core of stacked planar tetrads. G4s are found in the human genome and in the genomes of human pathogens, where they are involved in the regulation of gene expression and genome replication. G4s have been proposed as novel pharmacological targets in humans and their exploitation for antiviral therapy is an emerging research topic. Here, we report on the presence, conservation and localization of putative G4-forming sequences (PQSs) in human arboviruses. The prediction of PQSs was performed on more than twelve thousand viral genomes, belonging to forty different arboviruses that infect humans, and revealed that the abundance of PQSs in arboviruses is not related to the genomic GC content, but depends on the type of nucleic acid that constitutes the viral genome. Positive-strand ssRNA arboviruses, especially Flaviviruses, are significantly enriched in highly conserved PQSs, located in coding sequences (CDSs) or untranslated regions (UTRs). In contrast, negative-strand ssRNA and dsRNA arboviruses contain few conserved PQSs. Our analyses also revealed the presence of bulged PQSs, accounting for 17–26% of the total predicted PQSs. The data presented highlight the presence of highly conserved PQS in human arboviruses and present non-canonical nucleic acid-structures as promising therapeutic targets in arbovirus infections.

1. Introduction

Vector-borne diseases are bacterial, viral, or parasitic infections transmitted to the human host through the bite of infected arthropod species, such as mosquitoes, ticks, midges, and flies [1]. In 2020, the World Health Organization (WHO) declared that vector-borne diseases accounted for approximately 20% of all infectious diseases [2]. In the case of viral infections, arthropod-borne virus (arbovirus) infections are a global threat, as travel and trade contribute to the spread of vectors and viruses over large geographical areas, and climate change favours disease transmission [3,4,5]. Arboviruses are a large group of RNA viruses belonging to different families and genera, of which about fifty members are known to infect humans. A few members of arboviruses cause mild flu-like symptoms and joint pain. The vast majority of Arboviruses cause severe and life-threatening disease, with mortality rates as high as 50% [2,6,7]. Specific anti-arbovirus treatments are not available and vaccines have been developed against less than 10% of the arboviruses. As a result, arbovirus infections are controlled solely by prevention strategies to hinder the spread of viruses in the environment and among humans. Efforts to prevent and treat vector-borne viral diseases must be intensified. In fact, in 2022, WHO launched the Global Arbovirus Initiative to promote all initiatives to control arboviruses with epidemic and pandemic potential [8].
Approved antiviral drugs target viral proteins involved in key viral steps, from viral entry to viral gene expression and genome replication. Direct targeting of nucleic acid is very sporadic, as achieving selective targeting has always been extremely challenging. Nucleic acids have been shown to fold into structures alternative to the classical double helix, which do not obey the Watson-Crick hybridization canon and are therefore defined as non-canonical nucleic acid structures. Among these non-canonical structures, G-quadruplexes (G4s) have been shown to play key biological roles both at the human and viral level [9,10,11,12,13,14,15]. G4s can form in G-rich sequences of DNA or RNA, in which four guanines (Gs) are linked by Hoogsteen-type hydrogen bonds to form planar square structures called G-quartets. The stacking of successive G-quartets leads to the formation of the G4 structure, which is supported and stabilized by physiological cations, such as potassium or sodium [16]. G4s have been identified primarily in mammalian cells, but more recently their presence in viruses, bacteria and parasites has also been investigated [17]. At the viral level, G4s are involved in the control of key viral processes, such as transcription, genome replication and the induction or maintenance of the viral latency [12].
Several algorithms have been validated to predict the presence and distribution of putative quadruplex (G4)-forming sequences (PQSs) in genomes [18]. The different algorithms calculate the presence of PQSs or the G4 folding propensity, taking into account the number of Gs and G islands as well as the loop length. G4 prediction algorithms have been trained on the human genome [18,19]. To date, few bioinformatic analyses have predicted PQSs in microorganisms using tools such as the well-established QGRS and G4Hunter [18,19,20,21,22,23,24,25]. Viral genomes have been shown to contain G4s that do not strictly follow the rules of canonical G4s, but include bulges, mismatches and stem loops [26,27]. Therefore, PQS prediction algorithms that take into account the possibility of G4s folding from imperfect G-runs should be used to better estimate the presence of PQSs on viral genomes. Recently, Bioconductor’s pqsfinder tool was released as a flexible tool for analyzing putative PQSs that also contain bulges or mismatches [28].
This work shows that arboviruses embed both canonical and bulged PQSs and that the different viral families show different patterns of PQSs enrichment or depletion. The conservation of each predicted PQS among virus isolates was also analysed to correlate the presence of highly conserved putative G4 sequences with their possible biological role. Our data provide new information on the evolutionary conserved PQSs among human arboviruses, provide insights into unexplored aspects of arbovirus biology and reveal innovative anti-arbovirus targets.

2. Results

2.1. Prediction of PQSs in Human Arboviruses

Arboviruses were grouped according to the Expasy ViralZone and NCBI taxonomy classifications [29,30]. A total of 40 different arboviruses infecting humans was retrieved, which were further divided into three groups on the basis of the type of nucleic acid constituting their genomes: 1 dsRNA, 16 negative-strand ssRNA and 23 positive-strand ssRNA (Table 1). For each virus, the complete set of sequenced genomes was downloaded from the NCBI database. Partially sequenced and unverified genomes, as well as genomes containing nucleotide strings longer than five nucleotides without base assignment (i.e., NNNNN) were not considered for further analysis. For each virus, the nucleotide sequence to be considered as reference genome was retrieved from the NCBI Reference repository. The accession numbers are listed in Table 1.
First, for each virus, the GC content of the reference genome alone and of all the sequenced virus isolates was calculated and expressed as an average value (Table 1). Taken together the arboviruses have an average GC content of 44%. Positive-strand ssRNA viruses reference genomes have an average GC content around 50% (48–55%), whereas negative-strand ssRNA viruses reference genomes display GC contents that span from 33% to 50%. The Banna virus, which is the only arbovirus with a dsRNA segmented genome, has a GC content of 37% to 42%, depending on the segment. Analysis of the mean GC content of all sequenced genomes per virus provided data on the conservation of G and C residues, possibly involved in G4 formation. For the majority of positive-strand ssRNA viruses, the GC content was conserved, with the exception of Dengue strains 1 and 2. Notably, Dengue 1 and 2 are also the viruses with the highest number of sequenced genomes among all analyzed viruses, 2095 and 1764, respectively. The analysis of the negative-strand ssRNA viruses showed that the family members with segmented genomes shared a lower GC content conservation (e.g., Crimean Congo hemorrhagic fever virus), whereas the members with the genome composed of a single linear molecule of RNA (e.g., Chandipura virus) showed a very high GC content conservation. Among segmented negative-strand ssRNA viruses, the S segment, coding for non-structural proteins, was the less conserved. Once again, the viruses with the highest numbers of sequenced genomes showed the highest variability in GC content values. The conservation of the GC content in the segments of the Banna virus, the only dsRNA virus in this analysis, was very segment-dependent. Segment 9 was the less conserved and codes for the outer-capsid protein VP9. In this case too, we could analyze more sequences from this segment than from the other eleven. This may be because VP9 has been studied and recognized as the protein involved in host attachment and viral internalization [31].
Next, the pqsfinder algorithm was run on all reference genomes. The algorithm was set up to recognize sequences with G-runs containing at least two G residues. Each PQSs could have loops with a maximum length of 12 nucleotides and a maximum of one loop with a length of zero nucleotides. The pqsfinder algorithm was used to identify canonical PQSs and PQSs harboring a single bulge (Table 2). PQSs containing mismatches (non-G bases in the G-quartet) were excluded from the prediction. The minimum acceptable score was set at 12, in order to exclude PQSs that were characterized by short G-runs, together with long loops and a bulge, and therefore unlikely to form. Both the positive and the negative RNA strands were analyzed for the presence of PQS, as they represent two different stages of viral infection and are both essential in the viral replication cycle (Table 2) [32].
The analysis that considered canonical and bulged PQSs showed that the reference genomes of positive-strand ssRNA viruses, Flavivirus and Alphavirus, are particularly enriched in PQSs. In particular, the Japanese encephalitis, the Langat, the Louping ill, the Tick-borne encephalitis and Zika viruses were predicted to embed more than one hundred PQSs in their genomes. In general, the members of the Flavivirus family embed an average of 94 PQSs per genome. The second group of positive-strand RNA viruses, the Alphaviruses have an average of 67 PQSs per genome, with the Semliki Forest virus topping the list with 95 PQSs. Negative-strand ssRNA viruses, with the genome consisting of a single linear RNA strand, showed an average of 37 PQSs, with the Chandipura virus and the Australian bat lyssavirus showing the highest number of PQSs (i.e., 46 and 45, respectively) and the non-Indiana Vesicular stomatitis virus strains showing the fewest (i.e., 29). Segmented negative-strand ssRNA viruses and the dsRNA Banna virus, although not so different in GC content from the other viruses, were predicted to have very few PQSs (average of 10 and 2, respectively). A closer look at the PQSs strand location (Table S1) showed that among the positive-strand RNA viruses, Flaviviruses embed more PQSs in the positive strand (i.e., the viral genome, but also the viral mRNA) [33], whereas Alphaviruses have members with equal strand distribution (Chikungunya, Mayaro, O’nyong-nyong, Ross River, Sagiyama, Semliki Forest, Venezuelan and Western equine encephalitis) and members with PQSs mainly located in the antigenome strand (negative-strand) [34]. Single linear negative-strand RNA viruses have more PQSs in the positive strand, i.e., in the antigenome which corresponds also to the viral mRNA. Segmented negative-strand viruses showed the highest variability, with PQS distributed on both strands, depending on the virus.
The number of canonical PQSs was then calculated, excluding the bulged sequences from the first prediction (Table 2). The maximum loop length and minimum sequence score remained the same as in the previous analysis. This analysis showed that approximately 22% of the calculated PQSs are non-canonical, when arboviruses are considered as a single group. Looking at single classes of viruses, 21% of PQSs in Alphaviruses (positive-strand ssRNA viruses), 26% in negative-strand ssRNA viruses with a single linear RNA and 23% in segmented RNA viruses contain a bulge. Flaviviruses, the viruses with the highest number of PQSs, are less likely to have bulged PQSs (17%). The Banna virus has 22% of non-canonical PQSs (Table 2).
Next, the significance of the predicted PQSs was then calculated. To assess whether the predicted PQSs in arboviruses were statistically relevant or random, the results from viral genomes were compared with those obtained by viral genome simulation. Shuffled genomes (one hundred per virus), with the same nucleotide composition but different order with respect to the references, were generated. The presence of PQSs was predicted using the same parameters as in the first analyses (Table 2). To estimate the statistical significance of PQSs prediction, data on viruses (Reference genomes) and on shuffled genomes were subjected to one-sample t-test and p-values were calculated [35]. The one-sample t-test was used to determine whether the average PQSs number of the shuffled genomes (one hundred per virus) was significantly different from the PQSs number predicted on the relative viral reference genome. p-values lower than 0.001 were considered significant. Significance analysis and generation of corresponding p-values indicated that 52% of the considered viral genomes/segments were significantly enriched in PQSs, while 37% showed significant depletion in PQSs compared to the presence of G-runs on shuffled genomes/segments. The remaining 11% of viral genomes/segments showed no significant enrichment or depletion in PQSs. The PQS prediction was highly significant for positive-strand RNA viruses, with the exception of the Eastern equine encephalitis virus and the Ross River virus. Flaviviruses are all enriched in PQSs, whereas Alphaviruses, with the exception of Barmah Forest and Semliki Forest viruses, are depleted in PQSs. When considering negative-strand RNA genomes, viruses with single linear RNA genomes are all statistically enriched in PQSs, whereas segmented viruses display segments with PQSs enrichment, PQSs depletion and segments that are not significantly enriched in PQSs (i.e., Rift Valley fever virus). The Banna virus (dsRNA) has segments 10 and 11 with non-significant p-values, the other 10 segments have a statistically significant PQS prediction, despite the low number of predicted PQSs.

2.2. Conservation of Predicted PQSs and Genomic Location of Highly Conserved PQSs

Once the presence of PQSs in arboviruses had been predicted and their statistical relevance had been assessed, their distribution within the genomes was examined (Figure 1 and Figures S1–S5, density panels). PQS density distribution was calculated using pqsfinfer density function: it indicates if and where PQSs were clustered within the genome of interest; high scoring PQSs clustered in high density regions are considered to have a higher folding potential. We observed that, in general, PQSs were widely distributed across the length of the genome and that PQSs with high scores, and therefore more likely to form, tended to cluster together.
RNA viruses are prone to genomic mutations to enhance their environment/host adaptability [36,37], so the conservation of PQSs across all sequenced isolates of each virus species was assessed, hypothesizing that the presence of a conserved PQS in a poorly conserved genomic environment would strengthen the hypothesis of a significant biological function. The conservation rate of each predicted PQSs in all genome/segment sequences we retrieved from the NCBI database was calculated (Figure 1 and Figures S1–S5). We considered the conservation analysis to be significant when at least 5 isolates per virus were available.
The different RNA virus populations have different mutation rates [38]. Positive-strand ssRNA viruses have high mutation rates, followed by negative-strand ssRNA viruses. DsRNA viruses have the lowest mutation rate of the three viral classes analyzed. Notably, PQSs do not seem to follow this rule, as many members of the Flavivirus and Alphavirus (e.g., West Nile virus and Semliki Forest virus) have highly conserved PQSs throughout the genome. The Eastern equine encephalitis and the Ross River viruses, which would be excluded by the previously calculated p-value on shuffled genomes, have highly conserved PQSs. On the contrary, few negative-strand ssRNA viruses have highly conserved PQSs. We found few conserved PQSs in the segments of the Uukuniemy and the Sandfly viruses and in the genomes of the Chandipura and Vesicular stomatitis viruses. In the case of segmented negative-stranded RNA viruses, we found that segments with significant p-values (Table 2) did not have non-significantly conserved PQSs. The Banna virus (dsRNA) was not only poor in terms of predicted PQSs, although they were statistically significant, but also showed a very low rate of PQSs conservation among the virus isolates. Taken together, the conservation analysis revealed that the conservation of a particular PQS among isolates belonging to the same virus was not related to the initial prediction score (i.e., the folding propensity and associated stability), nor did it depend on the presence of other PQSs in the vicinity (density).
The conservation of canonical and bulged PQSs was calculated and their percentage of conservation was reported if they were conserved in more than 80% of the analysed viral genomes. Notably, in positive polarity viruses (Flaviviruses and Alphaviruses) bulged and canonical PQSs were conserved to the same extent, with viruses such as the Sagiyama and the Semliki Forest viruses having more than 87% of fully conserved canonical and bulged PQSs. The pattern of conservation among negative polarity RNA viruses is much more diverse. Segmented RNA viruses do not appear to conserve bulged PQSs, nor do they have a high conservation rate of canonical PQSs. Single linear negative polarity RNA viruses tend to conserve both canonic and bulged PQSs, with viruses such as the Australian bat Lyssavirus with no conservation potential and viruses such as the Isfahan virus and the Vesicular stomatitis (non-Indiana strains) having high conservation rates of both canonical and bulged PQSs. The Banna virus (dsRNA) showed no strong conservation of either type of PQSs (Table 2).
The genomic location of all PQSs with more than 85% conservation was then determined (Figure 1 and Figures S1–S5, conservation panel, Figure 2 and Figure S6). Genome coordinates were obtained for 5′- and 3′-untranslated (UTR) and coding sequences (CDS). For the majority of arboviruses, the CDSs are well defined in the Reference genomes, whereas the UTRs are more inconsistently annotated. When missing in the annotation file, the UTRs were manually defined as the regions preceding the first CDS and closing the genome after the last nucleotide of the last CDS. Flaviviruses showed a strong tendency to have and preserve PQSs in coding regions but also in 3’ UTRs. The vast majority of conserved PQSs of Alphaviruses are embedded in coding sequences, with the exception of the Semliki Forest and Ross River viruses, which also have conserved PQSs in the UTRs. Negative-strand viruses preserve PQSs in coding sequences, with exceptions such as Bunyamwere La Crosse virus, Dugbe virus, Sadfly Sicilian virus and Rift Valley fever virus that embed conserved PQSs also in UTRs of L and M segments (Figure 2 and Figure S6).

3. Discussion

The presence and possible key roles played by nucleic acid secondary structures, in particular G4s, during the viral cycle of major human pathogens, such as HIV-1, HSV-1 and many others [12], has begun to be demonstrated. Understanding the regulation of G4 folding in viruses has attracted much attention due to the potential use of G4s as targets for innovative antiviral therapies [14]. The most comprehensive viral genome analyses have been performed using algorithms that predict canonical putative G4s [18,22,39], or using algorithms that penalize sequences with cytosine runs [20,21,23]. These pattern-based algorithms do not consider non-canonical G4 forming sequences, such as those containing bulges or stem loops. Notably, both G4s folding with bulges or forming stem loops have been reported at the viral level [26,27,40].
Arboviruses are a major threat to humans with no specific pharmacological treatment and few prevention strategies [4]. Here we challenged the PQS Finder algorithm, which was designed to be imperfection-tolerant and validated on the human sequence data [28], with the genomes of arbovirus that infect humans. We had previously performed an extensive analysis on human viral pathogens using a traditional approach [39]. In this study we extended and included seven novel members of the arboviruses (Bunyavirus snowshoe hare, Chandipura, Dhori, Isfahan, Punta Toro, Sandfly fever Sicilian, Tick-borne encephalitis, Possawan encephalitis, and Usutu viruses) and examined the presence of PQSs and their conservation in all sequenced) virus isolates (up to February 2023). More than twelve thousand genomes/segments were analysed, belonging to the forty different arbovirus that infect humans. The present work provides new data showing that: a. arboviruses harbor bulged PQSs in addition to canonical ones; b. the vast majority of the predicted PQSs are statistically significant when compared with shuffled sequences; c. not only the canonical but also the bulged PQSs are conserved; d. the conserved PQSs are mainly located in coding sequences and 3’UTRs.
Our data show that frequency of PQSs is not related to the GC content of viral genomes, confirm that the clustering of G-runs is not random, and suggest a specific biological role for G4 structures at the arboviral level. Viruses, especially those with an RNA genome, such as that of arboviruses, mutate with high frequency [36,38,41]. The comparison with randomly generated shuffled genomes showed that members of the arbovirus family are statistically enriched or depleted in PQSs, revealing that certain members of this RNA virus family, prevent the generation of novel regions that could fold into G4s, despite their high mutation rate.
The conservation of PQSs in viruses is one of the strongest indications of the biological relevance of G4s. The utmost conservation of G-tracts in a G4-forming pattern indicates that they are required for infection/replication/transmission of the virus. Our data show that, depending on the virus class, both canonical and bulged PQSs are conserved among virus isolates, suggesting that also bulged G4s play a role in the biology of arboviruses. In addition, our analysis suggests that PQSs in the coding sequences and 3’UTRs of positive-strand ssRNA viruses, especially Flaviviruses, could play an essential role during viral infection.
The role of G4s in regulating transcription and translation when embedded in coding regions has begun to be elucidated at the human level [42]: our data emphaticize their regulatory role also in arboviruses, especially Flaviviruses where PQSs are mainly located on the positive RNA strand, which acts as both viral genome and viral mRNA. Furthermore, since the 3′ UTR of positive-strand ssRNA viruses regulates numerous aspects of the viral life cycle such as replication/translation and the complex network of the host-cell interactions, the highlighted presence of several G4s at this genomic level paves the way for a deeper understanding of G4s as regulators of novel aspects of arbovirus infection.
Our data also show that G4s are not particularly abundant or conserved in negative-strand ssRNA or dsRNA arboviruses. Negative-strand and dsRNA viruses have more complex viral cycles than positive-polarity ssRNA viruses: under these conditions, G4-mediated slowing of viral transcription and replication may be more likely to be avoided [43].

4. Materials and Methods

4.1. Viral Genomes Selection

Accessible viral genomic sequences (12240 in total) belonging to the 40 arboviruses infecting humans were downloaded from the genome database of the NCBI in January 2023. Genomes were grouped in FASTA-format files. Dengue viruses were clustered according to the four reported serotypes (Dengue virus 1–4) [44]. Vesicular stomatitis viruses (VSVs) were divided into two separated groups, the first including the Indiana serotype and the second containing all the other serotypes [45]. For each virus, the NCBI entry code indicated in the NCBI reference repository was considered as reference genome [46]. For the West Nile virus, two different reference genomes (lineage 1 and Kunjiun subtype) are considered as reference, the NCBI Reference Sequence NC_009942.1 (lineage 1) was set as reference genome. Differently from Dengue virus isolates, West Nile sequences are not registered indicating the lineage or the subtype, so that all genomes corresponding to the West Nile virus were aligned together. FASTA files were purged from unverified or partially sequenced genomes, as well as from genome sequences containing multiple stretches of nucleotides lacking base assignments (i.e., NNN). NCBI accession codes and FASTA files containing all viral complete genomic sequences are shown in Table 1 and contained in Supplementary Material (aligned_genomes.zip), respectively. Genomes were aligned using the Jalview platform [47].

4.2. Bioinformatic Prediction of Putative G4-Forming Sequences and Conservation Analysis

4.2.1. Prediction of PQS

All the analyses were performed using R (version 4.2.2). The FASTA files containing the reference genomes and the FASTA files containing the multiple alignments were loaded onto the R platform and GC content was calculated using Biostrings (2.66) [48,49]. PQS prediction was performed on the reference genome using pqsfinder [28] (version 2.14.1) with the following parameters: deep = TRUE, min_score = 12, max_bulges = 1, max_mismatches = 0, loop_max_len = 12. The deep parameter has been set to TRUE to allow detection of PQS clusters. Canonical PQS prediction was performed retrieving the sequences displaying no bulges from the initial PQS prediction. PQS density was predicted using the pqsfinder density function. The PQS score was automatically assigned by the pqsfinder algorithm. All R scripts created to generate predictions, density, score, GC content, and conservation rates are available in the Supplementary Material.

4.2.2. Shuffling and Statistical Analyses

The R universalmotif (version 1.16.0) was used to shuffle the reference genome sequences [50]. Each reference was shuffled one hundred times using the linear method with k-let = 1. For each shuffled sequence, a PQS prediction was performed with the pqsfinder parameters indicated in Section 4.2.1. To estimate the significance of the analysis, the one-sample t-test was performed comparing the PQS predictions of the shuffled genomes with the PQS prediction of the reference genome. Significance was expressed as a p-value.

4.2.3. Conservation of PQS

To calculate the percentage of conservation of each PQS the vmatchPattern function of Biostrings (version 2.66) was used, setting the parameter with.indels = TRUE, to count PQS with longer loops as conserved. The multiple aligned genomes were loaded onto the R platform and the number of times each predicted PQS was present in the aligned genomes was counted. The percentage of conservation was also calculated, taking into account the number of aligned genomes analysed. The conservation values were then plotted together with the density pattern and the PQS scores using Gviz (version 1.42.0) [51].

4.2.4. Annotation of Conserved PQS

Only PQSs with more than 80% conservation were annotated on viral genomes. GTF files were uploaded using rtracklayer (version 1.58.0) [52]. Annotation was performed using annotatr (version 1.24.0), with the length of each conserved PQS set as the minimum overlap [53]. The region preceding the first CDS was considered the “5’UTR”, while the region following the last CDS was considered the “3’UTR”. Bar graphs were generated using ggplot2 (version 3.4.0).

5. Conclusions

Arboviruses are a heterogeneous family of viruses that are transmitted to humans by arthropod vectors. This work has shown that many members of the family, mainly belonging to the Flavivirus subgroup, embed highly conserved PQSs in their genomes. The conserved PQSs are located in coding regions and at genome ends, reinforcing the critical role of G4s in the regulation of viral cycles. These findings pave the way for a broader understanding of the mechanisms regulating arbovirus infections and suggest that highly conserved PQSs may be novel and innovative antiviral targets against arbovirus infections.

Supplementary Materials

The supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms24119523/s1.

Author Contributions

Conceptualization, S.N.R. and I.F.; Methodology and Investigation, G.N. and I.F.; Data Curation, G.N.; Original Draft Preparation, I.F.; Writing—Review & Editing, I.F. and S.N.R.; Supervision, S.N.R.; Project Administration, S.N.R.; Funding Acquisition, I.F. and S.N.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by EU funding within the Next Generation EU-MUR PNRR Extended Partnership initiative on Emerging Infectious Diseases (Project no. PE00000007, INF-ACT).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in Supporting Materials.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Vector-Borne Diseases. Available online: https://www.ecdc.europa.eu/en/climate-change/climate-change-europe/vector-borne-diseases (accessed on 1 February 2023).
  2. Vector-Borne Diseases. Available online: https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases (accessed on 1 February 2023).
  3. Chala, B.; Hamde, F. Emerging and Re-Emerging Vector-Borne Infectious Diseases and the Challenges for Control: A Review. Front. Public Health 2021, 9, 715759. [Google Scholar] [CrossRef] [PubMed]
  4. Sigfrid, L.; Reusken, C.; Eckerle, I.; Nussenblatt, V.; Lipworth, S.; Messina, J.; Kraemer, M.; Ergonul, O.; Papa, A.; Koopmans, M.; et al. Preparing Clinicians for (Re-)Emerging Arbovirus Infectious Diseases in Europe. Clin. Microbiol. Infect. 2018, 24, 229–239. [Google Scholar] [CrossRef] [PubMed]
  5. Rocklöv, J.; Dubrow, R. Climate Change: An Enduring Challenge for Vector-Borne Disease Prevention and Control. Nat. Immunol. 2020, 21, 479–483. [Google Scholar] [CrossRef] [PubMed]
  6. LaBeaud, A.D.; Bashir, F.; King, C.H. Measuring the Burden of Arboviral Diseases: The Spectrum of Morbidity and Mortality from Four Prevalent Infections. Popul. Health Metr. 2011, 9, 1. [Google Scholar] [CrossRef] [PubMed]
  7. Mangat, R.; Louie, T. Arbovirus Encephalitides. In StatPearls; Internet, Updated 2023 Feb 19; StatPearls Publishing: Treasure Island, FL, USA, 2023. [Google Scholar]
  8. Launch of the Global Arbovirus Initiative. Available online: https://www.who.int/news-room/events/detail/2022/03/31/default-calendar/global-arbovirus-initiative (accessed on 1 February 2023).
  9. Varshney, D.; Spiegel, J.; Zyner, K.; Tannahill, D.; Balasubramanian, S. The Regulation and Functions of DNA and RNA G-Quadruplexes. Nat. Rev. Mol. Cell Biol. 2020, 21, 459–474. [Google Scholar] [CrossRef]
  10. Maizels, N. G4-Associated Human Diseases. EMBO Rep. 2015, 16, 910–922. [Google Scholar] [CrossRef]
  11. Frasson, I.; Pirota, V.; Richter, S.N.; Doria, F. Multimeric G-Quadruplexes: A Review on Their Biological Roles and Targeting. Int. J. Biol. Macromol. 2022, 204, 89–102. [Google Scholar] [CrossRef]
  12. Ruggiero, E.; Richter, S.N. Targeting G-Quadruplexes to Achieve Antiviral Activity. Bioorganic Med. Chem. Lett. 2023, 79, 129085. [Google Scholar] [CrossRef]
  13. Ruggiero, E.; Zanin, I.; Terreri, M.; Richter, S.N. G-Quadruplex Targeting in the Fight against Viruses: An Update. Int. J. Mol. Sci. 2021, 22, 10984. [Google Scholar] [CrossRef]
  14. Ruggiero, E.; Richter, S.N. Viral G-Quadruplexes: New Frontiers in Virus Pathogenesis and Antiviral Therapy. Annu. Rep. Med. Chem. 2020, 54, 101–131. [Google Scholar] [CrossRef]
  15. Métifiot, M.; Amrane, S.; Litvak, S.; Andreola, M.-L. G-Quadruplexes in Viruses: Function and Potential Therapeutic Applications. Nucleic Acids Res. 2014, 42, 12352–12366. [Google Scholar] [CrossRef] [PubMed]
  16. Bochman, M.L.; Paeschke, K.; Zakian, V.A. DNA Secondary Structures: Stability and Function of G-Quadruplex Structures. Nat. Rev. Genet. 2012, 13, 770–780. [Google Scholar] [CrossRef] [PubMed]
  17. Saranathan, N.; Vivekanandan, P. G-Quadruplexes: More Than Just a Kink in Microbial Genomes. Trends Microbiol. 2019, 27, 148–163. [Google Scholar] [CrossRef] [PubMed]
  18. Puig Lombardi, E.; Londoño-Vallejo, A. A Guide to Computational Methods for G-Quadruplex Prediction. Nucleic Acids Res. 2020, 48, 1603. [Google Scholar] [CrossRef]
  19. Bedrat, A.; Lacroix, L.; Mergny, J.-L. Re-Evaluation of G-Quadruplex Propensity with G4Hunter. Nucleic Acids Res. 2016, 44, 1746–1759. [Google Scholar] [CrossRef]
  20. Bartas, M.; Brázda, V.; Bohálová, N.; Cantara, A.; Volná, A.; Stachurová, T.; Malachová, K.; Jagelská, E.B.; Porubiaková, O.; Červeň, J.; et al. In-Depth Bioinformatic Analyses of Nidovirales Including Human SARS-CoV-2, SARS-CoV, MERS-CoV Viruses Suggest Important Roles of Non-Canonical Nucleic Acid Structures in Their Lifecycles. Front. Microbiol. 2020, 11, 1583. [Google Scholar] [CrossRef]
  21. Bohálová, N.; Cantara, A.; Bartas, M.; Kaura, P.; Šťastný, J.; Pečinka, P.; Fojta, M.; Mergny, J.-L.; Brázda, V. Analyses of Viral Genomes for G-Quadruplex Forming Sequences Reveal Their Correlation with the Type of Infection. Biochimie 2021, 186, 13–27. [Google Scholar] [CrossRef]
  22. Kabbara, A.; Vialet, B.; Marquevielle, J.; Bonnafous, P.; Mackereth, C.D.; Amrane, S. RNA G-Quadruplex Forming Regions from SARS-2, SARS-1 and MERS Coronoviruses. Front. Chem. 2022, 10, 1014663. [Google Scholar] [CrossRef]
  23. BBrázda, V.; Porubiaková, O.; Cantara, A.; Bohálová, N.; Coufal, J.; Bartas, M.; Fojta, M.; Mergny, J.-L. G-Quadruplexes in H1N1 Influenza Genomes. BMC Genom. 2021, 22, 77. [Google Scholar] [CrossRef]
  24. Brázda, V.; Kolomazník, J.; Lýsek, J.; Bartas, M.; Fojta, M.; Šťastný, J.; Mergny, J.-L. G4Hunter Web Application: A Web Server for G-Quadruplex Prediction. Bioinformatics 2019, 35, 3493–3495. [Google Scholar] [CrossRef]
  25. Kikin, O.; D’Antonio, L.; Bagga, P.S. QGRS Mapper: A Web-Based Server for Predicting G-Quadruplexes in Nucleotide Sequences. Nucleic Acids Res. 2006, 34, W676–W682. [Google Scholar] [CrossRef]
  26. Butovskaya, E.; Heddi, B.; Bakalar, B.; Richter, S.N.; Phan, A.T. Major G-Quadruplex Form of HIV-1 LTR Reveals a (3 + 1) Folding Topology Containing a Stem-Loop. J. Am. Chem. Soc. 2018, 140, 13654–13662. [Google Scholar] [CrossRef] [PubMed]
  27. Frasson, I.; Nadai, M.; Richter, S.N. Conserved G-Quadruplexes Regulate the Immediate Early Promoters of Human Alphaherpesviruses. Molecules 2019, 24, 2375. [Google Scholar] [CrossRef] [PubMed]
  28. Hon, J.; Martínek, T.; Zendulka, J.; Lexa, M. Pqsfinder: An Exhaustive and Imperfection-Tolerant Search Tool for Potential Quadruplex-Forming Sequences in R. Bioinformatics 2017, 33, 3373–3379. [Google Scholar] [CrossRef] [PubMed]
  29. ViralZone. Available online: https://viralzone.expasy.org/ (accessed on 28 February 2023).
  30. Home—Taxonomy—NCBI. Available online: https://www.ncbi.nlm.nih.gov/taxonomy (accessed on 1 February 2023).
  31. Jaafar, F.M.; Attoui, H.; Bahar, M.W.; Siebold, C.; Sutton, G.; Mertens, P.P.C.; De Micco, P.; Stuart, D.I.; Grimes, J.M.; De Lamballerie, X. The Structure and Function of the Outer Coat Protein VP9 of Banna Virus. Structure 2005, 13, 17–28. [Google Scholar] [CrossRef] [PubMed]
  32. Rampersad, S.; Tennant, P. Replication and Expression Strategies of Viruses. Viruses 2018, 55–82. [Google Scholar] [CrossRef]
  33. J Woolhouse, M.E.; Adair, K.; Brierley, L. RNA Viruses: A Case Study of the Biology of Emerging Infectious Diseases. Microbiol. Spectr. 2013, 1. [Google Scholar] [CrossRef]
  34. Te Velthuis, A.J.W.; Grimes, J.M.; Fodor, E. Structural Insights into RNA Polymerases of Negative-Sense RNA Viruses. Nat. Rev. Microbiol. 2021, 19, 303–318. [Google Scholar] [CrossRef]
  35. General Linear Model-an Overview | ScienceDirect Topics. Available online: https://www.sciencedirect.com/topics/mathematics/general-linear-model (accessed on 15 February 2023).
  36. Duffy, S. Why Are RNA Virus Mutation Rates so Damn High? PLoS Biol. 2018, 16, e3000003. [Google Scholar] [CrossRef]
  37. Mattenberger, F.; Vila-Nistal, M.; Geller, R. Increased RNA Virus Population Diversity Improves Adaptability. Sci. Rep. 2021, 11, 6824. [Google Scholar] [CrossRef]
  38. Peck, K.M.; Lauring, A.S. Complexities of Viral Mutation Rates. J. Virol. 2018, 92, e01031-17. [Google Scholar] [CrossRef]
  39. Lavezzo, E.; Berselli, M.; Frasson, I.; Perrone, R.; Palù, G.; Brazzale, A.R.; Richter, S.N.; Toppo, S. G-Quadruplex Forming Sequences in the Genome of All Known Human Viruses: A Comprehensive Guide. PLoS Comput. Biol. 2018, 14, e1006675. [Google Scholar] [CrossRef] [PubMed]
  40. Bidula, S.; Brázda, V. Genomic Analysis of Non-B Nucleic Acids Structures in SARS-CoV-2: Potential Key Roles for These Structures in Mutability, Translation, and Replication? Genes 2023, 14, 157. [Google Scholar] [CrossRef]
  41. Selisko, B.; Papageorgiou, N.; Ferron, F.; Canard, B. Structural and Functional Basis of the Fidelity of Nucleotide Selection by Flavivirus RNA-Dependent RNA Polymerases. Viruses 2018, 10, 59. [Google Scholar] [CrossRef]
  42. Vannutelli, A.; Perreault, J.-P.; Ouangraoua, A. G-Quadruplex Occurrence and Conservation: More than Just a Question of Guanine-Cytosine Content. NAR Genom. Bioinform. 2022, 4, lqac010. [Google Scholar] [CrossRef] [PubMed]
  43. Payne, S. Introduction to RNA Viruses. Viruses 2017, 97–105. [Google Scholar] [CrossRef]
  44. Kaptein, S.J.F.; Goethals, O.; Kiemel, D.; Marchand, A.; Kesteleyn, B.; Bonfanti, J.-F.; Bardiot, D.; Stoops, B.; Jonckers, T.H.M.; Dallmeier, K.; et al. A Pan-Serotype Dengue Virus Inhibitor Targeting the NS3–NS4B Interaction. Nature 2021, 598, 504–509. [Google Scholar] [CrossRef] [PubMed]
  45. Martinez, I.; Wertz, G.W. Biological Differences between Vesicular Stomatitis Virus Indiana and New Jersey Serotype Glycoproteins: Identification of Amino Acid Residues Modulating PH-Dependent Infectivity. J. Virol. 2005, 79, 3578–3585. [Google Scholar] [CrossRef]
  46. RefSeq: NCBI Reference Sequence Database. Available online: https://www.ncbi.nlm.nih.gov/refseq/ (accessed on 20 April 2023).
  47. Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2—A Multiple Sequence Alignment Editor and Analysis Workbench. Bioinformatics 2009, 25, 1189–1191. [Google Scholar] [CrossRef]
  48. Lawrence, M.; Huber, W.; Pagès, H.; Aboyoun, P.; Carlson, M.; Gentleman, R.; Morgan, M.T.; Carey, V.J. Software for Computing and Annotating Genomic Ranges. PLoS Comput. Biol. 2013, 9, e1003118. [Google Scholar] [CrossRef]
  49. Amezquita, R.A.; Lun, A.T.L.; Becht, E.; Carey, V.J.; Carpp, L.N.; Geistlinger, L.; Marini, F.; Rue-Albrecht, K.; Risso, D.; Soneson, C.; et al. Orchestrating Single-Cell Analysis with Bioconductor. Nat. Methods 2020, 17, 137–145. [Google Scholar] [CrossRef] [PubMed]
  50. Tremblay, B.J.-M.; Nystrom, S. Universalmotif: Import, Modify, and Export Motifs with R. Available online: https://bioconductor.org/packages/universalmotif/ (accessed on 22 March 2023).
  51. Hahne, F.; Ivanek, R. Visualizing Genomic Data Using Gviz and Bioconductor. Methods Mol. Biol. 2016, 1418, 335–351. [Google Scholar] [CrossRef]
  52. Lawrence, M.; Gentleman, R.; Carey, V. Rtracklayer: An R Package for Interfacing with Genome Browsers. Bioinformatics 2009, 25, 1841–1842. [Google Scholar] [CrossRef] [PubMed]
  53. Cavalcante, R.G.; Sartor, M.A. Annotatr: Genomic Regions in Context. Bioinformatics 2017, 33, 2381–2383. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Presence, density, score and conservation of PQSs in arboviruses. Plots representing the PQS density (red bars), the score (blue bars) and the conservation percentage (black dots) of each predicted PQS. The viral genome length is reported above the density plot. The Vesicular stomatitis virus non-Indiana strains have been abbreviated to Vesicular stomatitis virus non-Ind.
Figure 1. Presence, density, score and conservation of PQSs in arboviruses. Plots representing the PQS density (red bars), the score (blue bars) and the conservation percentage (black dots) of each predicted PQS. The viral genome length is reported above the density plot. The Vesicular stomatitis virus non-Indiana strains have been abbreviated to Vesicular stomatitis virus non-Ind.
Ijms 24 09523 g001
Figure 2. Conserved Genomic localization of PQSs in arboviruses. Plots reporting the annotation of highly conserved PQSs. Each viral genome was divided into three regions: Untranslated regions (5′ and 3′ UTRs) and coding sequences (CDS). PQSs were annotated on the basis of the official NCBI annotation of each viral Reference genome. In panels with long virus names, the word “segment” has been abbreviated to “seg”. The Vesicular stomatitis virus non-Indiana strains has been abbreviated to Vesicular stomatitis virus non-Ind.
Figure 2. Conserved Genomic localization of PQSs in arboviruses. Plots reporting the annotation of highly conserved PQSs. Each viral genome was divided into three regions: Untranslated regions (5′ and 3′ UTRs) and coding sequences (CDS). PQSs were annotated on the basis of the official NCBI annotation of each viral Reference genome. In panels with long virus names, the word “segment” has been abbreviated to “seg”. The Vesicular stomatitis virus non-Indiana strains has been abbreviated to Vesicular stomatitis virus non-Ind.
Ijms 24 09523 g002
Table 1. Analysed arboviruses data. The table reports the analysed viruses in alphabetical order. The columns indicate the virus name (Virus), the viral Genus/Family each virus belongs to (Genus, Family), the viral genomic nucleic acid type (Genome), the viral genomic structure (Genome structure), the name of each segment in case of segmented genomes (Segments), the analysed reference genome NCBI entry (Reference genome), the total number of analysed genomes/segments (Total analysed genomes and segments), the number of analysed sequences of each segment (Analysed segments), the average GC content of the Reference genomes and of the entire group of analysed genomes per virus (% GC Reference genomes and % GC all analysed genomes, respectively). Colors correspond to dsRNA (grey), negative-strand RNA ((-)ssRNA, yellow), positive-strand RNA ((+)ssRNA, blue) viruses. In segmented RNA viruses, the viral segments S, M and L are ordered by length.
Table 1. Analysed arboviruses data. The table reports the analysed viruses in alphabetical order. The columns indicate the virus name (Virus), the viral Genus/Family each virus belongs to (Genus, Family), the viral genomic nucleic acid type (Genome), the viral genomic structure (Genome structure), the name of each segment in case of segmented genomes (Segments), the analysed reference genome NCBI entry (Reference genome), the total number of analysed genomes/segments (Total analysed genomes and segments), the number of analysed sequences of each segment (Analysed segments), the average GC content of the Reference genomes and of the entire group of analysed genomes per virus (% GC Reference genomes and % GC all analysed genomes, respectively). Colors correspond to dsRNA (grey), negative-strand RNA ((-)ssRNA, yellow), positive-strand RNA ((+)ssRNA, blue) viruses. In segmented RNA viruses, the viral segments S, M and L are ordered by length.
VirusGenus, FamilyGenome Genome StructureSegmentsReference GenomeTotal Analysed Genomes and SegmentsAnalysed Segments% GC Reference Genomes% GC All Analysed Genomes
Australian bat lyssavirusLyssavirus, Rhabdoviridae(-)ssRNASingle linear RNA NC_003243.134 4443
Banna virusSeadornavirus, ReoviridaedsRNA12 Segmented RNAsSegment 1KC954611.112873839
Segment 2KC954612.174040
Segment 3KC954613.194037
Segment 4KC954614.184039
Segment 5KC954615.174039
Segment 6KC954616.1104240
Segment 7KC954617.1123735
Segment 8KC954618.184342
Segment 9KC954619.1373832
Segment 10KC95462183837
Segment 11KC954621.173939
Segment 12KC954622.183838
Barmah Forest virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_001786.139 4848
Bunyamwera virusOrthobunyavirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_001927.12184240
Segment MNC_001926.173736
Segment LNC_001925.163333
Bunyavirus La CrosseOrthobunyavirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_004111100394140
Segment MNC_004109.1343838
Segment LNC_004108.1273535
Bunyavirus snowshoe hareOrthobunyavirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_055198.11254540
Segment MNC_055197.143938
Segment LNC_055196.133535
Chandipura virusVesiculovirus, Rhabdoviridae(-)ssRNASingle linear RNA NC_020805.17 4242
Chikungunya virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_004162.2899 3636
Crimean-Congo hemorrhagic fever virusNairovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_005302.16422114640
Segment MNC_0053021964334
Segment LNC_005301.32354138
Dengue virus 1Flavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_001477.12095 4744
Dengue virus 2(+)ssRNASingle linear RNANC_001474.217644643
Dengue virus 3(+)ssRNASingle linear RNANC_001475.29924746
Dengue virus 4(+)ssRNASingle linear RNANC_0026412574746
Dhori virusThogotovirus, Orthomyxoviridae(-)ssRNA6 Segmented RNAsSegment 1NC_034261.13964545
Segment 2NC_034263.174545
Segment 3NC_034254.164444
Segment 4NC_034255.174847
Segment 5NC_034262.164848
Segment 6NC_034256.174949
Dugbe virusNairovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_004157.11474318
Segment MNC_004158.134241
Segment LNC_004159.143939
Eastern equine encephalitis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_003899.1455 4949
Isfahan virusVesiculovirus, Rhabdoviridae(-)ssRNASingle linear RNA NC_020806.12 4242
Japanese encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_001437328 5151
Langat virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_0036903 5454
Louping ill virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_00180928 5555
Mayaro virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_003417.141 5049
Murray Valley encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_00094317 4949
O’nyong-nyong virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_001512.17 4848
Oropouche virusOrthobunyavirus(-)ssRNA3 Segmented RNAsSegment SNC_005777.1174594741
Segment MNC_005775.1573535
Segment LNC_005776.1583435
Punta Toro phlebovirusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SDQ363406.145164140
Segment MDQ363407.1154039
Segment LMK896483.1143939
Rift Valley fever virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_014395.14532974948
774545
Segment MNC_014396.1
794443
Segment L
NC_014397.1
Ross River virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_001544.123 5151
Sagiyama virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA AB032553.12 5252
Sandfly fever Sicilian virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_015413.116104746
Segment MNC_015411.134443
Segment LNC_015412.134343
Sandfly fever Toscana virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_006318.195504745
Segment MNC_006321284544
Segment LNC_006319.1174444
Semliki Forest virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_003215.110 5352
Sindbis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_001547.1194 5150
St. Louis encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_00758014 5049
Tick-borne encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_001672.1190 5453
Tick-borne powassan virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_0036872 5353
Usutu virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_006551.1159 5150
Uukuniemi virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment SNC_005221.12485049
Segment MNC_005221104847
Segment LNC_005214.164746
Venezuelan equine encephalitis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_001449.1127 5049
Vesicular stomatitis virus strain IndianaVesiculovirus, Rhabdoviridae(-)ssRNASingle linear RNA NC_00156139 4241
Vesicular stomatitis virus non-Indiana strainsSingle linear RNA MT094111.172 4039
West Nile virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_009942.1/11840 5148
Western equine encephalitis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA NC_003908.138 4949
Yellow fever virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_002031246 5050
Zika virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA NC_012532556 5148
Table 2. Arboviruses PQSs frequency. The table reports the analysed viruses in alphabetical order. Columns indicate the virus name (Virus), the viral Genus/Family each virus belongs to (Genus, Family), the viral genomic nucleic acid type (Genome), the viral genomic structure (Genome structure), the name of each segment in case of segmented genomes (Segments), the total number of PQSs, the number of canonical (no bulges) and the number of bulged PQSs predicted in viral genomes (PQSs in viral genomes, Canonical PQSs and Bulged PQSs, respectively), the percentage of bulged PQSs on the total number of predicted PQSs (% bulged PQSs), The percentage of bulged and canonical PQSs conserved in more than 80% of analysed viral genomes (% conserved bulged PQSs and % conserved canonical PQSs, respectively), the rounded average total number of predicted PQSs on shuffled genomes and the statistical significance of the difference between the number of viral vs. shuffled PQSs (PQSs in shuffled genomes and p-values PQSs viral vs shuffled genomes, respectively). The symbols (), () and (=) indicate that the number of PQSs predicted in the viral genomes is higher, lower or equal to the number of PQSs predicted in the shuffled genomes. Colors correspond to dsRNA (grey), negative-strand RNA ((-)ssRNA, yellow), positive-strand RNA ((+)ssRNA, blue) viruses. In segmented RNA viruses, the viral segments (S, M and L) are ordered by length.
Table 2. Arboviruses PQSs frequency. The table reports the analysed viruses in alphabetical order. Columns indicate the virus name (Virus), the viral Genus/Family each virus belongs to (Genus, Family), the viral genomic nucleic acid type (Genome), the viral genomic structure (Genome structure), the name of each segment in case of segmented genomes (Segments), the total number of PQSs, the number of canonical (no bulges) and the number of bulged PQSs predicted in viral genomes (PQSs in viral genomes, Canonical PQSs and Bulged PQSs, respectively), the percentage of bulged PQSs on the total number of predicted PQSs (% bulged PQSs), The percentage of bulged and canonical PQSs conserved in more than 80% of analysed viral genomes (% conserved bulged PQSs and % conserved canonical PQSs, respectively), the rounded average total number of predicted PQSs on shuffled genomes and the statistical significance of the difference between the number of viral vs. shuffled PQSs (PQSs in shuffled genomes and p-values PQSs viral vs shuffled genomes, respectively). The symbols (), () and (=) indicate that the number of PQSs predicted in the viral genomes is higher, lower or equal to the number of PQSs predicted in the shuffled genomes. Colors correspond to dsRNA (grey), negative-strand RNA ((-)ssRNA, yellow), positive-strand RNA ((+)ssRNA, blue) viruses. In segmented RNA viruses, the viral segments (S, M and L) are ordered by length.
VirusGenus, FamilyGenomeGenome
Structure
SegmentsPQSs in Viral GenomesCanonical PQSs in Viral GenomesBulged PQS% Bulged PQSs% Conserved Bulged PQSs % Conserved Canonical PQSs PQSs in Shuffled Genomesp-Values PQSs Viral vs. Shuffled Genomes
Australian bat lyssavirusLyssavirus, Rhabdoviridae(-)ssRNASingle linear RNA 45 ()34112402.94401.150 × 10−30
Banna virusSeadornavirus, ReoviridaedsRNA12 Segmented RNAsSegment 12 ()200 061.15 × 10−33
Segment 24 ()31250061.66 × 10−13
Segment 36 ()42330055.23 × 10−45
Segment 41 ()100 043.12 × 10−37
Segment 52 ()0210050042.54 × 10−26
Segment 63 ()300 051.90 × 10−37
Segment 72 (=)200 021.13 × 10−4
Segment 81 ()100 033.68 × 10−23
Segment 90 ()000 016.77 × 10−24
Segment 102 ()11500011.74 × 10−3
Segment 113 ()300 012.73 × 10−3
Segment 122 ()11500016.55 × 10−18
Barmah Forest virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 74 ()6113186256591.24 × 10−43
Bunyamwera virusOrthobunyavirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S4 ()31250024.16 × 10−32
Segment M5 (=)32400050.65
Segment L5 ()41200042.92 × 10−4
Bunyavirus La CrosseOrthobunyavirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S4 ()312506724.95 × 10−22
Segment M6 ()335003373.01 × 10−5
Segment L8 ()71130061.83 × 10−8
Bunyavirus snowshoe hareOrthobunyavirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S3 ()213305043.54 × 10−4
Segment M9 ()722250073.72 × 10−10
Segment L5 ()23600043.54 × 10−4
Chandipura virusVesiculovirus, Rhabdoviridae(-)ssRNASingle linear RNA 46 ()3511241851306.10 × 10−54
Chikungunya virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 65 ()5411172728694.81 × 10−9
Crimean-Congo hemorrhagic fever virusNairovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S6 (=)42330060.96
Segment M23 ()1852200161.53 × 10−41
Segment L18 () 16 2 11 00261.70 × 10−32
Dengue virus 1Flavivirus, Flaviviridae(+)ssRNASingle linear RNA 61 ()52915015497.80 × 10−38
Dengue virus 2(+)ssRNASingle linear RNA 64 ()531117915446.40 × 10−55
Dengue virus 3(+)ssRNASingle linear RNA 69 ()541522720496.05 × 10−59
Dengue virus 4(+)ssRNASingle linear RNA 77 ()6413173119524.29 × 10−70
Dhori virusThogotovirus, Orthomyxoviridae(-)ssRNA6 Segmented RNAsSegment 110 ()82200085.81 × 10−14
Segment 25 ()41200071.18 × 10−12
Segment 37 ()700 063.73 × 10−6
Segment 411 ()10190076.27 × 10−37
Segment 54 ()400 071.10 × 10−20
Segment 65 (=)23600050.22
Dugbe virusNairovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S5 (=)41200050.45
Segment M12 (=)1200 25120.09
Segment L18 ()1531704050.45
Eastern equine encephalitis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 59 ()4712205868611.93 × 10−3
Isfahan virusVesiculovirus, Rhabdoviridae(-)ssRNASingle linear RNA 33 ()221133100100274.34 × 10−22
Japanese encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 101 ()821919011738.61 × 10−53
Langat virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 125 ()98272233431132.22 × 10−34
Louping ill virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 130 ()106241833371144.51 × 10−42
Mayaro virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 66 ()5214217.0702.49 × 10−12
Murray Valley encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 87 ()7989014664.61 × 10−55
O’nyong-nyong virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 53 ()4211213645594.84 × 10−17
Oropouche virusOrthobunyavirus(-)ssRNA3 Segmented RNAsSegment S4 ()40001634.99 × 10−10
Segment M2 ()200 5041.40 × 10−15
Segment L2 ()2000052.23 × 10−29
Punta Toro phlebovirusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S4 (=)22500647.81 × 10−2
Segment M6 ()33500083.06 × 10−10
Segment L12 ()11186050130.11
Rift Valley fever virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S8 ()7113865092.84 × 10−7
16 (=)11531017160.64
Segment M
Segment L24 ()1772900214.57 × 10−12
Ross River virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 76 ()6115203346771.61 × 10−2
Sagiyama virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 71 ()55162310098862.46 × 10−43
Sandfly fever Sicilian virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S7 ()61140092.03 × 10−8
Segment M17 ()125296050146.75 × 10−13
Segment L23 ()167308650205.13 × 10−13
Sandfly fever Toscana virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S7 ()611401781.04 × 10−6
Segment M17 ()1252900157.39 × 10−10
Segment L21 ()1652406.25220.13
Semliki Forest virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 95 ()7916178887926.37 × 10−4
Sindbis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 68 ()5216247569761.89 × 10−21
St. Louis encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 85 ()7213152310701.27 × 10−37
Tick-borne encephalitis virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 120 ()992118011112.26 × 10−23
Tick-borne powassan virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 123 ()10122181001001024.25 × 10−49
Usutu virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 92 ()7220225072797.96 × 10−42
Uukuniemi virusPhlebovirus, Bunyaviridae(-)ssRNA3 Segmented RNAsSegment S8 ()622500104.64 × 10−14
Segment M16 (=)10638010160.22
Segment L30 ()291310066281.85 × 10−6
Venezuelan equine encephalitis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 63 ()49142202691.17 × 10−18
Vesicular stomatitis virus strain IndianaVesiculovirus, Rhabdoviridae(-)ssRNASingle linear RNA 34 ()259263348261.98 × 10−35
Vesicular stomatitis virus non-Indiana strainsSingle linear RNA 29 ()227248695205.57 × 10−40
West Nile virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 88 ()7513153840815.40 × 10−17
Western equine encephalitis virusAlphavirus, Togaviridae(+)ssRNASingle linear RNA 55 ()4213246971644.96 × 10−25
Yellow fever virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 94 ()78161705.731.77 × 10−52
Zika virusFlavivirus, Flaviviridae(+)ssRNASingle linear RNA 101 ()8417171812792.12 × 10−56
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nicoletto, G.; Richter, S.N.; Frasson, I. Presence, Location and Conservation of Putative G-Quadruplex Forming Sequences in Arboviruses Infecting Humans. Int. J. Mol. Sci. 2023, 24, 9523. https://doi.org/10.3390/ijms24119523

AMA Style

Nicoletto G, Richter SN, Frasson I. Presence, Location and Conservation of Putative G-Quadruplex Forming Sequences in Arboviruses Infecting Humans. International Journal of Molecular Sciences. 2023; 24(11):9523. https://doi.org/10.3390/ijms24119523

Chicago/Turabian Style

Nicoletto, Giulia, Sara N. Richter, and Ilaria Frasson. 2023. "Presence, Location and Conservation of Putative G-Quadruplex Forming Sequences in Arboviruses Infecting Humans" International Journal of Molecular Sciences 24, no. 11: 9523. https://doi.org/10.3390/ijms24119523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop