ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article
Revised

Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins

[version 2; peer review: 2 approved]
Previously titled: Prediction of the Effects of Nonsynonymous Variants on SARS-CoV-2 Proteins
PUBLISHED 18 May 2022
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

This article is included in the Cell & Molecular Biology gateway.

Abstract

Background: SARS-CoV-2 virus is a highly transmissible pathogen that causes COVID-19. The outbreak originated in Wuhan, China in December 2019. A number of nonsynonymous mutations located at different SARS-CoV-2 proteins have been reported by multiple studies. However, there are limited computational studies on the biological impacts of these mutations on the structure and function of the proteins.  
Methods: In our study nonsynonymous mutations of the SARS-CoV-2 genome and their frequencies were identified from 30,229 sequences. Subsequently, the effects of the top 10 highest frequency nonsynonymous mutations of different SARS-CoV-2 proteins were analyzed using bioinformatics tools including co-mutation analysis, prediction of the protein structure stability and flexibility analysis, and prediction of the protein functions.  
Results: A total of 231 nonsynonymous mutations were identified from 30,229 SARS-CoV-2 genome sequences. The top 10 nonsynonymous mutations affecting nine amino acid residues were ORF1a nsp5 P108S, ORF1b nsp12 P323L and A423V, S protein N501Y and D614G, ORF3a Q57H, N protein P151L, R203K and G204R. Many nonsynonymous mutations showed a high concurrence ratio, suggesting these mutations may evolve together and interact functionally. Our result showed that ORF1a nsp5 P108S, ORF3a Q57H and N protein P151L mutations may be deleterious to the function of SARS-CoV-2 proteins. In addition, ORF1a nsp5 P108S and S protein D614G may destabilize the protein structures while S protein D614G may have a more open conformation compared to the wild type.  
Conclusion: The biological consequences of these nonsynonymous mutations of SARS-CoV-2 proteins should be further validated by in vivo and in vitro experimental studies in the future.

Keywords

SARS-CoV-2, nonsynonymous mutation, co-mutation, COVID-19

Revised Amendments from Version 1

The title of the paper and the abstract have been revised to improve the clarity of the writing. The introduction section with the examples of the effect of nonsynonymous mutations affecting the function of the viral protein has been included in the last paragraph. The information about the geographical distribution of the SARS-CoV-2 dataset with the date range has been added as extended data. The detail of the high coverage parameter has been included in the methods section. Additional information on GTF and VCF files are added in the methods. Both references for the scoring method of SIFT4G and PROVEAN have been included. The GTF and VCF files for SIFT4G analysis are deposited in Figshare. The frequency percentage of the top 10 nonsynonymous mutations in the primary lineages associated with the past and present variant of concern has been included in Table 3. Figure 2 has been revised with the additional information on amino acid changes. Co-mutation analysis in the discussion section has been expanded. Additional paragraph in the discussion section has been added to discuss the impact of our study and to explain why the study of some of the identified mutations remain relevant for the newer variants. Some citations have been deleted, added or updated accordingly.

See the authors' detailed response to the review by Ujwal Ranjit Bagal and Vishal Nayak
See the authors' detailed response to the review by In-Hee Lee

Introduction

A new coronavirus disease known as COVID-19, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was first reported in Wuhan, China in December 2019.1 SARS-CoV-2 is a positive-sense single stranded RNA virus with a helical nucleocapsid. The genome size of SARS-CoV-2 is about 30 kilobases. There are 11 protein-coding genes from the SARS-CoV-2 genome including four structural genes (spike (S), envelope (E), membrane (M), and nucleocapsid (N) genes) and seven nonstructural genes (ORF1ab, ORF3a, ORF6, ORF7a, ORF7b, ORF8 and ORF10).2

The SARS-CoV-2 virus can rapidly mutate to bypass the immune response of the host.3 These mutations can be synonymous, nonsynonymous, deletions, insertions, or others. Nonsynonymous mutations are expected to have a greater impact than synonymous mutations since nonsynonymous mutations affect the amino acid sequences of a protein, subsequently they may change their structures and functions. According to Kim et al. (2020), a total of 767 synonymous and 1352 nonsynonymous mutations have been identified from SARS-CoV-2 genomes.4 In another study, a total of 119 SNPs were identified using 11,183 SARS-CoV-2 genomes, in which there were 74 nonsynonymous mutations and 43 synonymous mutations.5 From a study on the analysis of nonsynonymous mutations in structural proteins of SARS-CoV-2, it has been shown that S and N proteins have higher mutation rate per gene compared to that of E and M proteins.6 The mutations of SARS-CoV-2 proteins may affect viral transmission, host immune evasion, and disease severity. Many SARS-CoV-2 S protein mutations, for example, D614G, N501Y have been heavily studied in detail because S protein facilitates viral entry into human host through the binding of ACE2 receptor.7 S protein D614G mutation increases viral transmission in animal model and human cell line study8 while S protein N501Y mutation results in an increase in binding to ACE2 receptor9 and an increase in viral replication in human upper-airway cells.10 However, the biological consequences of some of these mutations on the functions and structures of SARS-CoV-2 proteins remain unclear. In our study, computational analysis of the nonsynonymous mutations of SARS-CoV-2 proteins were performed using different bioinformatics tools including co-mutation analysis, protein structure stability and flexibility analysis, and protein function analysis to predict the effects of the mutations on the structures and functions of proteins.

Methods

Sequences and structures retrieval

The SARS-CoV-2 genomes data were downloaded from GISAID database (Global Initiative on Sharing All Influenza Data, RRID:SCR_018251).11 In this study, a total number of 30,229 SARS-CoV-2 virus genomes data with collection dates ranging from 2020-01-01 to 2021-03-21 were retrieved. The information on the geographical distribution of the SARS-CoV-2 dataset with the date range were summarized in a table as extended data.12 To make sure that only high-quality sequences were used, the filters including complete genome, high coverage (<1% Ns and <0.05% unique amino acid mutations (not seen in other sequences in database) and no insertion/deletion unless verified by submitter) and patient status, excluding low coverage were applied. The reference strain NC_045512.2 with a total number of 29903 bases was retrieved from NCBI database (NCBI, RRID:SCR_006472). The wild type protein structures obtained from RCSB PDB (Research Collaboratory for Structural Bioinformatics Protein Data Bank, RRID:SCR_012820) are listed in Table 1.13 Since N protein R203 and G204 are located at a disordered region which does not have a well-defined three-dimensional structure, no experimental structural data was available for the prediction analysis. A predicted model of N protein model (QHD43423, estimate TM-score = 0.97) generated with D-I-TASSER/C-I-TASSER pipeline was used.14

Table 1. SARS-CoV-2 protein structures used in this study.

ProteinNucleotide changesAmino acid changesTemplate structure (PDB ID)
ORF1a nsp5C10376TP108S7KPH
ORF1b nsp12C14408TP323L6YYT
C14708TA423V6YYT
SA23063TN501Y7A92
A23403GD614G7A92
ORF3aG25563TQ57H6XDC
NC28725TP151L6VYO
G28881AR203KQHD43423
G28882AR203KQHD43423
G28883CG204RQHD43423

Multiple sequence alignment of SARS-CoV-2 genomes

Multiple sequence alignment (MSA) was performed using rapid calculation in MAFFT (MAFFT, version 7.467, RRID:SCR_011811) which supports alignment for more than 20,000 sequences.15 After all SARS-CoV-2 sequences were aligned to the reference genome, the multiple sequence alignment file was visualized under MEGA X software, version 10.2.5 build 10210330 (MEGA Software, RRID:SCR_000667).

Identification of nonsynonymous mutations and the statistics of the mutation in the SARS-CoV-2 proteins

The 11 different coding sequences were extracted from these 30,229 strains according to their genomic positions in the reference strain (fasta file format) in NCBI, which is NC_045512.2. Inappropriate sequences of base calling errors, “N” unresolved nucleotides, and undefinable gaps were omitted. Then, the frequency and number of nonsynonymous mutations in these 30,229 strains were identified using a Python script. The frequency percentage of the top 10 nonsynonymous mutations in the primary lineages associated with the past and present variant of concern (VOC) were obtained from COVID CG (COVID CG, RRID:SCR_022266).16

Co-mutation analysis of SARS-CoV-2 proteins

The concurrence ratio of each nonsynonymous mutation in the SARS-CoV-2 genome was determined using GESS database (The Global Evaluation of SARS-CoV-2/hCoV-19 Sequences, RRID:SCR_021847)17 derived from GISAID web server. The concurrence search used for the analysis of the concurrence ratio in the top 10 nonsynonymous mutations is listed in Table 2. The frequency for each SNV in the concurrence search is greater than 0.1%. The chord diagram for co-mutations of nonsynonymous mutations in the SARS-CoV-2 genome was generated using Circos table viewer (Circos, RRID:SCR_011798).18

Table 2. Concurrence ratio of top 10 nonsynonymous mutations in SARS-CoV-2 proteins.

Coding region and amino acid changeORF1a nsp5 P108SORF1b nsp12 P323LORF1b nsp12 A423VS protein N501YS protein D614GORF3a Q57HN protein P151LN protein R203KN protein R203KN protein G204R
Coding region and amino acid changeNucleotide changeC10376TC14408TC14708TA23063TA23403GG25563TC28725TG28881AG28882AG28883C
ORF1a nsp5 P108SC10376T099.998.30.299.94.697.291.691.691.7
ORF1b nsp12 P323LC14408T99.9099.998.899.899.799.999.799.899.8
ORF1b nsp12 A423VC14708T98.399.900.199.90.898.398.698.698.6
S protein N501YA23063T0.298.80.1099.225.70.267.365.666.7
S protein D614GA23403G99.999.899.999.2099.910099.699.799.7
ORF3a Q57HG25563T4.699.70.825.799.901.10.40.30.2
N protein P151LC28725T97.299.998.30.21001.1098.198.198.1
N protein R203KG28881A91.699.798.667.399.60.498.1099.999.9
N protein R203KG28882A91.699.898.665.699.70.398.199.9099.9
N protein G204RG28883C91.799.898.666.799.70.298.199.999.90

Prediction of mutation effect on protein stability and flexibility

To predict the effects of the mutations on the stability and flexibility of the protein structure, the protein structures were analyzed with DynaMut server (DynaMut, RRID:SCR_021849).19 The free energy change between the wild type and mutant protein structure (ΔΔG) predicts the status of protein stability, in which the values of ΔΔG above zero indicate a good stabilization while any values below zero or negative indicate a destabilizing outcome. The difference in entropic energy between the wild type and mutant structures (ΔΔSVib ENCoM) predicts the status of protein flexibility, in which the values of ΔΔSVib ENCoM above zero indicate an increase in flexibility while any values below zero or negative indicate a decrease in flexibility.

Prediction of mutation effect on protein function

SIFT 4G (Sorting Tolerant From Intolerant For Genomes, RRID:SCR_021850)20 and PROVEAN (Protein Variation Effect Analyzer, RRID:SCR_002182)21 were used to predict the deleteriousness of the nonsynonymous single nucleotide polymorphisms (nsSNPs) on SARS-CoV-2 protein structure. SIFT 4G predicts the effects of the mutations based on the sequence conservation and amino acid properties. For SIFT 4G analysis, gene annotation file (GTF), fasta file containing the SARS-CoV-2 genome sequences was obtained from Ensembl (Ensembl, RRID:SCR_002344).22 Subsequently a variant call format file (VCF) comprising all the SNP of SARS-CoV-2 was obtained using SNP-sites tool (RRID:SCR_02226).12 After that, the SARS-CoV-2 genome database, built with the SIFT 4G algorithm, was created. Lastly, SIFT 4G annotator was applied to annotate the VCF file with SARS-CoV-2 genome database. Mutations with a SIFT 4G score of less than 0.05 were considered deleterious.20 PROVEAN predicts the effects of the mutations based on the principle of alignment-based score. For PROVEAN analysis, the amino acid sequence along with the amino acid variation were processed in the PROVEAN server to get the prediction result. Mutations with a value less than −2.5 were considered as deleterious.21

Results

The statistics of nonsynonymous mutations in SARS-CoV-2 proteins

From the multiple alignment analysis, we identified 231 nonsynonymous mutations from 30,229 SARS-CoV-2 genome sequences. Figure 1 shows the numbers of the nonsynonymous mutations found in 11 coding sequences of SARS-CoV-2 proteins. ORF1a has the highest numbers of nonsynonymous mutations, followed by S protein and N protein. The top 10 highest frequency nonsynonymous mutations affecting 9 amino acids residues including ORF1a nsp5 P108S, ORF1b nsp12 P323L and A423V, S protein N501Y and D614G, ORF3a Q57H, N protein P151L, R203K and G204R and their frequency percentage in the primary lineages associated with the past and present VOCs are shown in Table 3.

86af4060-7439-4df9-bf80-66d493d2e0cc_figure1.gif

Figure 1. The numbers of nonsynonymous mutations in 11 coding sequences of SARS-CoV-2 proteins.

Table 3. Top 10 nonsynonymous mutations of SARS-CoV-2 proteins and their frequency percentage in the primary lineages of VOCs.

ProteinNucleotide changesAmino acid changesFrequencyLineage (VOC)
B.1.1.7
(α)
B.1.351
(β)
P.1
(γ)
B.1.617.2 (δ)BA.1 (ο)
ORF1a nsp5C10376TP108S4024-----
ORF1b nsp12C14408TP323L27953100%90%99%100%100%
C14708TA423V3988-----
SA23063TN501Y421899%91%97%0%94%
A23403GD614G28022100%100%100%100%100%
ORF3aG25563TQ57H52740%98%0%0%0%
NC28725TP151L4007-----
G28881AR203K1811699%0%97%0%100%
G28882AR203K1809299%0%97%0%100%
G28883CG204R1809091%0%97%0%99%

Co-mutation analysis of SARS-CoV-2 proteins

Some nonsynonymous mutations may be random and have no or little biological impact on viral transmission and pathogenesis. If a single nonsynonymous mutation co-mutates with other mutations, they may evolve together and interact functionally. To study co-mutation between different nonsynonymous mutations, the concurrence ratio of co-mutations in the top 10 nonsynonymous mutations was retrieved from GESS database website as shown in Table 2. The visualization of co-mutations in the top 10 nonsynonymous mutations generated with Circos table view is shown in Figure 2. In this chord diagram, connection ribbons represent co-mutations and each ribbon between row and column segments represents the value of concurrence ratio in each top 10 nonsynonymous mutations. Single colours encoded in circular arranged segments represent its own specific mutation whereas rainbow colours represent co-mutation in each mutation. The size of circular arrangement segments is proportional to the total value of concurrence ratio in a row or column. The circular size segment of ORF3a Q57H (G25563T) with the smallest segment size means the total value of concurrence ratio in row or column of ORF3a Q57H (G25563T) having the lowest concurrence ratio. A high concurrence ratio shows high co-mutation between each mutation with thicker ribbon size. S protein D614G (A23403G) with all other nine nonsynonymous mutations had concurrence ratios greater than 99%. On the other hand, low concurrence ratio shows low co-mutation with thinner ribbon size, for example, mutation ORF3a Q57H (G25563T) had the lowest concurrence ratio, only having a high concurrence ratio with S protein D614G (A23403G) and ORF1b nsp12 (P323L) C14408T, the top 2 nonsynonymous mutations which were present in more than 90% of the reported sequences.

86af4060-7439-4df9-bf80-66d493d2e0cc_figure2.gif

Figure 2. Visualization of co-mutation in top 10 nonsynonymous mutations in SARS-CoV-2 proteins.

Prediction of mutation effect on protein stability and flexibility

Table 4 summarizes the results of predicted effects of mutations on protein stability and flexibility obtained from DynaMut. Only two mutations, namely ORF1a nsp5 P108S and S protein D614G were predicted to be destabilizing with ΔΔG values of −0.288 and −0.072, respectively. For the prediction of protein flexibility, only S protein D614G was predicted to have an increase in flexibility with an ΔΔSVib ENCoM value of 0.523.

Table 4. Prediction of nonsynonymous mutation effect on SARS-CoV-2 proteins stability.

ProteinMutationΔΔG (kcal/mol)Prediction outcomeΔΔSVib ENCoM (kcal.mol−1.K−1)Molecule flexibility
ORF1a nsp5P108S−0.288Destabilizing−0.208Decrease
ORF1b nsp12P323L1.784Stabilizing−0.432Decrease
A423V0.776Stabilizing−0.348Decrease
SN501Y0.013Stabilizing−0.088Decrease
D614G−0.072Destabilizing0.523Increase
ORF3aQ57H0.275Stabilizing−0.160Decrease
NP151L1.111Stabilizing−0.325Decrease
R203K0.749Stabilizing−0.107Decrease
G204R1.064Stabilizing−2.522Decrease

Prediction of mutation effect on protein function

The prediction results of nonsynonymous mutations in the SARS-CoV-2 proteins using SIFT 4G and PROVEAN are shown in Table 5. SIFT 4G functional missense mutation score predicted that the P108S mutation in ORF1a nsp5 was deleterious (score 0.00) while four mutations S protein D614G, ORF3a Q57H, N protein R203K and G204R were tolerated (>0.05). However, the SIFT 4G results of ORF1b nsp12 P323L and A423V, S protein N501Y and N protein P151L mutations cannot be obtained due to missing data in the Ensembl database. For the PROVEAN score, three nonsynonymous mutations, namely ORF1a nsp5 P108S, ORF3a Q57H and N protein P151L were predicted to be deleterious (score < −2.5). However, six nonsynonymous mutations, namely ORF1b nsp12 P323L and A423V, S protein N501Y and D614G, N protein R203K and G204R were predicted to be neutral (score > −2.5).

Table 5. Prediction of nonsynonymous mutation effect on SARS-CoV-2 proteins function.

ProteinMutationSIFT 4GProvean
ORF1a nsp5P108S0.00 (deleterious)−3.71 (deleterious)
ORF1b nsp12P323L-−0.91 (neutral)
A423V-1.21 (neutral)
SN501Y-−0.09 (neutral)
D614G1.00 (tolerated)0.60 (neutral)
ORF3aQ57H0.61 (tolerated)−3.29 (deleterious)
NP151L-−4.93 (deleterious)
R203K0.11 (tolerated)−1.60 (neutral)
G204R0.08 (tolerated)−1.66 (neutral)

Discussion

The top 10 highest frequency nonsynonymous mutations of SARS-CoV-2 identified from 30,229 SARS-CoV-2 genome sequences were further analyzed with co-mutation analysis, prediction of the protein structure stability and flexibility analysis, and prediction of the protein function analysis. To determine if two nonsynonymous mutations of SARS-CoV-2 proteins co-mutate, concurrence ratio was calculated. Many nonsynonymous mutations showed a high concurrence ratio, suggesting these mutations may evolve together and interact functionally. The top 2 nonsynonymous mutations, S protein D614G and ORF1b nsp12 P323L (as known as RNA-dependent RNA polymerase) showed very high concurrence ratio with other mutations since they emerged in the early phase of the pandemic.23 Previously it has been shown that S protein D614G co-evolved with ORF1b nsp12 P323L.23 The combination of both mutations may enhance viral fitness based on epidemiological data, although the molecular mechanisms of this evolutionary advantage remain elusive.23 Interestingly other mutations with high or medium concurrence ratio are found in more than 90% of the lineage B.1.1.7 in the alpha variant, in which its frequency peaked between March-May 2021. However, the ORF3a Q57H mutation with the lowest concurrence ratio is absent in the lineage B.1.1.7. Unfortunately, the information on the frequency percentage of ORF1a nsp5 P108S, ORF1b nsp12 A423V and N protein P151L mutations in the primary lineages of different VOCs are not available. In another study, it has been predicted that multiple SARS-CoV-2 genes may have epistatic interactions linked to viral fitness.24 The effects of a mutation can be neutral, harmful, or beneficial to the virus. It is expected that most single mutations have a small effect on viral fitness. It remains an arduous task to associate a specific phenotype with a single viral mutation since it is possible that a specific phenotype is contributed to by the effects of multiple mutations.

There are huge numbers of single nucleotide polymorphisms (SNPs) present in the SARS-CoV-2 genome, hence evaluating the biological functions of all SNPs using experimental approaches is not feasible. Therefore, prediction of the effects of SNPs allows us to prioritize variants which may have some significant biological functions. Our study used the meta-prediction approach to perform functional predictions of nonsynonymous mutations to minimize the false positive rate. When two or three tools are combined, the prediction accuracy increases and reaches greater performance, however, the sensitivity is subsequently decreased as more tools are combined.25

Of all these nine protein mutations, only two mutations namely ORF1a nsp5 P108S and S protein D614G were predicted to reduce their stability whereas only S protein D614G may have more a flexible conformation compared to the wild type. S protein binds to human ACE2 receptors to gain access to the host cell.7 D614G mutation is found at S1 domain which is involved in receptor binding.26 Two independent studies of S protein D614G mutant structures derived from cryo-electron microscopy analysis has demonstrated that the G614 mutant adopts a more open conformation compared to D614 wild type.27,28 Interestingly, an in vitro study has shown that S protein D614G mutation may enhance virus infectivity by promoting the packing of S protein into the virion, not by enhancing the binding of S protein to the ACE2 receptor.29 On the other hand, ORF1a nsp5, also known as 3C-like protease is responsible for cleaving viral polypeptides during replication.2 A study by Abe et al., (2021) has showed that ORF1a nsp5 protein P108S mutation associated with the clade, 20B-T (lineage B.1.1.284), diminished its activity, possibly leading to a reduction in disease severity.30

Since the protein function depends directly on the three-dimensional structure of the protein, we wanted to see if these mutations may affect the function of the protein using SIFT 4G and PROVEAN prediction tools. The PROVEAN tool is applicable for all organisms. SIFT4G, instead of SIFT was used since it allows us to build a SARS-CoV-2 genome database with variant annotation. Interestingly ORF1a nsp5 protein P108S mutation was the only mutation found to be deleterious from both SIFT4G and PROVEAN functional analysis. Together with the DynaMut stability result, it has been demonstrated that this mutation may be harmful to the virus itself, and can be less damaging to the human host as reported by Abe et al. (2022).30 On the other hand, ORF3a Q57H and N protein P151L mutations are predicted to be deleterious by the PROVEAN tool only. ORF 3a is an ion channel (viroporin) which is involved in viral egress steps through lysosomal trafficking.31,32 ORF3a Q57H mutation not only causes a change in amino acid in ORF3a, but also produces a truncated ORF3b due to the overlapping protein-coding sequences shared by ORF3a and ORF3b.33 However, there are conflicting results about the effect of the ORF3a Q57H mutation on the human host immune response.33,34 N protein is involved in the liquid-liquid phase separation for the viral genome packaging.35 N protein P151L mutation is located at the RNA binding domain. It has been proposed that this mutation may disrupt the protein-drug interaction.36 Although another two N protein mutations, R203K and G204R were not predicted to be deleterious in our study, they have been identified in the alpha variant, B.1.1.7, gamma variant, P.1, lambda variant, C.37 and omicron variant, BA.1.37 While N protein, T205I mutation has been reported in the beta variant, B.1.351 and Mu variant, B.1.621.37 More recently, another N protein mutation, R203M has been reported in the delta variant, B.1.617.2.37 Interestingly mutants with N protein S202R or R203M mutations can pack more RNA material compared to the wild type based on in vitro studies.38 These observations and experimental results suggest that N protein residues, S202, R203, G204 and T205 may play some role on viral RNA replication.

The SARS-CoV-2 virus genome data from GISAID database ranging from 1st January 20 to 22 March 21 were analyzed in this study. The frequency of alpha variant peaked around March-May 21 in most countries.37 Hence, it is not surprised that five nonsynonymous mutations, including ORF1b nsp12 P323L, S protein N501Y, S protein D614G, N protein R203K and N protein G204R identified in this study were also part of the defining mutations in the alpha variant.37 Note that ORF1b nsp12 P323L is the same mutation as ORF1b nsp12 P314L, nsp12 is located between ORF1a and ORF1b, and the last 9 overlapping amino acid residues from ORF1a, SADAQSFLN were included in ORF1b nsp12 P323L. The mutational profile of SARS-CoV-2 genome is changing very rapidly. However, it is out of the scope of this paper to monitor the mutational changes of SARS-CoV-2 genome since it is impossible to keep up with the exponential growth of these data. Interestingly two preprints on genomic surveillance analysis using specimens collected in late 2021 have showed that a small number of wild deer in North America carry the alpha or alpha-like variant.39,40 Although the alpha variant circulating in human population has been replaced by other variants, it remains to be seen if the alpha variant would jump back from animal to human. Furthermore, these five nonsynonymous mutations found in the alpha variant were also found in the current dominant variant, the omicron variant as shown in Table 3. Therefore, it is still relevant to study the consequences of these mutations.

Conclusion

In this study, ORF1a nsp5 P108S, S protein D614G, ORF3a Q57H and N protein P151L mutations have been predicted to alter their structures and/or functions. Since all the reported variants of concern contain multiple mutations present in multiple SARS-CoV-2 proteins, it is necessary to evaluate the impact of these mutations in combination on viral transmission and pathogenicity. The biological consequences of these nonsynonymous mutations of SARS-CoV-2 proteins should be further validated with in vivo and in vitro experimental studies in the future.

Ethics and dissemination

No ethical approval is required for data analysis in this study (EA0802021).

Author contribution

CHN contributes to the concept, design, supervision of the project. SBZ, WXB, YYY and SK contribute to the design, methodology, and data collection. SBZ, WXB, YYY and SK contributed to the analysis, and interpretation of data.

All authors were involved in drafting and revising the manuscript and approved the final version.

Data availability

Underlying data

SARS-CoV-2 virus genome sequence data were downloaded from the GISAID Database.

The geographical distribution of the SARS-CoV-2 genome dataset with the date range from 2020-01-01 to 2021-03-21 was summarized in a table and deposited in Figshare.

Figshare: Geographical Distribution (SARS-CoV-2). https://doi.org/10.6084/m9.figshare.19721716.v141

The additional multiple alignment data can be obtained from Figshare

Figshare: MSA (SARS-CoV-2). https://doi.org/10.6084/m9.figshare.16681900.v442

This project contains the following underlying data.

  • MSA_0 (31-12-2019 to 31-05-2020).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 31-12-2019 and 31-05-2020.

  • MSA_1 (01-06-2020 to 15-10-2020).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 01-06-2020 and 15-10-2020.

  • MSA_2 (16-10-2020 to 31-01-2021).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 16-10-2020 and 31-01-2021.

  • MSA_3 (01-02-2021 to 22-03-2021).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 01-02-2021 to 22-03-2021.

In SIFT4G analysis, we used GTF file containing the SARS-CoV-2 genome sequences, and VCF file comprising all the SNP of SARS-CoV-2.

Figshare: SIFT4G (SARS-CoV-2). https://doi.org/10.6084/m9.figshare.1969736543

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Software availability

The python script used for the identification of SARS-CoV-2 genome mutations can be obtained through GitHub (https://github.com/wxboon98/Mutations-Identification).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 06 Jan 2022
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Sia BZ, Boon WX, Yap YY et al. Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins [version 2; peer review: 2 approved] F1000Research 2022, 11:9 (https://doi.org/10.12688/f1000research.72904.2)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 18 May 2022
Revised
Views
9
Cite
Reviewer Report 09 Jun 2022
Ujwal Ranjit Bagal, Centers for Disease Control and Prevention, Atlanta, GA, USA 
Vishal Nayak, Centers for Disease Control and Prevention, Atlanta, USA;  Frederick National Laboratory for Cancer Research, Frederick, MD, USA 
Approved
VIEWS 9
The authors have addressed all our concerns in the revised version of the manuscript. The current version looks almost ready for indexing. 

Below are a few grammatical errors we want them to correct or rephrase the sentence. ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ranjit Bagal U and Nayak V. Reviewer Report For: Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins [version 2; peer review: 2 approved]. F1000Research 2022, 11:9 (https://doi.org/10.5256/f1000research.133846.r138228)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
11
Cite
Reviewer Report 27 May 2022
In-Hee Lee, Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA 
Approved
VIEWS 11
The revised article properly addresses ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Lee IH. Reviewer Report For: Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins [version 2; peer review: 2 approved]. F1000Research 2022, 11:9 (https://doi.org/10.5256/f1000research.133846.r138227)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 06 Jan 2022
Views
12
Cite
Reviewer Report 10 Mar 2022
In-Hee Lee, Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA 
Approved with Reservations
VIEWS 12
The authors examined 10 selected nonsynonymous mutations in SARS-CoV-2 genome for their predicted effect on protein stability as well as their co-mutations. Overall the paper is written well to understand the experiments and analysis results.

Methods:
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Lee IH. Reviewer Report For: Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins [version 2; peer review: 2 approved]. F1000Research 2022, 11:9 (https://doi.org/10.5256/f1000research.76515.r125647)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 09 May 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    09 May 2022
    Author Response
    The authors examined 10 selected nonsynonymous mutations in SARS-CoV-2 genome for their predicted effect on protein stability as well as their co-mutations. Overall the paper is written well to understand ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 09 May 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    09 May 2022
    Author Response
    The authors examined 10 selected nonsynonymous mutations in SARS-CoV-2 genome for their predicted effect on protein stability as well as their co-mutations. Overall the paper is written well to understand ... Continue reading
Views
23
Cite
Reviewer Report 28 Feb 2022
Ujwal Ranjit Bagal, Centers for Disease Control and Prevention, Atlanta, GA, USA 
Vishal Nayak, Centers for Disease Control and Prevention, Atlanta, USA;  Frederick National Laboratory for Cancer Research, Frederick, MD, USA 
Approved with Reservations
VIEWS 23
The authors in this paper have tried to use an extensive set of SARS-COV-2 genomes to identify and select the top 10 non-synonymous mutations for analyzing the co-mutation effect, as well as its effect on the stability and flexibility of ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ranjit Bagal U and Nayak V. Reviewer Report For: Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins [version 2; peer review: 2 approved]. F1000Research 2022, 11:9 (https://doi.org/10.5256/f1000research.76515.r123168)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 09 May 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    09 May 2022
    Author Response
    "The authors in this paper have tried to use an extensive set of SARS-COV-2 genomes to identify and select the top 10 non-synonymous mutations for analyzing the co-mutation effect, as ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 09 May 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    09 May 2022
    Author Response
    "The authors in this paper have tried to use an extensive set of SARS-COV-2 genomes to identify and select the top 10 non-synonymous mutations for analyzing the co-mutation effect, as ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 06 Jan 2022
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.