Abstract
SARS-CoV-2 coronavirus has caused a world-wide crisis with profound effects on both healthcare and the economy. In order to combat the COVID-19 pandemic, research groups have shared viral genome sequence data through the GISAID initiative. We collected and computationally profiled ∼223,000 full SARS-CoV-2 proteome sequences from GISAID over one year for emergent nonsynonymous mutations. Our analysis shows that SARS-CoV-2 proteins are mutating at substantially different rates, with most viral proteins exhibiting little mutational variability. As anticipated, our calculations capture previously reported mutations occurred in the first period of the pandemic, such as D614G (Spike), P323L (NSP12), and R203K/G204R (Nucleocapsid), but also identify recent mutations like A222V and L18F (Spike) and A220V (Nucleocapsid). Our comprehensive temporal and geographical analyses show two periods with different mutations in the SARS-CoV-2 proteome: December 2019 to June 2020 and July to November 2020. Some mutation rates differ also by geography; the main mutations in the second period occurred in Europe. Furthermore, our structure-based molecular analysis provides an exhaustive assessment of mutations in the context of 3D protein structure. Emerging sequence-to-structure data is beginning to reveal the site-specific mutational tolerance of SARS-CoV2 proteins as the virus continues to spread around the globe.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- MR
- Mutation Rate
- S
- Spike
- N
- Nucleocapsid
- E
- Envelope
- M
- Membrane
- RdRp
- RNA-dependent RNA polymerase
- PLpro
- Papain-like protease
- Mpro
- Main protease