Fragment-based inhibitor design for SARS-CoV2 main protease

Andola, Priyanka; Pagag, Jishu; Laxman, Durgam; Guruprasad, Lalitha

doi:10.1007/s11224-022-01995-z

Fragment-based inhibitor design for SARS-CoV2 main protease

Original Research
Published: 04 July 2022

Volume 33, pages 1467–1487, (2022)
Cite this article

Download PDF

Structural Chemistry Aims and scope Submit manuscript

Fragment-based inhibitor design for SARS-CoV2 main protease

Download PDF

Priyanka Andola¹^na1,
Jishu Pagag¹^na1,
Durgam Laxman¹ &
…
Lalitha Guruprasad ORCID: orcid.org/0000-0003-1878-6446¹

1984 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

COVID-19 disease caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV2) has resulted in tremendous loss of lives across the world and is continuing to do so. Extensive work is under progress to develop inhibitors which can prevent the disease by arresting the virus in its life cycle. One such way is by targeting the main protease of the virus which is crucial for the cleavage and conversion of polyproteins into functional units of polypeptides. In this endeavor, our effort was to identify hit molecule inhibitors for SARS-CoV2 main protease using fragment-based drug discovery (FBDD), based on the available crystal structure of chromene-based inhibitor (PDB_ID: 6M2N). The designed molecules were validated by molecular docking and molecular dynamics simulations. The stability of the complexes was further assessed by calculating their binding free energies, normal mode analysis, mechanical stiffness, and principal component analysis.

An Overview of SARS-CoV-2 Potential Targets, Inhibitors, and Computational Insights to Enrich the Promising Treatment Strategies

Article 11 May 2024

Screening of Potential Inhibitors Targeting the Main Protease Structure of SARS-CoV-2 via Molecular Docking, and Approach with Molecular Dynamics, RMSD, RMSF, H-Bond, SASA and MMGBSA

Article 25 July 2023

COVID-19 outbreak: history, mechanism, transmission, structural studies and therapeutics

Article 04 September 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Coronaviruses comprise a large family of zoonotic viruses that have potential to infect humans via animal intermediaries [1]. Coronaviruses pose a continuous threat to the mankind because of their ability to emerge unpredictably and periodically, thereby spreading rapidly and inducing serious infectious pandemic disease. The world previously witnessed outbreaks of coronavirus disease caused by other similar species, namely SARS-CoV in 2002 and 2003 in China [2]; MERS-CoV in 2012 in the Middle East [3]; HCoV-OC43 [4, 5] in 1967 in Salisbury, UK; HCoV-229E [6] in 1966 in African bats; HCoV-HKU1 [7] in 2005 in Hong Kong; and HCoV-NL63 [8, 9] in 2002–2003 across Asia, Europe, and North America. During the last 2 years, the novel human coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) that causes coronavirus disease 2019 (COVID-19) resulting in an acute infection in the respiratory system [1], has become a global pandemic. All these viruses belong to the subfamily Coronavirinae in a family Coronaviridae. These are nonsegmented, enveloped, positive‐sense single‐stranded RNA virus genomes [10], and the human coronaviruses have the genomic size ranging from 29 to 31 kilobases. The RNA genome of the human SARS-CoV2 is 82% identical to that of human SARS-CoV [11] and shares considerable similarity in the genome; hence, it has been named as SARS-CoV2 [12]. The SARS-CoV2 genome encodes open-reading frames (ORFs) ORF1a and ORF1ab that encode for polyproteins. These polyproteins are further processed into 16 nonstructural proteins (NSPs), NSP1 to NSP16, which constitute the replicase-transcriptase complex. Various enzymes including the main protease also called as 3-chymotrypsin-like protease (NSP5), papain-like protease (NSP3), and RNA-dependent RNA polymerase (NSP12) are present in this complex among other proteins [13]. The virion contains a nucleocapsid made up of genomic RNA and phosphorylated nucleocapsid protein, buried inside the phospholipid bilayers and surrounded by spike protein on the membrane of virus. A type-3 transmembrane glycoprotein also known as membrane protein and the envelope protein are situated among the spike proteins in the virus envelope [10].

The most prominent symptom reported in SARS-CoV2 is respiratory distress with some patients requiring intensive care, high-flow oxygen therapy, and invasive and noninvasive ventilation. The condition could worsen due to respiratory failure leading to the death of the patient [14]. It was also observed that patients with comorbidity pose a greater risk of developing severe infection which may also result in death of an individual [15]. More than 530 million COVID-19 cases and close to 6.29 million deaths have been reported during the last 2 years and 5 months. Despite the availability of vaccines and large populations in various countries being already vaccinated, the re-emergence of SARS-CoV2 in the mutated forms as witnessed in the case of beta [16] and omicron [17] variants poses a continuous threat of COVID-19 disease. There have been continuous research efforts to develop antiviral agents against members of Coronaviridae family which suggested that the human receptor, angiotensin-converting enzyme 2 that facilitates entry of virus into the host, RNA-dependent RNA polymerase and main protease, helicase, and papain-like protease, may be represented as suitable drug targets [11, 18,19,20,21,22,23].

Considering the functional significance of the main protease, i.e., to carry out processing of polyproteins into NSPs, in the viral life cycle, and combined with the absence of closely related homologues in humans, main protease serves as an attractive target for the design of antiviral drugs. Inhibiting the viral activity of this enzyme would block the replicase-transcriptase complex and therefore disrupt the viral life cycle. These targets are a choice for inhibition purpose also in SARS-CoV and MERS-CoV, and hence, constant work has been carried out either by computational methods or experimental investigations to find novel molecules which can interfere with virus life cycle by inhibiting the main protease. For example, glycosylated flavonoids display a strong inhibitory activity on SARS-CoV2 main protease [24]. Some natural polyphenolic compounds, i.e., glucogallin, mangiferin, and phlorizin, may have the ability to be used as safe protease inhibitors, which may act by inhibiting type-2 transmembrane serine protease and main protease [25]. In silico approaches have identified three FDA-approved drugs, namely remdesivir, saquinavir, darunavir, and two small molecules such as flavone and coumarin derivatives that act as potential inhibitors of human SARS-CoV2 main protease [26]. Few drugs were identified and tested for treatment of COVID-19 disease such as chloroquine, lopinavir, nafamostat, hydroxychloroquine, ritonavir, camostat, corticosteroids, and sarilumab [27]. Also a combination of drugs such as noscapine and hydroxychloroquine demonstrated a strong binding affinity towards main protease [28]. Numerous drug candidates such as ebselen, disulfiram, carmofur, α-ketoamides, and peptidomimetic aldehydes 11a/11b have been discovered which inhibit the main protease activity by providing interference with the maturation process of NSPs [11, 22, 29, 30]. Computational studies have identified some of the NCI natural products as SARS-CoV2 main protease inhibitors [31].

The present work focuses on finding suitable inhibitors for the main protease in SARS-CoV2 using the computational methodologies in fragment-based drug discovery (FBDD). The FBDD methodology initially employs virtual screening of chemical fragments with low molecular weight and less chemical complexity and targets the sub-pockets within the receptor binding site [32]. The selection of fragments in a library generally follows “rule of three” criterion, i.e., molecular weight $\le$ 300, hydrogen bond donor and acceptor $\le$ 3, and lipophilicity index (CLogP) $\le$ 3 [33, 34]. Furthermore, it is desirable that the number of rotatable bonds is ≤ 3, and the polar surface area (PSA) is ≤ 60 [35]. Such fragment hits are considered to be more suitable before the “hit-to-lead optimization” since they have reduced complexity, thereby allowing more freedom for optimization of various properties of the fragment hits in a multidimensional manner [32]. Some attractive features like favorable physical, pharmacokinetics, and toxicity properties can be integrated in the development of fragments [36]. These fragments function as building blocks, combined into compounds that typically recognize the protein binding site. The process of optimization of fragments can be done by linking the two independent fragments together or by adding functional groups in each step to make improvements to the molecules [32]. These compounds are further elaborated such that they bind the receptor via several nonbonding interactions. Methodologies in FBDD have rapidly led to the identification of several FDA-approved drugs such as INQOVI, erdafitinib [37], ribociclib, vemurafenib [38], pexidartinib [39], and venetoclax [40,41,42].

The inhibitor 3WL complexed with SARS-CoV2 main protease was used as the reference structure on which FDA-approved drug fragments were employed in order to obtain new lead molecules that could bind the active site of the receptor. Molecular docking, molecular dynamics (MD) simulations, and trajectory analyses were employed in order to propose new hit molecules to bind the SARS-CoV2 main protease.

Materials and methods

Protein preparation

The crystal structure of SARS-CoV2 main protease was solved at 2.2 Å resolution, and its three-dimensional structure complexed with inhibitor is deposited in the protein structure databank (PDB) with the PDB_ID: 6M2N [43]. The crystal structure of the main protease is a homodimer; chain A was selected for the computational studies as it did not have any missing residues. Crystal waters were deleted, hydrogen atoms were added to complete the valency, and geometry optimization of the structure was done using minimize structure tool in the UCSF Chimera [44] version 1.12 for further studies.

Inhibitor design

We used Auto Core Fragment in silico Screening (ACFIS) 2.0 web server available at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/ [45] that comprises FDA drug fragment database with 2883 fragment molecules for FBDD [45, 46]. This server works on the principle of pharmacophore-linked fragment virtual screening approach. ACFIS has three computational modules. PARA_GEN is a tool to generate parameters for ACFIS using the AMBER tools; CORE_GEN tool considers the protein–ligand complex and proceeds to deconstruct the ligand structure into fragments by cleavage at single bond based on retrosynthetic analysis by using decomposition and identification of molecules program [47] to derive core fragment structure from a bioactive ligand. CAND_GEN is a tool to link fragments to the core fragment structure and generate candidate molecules. The fragments in the library will be linked to the junction of the core fragment structure at the position of cleavage. ACFIS server creates new molecules based on growing algorithm by determining the core fragment structure followed by automatic linking of each of the fragments for improving the binding affinity. Finally, the molecules are ranked based on their binding affinities. Output structure files and related data were downloaded. These molecules thus identified were used for further molecular docking studies.

Molecular docking

Molecular docking is a tool to predict molecular recognition between a small molecule and a target macromolecule in terms of binding modes and binding affinity. Molecular docking involves generating the most favorable binding mode of a ligand to its target macromolecule. Virtual screening of molecules by molecular docking was performed by using PyRx [48] and AutoDock 4.2 [49] software. Both the docking methods were validated by docking 3WL into the active site of main protease 6M2N as seen in Supplementary Fig. 1A, B. The interacting residues within a radius of 5 Å with respect to reference molecule 3WL were considered for generation of grid box. The protein macromolecule and ligands were imported using input molecule tools in PyRx and were converted from .pdb into .pdbqt format. Grid box was generated and adjusted so that it includes the active site residues at the binding site, with the exhaustiveness value of 10. The results were analyzed for determining the best conformers with lowest binding energy (kcal/mol) and hydrogen bond interactions with the active site residues. The result output was exported to a file for visualization and analysis purpose using discovery studio visualizer.

The best ranked molecules were subjected to another round of docking studies using AutoDock 4.2 tools in order to dock them inside a 5 Å cavity defined around the reference molecule 3WL. The protein was cleaned by deletion of heteroatoms and water molecules, torsions were set, polar hydrogens were added, and structure was saved in .pdbqt format. Similarly, the designed molecules were loaded, and torsions were set and saved in .pdbqt format. The calculations for protein–ligand flexible docking were performed using the Lamarckian genetic algorithm. The maximum number of generations simulated during each GA run was set to 27,000, and the rates of the gene mutation and crossover were set to 0.02 and 0.8, respectively. The threshold RMSD value was set to 2 Å, and the number of docking poses generated for each molecule was set to 10 for AutoDock as well as PyRx. A grid box with the dimensions of X: −32.464 Å, Y: −62.314 Å, and Z: 43.532 Å and a grid spacing of 0.375 Å was used with exhaustiveness value of 10. The conformers obtained were ranked based on the nature of interactions, root-mean-square deviation (RMSD) values, and binding affinity scores.

Drug likeness

The absorption, distribution, metabolism, and excretion (ADME) parameters describe the pharmacokinetic properties associated with the ligand molecules and have a crucial role to play in the process of drug discovery and development. The ADME properties of the docked molecules were measured using the SwissADME server [50]. SwissADME server is a free webtool to predict the drug-likeness properties of small molecules; it displays the pharmacokinetic properties in the form of topological polar surface area (TPSA), consensus LogP_O/W, Log S, and Log K_P, and synthetic accessibility values.

Molecular dynamics simulations

Proteins are inherently flexible biomolecules which combine with a ligand to form the protein–ligand complex; this complex attains stability mediated by various nonbonding interactions such as van der Waals (vdW), electrostatic (el), hydrogen bonding, and hydrophobic. These kinds of interactions can be studied using MD simulations. Proteins identified by X-ray diffraction consider static conditions, while the MD simulations can be used to find the consistency in non-covalent interactions and the stability of protein–ligand complexes. The MD simulation studies were run for the apo form of main protease as well as with the reference molecule bound in the complex form using GROMACS 5.1.4 [51] software for 250 ns. The force fields for protein were generated using AMBER ff99SB [52], and force fields for hit molecules and reference ligand were generated in antechamber [53] using ACPYPE [54] scripts. The complexes were immersed in a cubic box filled with water molecules as simple point charge (SPC) [55] model. The neutralization of the system was done by adding Na⁺ and Cl⁻ ions. This was followed by energy minimization using steepest descent method for 50,000 steps to relieve the system from short-range bad contacts. The equilibration of the system is performed in two steps by allowing the system to get equilibrated until it reaches 300-K temperature in the constant number, volume and temperature (NVT) step and also until it attains the suitable density in constant number, pressure and temperature (NPT) (1 atm, 300 K) step. The NVT and NPT equilibration were performed for 100 ps timescale to energy minimize the system. For temperature and pressure couplings, V-rescale thermostat [56] and Parrinello and Rahman methods [57] were used respectively. The long-range electrostatics were handled using the particle mesh Ewald method [58]. The Lennard–Jones interactions, real space interactions, were truncated at 9 Å, and hydrogen bonds were constrained using LINCS algorithm [59]. The final production run was performed for 250 ns timescale involving 125 million steps. The final models for all molecular systems were obtained by averaging the snapshots generated from the trajectories of MD simulations. The RMSD and root-mean-square fluctuations (RMSF) were calculated using gmx rms and gmx rmsf commands, respectively, of GROMACS to study the conformational variations in the protein bound to reference and hit molecule complexes.

Binding free energies

The interaction energy between the protein and ligand is vital to study about the overall stability of the complex and to assess how well the ligand fits into the protein binding site. Also, the contribution from amino acid residues which form the ligand binding site in complex formation can be estimated from binding free energy. The Poisson-Boltzmann or Generalized Born and Surface Area Continuum Solvation (MM-PBSA and MM-GBSA) [60] are methods that can estimate the free energy of binding of small molecules to the protein. The output trajectories from GROMACS with the g_mmpbsa tool were used to calculate binding free energies for all the complexes. The linear interaction energy (LIE) method [61], which is based upon estimations of force fields in the receptor-ligand interactions and thermal conformational sampling, was used for calculation of ligand binding free energies. The gmx lie tool was utilized to compute free energy estimate based on analysis from nonbonded energies.

Generally, the binding free energy of a protein in complex with a ligand is expressed as [62, 63]:

$$\Delta \mathrm{G_{binding}} = \mathrm{G_{complex}} - \mathrm{(G_{protein}}+ {G_{ligand})}$$

Here, G_complex is the total free energy of protein–ligand complex, and G_protein and G_ligand are the total free energies of isolated forms of protein and ligand in the solvent, respectively.

In the LIE approach, the binding free energy is estimated according to the following equation and is divided into polar and nonpolar contributions [64].

$$\Delta \mathrm {G_{bind}} = \alpha\Delta\langle\mathrm{ {V_{l-s}}^{vdW}\rangle} + \beta\Delta\langle \mathrm{V_{l-s}}^{\mathrm {el}}\rangle + \gamma$$

where ⟨⟩ depicts the MD averages for the nonbonded vdW and el interactions of the ligand with its surrounding environment (l-s). The Δ's depict change in the MD averages when ligand is transferred from solution (free state) to the binding site of the solvated receptor (bound state). The coefficients α and β corresponding to parameters for nonpolar and polar interactions, respectively, are scaling factors for these energies, and γ is a constant term which could also be implemented as surface-area dependent [65, 66]. The values taken for α, β, and γ were 0.181, 0.5, and 0, respectively.

Normal mode analysis and mechanical stiffness

Protein dynamics is very essential to understand molecular events in a cell such as ligand recognition, binding, and transport [67]. The ensembles of structures can provide information about the structural variations important for biological activity [68, 69]. We used software suite of programs in ProDy [67] to develop the elastic network model-based normal mode analysis (NMA), and dihedral angles were taken as independent variables for all molecular systems. The mechanical stiffness plots were developed for all molecular systems using anisotropic network model [70].

Principal component analysis

Principal component analysis (PCA) is a covariance-matrix-based mathematical technique to find global and correlated motions in atomic simulations of protein [71]. The overall motion of main protease was assessed through PCA technique by generating a 3N × 3N covariance matrix followed by its diagonalization to construct eigenvectors. The technique was employed for all the simulated molecular systems using MODE-TASK [72], and calculations were carried out from 0 to 250 ns MD simulations trajectories.

Results and discussions

Overall fold and active site of SARS-CoV2 main protease

Several crystal structures of SARS-CoV2 main protease in complex with peptide-based inhibitors are available in the Protein Data Bank (PDB) with IDs as follows: 6XMK, 6XBH, 7BRP, 7CB7, and 7CBT. We have selected the crystal structure of SARS-CoV2 main protease in complex with a small molecule inhibitor (3WL), having PDB_ID: 6M2N [43] as the reference system. The three-dimensional structure of SARS-CoV2 main protease contains three domains. The domain 1 (8–101 amino acid residues) and domain 2 (102–184) fold into β-barrels and bear close resemblance to the chymotrypsin, whereas domain 3 (201–306) is mainly composed of α-helices. The substrate binding region, which is located at the cleft of domains 1 and 2, comprises of catalytic dyad His41 and Cys145, in which Cys serves as a nucleophile, while His acts as a proton acceptor. There are two deeply buried subsites (S1 and S2) and three shallow subsites (S3–S5) in addition to the catalytic site that also contribute to the active site of main protease. The S1 subsite is composed of Phe140, Gly143, Cys145, His163, Glu166, and His172, and S2 subsite is composed of Thr25, His41, and Cys145 amino acid residues. The shallow subsites S3–S5 are formed by His41, Met49, Met165, Glu166, and Gln189 amino acid residues [26]. This molecule makes multiple hydrogen bonds with side chain and main chain atoms in the protein active site. The small molecule 3WL chromene moiety forms hydrogen bonding interactions with residues Leu141, Gly143, Ser144, and Glu166. The phenyl ring moiety is embedded in a hydrophobic pocket formed by His41, Cys44, Met49, and Tyr54 as shown in Fig. 1.

Inhibitor design and molecular docking

Here, we used SARS-CoV2 main protease complexed with 3WL molecule. The molecule 3WL was divided into two fragments (as shown in Fig. 2): fragment 1 (chromene ring) and fragment 2 (phenyl ring) that have binding affinity values −15.73 kcal/mol and −4.95 kcal/mol and ligand efficiency values (LE) 1.12 (fragment 1) and 0.82 (fragment 2) as shown in Supplementary Table 1 using ACFIS server. The molecular size can be eliminated by using LE rather than potency. The LE of fragment 1 (1.12) is relatively higher than the fragment 2 (0.82). However, comprehensively considered with Gbind, LE, and new fragment growing spaces, fragment 2 was selected as a core fragment to perform CAND_GEN calculation. In this molecule, also, since the 5,6,7-trihydroxy-2-phenyl-4H-chromen-4-one chromene ring makes several hydrogen bonding interactions with the protein active site, only the phenyl ring was searched for FBDD. The screening and linking was performed at fragment 2 (phenyl) ring position. The screening of FDA-approved fragment library with 2883 fragment molecules with integrated energy calculations identified 29 fragments. These fragments were successfully joined with the chromene ring in fragment 1 to form 29 new molecules. The output coordinates were downloaded and used for docking studies. The 29 hit molecules obtained by FBDD approach using ACFIS server were docked into the active site of main protease using PyRx and AutoDock to study their binding interactions that are shown in Table 1 and Supplementary Table 2. Also, the cluster RMSD mostly ranged from 0.00 to 1.76 Å, wherein the lowest energy docking pose from the largest cluster was selected. In other terms, the best conformer from the possible generated conformers was selected based on RMSD value and the binding affinity score. The ten best molecules selected from both the docking methods show binding affinity in the range −9.0 to −6.8 kcal/mol (AutoDock) and −8.5 to −6.7 kcal/mol (PyRx), while the reference molecule 3WL had docking score of −7.7 and −7.2 kcal/mol using AutoDock and PyRx, respectively, as shown in Table 1. The intermolecular interactions such as hydrogen bonding, vdW, el, and hydrophobic observed due to the docked hit molecules within the active site residues such as His41, Leu141, Asn142, Gly143, Ser144, Cys145, and Glu166 also favored the selection of best conformers. The best ten molecules from the docking studies were used for drug-likeness studies.

Table 1 Binding affinity (kcal/mol) and interacting residues in the docking of 3WL and ten best hit molecules into SARS-CoV2 main protease. The residues that make hydrogen bonds are highlighted in bold

Full size table

Drug-likeness properties

The ADME properties of the selected 10 molecules and reference molecule are tabulated in Table 2. The synthetic accessibility values for hit molecules lie within 2.92–4.03 which is indicative of their ease of synthesis. The topological surface area lies between 90.90 and 120 Å², and the lipophilicity parameter consensus LogP is less than 3. Log S values for the hit molecules are comparable with the reference molecule which indicates that their water solubility is similar with the reference molecule. The skin permeation of hit molecules expressed as Log K_pvalue is also in good agreement with respect to reference molecule. All the qualified molecules were used for further stability studies by MD simulations.

Table 2 Pharmacokinetic properties of reference molecule, 3WL and four hit molecules

Full size table