Optimized SQE atomic charges for peptides accessible via a web application

Schindler, Ondřej; Raček, Tomáš; Maršavelski, Aleksandra; Koča, Jaroslav; Berka, Karel; Svobodová, Radka

doi:10.1186/s13321-021-00528-w

Research article
Open access
Published: 30 June 2021

Optimized SQE atomic charges for peptides accessible via a web application

Ondřej Schindler^1,2^na1,
Tomáš Raček^1,2,3^na1,
Aleksandra Maršavelski⁴,
Jaroslav Koča^1,2,
Karel Berka⁵ &
…
Radka Svobodová ORCID: orcid.org/0000-0002-3840-8760^1,2

Journal of Cheminformatics volume 13, Article number: 45 (2021) Cite this article

2377 Accesses
2 Citations
1 Altmetric
Metrics details

A Correction to this article was published on 14 July 2021

This article has been updated

Abstract

Background

Partial atomic charges find many applications in computational chemistry, chemoinformatics, bioinformatics, and nanoscience. Currently, frequently used methods for charge calculation are the Electronegativity Equalization Method (EEM), Charge Equilibration method (QEq), and Extended QEq (EQeq). They all are fast, even for large molecules, but require empirical parameters. However, even these advanced methods have limitations—e.g., their application for peptides, proteins, and other macromolecules is problematic. An empirical charge calculation method that is promising for peptides and other macromolecular systems is the Split-charge Equilibration method (SQE) and its extension SQE+q0. Unfortunately, only one parameter set is available for these methods, and their implementation is not easily accessible.

Results

In this article, we present for the first time an optimized guided minimization method (optGM) for the fast parameterization of empirical charge calculation methods and compare it with the currently available guided minimization (GDMIN) method. Then, we introduce a further extension to SQE, SQE+qp, adapted for peptide datasets, and compare it with the common approaches EEM, QEq EQeq, SQE, and SQE+q0. Finally, we integrate SQE and SQE+qp into the web application Atomic Charge Calculator II (ACC II), including several parameter sets.

Conclusion

The main contribution of the article is that it makes SQE methods with their parameters accessible to the users via the ACC II web application (https://acc2.ncbr.muni.cz) and also via a command-line application. Furthermore, our improvement, SQE+qp, provides an excellent solution for peptide datasets. Additionally, optGM provides comparable parameters to GDMIN in a markedly shorter time. Therefore, optGM allows us to perform parameterizations for charge calculation methods with more parameters (e.g., SQE and its extensions) using large datasets.

Graphic Abstract

Introduction

Partial atomic charges are real numbers assigned to individual atoms of a molecule that approximate the distribution of electron density among these atoms. Partial atomic charges find many applications in computational chemistry [1,2,3], chemoinformatics [4,5,6], bioinformatics [7, 8], and nanoscience [9, 10]. Because the charges are not physicochemical observables but a theoretical concept, many methods for their calculation have been developed. The most reliable are quantum mechanical (QM) methods, because they are calculated according to the standard definition of partial atomic charges. Specifically, they compute the distribution of electrons in orbitals (the so-called electron population of the orbitals) and divide this electron population among individual atoms via a population analysis (e.g., MPA [11, 12], NPA [13, 14]) or charge calculation scheme (e.g., ESP [15], RESP [16]). A substantial disadvantage of QM approaches is their high computational complexity, and therefore a long computational time.

Empirical charge calculation methods are faster alternatives to QM methods. They calculate charges based on common physicochemical laws (e.g., Coulomb law), but they include empirical parameters derived from values of QM charges or other tabular values or constants. Currently, frequently used empirical methods are the Electronegativity Equalization Method (EEM) [17], Charge Equilibration method (QEq) [18], and Extended QEq (EQeq) [19]. However, even these advanced and popular methods have their limitations—e.g., their application for peptides, proteins, and other homogeneous macromolecular systems (i.e., systems composed from just several types of residues) is problematic. The reason for this is that in these macromolecules, individual types of atoms (e.g., single-bonded O) have charge values that are spread over a small range (or a few small ranges), and such disproportional charge distribution is a challenge for parameterization approaches. Especially when charge differences in the whole molecule are small (no highly positive or negative atoms or ions are present), the charge ranges are tiny. However, there are promising empirical charge calculation methods: the Split-charge Equilibration method (SQE) [20] and its extension to peptides, SQE+q0 [21].

Unfortunately, implementations of these methods and their parameters are not easily accessible to the public, so their potential usage is limited.

Recently, also machine learning approaches were applied in the area of partial atomic charges computation [22,23,24,25]. However, they are primarily targeted at small heterogeneous molecules with a firm conformation. Moreover, recent approaches [24, 25] impose limits on the size of the molecule (having at most 65 atoms) which is the limitation empirical methods don’t have.

In this publication, we have reimplemented the SQE and SQE+q0 methods and compared them with other currently popular empirical approaches. Furthermore, we introduce another SQE extension, SQE+qp, adapted for peptides. An essential goal of our article is also to make SQE and SQE+qp implementation accessible for the research community via the web application Atomic Charge Calculator II (ACC II) [26], including several parameter sets. Finally, this article also presents an optimized guided minimization method (optGM) for the fast parameterization of empirical charge calculation methods.

Description of SQE and SQE+q0 methods

SQE

SQE is based on the electronegativity equalization principle. However, unlike EEM or QEq, it does not perform equalization at the level of individual atoms, but switches the problem to a bond domain by defining split-charges, i.e., charges located on the bonds. Formally, the atomic charge on atom i is expressed as the sum of those split-charges on bonds that a particular atom is a part of:

$$\begin{aligned} q_i = \sum _{j \in \text {BA}(i)} p_{i, j} \end{aligned}$$

where $\text {BA}(i)$ is the set of atoms bonded to atom i, and $p_{i, j}$ is the split-charge on the bond $i - j$.

The SQE method written in the form of a system of linear equations is described by the equation:

$$\begin{aligned} \left( THT^T + \text {diag}(\kappa )\right) {q_{sp}} = T\chi \end{aligned}$$

where $q_{sp}$ is the vector of split-charges, T is the incidence matrix describing the molecular topology, $\text {diag}(\kappa )$ is the diagonal matrix with bond hardnesses, $\chi$ is the vector of atomic electronegativities, and H is the hardness matrix that describes the interactions between the atoms.

To reconstruct the atomic charges q from the split-charges, the following transformation is made:

$$\begin{aligned} q = T^Tq_{sp} \end{aligned}$$

SQE+q0

Since the formalism of SQE has no way of setting the total charge of a molecule or the formal charge of a particular atom, it might not be very well suited to accounting for the charged functional groups found, for example, in peptides. This shortcoming was addressed in SQE+q0 [21], an extension to SQE. SQE+q0 adds formal charges to work as initial seeds for the computation of partial atomic charges. This change is expressed in:

$$\begin{aligned} \left( THT^T + \text {diag}(\kappa )\right) q_{sp} = T(\chi - Hq_0 + \eta * q_0) \end{aligned}$$

where $q_0$ is the vector of initial formal charges, $\eta$ is the vector of atomic hardnesses, and $*$ is the element-wise product. The calculation of atomic charges is then trivially modified to:

$$\begin{aligned} {q} = T^Tq_{sp} + q_0 \end{aligned}$$

Methods

Description of SQE+qp method

Our new method SQE+qp replaces the formal charge $q_0$ of a SQE+q0 method with the member $q_p$, representing the initial charge of the relevant atomic type. Since the sum of the initial charges can differ from the total molecular charge, simple normalization must be performed before the actual computation. The following equation describes this normalization:

$$\begin{aligned} q_p^{norm} = q_p - \frac{1}{N}\left( 1^Tq_p - Q\right) \end{aligned},$$

where Q is the total molecular charge, and N is the number of atoms in the molecule. The values of initial charges are obtained in the process of parameterization of the SQE+qp method.

Implementation of empirical methods for partial charge calculation

All the methods which are used in this paper are implemented as modules of ACC II. Specifically, EEM, QEq, and EQeq were already present in ACC II, and their implementations were based on the descriptions in articles [17, 18], and [19], respectively. SQE, SQE+q0, and SQE+qp were recently added to ACC II as a result of this work. SQE and SQE+q0 were implemented according to [20, 27]. The implementation of SQE+qp is based on the previous works and this article.

ACC II is freely available under the MIT license at GitHub [28]. Furthermore, all ACC II charge calculation methods can be used via a standalone command-line application [29] that enables users to integrate charge calculation methods (including SQE-like methods) into their own workflows. While the application and all the methods are implemented in C++ language to achieve the best performance, we also provide Python bindings to these methods for convenience. A short description of the methods can be found at the ACC II webpage [30].

Parameterization of empirical methods

Several key aspects largely influence the parameterization process, namely, the differentiation of atoms (and bonds) into atomic (and bond) types, the global optimization scheme, and the design of the objective function that evaluates the parameters’ quality using several standard metrics. Note that the implementations of all the parameterization schemes mentioned in this section are a part of our internal package MACH, available freely at GitHub [31].

Atomic and bond types

During the parameterization, each atom is assigned a type that shares the same values of individual parameters. Multiple schemes for assigning types can be employed, from the simplest, in which an atom’s element represents the type, to more complex ones. One of the widely used approaches is to differentiate the atoms based on the element and the highest bond order of the bond they are part of [32,33,34]. In this text, we use the acronym HBO (highest bond order) to denote such classification (e.g., a carbon with a double bond would be C/2, an oxygen with only single bonds is O/1). The second scheme we used describes an atom’s bonded environment, i.e., all the bonded atoms (BA). Examples might be C/CCCH for a carbon connected to three other carbon atoms and one hydrogen, or O/CH for an oxygen connected to a carbon and a hydrogen.

Since SQE includes bond parameters, we must also categorize each bond. The bond type is based on the atomic types of the constituent atoms and the order of the bond.

Optimization scheme

We used the guided minimization (GDMIN) [35] method to parameterize all the above-mentioned empirical methods. Unfortunately, we found that GDMIN is very time-consuming for SQE-like methods, because they require parameters for bonds, which are not present in EEM, QEq, and EQeq. Moreover, this problem is amplified by the usage of BA atomic types, which allows for a greater number of potential combinations of bonded atoms. Therefore the number of parameters increases significantly. For this reason, we developed the method optGM, an improvement of GDMIN designed to reach the same or better results in a markedly shorter time. The main differences between GDMIN and optGM are:

optGM only uses a suitable subset (i.e., a subset of molecules containing at least N atoms of each atomic type present in the original training set) of molecules in several steps of the parameterization process. Evaluation of the objective function in these steps is therefore significantly faster.
The number of initial samples can be substantially higher (since they are only evaluated on a subset) than in the original approach developed for EEM, which has only two parameters per atomic type. A large number of initial samples is necessary to sufficiently cover the parameter space in methods with multiple atom and bond parameters.
The number of local optimizations, which are the most time-demanding part of the parameterization, is limited to just the best candidate samples.

Further details about optGM are described in Additional file 1: Section 1.

Quality metrics

To be able to evaluate the quality of the parameters, quality criteria must be defined. All of them describe the correspondence between the reference QM charges $X = (x_1, \ldots , x_N)$ and the empirical charges $Y = (y_1, \ldots , y_N)$ produced as a result of the parameterization process. In this work, we use the most common quality metrics, specifically:

$\text {R}^2$ Squared value of Pearson’s correlation coefficient. This metric describes the linear correlation between two sets of values. Values close to 1 indicate a strong linear correlation, whereas values near zero indicate a low correlation.

$$\begin{aligned} \text {R}^2(X, Y) = \frac{\left( \sum _{i=1}^{N}(x_i-{{\overline{x}}})(y_i-{{\overline{y}}})\right) ^2}{ \sum _{i=1}^{N}\left( x_i-{\overline{x}}\right) ^2 \sum _{i=1}^{N}(y_i-{\overline{y}})^2} \end{aligned}$$

where ${\overline{x}}$ and ${\overline{y}}$ represent the mean values of the sets X and Y, respectively.

$\text {RMSD}$ Root mean square deviation. The lower the value of RMSD, the more similar the two sets of values are. A zero value indicates that the sets are identical.

$$\begin{aligned} \text {RMSD}(X, Y) = \displaystyle \sqrt{\frac{1}{N} \sum _{i=1}^N \left( x_i - y_i\right) ^2} \end{aligned}$$

${\text {RMSD}_{at}}$ RMSD for atomic type. This quantity represents the worst (i.e., the largest) value of the RMSD values computed for individual atomic types.

In this work, the values of $\text {R}^2$ and $\text {RMSD}$ are computed for each molecule and then averaged over the whole set.

Objective function

The evaluation of the objective function guides the steps of the global optimization method. In this paper, we used the function defined as the sum of averaged RMSD values calculated for each molecule and the average of $\text {RMSD}$ values for each atomic type.

Correlation graphs

In parallel with quality metrics, a correlation between reference (QM) and empirical charges can also be evaluated using a correlation graph. The X-axis of the graph contains QM charges and the Y-axis empirical charges. Each point of the graph represents one atom and pairs its QM and empirical charges. Moreover, individual points are colored according to their atomic type. Therefore it can be directly seen which type of atoms correlates weakly. An example of a correlation graph can be found in Fig. 1.

Results and discussions

To assess the empirical methods and the parameterization schemes described in the previous section, we devised a series of experiments. First, the choice of datasets and reference charges had to be made.

Datasets

In this paper, we utilized three datasets of molecules, described in Table 1. The first two datasets are composed of organic molecules and were also used for the comparison and parameterization of empirical charge calculation methods in previous publications [33, 34]. DTP_small is a simple set (a low number of small-sized molecules with low variability) while CCD_gen is more complex. DTP_small contains organic molecules used as drugs; CCD_gen includes organic molecules acting as protein ligands. The last dataset, PUB_pept, was created directly for this publication. It contains small peptides obtained from the PubChem database [36]. It represents a dataset of molecules with homogeneous atomic types. The methodology of how this dataset was prepared is described in Additional file 1: Section 2.

Each dataset was divided into two subsets: a training set and a test set containing 80% and 20% of the molecules, respectively. The division was done randomly, and the stratification was included during the separation. The list of molecules that comprised the training and test set can be found in Additional file 2. For all the datasets, molecules in SDF format are provided in Additional file 3.

Table 1 Summary information about datasets used in this work

Full size table

Reference charges

The QM charge calculation approach B3LYP/6-311G/NPA was selected for calculating the QM reference charges (i.e., charges used for the parameterization and evaluation of all the compared empirical methods) on datasets DTP_small and CCD_gen. These charges were used because the combination of the B3LYP theory level, the 6-311G basis set, and NPA proved to be very suitable for parameterizing empirical charge calculation methods [4, 5, 33, 38]. For the dataset PUB_pept, the QM charge calculation approach B3LYP/6-31G*/NPA was selected. The method and the population analysis are the same as for the first two datasets, but the basis set 6-31G* was used. The reason for this is that 6-311G is too complex and not applicable for peptide molecules. The basis set 6-31G* represents a robust enough and feasible replacement, and was also often used to parameterize empirical charge calculation methods [32, 39, 40]. The QM charges for all the datasets were calculated with Gaussian 09 [41]. The files with QM partial atomic charges for molecules from all the datasets are available in Additional file 4.

Comparison of parameterization approaches GDMIN and optGM

As the first step of our study, we proved the applicability of the optGM method. For this purpose, a parameterization of the SQE method was performed via GDMIN and optGM for training subsets of all three datasets (with HBO atomic types). The parameterization times are summarized in Table 2. Further details about the parameterization process (setup, convergence criteria) are in Additional file 5: Section 2.

Table 2 Comparison of GDMIN and optGM parameterization of SQE with HBO atomic types

Full size table

This parameterization was only done for SQE, because other empirical methods have a low number of parameters; thus their parameterization is considerably less time demanding, making GDMIN sufficient for them. The HBO atomic type was chosen because it is frequently used and only creates a small number of atomic classes. Thus the calculation of parameters is markedly less time demanding than for BA atomic types, and can even be done by GDMIN in a reasonable time (a few days). Afterward, the parameters computed for each dataset were used to calculate empirical charges for this dataset (i.e., using its training subset and also using its test subset). The values of obtained empirical and reference QM charges were compared via standard metrics (i.e., ${R^2}$, $\text {RMSD}$, and ${\text {RMSD}_{at}}$). The values of these metrics for the training subsets are summarized in Table 2. Other values of quality metrics are provided in Additional file 5: Section 2. Fig. 1 also shows correlation graphs for the whole CCD_gen dataset. Other correlation graphs are in Additional file 5: Section 3.

Table 2 shows that the parameters obtained by optGM provide charges, which correlate with QM comparably or slightly better than the charges calculated using the parameters obtained by GDMIN. The metrics for the test set show the same trend. This conclusion is also confirmed by the correlation graphs (see Fig. 1).

Moreover, Table 2 shows that optGM provides results significantly faster than GDMIN. Therefore, optGM proved to be a more appropriate parameterization approach and was used for the subsequent examinations presented in this work.

Comparison of empirical charge calculation methods

As the second step of our study, we compared SQE, SQE+q0, and the newly developed SQE+qp method with the common approaches (i.e., EEM, QEq, and EQeq). For this comparison, a parameterization of all the methods was performed via optGM on the training subsets of all three datasets. HBO atomic types were used for all the datasets. Additionally, BA atomic types were also used for the dataset PUB_pept. The reason for this is that the PUB_pept dataset is homogeneous, since its atoms are parts of amino acids. Therefore, they have only several combinations of neighboring atoms (e.g., S can only have the following atom pairs as neighbors: C and C, C and H, C and S). Because of this, BA atomic types do not divide atoms into too many groups (which could have only a small number of atoms), which would negatively affect the parameterization process. Vice-versa, DTP_small and CCD_gen are heterogeneous datasets, and BA is not appropriate for them due to the small number of samples for the individual atomic types.

In summary, four combinations of datasets and atomic types were used (see Table 3). Further details about the parameterization process are in Additional file 6: Section 1. All the obtained parameter sets are in Additional file 7.

Table 3 Comparison of empirical methods on training subsets

Full size table

Afterwards, the parameters computed for each dataset and atomic types were used to calculate empirical charges for this dataset (i.e., using its training subset and its test subset).

The values of obtained empirical and reference QM charges were compared via standard metrics. The values of these metrics for the training subsets are summarized in Table 3, and the remaining values of quality metrics are in Additional file 6: Section 2. Figure 2 shows selected correlation graphs for the heterogeneous dataset CCD_gen, and Fig. 3 presents selected correlation graphs for the homogeneous dataset PUB_pept. The remaining correlation graphs are in Additional file 6: Section 3.

Comparison of methods for heterogeneous datasets

All methods perform well for datasets of drug-like organic molecules (see Table 3 and the high values of ${R^2}$). However, even though the quality metrics are reasonable for non-SQE approaches, the correlation graphs in Fig. 2 show examples proving that SQE describes individual atomic types better than EQeq, which proved to be the best of the traditional methods. Moreover, SQE+qp is comparable or slightly better than SQE and SQE+q0.

Comparison of methods for a homogeneous dataset

When considering peptides, we included both HBO and BA atomic types. Whereas the HBO types are usable for every method, the BA atomic types are not suited for EEM, QEq, and EQeq. For example, see Fig. 3, where EQeq, combined with BA atomic types, gives constant empirical charges for almost every atomic type (see X-axis parallel lines of points for most atomic types). EEM and QEq exhibit the same behavior (see correlation graphs in Supplementary information).

SQE-like methods, on the other hand, can utilize the more fine-grained division of BA atomic types and generates high-quality empirical charges. However, even with these methods, we can find differences between them. See the example comparison of SQE+q0 and SQE+qp in Fig. 4. Our method SQE+qp outperforms the earlier two models for peptides and seems to be promising for other homogeneous datasets.

The complete results for all the methods and datasets are presented in Table 3.

Conclusions

First, we developed and tested the optGM parameterization scheme. This scheme produces parameters comparable to the GDMIN method, but in a significantly shorter time. Therefore, optGM is also applicable for large datasets and charge calculation approaches with more parameters (i.e., SQE, SQE+q0, and SQE+qp). An implementation of optGM is available on GitHub.

Then, we developed the SQE+qp empirical charge calculation method and compared this method with the empirical methods EEM, QEq, EQeq, SQE, and SQE+q0. We found that for heterogeneous datasets with drug-like organic molecules, SQE-like methods performed comparably and improved upon the traditional electronegativity equalization approaches. For a homogeneous dataset with peptides, SQE+qp provided the best results and outperformed all other approaches, including SQE+q0. We also introduced a new atom classification type, BA, tailored to peptides and likely other homogeneous datasets. The combination of SQE+qp with BA atomic types proved to be an excellent solution for peptides.

The main contribution of the article is that it makes SQE, SEQ+q0 and its extension SEQ+qp together with their parameter sets accessible to the users via ACC II web application and also via a command-line application. Therefore, all these methods are now available for the broad research community for quick and precise empirical atomistic charge calculation.

Availability of data and materials

The datasets supporting the conclusions of this article are included within the article (and its additional files).

Change history

14 July 2021
A Correction to this paper has been published: https://doi.org/10.1186/s13321-021-00531-1

References

Vainio MJ, Johnson MS (2007) Generating conformer ensembles using a multiobjective genetic algorithm. J Chem Inform Model 47(6):2462–2474
Article CAS Google Scholar
Muniz HS, Nascimento AS (2017) Towards a critical evaluation of an empirical and volume-based solvation function for ligand docking. PLoS ONE 12(3):0174336
Article Google Scholar
Kritikos E, Giusti A (2020) Reactive molecular dynamics investigation of toluene oxidation under electrostatic fields: effect of the modeling of local charge distribution. J Phys Chem A 124:51
Article Google Scholar
Svobodová Vařeková R, Geidl S, Ionescu C-M, Skřehota O, Kudera M, Sehnal D, Bouchal T, Abagyan R, Huber HJ, Koča J (2011) Predicting pKa values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. J Chem Inform Model 51(8):1795–1806
Article Google Scholar
Geidl S, Svobodová Vařeková R, Bendová V, Petrusek L, Ionescu C-M, Jurka Z, Abagyan R, Koca J (2015) How does the methodology of 3D structure preparation influence the quality of pKa prediction? J Chem Inform Model 55(6):1088–1097
Article CAS Google Scholar
Kumar SP, Jha PC, Jasrai YT, Pandya HA (2015) The effect of various atomic partial charge schemes to elucidate consensus activity-correlating molecular regions: a test case of diverse QSAR models. J Biomol Struct Dyn 34(3):540–559
Article Google Scholar
Holliday JD, Jelfs SP, Willett P, Gedeck P (2003) Calculation of intersubstituent similarity using R-group descriptors. J Chem Inform Comp Sci 43(2):406–411
Article CAS Google Scholar
Cleves AE, Johnson SR, Jain AN (2019) Electrostatic-field and surface-shape similarity for virtual screening and pose prediction. J Comput Aided Mol Design 33(10):865–886
Article CAS Google Scholar
Chuang C-H, Porel M, Choudhury R, Burda C, Ramamurthy V (2018) Ultrafast electron transfer across a nanocapsular wall: coumarins as donors, viologen as acceptor, and octa acid capsule as the mediator. J Phys Chem B 122(1):328–337
Article CAS Google Scholar
Luo D, Wang F, Chen J, Zhang F, Yu L, Wang D, Willson RC, Yang Z, Ren Z (2018) Poly(sodium 4-styrenesulfonate) Stabilized Janus Nanosheets in Brine with Retained Amphiphilicity. Langmuir 34(12):3694–3700
Article CAS Google Scholar
Mulliken RS (1955) Electronic population analysis on LCAO-MO molecular wave functions. J Chem Phys 23(10):1833–1840
Article CAS Google Scholar
Mulliken RS (1955) Electronic population analysis on LCAO-MO molecular wave functions. II. Overlap populations, bond orders, and covalent bond energies. J Chem Phys 23(10):1841–1846
Article CAS Google Scholar
Reed AE, Weinhold F (1983) Natural bond orbital analysis of near-Hartree-Fock water dimer. J Chem Phys 78(6):4066–4073
Article CAS Google Scholar
Reed AE, Weinstock RB, Weinhold F (1985) Natural population analysis. J Chem Phys 83(2):735–746
Article CAS Google Scholar
Singh UC, Kollman PA (1984) An approach to computing electrostatic charges for molecules. J Comput Chem 5(2):129–145
Article CAS Google Scholar
Bayly CI, Cieplak P, Cornell W, Kollman PA (1993) A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J Phys Chem 97(40):10269–10280
Article CAS Google Scholar
Mortier WJ, Ghosh SK, Shankar S (1986) Electronegativity equalization method for the calculation of atomic charges in molecules. J Am Chem Soc 108(15):4315–4320
Article CAS Google Scholar
Rappé AK, Goddard WA III (1991) Charge equilibration for molecular dynamics simulations. J Phys Chem 95(8):3358–3363
Article Google Scholar
Wilmer CE, Kim KC, Snurr RQ (2012) An extended charge equilibration method. J Phys Chem Lett 3(17):2506–2511
Article CAS Google Scholar
Nistor RA, Polihronov JG, Müser MH, Mosey NJ (2006) A generalization of the charge equilibration method for nonmetallic materials. J Chem Phys 125:9
Article Google Scholar
Verstraelen T, Pauwels E, De Proft F, Van Speybroeck V, Geerlings P, Waroquier M (2012) Assessment of atomic charge models for gas-phase computations on polypeptides. J Chem Theory Comput 8(2):661–676
Article CAS Google Scholar
Bleiziffer P, Schaller K, Riniker S (2018) Machine learning of partial charges derived from high-quality quantum-mechanical calculations. J Chem Inform Model 58(3):579–590
Article CAS Google Scholar
Martin R, Heider D (2019) ContraDRG:automatic partial charge prediction by machine learning. Front Genet 10:990
Article CAS Google Scholar
Wang J, Cao D, Tang C, Chen X, Sun H, Hou T (2020) Fast and accurate prediction of partial charges using Atom-Path-Descriptor-based machine learning. Bioinformatics 36(18):4721–4728
Article CAS Google Scholar
Wang J, Cao D, Tang C, Xu L, He Q, Yang B, Chen X, Sun H, Hou T (2021) DeepAtomicCharge: a new graph convolutional network-based architecture for accurate prediction of atomic charges. Brief Bioinform 22(3):183
Article CAS Google Scholar
Raček T, Schindler O, Toušek D, Horský V, Berka K, Koča J, Svobodová R (2020) Atomic Charge Calculator II: web-based tool for the calculation of partial atomic charges. Nucleic Acids Re 48(W1):591–596
Article Google Scholar
Verstraelen T, Ayers PW, van Speybroeck V, Waroquier M (2013) ACKS2: Atom-condensed Kohn-Sham DFT approximated to second order. J Chem Phys 138:7
Article Google Scholar
Raček T (2021) krab1k/AtomicChargeCalculator2. https://github.com/krab1k/AtomicChargeCalculator2 Accessed 8 Mar 2021
Raček T (2021) krab1k/ChargeFW2. https://github.com/krab1k/ChargeFW2 Accessed 8 Mar 2021
Raček T (2021) Short description of the methods. https://acc2.ncbr.muni.cz/static/methods.pdf Accessed 8 Mar 2021
Schindler O (2021) dargen3/MACH. https://github.com/dargen3/MACH Accessed 8 Mar 2021
Ouyang Y, Ye F, Liang Y (2009) A modified electronegativity equalization method for fast and accurate calculation of atomic charges in large biological molecules. Phys Chem Chem Phys 11(29):6082–6089
Article CAS Google Scholar
Geidl S, Bouchal T, Raček T, Svobodová Vařeková R, Hejret V, Křenek A, Abagyan R, Koča J (2015) High-quality and universal empirical atomic charges for chemoinformatics applications. J Cheminform 7:59
Article Google Scholar
Raček T, Pazúriková J, Svobodová Vařeková R, Geidl S, Křenek A, Falginella FL, Horský V, Hejret V, Koča J (2016) NEEMP: software for validation, accurate calculation and fast parameterization of EEM charges. J Cheminform 8:57
Article Google Scholar
Pazúriková J, Křenek A, Matyska L (2016) Guided optimization method for fast and accurate atomic charges computation. In: Proceedings of the 2016 European simulation and modelling conference, EUROSIS - ETI, Ghent, Belgium, pp 267–274
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47(D1):1102–2109
Article Google Scholar
Svobodová Vařeková R, Jiroušková Z, Vaněk J, Suchomel Š, Koča J (2007) Electronegativity equalization method: parameterization and validation for large sets of organic, organohalogene and organometal molecule. Int J Mol Sci 8(7):572–582
Article Google Scholar
Ionescu C-M, Geidl S, Svobodová Vařeková R, Koča J (2013) Rapid calculation of accurate atomic charges for proteins via the electronegativity equalization method. J Chem Inform Model 53(10):2548–2558
Article CAS Google Scholar
Bultinck P, Langenaeker W, Lahorte P, De Proft F, Geerlings P, Van Alsenoy C, Tollenaere JP (2002) The electronegativity equalization method II: Applicability of different atomic charge schemes. J Phys Chem A 106(34):7895–7901
Article CAS Google Scholar
Bultinck P, Vanholme R, Popelier PLA, De Proft F, Geerlings P (2004) High-speed calculation of aim charges through the electronegativity equalization method. J Phys Chem A 108(46):10359–10366
Article CAS Google Scholar
Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Scalmani G, Barone V, Petersson GA, Nakatsuji H, Li X, Caricato M, Marenich AV, Bloino J, Janesko BG, Gomperts R, Mennucci B, Hratchian HP, Ortiz JV. Izmaylov AF, Sonnenberg JL, Williams-Young D, Ding F, Lipparini F, Egidi F, Goings J, Peng B, Petrone A, Henderson T, Ranasinghe D, Zakrzewski VG, Gao J, Rega N, Zheng G, Liang W, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Throssell K, Montgomery JA Jr, Peralta JE, Ogliaro F, Bearpark MJ, Heyd JJ, Brothers EN, Kudin KN, Staroverov VN, Keith TA, Kobayashi R, Normand J, Raghavachari K, Rendell AP, Burant JC, Iyengar SS, Tomasi J, Cossi M, Millam JM, Klene M, Adamo C, Cammi R, Ochterski JW. Martin RL, Morokuma K, Farkas O, Foresman JB, Fox DJ (2016) Gaussian 16 Revision B.01

Download references

Acknowledgements

Computational resources were supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.

Funding

This work was funded by the ELIXIR-CZ research infrastructure project (MEYS Grant No: LM2018131).

Author information

Ondřej Schindler and Tomáš Raček shared first authorship

Authors and Affiliations

CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, 602 00, Brno, Czech Republic
Ondřej Schindler, Tomáš Raček, Jaroslav Koča & Radka Svobodová
National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
Ondřej Schindler, Tomáš Raček, Jaroslav Koča & Radka Svobodová
Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Tomáš Raček
Division of Biochemistry, Department of Chemistry, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000, Zagreb, Croatia
Aleksandra Maršavelski
Department of Physical Chemistry, Faculty of Science, Palacký University Olomouc, 17. listopadu 1192/12, 771 46, Olomouc, Czech Republic
Karel Berka

Authors

Ondřej Schindler
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Raček
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandra Maršavelski
View author publications
You can also search for this author in PubMed Google Scholar
Jaroslav Koča
View author publications
You can also search for this author in PubMed Google Scholar
Karel Berka
View author publications
You can also search for this author in PubMed Google Scholar
Radka Svobodová
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

OS and TR designed and implemented the optGM and SQE+qp methods. RS designed the experiments. AM prepared the peptide dataset. RS and TR wrote the manuscript. KB and JK revised and improved the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Radka Svobodová.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised because Table 2 was misaligned.

Supplementary Information

Additional file 1.

Detailed description of parameterization process.Description of optGM parameterization scheme and preparation of PUB pept dataset.

Additional file 2.

Division of datasets into training and test sets.Lists of IDs of molecules comprising training and test sets for each dataset.

Additional file 3.

Datasets.Molecules for all datasets.

Additional file 4.

QM charges.Files with QM partial atomic charges for molecules from all datasets.

Additional file 5.

Details of GDMIN and optGM comparison.Description of procedure, values of quality metrics and correlation graphs.

Additional file 6.

Details of empirical charge method comparison.Description of procedure, values of quality metrics and correlation graphs.

Additional file 7.

Parameter sets.All parameter sets obtained during the comparison of empirical methods.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Schindler, O., Raček, T., Maršavelski, A. et al. Optimized SQE atomic charges for peptides accessible via a web application. J Cheminform 13, 45 (2021). https://doi.org/10.1186/s13321-021-00528-w

Download citation

Received: 11 March 2021
Accepted: 18 June 2021
Published: 30 June 2021
DOI: https://doi.org/10.1186/s13321-021-00528-w

Optimized SQE atomic charges for peptides accessible via a web application

Abstract

Background

Results

Conclusion

Graphic Abstract

Introduction

Description of SQE and SQE+q0 methods

SQE

SQE+q0

Methods

Description of SQE+qp method

Implementation of empirical methods for partial charge calculation

Parameterization of empirical methods

Atomic and bond types

Optimization scheme

Quality metrics

Objective function

Correlation graphs

Results and discussions

Datasets

Reference charges

Comparison of parameterization approaches GDMIN and optGM

Comparison of empirical charge calculation methods

Comparison of methods for heterogeneous datasets

Comparison of methods for a homogeneous dataset

Conclusions

Availability of data and materials

Change history

14 July 2021

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1.

Additional file 2.

Additional file 3.

Additional file 4.

Additional file 5.

Additional file 6.

Additional file 7.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us