Elsevier

Chemical Engineering Science

Volume 183, 29 June 2018, Pages 95-105
Chemical Engineering Science

Prediction of acid dissociation constants of organic compounds using group contribution methods

https://doi.org/10.1016/j.ces.2018.03.005Get rights and content

Highlights

  • Prediction of acid dissociation constants (Ka) for a large set of organic compounds.

  • The Marrero and Gani–Group Contribution (MG-GC) method to develop the property models.

  • Linear and nonlinear GC models for amino acids and other classes of compounds.

  • An Artificial Neural Network (ANN) based GC model for organic compounds.

  • Modeling details and model parameters provided.

  • Accuracy of the models demonstrated through application examples.

Abstract

In this paper, group contribution (GC) property models for the estimation of acid dissociation constants (Ka) of organic compounds are presented. Three GC models are developed to predict the negative logarithm of the acid dissociation constant pKa: (a) a linear GC model for amino acids using 180 data-points with average absolute error of 0.23; (b) a non-linear GC model for organic compounds using 1622 data-points with average absolute error of 1.18; (c) an artificial neural network (ANN) based GC model for the organic compounds with average absolute error of 0.17. For each of the developed model, uncertainty estimates for the predicted pKa values are also provided. The model details, regressed parameters and application examples are highlighted.

Introduction

The acid dissociation constant (Ka) of a compound, which expresses the extent to which the compound in its aqueous solution is dissociated into its ionic form, is sought after by many chemists, biochemists and product formulators. Although experimental measurements would yield the most satisfactory results, it is not always convenient to setup and conduct experiments for Ka determination. This is because the organic compounds that weakly dissociate lack adequate spectral differences in the dissociated and undissociated forms. Besides, in the cases where a compound is unstable or is insufficiently soluble in water, experimental Ka determination is impossible (Tong and Wen, 2008).

The currently available pKa (negative logarithm of Ka) compilations provide values for only a small fraction of known or possible acids and bases (Perrin et al., 1981). This motivates the development of advanced pKa prediction models.

This paper is organized as follows. First, we give a definition on pKa and highlight its significance in several research areas (Section 1.1). After a brief introduction of the main existing methods for pKa prediction (Section 1.2), we focus on the powerful group contribution (GC) methods and present more details about these methods in Section 2. Three different GC models are then developed to predict pKa for amino acids and other classes of organic compounds. The performances of these models are evaluated and compared in Section 3.1. Finally, in Section 3.2, several examples are shown to help the reader in understanding how to apply the developed models for predicting pKa.

In aqueous solution, acids (generically represented by HA) undergo a protolytic reaction with water. This equilibrium reaction is given as:HA+H2OH3O++A-

The equilibrium constant (in this case, the acid dissociation constant Ka) for the reaction given in Eq. (1) is expressed as Eq. (2), which relates the activity of the dissociated form of the acid (aA-) to the activity of its undissociated form (aHA)Ka=[aH3O+][aA-][aH2O][aHA]

As the Ka measurements are generally made in dilute aqueous solutions, the concentration of water remains nearly constant and therefore, its activity can be taken as unity. The general expression of Ka is then derived from Eq. (2), asKa=[aH3O+][aA-][aHA]

By taking negative logarithm on both sides of Eq. (3) and rearranging the terms, the relation between the pH of the solution and the pKa of HA can be obtained, given as Eq. (5).-log(Ka)=-log([aH3O+])-log[aA-][aHA]pKa=pH+log[aHA][aA-]

In the special case, when the activity of HA equals that of A, pKa is identical to pH.

pKa is very significant in many different areas. For example, during liquid-liquid extraction, when an organic compound is to be separated from an aqueous solution, the undissociated form of the compound usually is more soluble in the organic phase. Hence, the pH of the aqueous phase can be adjusted to its optimum value if the pKa of the organic compound is known (Green and Perry, 2008). In preparative chemistry, considering the effects of pH on the properties of reactants as well as the possible intermediates and products, conditions for synthesis are selected by making use of pKa (Perrin et al., 1981).

Nowadays a large number of experimental pKa data are available, thus one can predict pKa of new compounds by extrapolating or interpolating the pKa of database compounds of the same type. Besides this, theoretical calculations and semi-empirical correlations based on thermodynamics and quantum chemical foundations have also been used for pKa prediction in various works (e.g., Jensen et al., 2017 use isodemic reactions, where the pKa is estimated relative to a chemically related reference compound, to make COSMO-based and SMD-based predictions. The pKa values of 53 amine groups in 48 druglike compounds are computed.)

The Hammett-Taft equation quantifies the electronic effect of organic functional groups (or substituents) on other groups to which they are attached. This equation is a linear free energy relationship (LFER). It is widely used for pKa prediction (Metzler, 2012) and is as shown in Eq. (6).pKa=pKa0-ρσiwhere pKa0 indicates the pKa value for unsubstituted reference compounds; σi is the substituent constant for the substituent i; and ρ is the proportionality constant for the particular equilibrium dissociation reaction i.e. it is the measure of the sensitivity of the reaction to the presence of electron-withdrawing or electron-donating substituents, for example the ρ for phenylacetic acids is 0.49, while that for phenols is 2.23. It should be noted that, currently only a limited number of substituent constants are available, which limits the applicability of the LFER method for pKa prediction.

There are several first-principle theory based methods for pKa prediction. The Kirkwood-Westheimer equation (Kirkwood and Westheimer, 1938) quantifies ΔpKa for a charged or a dipolar substituent as follows,ΔpKa=eμcosϕ2.3kTR2Deff

In Eq. (7), ϕ is the angle between the line joining the centre of the ionizing group to the centre of the dipole and the axis of the dipole, e is the electronic charge, k is the Boltzmann constant, T is the temperature in K, μ is the dipole moment, R is the distance between two charges, Deff is the effective dielectric constant. The largest limitation of the Kirkwood-Westheimer method is that it is applicable only to ellipsoidal molecules with point charges at their foci only.

pKa can also be estimated based on thermodynamic cycles that relate the gas phase to the solution phase, where state-of-the-art quantum chemical techniques coupled with an appropriate solvation model are used (Shields and Seybold, 2013). Jang et al. (2001) predicted the pKa values for a series of 5-substituted uracil derivatives using density functional theory (DFT) calculations in combination with the Poisson-Boltzmann continuum-solvation model (Im et al., 1998).

Even though theoretical calculations can yield good results in predicting pKa, these methods are not very attractive for some applications due to their high computational cost. For instance, in drug formulation design, the pKa of active ingredients (AIs) is a very important property for selecting AIs because the pKa value indicates the aqueous solubility of the AI and the ability of the AI to permeate through the gastro-intestinal membrane. In order to perform a fast AI pre-screening, a quick and reliable pKa prediction method is more preferable than an accurate but very computationally expensive one.

The compounds of the same class usually have small differences in their pKa values. For example, the pKa of 1-aminoheptane is 10.67 at 25 °C, which is just slightly lower than the pKa value of 10.70 for ethylamine. In general, the pKa of primary amines falls into the range of 10.6 ± 0.2. Also, if the alkyl-chain-substituted amines are compared with cyclic amines, the pKa is raised by 0.2 units for one ring and 0.3 units for two rings (Perrin et al., 1981). By employing analogical methods, one can perform pKa estimations. However, in order to accurately predict pKa for a certain compound, one needs quite a lot of information about other compounds with similar molecular structures.

As indicated, the three types of prediction methods (see Sections 1.2.1 Linear free energy relationships (LFER), 1.2.1 Linear free energy relationships (LFER), 1.2.3 Group contribution based methods) all have certain limitations, which motivates the development of new methods for fast and reliable pKa predictions. It is also clear that the pKa value or the degree to which a compound dissociates in its aqueous solution depends mostly on the molecular structure of the compound. This inspires us to develop group contribution (GC) based models that are applicable to all different classes of organic compounds.

Amino acid molecules have at least one acidic group and one basic group. This allows intramolecular acid-base equilibrium reaction resulting in the formation of a dipolar tautomeric ion known as the zwitterion or internal salt (Cheung, 1995). The dissociation of amino acids in aqueous solutions is represented as follows.H3N+·R·COOH(R+)+H2OH3N+·R·COO-(R±)+H3O+whereKa1=[aR±][aH3O+][aR+]andH3N+·R·COO-(R±)+H2OH2N·R·COO-(R-)+H3O+whereKa2=[aR-][aH3O+][aR±]

From above, we know that an amino acid typically has at least two dissociation constants with the first one corresponding to the case when the COOH group is deprotonated and the second one corresponding to the case when the H3N+ group gets deprotonated in aqueous solution. Considering this unique behaviour, amino acids have been considered separately from other organic compounds in this work in the same way as in our previous GC-based property estimation models for amino acids (Jhamb et al., 2018).

Section snippets

Experimental dataset

In the present study, the first dataset (dataset – 1) comprises experimental pKa values of 180 amino acids while the second dataset (dataset – 2) contains pKa values of 1622 organic compounds that are not amino acids. The experimentally measured pKa values in both datasets are collected from the KT-Consortium database and handbooks containing the dissociation constants of organic compounds (Kortüm et al., 1961, Perrin, 1965). Table 1 provides an overview of the datasets used for developing

Results and discussions

Three GC (linear, nonlinear, and ANN-based) models are developed to predict pKa for amino acids and other classes of organic compounds. The performances of these models in predicting pKa are evaluated in Section 3.1. Several examples are shown in Section 3.2 to help the reader in understanding how to apply the developed models for pKa prediction.

Conclusion

The prediction of acid dissociation constant (Ka) is very significant in many areas. In this work, three GC property models have been developed and tested for the estimation of the pKa of organic compounds including amino acids. The linear GC model has a good performance (R2 = 0.96, AAE = 0.23) only for amino acids. For the other classes of compounds, a nonlinear GC model and an ANN-GC model have been developed. The nonlinear GC model has a moderate prediction quality (R2 = 0.81, AAE = 1.18)

References (27)

  • F. Gharagheizi et al.

    A new neural network – group contribution method for estimation of flash point temperature of pure components

    Energy Fuels

    (2008)
  • F. Gharagheizi et al.

    Use of artificial neural network-group contribution method to determine surface tension of pure compounds

    J. Chem. Eng. Data

    (2011)
  • D. Green et al.

    Perrys's Chemical Engineers' Handbook

    (2008)
  • Cited by (43)

    • Functional group analysis and machine learning techniques for MIE prediction

      2024, Journal of Loss Prevention in the Process Industries
    • Prediction of Pourbaix diagrams of quinones for redox flow battery by COSMO-RS

      2022, Journal of Energy Storage
      Citation Excerpt :

      There is a wealth of existing methods for predicting pKa [9], including empirical methods like Quantitative Structure Property Relationships (QSPR) [10], group contributions [11], or more theoretical approaches like quantum chemistry [12] combined with explicit water molecules [13] and/or solvation models like the COnductor-Like Screening MOdel (COSMO-RS) [14]. Although errors of ± 0.2-0.4 pKa units are already achievable from models fitted for particular families of molecules [9,11] (with the obvious shortcoming of a smaller applicability domain), errors of ± 0.6-1.5 pKa units are usual for more general methods [9,10]. By definition, the reduction potential is a difference between two electric potential differences.

    View all citing articles on Scopus
    1

    Current address.

    View full text