Skip to main content
Log in

Using the Gini coefficient to characterize the shape of computational chemistry error distributions

  • Regular Article
  • Published:
Theoretical Chemistry Accounts Aims and scope Submit manuscript

Abstract

The distribution of errors is a central object in the assessment and benchmarking of computational chemistry methods. The popular and often blind use of the mean unsigned error as a benchmarking statistic leads to ignore distributions features that impact the reliability of the tested methods. We explore how the Gini coefficient offers a global representation of the errors distribution, but, except for extreme values, does not enable an unambiguous diagnostic. We propose to relieve the ambiguity by applying the Gini coefficient to mode-centered error distributions. This version can usefully complement benchmarking statistics and alert on error sets with potentially problematic shapes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability statement

The data and codes that support the findings of this study are openly available at the following URL: https://doi.org/10.5281/zenodo.4333217

References

  1. Pernot P, Civalleri B, Presti D, Savin A (2015) Prediction uncertainty of density functional approximations for properties of crystals with cubic symmetry. J Phys Chem A 119:5288–5304. https://doi.org/10.1021/jp509980w

    Article  CAS  PubMed  Google Scholar 

  2. Pernot P, Savin A (2018) Probabilistic performance estimators for computational chemistry methods: the empirical cumulative distribution function of absolute errors. J Chem Phys 148:241707. https://doi.org/10.1063/1.5016248

    Article  CAS  PubMed  Google Scholar 

  3. Pernot P, Savin A (2020) Probabilistic performance estimators for computational chemistry methods: systematic improvement probability and ranking probability matrix. I. Theory J Chem Phys 152:164108. https://doi.org/10.1063/5.0006202

    Article  CAS  PubMed  Google Scholar 

  4. Pernot P, Savin A (2020) Probabilistic performance estimators for computational chemistry methods: Systematic improvement probability and ranking probability matrix. II. Appl J Chem Phys 152:164109. https://doi.org/10.1063/5.0006204

    Article  CAS  Google Scholar 

  5. Pernot P, Huang B, Savin A (2020) Impact of non-normal error distributions on the benchmarking and ranking of Quantum Machine Learning models. Mach Learn Sci Technol 1:035011. https://doi.org/10.1088/2632-2153/aba184

    Article  Google Scholar 

  6. Bonato M (2011) Robust estimation of skewness and kurtosis in distributions with infinite higher moments. Finance Res Lett 8:77–87. https://doi.org/10.1016/j.frl.2010.12.001

    Article  Google Scholar 

  7. Lorenz MO (1905) Methods of measuring the concentration of wealth. Publ Am Stat Assoc 9:209–219. https://doi.org/10.2307/2276207

    Article  Google Scholar 

  8. Gini C (1912) Variabilità e mutabilità

  9. Damgaard C, Weiner J (2000) Describing inequality in plant size or fecundity. Ecology 81:1139–1142. https://doi.org/10.2307/177185

    Article  Google Scholar 

  10. Eliazar II, Sokolov IM (2010) Measuring statistical heterogeneity: the Pietra index. Phys A 389:117–125. https://doi.org/10.1016/j.physa.2009.08.006

    Article  Google Scholar 

  11. Bendel RB, Higgins SS, Teberg JE, Pyke DA (1989) Comparison of skewness coefficient, coefficient of variation, and Gini coefficient as inequality measures within populations. Oecologia 78:394–400. https://doi.org/10.1007/BF00379115

    Article  CAS  PubMed  Google Scholar 

  12. Florian MK, Li N, Gladders MD (2016) The Gini coefficient as a morphological measurement of strongly lensed galaxies in the image plane. Astrophys J 832:168. https://doi.org/10.3847/0004-637X/832/2/168

    Article  Google Scholar 

  13. Hurley N, Rickard S (2009) Comparing measures of sparsity. IEEE Trans Inf Theory 55:4723–4741. https://doi.org/10.1109/TIT.2009.2027527

    Article  Google Scholar 

  14. Kleiber C (2005) The Lorenz curve in economics and econometrics. techreport, TU Dortmund, March. https://doi.org/10.17877/DE290R-14481

  15. Dixon PM, Weiner J, Mitchell-Olds T, Woodley R (1987) Bootstrapping the Gini coefficient of inequality. Ecology 68:1548–1551. https://doi.org/10.2307/1939238

    Article  Google Scholar 

  16. Ruppert D (1987) What is kurtosis? An influence function approach. Am Stat 41:1. https://doi.org/10.2307/2684309

    Article  Google Scholar 

  17. Groeneveld RA, Meeden G (1984) Measuring skewness and kurtosis. Stat 33:391–399. http://www.jstor.org/stable/2987742, https://doi.org/10.2307/2987742

  18. Suaray K (2015) On the asymptotic distribution of an alternative measure of kurtosis. Int J Adv Stat Proba 3:161–168. https://doi.org/10.14419/ijasp.v3i2.5007

    Article  Google Scholar 

  19. Crow EL, Siddiqui MM (1967) Robust estimation of location. J Am Stat Assoc 62:353–389. https://doi.org/10.2307/2283968

    Article  Google Scholar 

  20. Bickel DR (2002) Robust estimators of the mode and skewness of continuous data. Comput Stat Data Anal 39:153–163. https://doi.org/10.1016/S0167-9473(01)00057-3

    Article  Google Scholar 

  21. Hedges SB, Shah P (2003) Comparison of mode estimation methods and application in molecular clock analysis. BMC Bioinform 4:31. https://doi.org/10.1186/1471-2105-4-31

    Article  Google Scholar 

  22. Glasser GJ (1962) Variance formulas for the mean difference and coefficient of concentration. J Am Stat Assoc 57:648–654. https://doi.org/10.1080/01621459.1962.10500553

    Article  Google Scholar 

  23. Zeileis A (2014) ineq: measuring inequality, concentration, and poverty. R package version 0.2-13. URL: https://CRAN.R-project.org/package=ineq

  24. Harrell FE, Davis C (1982) A new distribution-free quantile estimator. Biometrika 69:635–640. https://doi.org/10.2307/2335999

    Article  Google Scholar 

  25. Wilcox RR, Erceg-Hurn DM (2012) Comparing two dependent groups via quantiles. J Appl Stat 39:2655–2664. https://doi.org/10.1080/02664763.2012.724665

    Article  Google Scholar 

  26. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26. https://doi.org/10.1214/aos/1176344552

    Article  Google Scholar 

  27. R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org/

  28. Gentleman R, Carey V, Huber W, Hahne F (2019) genefilter: methods for filtering genes from high-throughput experiments. R package version 1(68)

  29. Canty A, Ripley BD (2019) boot: bootstrap R (S-Plus) Functions. R package version 1.3-22

  30. Evans M, Hastings N, Peacock B (2000) Statistical distributions. Wiley-Interscience, 3rd edition

  31. Hoaglin DC (1985) Exploring data tables, trends, and shapes, chapter Summarizing shape numerically: the g-and-h distributions, pp 461–513. Wiley, New York

  32. Borlido P, Aull T, Huran AW, Tran F, Marques MA, Botti S (2019) Large-scale benchmark of exchange-correlation functionals for the determination of electronic band gaps of solids. J Chem Theory Comput 15:5069–5079. https://doi.org/10.1021/acs.jctc.9b00322

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Narayanan B, Redfern PC, Assary RS, Curtiss LA (2019) Accurate quantum chemical energies for 133000 organic molecules. Chem Sci 10:7449–7455. https://doi.org/10.1039/c9sc02834j

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Schmidt PS, Thygesen KS (2018) Benchmark database of transition metal surface and adsorption energies from many-body perturbation theory. J Phys Chem C 122:4381–4390. https://doi.org/10.1021/acs.jpcc.7b12258

    Article  CAS  Google Scholar 

  35. Thakkar AJ, Wu T (2015) How well do static electronic dipole polarizabilities from gas-phase experiments compare with density functional and MP2 computations? J Chem Phys 143:144302. https://doi.org/10.1063/1.4932594

    Article  CAS  PubMed  Google Scholar 

  36. Wu T, Kalugina YN, Thakkar AJ (2015) Choosing a density functional for static molecular polarizabilities. Chem Phys Lett 635:257–261. https://doi.org/10.1016/j.cplett.2015.07.003

    Article  CAS  Google Scholar 

  37. Zaspel P, Huang B, Harbrecht H, von Lilienfeld OA (2019) Boosting quantum machine learning models with a multilevel combination technique: people diagrams revisited. J Chem Theory Comput 15(3):1546–1559. https://doi.org/10.1021/acs.jctc.8b00832

    Article  CAS  PubMed  Google Scholar 

  38. Zhang Y, Kitchaev DA, Yang J, Chen T, Dacek ST, Sarmiento-Perez RA, Marques MAL, Peng H, Ceder G, Perdew JP, Sun J (2018) Efficient first-principles prediction of solid stability: towards chemical accuracy. npj Comput Mater 4:9. https://doi.org/10.1038/s41524-018-0065-z

  39. Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7:308–313. https://doi.org/10.1093/comjnl/7.4.308

    Article  Google Scholar 

  40. Kacker RN, Kessel R, Sommer K-D (2010) Assessing differences between results determined according to the guide to the expression of uncertainty in measurement. J Res Nat Inst Stand Technol 115(6):453. https://doi.org/10.6028/jres.115.031

    Article  Google Scholar 

  41. Lejaeghere K, Jaeken J, Speybroeck VV, Cottenier S (2014) Ab initio based thermal property predictions at a low cost: an error analysis. Phys Rev B 89:014304. https://doi.org/10.1103/physrevb.89.014304

    Article  Google Scholar 

  42. Lejaeghere K, Vanduyfhuys L, Verstraelen T, Speybroeck VV, Cottenier S (2016) Is the error on first-principles volume predictions absolute or relative? Comput Mater Sci 117:390–396. https://doi.org/10.1016/j.commatsci.2016.01.039

    Article  CAS  Google Scholar 

  43. Proppe J, Husch T, Simm GN, Reiher M (2016) Uncertainty quantification for quantum chemical models of complex reaction networks. Faraday Discuss 195:497–520. https://doi.org/10.1039/c6fd00144k

    Article  CAS  PubMed  Google Scholar 

  44. Proppe J, Reiher M (2017) Reliable estimation of prediction uncertainty for physicochemical property models. J Chem Theory Comput 13:3297–3317. https://doi.org/10.1021/acs.jctc.7b00235

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pascal Pernot.

Additional information

This work is dedicated to Ramon Carbó-Dorca for his 80th birthday. It reflects his interest for cross-disciplinary aspects of science, especially the role of mathematics in chemistry.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Published as part of the special collection of articles “Festschrift in honour of Prof. Ramon Carbó-Dorca”.

Supplementary Information

Below is the link to the electronic supplementary material. Statistics, ECDFs and Lorenz curves for the literature datasets are also openly available at the following URL: https://doi.org/10.5281/zenodo.4333217

Supplementary material 1 (pdf 3173 KB)

Supplementary material 2 (csv 3173 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pernot, P., Savin, A. Using the Gini coefficient to characterize the shape of computational chemistry error distributions. Theor Chem Acc 140, 24 (2021). https://doi.org/10.1007/s00214-021-02725-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00214-021-02725-0

Keywords

Navigation