Skip to main content
Log in

When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?

  • Original Article
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs (http://tomocomd.com/md-lovis), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon’s entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Scheme 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of Software and Material

The MD-LOVIs software and the respective user manual are freely available online at http://tomocomd.com/md-lovis.

References

  1. Todeschini R, Consoni V (2009) Handbook of molecular descriptors. Wiley VCH, Weinheim

    Google Scholar 

  2. Mani-Varnosfaderani A, Neiband MS, Benvidi A (2019) Identification of molecular features necessary for selective inhibition of B cell lymphoma proteins using machine learning techniques. Mol Divers 23(1):55–73

    CAS  Google Scholar 

  3. DRAGON for Windows (software for molecular descriptor calculations) (2005)

  4. CODESSA 2.13. Semichem edn, 7204 Mullen, Shawnee, KS 66216, USA

  5. Yap CW (2010) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707

    Article  CAS  Google Scholar 

  6. García-Jacas CR, Marrero-Ponce Y, Acevedo-Martínez L, Barigye SJ, Valdés-Martiní JR, Contreras-Torres E (2014) QuBiLS-MIDAS: a parallel free-software for molecular descriptors computation based on multilinear algebraic maps. J Comput Chem 35(18):1395–1409

    Google Scholar 

  7. Valdés-Martiní JR, Marrero-Ponce Y, García-Jacas CR, Martinez-Mayorga K, Barigye SJ, d‘Almeida YSV YSV, Pérez-Giménez F, Morell CA (2017) QuBiLS-MAS, open source multi-platform software for atom-and bond-based topological (2D) and chiral (2.5 D) algebraic molecular descriptors computations. J Cheminform 9(1):35

    Google Scholar 

  8. Hong H, Xie Q, Ge W, Qian F, Fang H, Shi L, Su Z, Perkins R, Tong W (2008) Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J Chem Inf Model 48(7):1337–1344

    CAS  Google Scholar 

  9. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK)-an open-source java library for chemo-and bioinformatics. Curr Pharm Des 12(17):2111–2120

    CAS  Google Scholar 

  10. Dong J, Cao D-S, Miao H-Y, Liu S, Deng B-C, Yun Y-H, Wang N-N, Lu A-P, Zeng W-B, Chen AF (2015) ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. J Cheminform 7(1):60

    Google Scholar 

  11. Gutman I, Das KC (2004) The first Zagreb indices 30 years after. MATCH Commun Math Comput Chem 50:83–92

    CAS  Google Scholar 

  12. Randic M (1975) Characterization of molecular branching. J Am Chem Soc 97(23):6609–6615

    CAS  Google Scholar 

  13. Broto P, Moreau G, Vandycke C (1984) Molecular structures: perception, autocorrelation descriptor and SAR studies, autocorrelation descriptor. Eur J Med Chem 19:66–70

    CAS  Google Scholar 

  14. Katritzky AR, Lobanov VS, Karelson M, Murugan R, Grendze MP, Toomey JEJ (1996) Comprehensive descriptors for structural and statistical analysis. 1. Correlations between structure and physical properties of substituted pyridines. Rev Roum Chim 41(85):81–867

    Google Scholar 

  15. Kier LB, Hall LH (1986) Molecular connectivity in structure-activity analysis. Research Studies Press, Letchworth

    Google Scholar 

  16. Zhao YH, Abraham MH, Zissimos AM (2003) Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. J Org Chem 68(19):7368–7373

    CAS  Google Scholar 

  17. Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Google Scholar 

  18. Barigye SJ, Marrero-Ponce Y, Martínez Santiago O, Martínez López Y, Torrens F (2013) Shannon’s, mutual, conditional and joint entropy-based information indices. Generalization of global indices defined from local vertex invariants. Curr Comput-Aided Drug Des 9(2):164–183

    CAS  Google Scholar 

  19. García-Jacas CR, Cabrera-Leyva L, Marrero-Ponce Y, Suárez-Lezcano J, Cortés-Guzmán F, García-González LA (2018) GOWAWA aggregation operator-based global molecular characterizations: weighting atom/bond contributions (LOVIs/LOEIs) according to their influence in the molecular encoding. Mol Inform 37(12):1800039

    Google Scholar 

  20. Martínez-Santiago O, Millán-Cabrera R, Marrero-Ponce Y, Barigye SJ, Martínez-López Y, Torrens F, Pérez-Giménez F (2014) Discrete derivatives for atom-pairs as a novel graph-theoretical invariant for generating new molecular descriptors: orthogonality, interpretation and QSARs/QSPRs on benchmark databases. Mol Inform 33(5):343–368

    Google Scholar 

  21. Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3):519–530

    Google Scholar 

  22. Fleming PJ, Wallace JJ (1986) How not to lie with statistics: the correct way to summarize benchmark results. Commun ACM 29(3):218–221

    Google Scholar 

  23. Calvo T, Mayor G, Mesiar R (2012) Aggregation operators: new trends and applications, vol 97. Physica, Heidelberg

    Google Scholar 

  24. Merigó JM, Palacios-Marqués D, Soto-Acosta P (2017) Distance measures, weighted averages, OWA operators and Bonferroni means. Appl Soft Comput 50:356–366

    Google Scholar 

  25. Karczmarek P, Kiersztyn A, Pedrycz W (2018) Generalized Choquet integral for face recognition. Int J Fuzzy Syst 20(3):1047–1055

    Google Scholar 

  26. Wang Z, Yang R, Leung K (2010) Nonlinear integrals and their applications in data mining. In: Advances in fuzzy systems—applications and theory, vol 24. https://doi.org/10.1142/9789812814685_0001

  27. Liu B, Fu M, Zhang S, Xue B, Zhou Q, Zhang S (2018) An interval-valued 2-tuple linguistic group decision-making model based on the Choquet integral operator. Int J Inf Sci 49(2):407–424

    Google Scholar 

  28. Fontaine F, Pastor M, Gutiérrez-de-Terán H, Lozano JJ, Sanz F (2003) Use of alignment-free molecular descriptors in diversity analysis and optimal sampling of molecular libraries. Mol Divers 6(2):135–147

    CAS  Google Scholar 

  29. Maldonado AG, Doucet JP, Petitjean M, Fan BT (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 10(1):39–79

    CAS  Google Scholar 

  30. Bajorath J (2017) Molecular similarity concepts for informatics applications. In: Keith J (ed) Bioinformatics. Springer, Berlin, pp 231–245

    Google Scholar 

  31. Marrero-Ponce Y (2004) Linear Indices of the “molecular pseudograph’s atom adjacency matrix”: definition, significance-interpretation, and application to QSAR analysis of flavone derivatives as HIV-1 integrase inhibitors. J Chem Inf Comput Sci 44(6):2010–2026. https://doi.org/10.1021/ci049950k

    Article  CAS  Google Scholar 

  32. Basak S, Gute B (1997) Characterization of molecular structures using topological indices. SAR QSAR Environ Res 7(1–4):1–21

    CAS  Google Scholar 

  33. Merigó JM, Gil-Lafuente AM (2010) New decision-making techniques and their application in the selection of financial products. Inf Sci 180(11):2085–2094

    Google Scholar 

  34. Xu ZS (2012) Fuzzy ordered weighted distances. Fuzzy Optim Decis Making 11:73–97

    Google Scholar 

  35. García-Jacas CR, Cabrera-Leyva L, Marrero-Ponce Y, Suárez-Lezcano J, Cortés-Guzmán F, Pupo-Meriño M, Vivas-Reyes R (2018) Choquet integral-based fuzzy molecular characterizations: when global definitions are computed from the dependency among atom/bond contributions (LOVIs/LOEIs). J Cheminform 10(1):51

    Google Scholar 

  36. Bolton J, Gader P, Wilson JN (2008) Discrete Choquet integral as a distance metric. IEEE Trans Fuzzy Syst 16(4):1107–1110

    Google Scholar 

  37. Merigó JM (2011) A unified model between the weighted average and the induced OWA operator. Expert Syst Appl 38(9):11560–11572

    Google Scholar 

  38. Ertl P, Rohde B, Selzer P (2000) Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem 43(20):3714–3717

    CAS  Google Scholar 

  39. Ghose AK, Crippen GM (1987) Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions. J Chem Inf Comput Sci 27(1):21–35

    CAS  Google Scholar 

  40. Steinbeck C, Han YQ, Kuhn S, Horlacher O, Luttmann E, Willighagen EL (2003) The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43(2):493–500

    CAS  Google Scholar 

  41. Kier LB, Hall LH (1999) Molecular structure description. The electrotopological state. Academic Press, New York

    Google Scholar 

  42. Kier LB, Hall LH (1990) An electrotopological-state index for atoms in molecules. Pharm Res 7(8):801–807

    CAS  Google Scholar 

  43. Harary F, Palmer E, Robinson R, Read R (1976) In: Balaban AT (ed) Chemical applications of graph theory. Academic Press, London, p 25

  44. Kupchik EJ (1988) Structure—molar refraction relationships of alkylgermanes using molecular connectivity. Quant Struct-Act Relat 7(2):57–59

    CAS  Google Scholar 

  45. Hu Q-N, Liang Y-Z, Yin H, Peng X-L, Fang K-T (2004) Structural interpretation of the topological index. 2. The molecular connectivity index, the kappa index, and the atom-type E-State index. J Chem Inf Comput Sci 44:1193–1201

    CAS  Google Scholar 

  46. Beliakov G (2003) How to build aggregation operators from data. Int J Intell Syst 18:903–923

    Google Scholar 

  47. Alikhanidi S, Takahash Y (2006) New molecular fragmental descriptors and their application to the prediction of fish toxicity. MATCH Commun Math Comput Chem 55:205–232

    CAS  Google Scholar 

  48. Ivanciuc O (1989) Design on topological indices. 1. Definition of a vertex topological index in the case of 4-trees. Revue Roumaine de Chimie 34(6):1361–1368

    CAS  Google Scholar 

  49. Visual Paradigm 8.0 for UML Enterprise (2010). 8.0 edn

  50. (MDL Information Systems). http://en.wikipedia.org/wiki/MDL_Information_Systems. Accessed Jan 2019

  51. Holmes G, Donkin A (1994) Witten IH Weka: a machine learning workbench. In: 2nd Australian and New Zealand conference on intelligent information systems, Brisbane, Australia, vol 357–361

  52. OTAVA L (2019) OTAVA chemicals. https://www.otavachemicals.com/products/compound-libraries-for-hts/diversity-sets. Accessed Jan 2019

  53. Mangal M, Sagar P, Singh H, Raghava GP, Agarwal SM (2013) NPACT: naturally occurring plant-based anti-cancer compound-activity-target database. Nucleic Acids Res 41(D1):D1124–D1129. https://doi.org/10.1093/nar/gks1047

    Article  CAS  Google Scholar 

  54. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2017) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):D1074–D1082

    Google Scholar 

  55. Georg H (2008) BlueDesc-molecular descriptor calculator. University of Tübingen, Tübingen

    Google Scholar 

  56. Urias RWP, Barigye SJ, Marrero-Ponce Y, García-Jacas CR, Valdes-Martiní JR, Perez-Gimenez F (2015) IMMAN: free software for information theory-based chemometric analysis. Mol Divers 19(2):305–319

    CAS  Google Scholar 

  57. Liu K, Feng J, Young SS (2005) PowerMV: a software environment for molecular viewing, descriptor generation, data analysis and hit evaluation. J Chem Inf Model 45(2):515–522

    CAS  Google Scholar 

  58. STATISTICA version. 6.0 (2001). Statsoft, I., Tulsa

  59. Todeschini R, Consonni V, Mauri A, Pavan M (2003) MobyDigs: software for regression and classification models by genetic algorithms. In: Leardi R (ed) Data handling in science and technology, vol 23. Elsevier, Amsterdam, pp 141–167

    Google Scholar 

  60. Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. JACS 110(18):5959–5967

    CAS  Google Scholar 

  61. Tuppurainen K, Viisas M, Peräkylä M, Laatikainen R (2004) Ligand intramolecular motions in ligand-protein interaction: ALPHA, a novel dynamic descriptor and a QSAR study with extended steroid benchmark dataset. JCAMD 18:175–187

    CAS  Google Scholar 

  62. Coats EA (1998) The CoMFA steroids as a benchmark dataset for development of 3D QSAR methods. Perspect Drug Discov Des 12–14:199–213

    Google Scholar 

  63. Hodge VJ, Austin J (2004) A Survey of outlier detection methodologies. Artif Intell Rev 22:85–126

    Google Scholar 

  64. Moldovan CD, Diudea MV, Costescu A, Katona G (2008) Application to QSAR studies of 2-furylethylene derivatives. J Math Chem 45(2):442

    Google Scholar 

  65. Estrada E, Molina E (2001) Novel local (fragment-based) topological molecular descriptors for QSPR/QSAR and molecular design. J Mol Graphics Model 20(1):54–64

    CAS  Google Scholar 

  66. Aires-de-Sousa J, Gasteiger J, Gutman I, Vidovic D (2004) Chirality codes and molecular structure. J Chem Inf Comput Sci 44:831–836

    CAS  Google Scholar 

  67. Damale MG, Harke SN, Kalam Khan FA, Shinde DB, Sangshetti JN (2014) Recent advances in multidimensional QSAR (4D-6D): a critical review. Mini Rev Med Chem 14(1):35–55

    CAS  Google Scholar 

  68. Abraham B (ed) (1998) Quality improvement through statistical methods. Statistics for industry and technology. Birkhäuser, Boston

    Google Scholar 

  69. MACCS Drug Data Report (2000). MDL Information Systems, Inc. 14600 Catalina Street, San Leandro, CA 94577

  70. Cosentino U, Moro G, Bonalumi D, Bonati L, Lasagni M, Todeschini R, Pitea D (2000) A combined use of global and local approaches in 3D-QSAR. Chemom Intell Lab Syst 52:183–194

    CAS  Google Scholar 

  71. Alcalá-Fdez J, Sánchez L, García S, Jesus MJd, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318

    Google Scholar 

  72. Barigye SJ, Marrero-Ponce Y, Martínez López Y, Martínez Santiago O, Torrens F, García Domenech R, Galvez J (2013) Event-based criteria in GT-STAF information indices: theory, exploratory diversity analysis and QSPR applications. SAR QSAR Environ Res 24:3–34

    CAS  Google Scholar 

  73. Estrada E, Molina E (2001) 3D conectivity indices in QSPR/QSAR studies. J Chem Inf Comput Sci 41:791–797

    CAS  Google Scholar 

  74. Martinez-Lopez Y, Caballero Y, Barigye SJ, Marrero-Ponce Y, Millan-Cabrera R, Madera J, Castillo-Garit JA (2017) State of the art review and report of new tool for drug discovery. Curr Top Med Chem 17(26):2957–2976

    CAS  Google Scholar 

  75. Martínez-López Y, Barigye SJ, Martínez-Santiago O, Marrero-Ponce Y, Green J, Castillo-Garit JA (2017) Prediction of aquatic toxicity of benzene derivatives using molecular descriptor from atomic weighted vectors. Environ Toxicol Pharmacol 56:314–321

    Google Scholar 

  76. Klebe G, Abraham U, Mietzner T (1994) Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. J Med Chem 37:4130–4146

    CAS  Google Scholar 

  77. Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure—activity relationships. J Med Chem 47(22):5541–5554

    CAS  Google Scholar 

  78. Salahinejad M, Ghasemi JB (2014) 3D-QSAR studies on the toxicity of substituted benzenes to Tetrahymena pyriformis: coMFA, CoMSIA and VolSurf approaches. Ecotoxicol Environ Safety 105:128–134

    CAS  Google Scholar 

Download references

Acknowledgements

Yoan Martínez-López thanks the program International Investigator Invited for a postdoctoral fellowship to work at USFQ in 2019. Yovani Marrero-Ponce acknowledges the support from USFQ “Chancellor Grant 2018 (Project ID13525).”

Funding

This work was partially supported from the USFQ (Project ID13525 “Chancellor Grant 2018”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yovani Marrero-Ponce.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 7742 kb)

Supplementary material 2 (DOCX 652 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martínez-López, Y., Marrero-Ponce, Y., Barigye, S.J. et al. When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?. Mol Divers 24, 913–932 (2020). https://doi.org/10.1007/s11030-019-10002-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-019-10002-3

Keywords

Navigation