Abstract
In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs (http://tomocomd.com/md-lovis), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon’s entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs.
Graphic abstract
Similar content being viewed by others
Availability of Software and Material
The MD-LOVIs software and the respective user manual are freely available online at http://tomocomd.com/md-lovis.
References
Todeschini R, Consoni V (2009) Handbook of molecular descriptors. Wiley VCH, Weinheim
Mani-Varnosfaderani A, Neiband MS, Benvidi A (2019) Identification of molecular features necessary for selective inhibition of B cell lymphoma proteins using machine learning techniques. Mol Divers 23(1):55–73
DRAGON for Windows (software for molecular descriptor calculations) (2005)
CODESSA 2.13. Semichem edn, 7204 Mullen, Shawnee, KS 66216, USA
Yap CW (2010) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707
García-Jacas CR, Marrero-Ponce Y, Acevedo-Martínez L, Barigye SJ, Valdés-Martiní JR, Contreras-Torres E (2014) QuBiLS-MIDAS: a parallel free-software for molecular descriptors computation based on multilinear algebraic maps. J Comput Chem 35(18):1395–1409
Valdés-Martiní JR, Marrero-Ponce Y, García-Jacas CR, Martinez-Mayorga K, Barigye SJ, d‘Almeida YSV YSV, Pérez-Giménez F, Morell CA (2017) QuBiLS-MAS, open source multi-platform software for atom-and bond-based topological (2D) and chiral (2.5 D) algebraic molecular descriptors computations. J Cheminform 9(1):35
Hong H, Xie Q, Ge W, Qian F, Fang H, Shi L, Su Z, Perkins R, Tong W (2008) Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J Chem Inf Model 48(7):1337–1344
Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK)-an open-source java library for chemo-and bioinformatics. Curr Pharm Des 12(17):2111–2120
Dong J, Cao D-S, Miao H-Y, Liu S, Deng B-C, Yun Y-H, Wang N-N, Lu A-P, Zeng W-B, Chen AF (2015) ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. J Cheminform 7(1):60
Gutman I, Das KC (2004) The first Zagreb indices 30 years after. MATCH Commun Math Comput Chem 50:83–92
Randic M (1975) Characterization of molecular branching. J Am Chem Soc 97(23):6609–6615
Broto P, Moreau G, Vandycke C (1984) Molecular structures: perception, autocorrelation descriptor and SAR studies, autocorrelation descriptor. Eur J Med Chem 19:66–70
Katritzky AR, Lobanov VS, Karelson M, Murugan R, Grendze MP, Toomey JEJ (1996) Comprehensive descriptors for structural and statistical analysis. 1. Correlations between structure and physical properties of substituted pyridines. Rev Roum Chim 41(85):81–867
Kier LB, Hall LH (1986) Molecular connectivity in structure-activity analysis. Research Studies Press, Letchworth
Zhao YH, Abraham MH, Zissimos AM (2003) Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. J Org Chem 68(19):7368–7373
Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Barigye SJ, Marrero-Ponce Y, Martínez Santiago O, Martínez López Y, Torrens F (2013) Shannon’s, mutual, conditional and joint entropy-based information indices. Generalization of global indices defined from local vertex invariants. Curr Comput-Aided Drug Des 9(2):164–183
García-Jacas CR, Cabrera-Leyva L, Marrero-Ponce Y, Suárez-Lezcano J, Cortés-Guzmán F, García-González LA (2018) GOWAWA aggregation operator-based global molecular characterizations: weighting atom/bond contributions (LOVIs/LOEIs) according to their influence in the molecular encoding. Mol Inform 37(12):1800039
Martínez-Santiago O, Millán-Cabrera R, Marrero-Ponce Y, Barigye SJ, Martínez-López Y, Torrens F, Pérez-Giménez F (2014) Discrete derivatives for atom-pairs as a novel graph-theoretical invariant for generating new molecular descriptors: orthogonality, interpretation and QSARs/QSPRs on benchmark databases. Mol Inform 33(5):343–368
Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3):519–530
Fleming PJ, Wallace JJ (1986) How not to lie with statistics: the correct way to summarize benchmark results. Commun ACM 29(3):218–221
Calvo T, Mayor G, Mesiar R (2012) Aggregation operators: new trends and applications, vol 97. Physica, Heidelberg
Merigó JM, Palacios-Marqués D, Soto-Acosta P (2017) Distance measures, weighted averages, OWA operators and Bonferroni means. Appl Soft Comput 50:356–366
Karczmarek P, Kiersztyn A, Pedrycz W (2018) Generalized Choquet integral for face recognition. Int J Fuzzy Syst 20(3):1047–1055
Wang Z, Yang R, Leung K (2010) Nonlinear integrals and their applications in data mining. In: Advances in fuzzy systems—applications and theory, vol 24. https://doi.org/10.1142/9789812814685_0001
Liu B, Fu M, Zhang S, Xue B, Zhou Q, Zhang S (2018) An interval-valued 2-tuple linguistic group decision-making model based on the Choquet integral operator. Int J Inf Sci 49(2):407–424
Fontaine F, Pastor M, Gutiérrez-de-Terán H, Lozano JJ, Sanz F (2003) Use of alignment-free molecular descriptors in diversity analysis and optimal sampling of molecular libraries. Mol Divers 6(2):135–147
Maldonado AG, Doucet JP, Petitjean M, Fan BT (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 10(1):39–79
Bajorath J (2017) Molecular similarity concepts for informatics applications. In: Keith J (ed) Bioinformatics. Springer, Berlin, pp 231–245
Marrero-Ponce Y (2004) Linear Indices of the “molecular pseudograph’s atom adjacency matrix”: definition, significance-interpretation, and application to QSAR analysis of flavone derivatives as HIV-1 integrase inhibitors. J Chem Inf Comput Sci 44(6):2010–2026. https://doi.org/10.1021/ci049950k
Basak S, Gute B (1997) Characterization of molecular structures using topological indices. SAR QSAR Environ Res 7(1–4):1–21
Merigó JM, Gil-Lafuente AM (2010) New decision-making techniques and their application in the selection of financial products. Inf Sci 180(11):2085–2094
Xu ZS (2012) Fuzzy ordered weighted distances. Fuzzy Optim Decis Making 11:73–97
García-Jacas CR, Cabrera-Leyva L, Marrero-Ponce Y, Suárez-Lezcano J, Cortés-Guzmán F, Pupo-Meriño M, Vivas-Reyes R (2018) Choquet integral-based fuzzy molecular characterizations: when global definitions are computed from the dependency among atom/bond contributions (LOVIs/LOEIs). J Cheminform 10(1):51
Bolton J, Gader P, Wilson JN (2008) Discrete Choquet integral as a distance metric. IEEE Trans Fuzzy Syst 16(4):1107–1110
Merigó JM (2011) A unified model between the weighted average and the induced OWA operator. Expert Syst Appl 38(9):11560–11572
Ertl P, Rohde B, Selzer P (2000) Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem 43(20):3714–3717
Ghose AK, Crippen GM (1987) Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions. J Chem Inf Comput Sci 27(1):21–35
Steinbeck C, Han YQ, Kuhn S, Horlacher O, Luttmann E, Willighagen EL (2003) The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43(2):493–500
Kier LB, Hall LH (1999) Molecular structure description. The electrotopological state. Academic Press, New York
Kier LB, Hall LH (1990) An electrotopological-state index for atoms in molecules. Pharm Res 7(8):801–807
Harary F, Palmer E, Robinson R, Read R (1976) In: Balaban AT (ed) Chemical applications of graph theory. Academic Press, London, p 25
Kupchik EJ (1988) Structure—molar refraction relationships of alkylgermanes using molecular connectivity. Quant Struct-Act Relat 7(2):57–59
Hu Q-N, Liang Y-Z, Yin H, Peng X-L, Fang K-T (2004) Structural interpretation of the topological index. 2. The molecular connectivity index, the kappa index, and the atom-type E-State index. J Chem Inf Comput Sci 44:1193–1201
Beliakov G (2003) How to build aggregation operators from data. Int J Intell Syst 18:903–923
Alikhanidi S, Takahash Y (2006) New molecular fragmental descriptors and their application to the prediction of fish toxicity. MATCH Commun Math Comput Chem 55:205–232
Ivanciuc O (1989) Design on topological indices. 1. Definition of a vertex topological index in the case of 4-trees. Revue Roumaine de Chimie 34(6):1361–1368
Visual Paradigm 8.0 for UML Enterprise (2010). 8.0 edn
(MDL Information Systems). http://en.wikipedia.org/wiki/MDL_Information_Systems. Accessed Jan 2019
Holmes G, Donkin A (1994) Witten IH Weka: a machine learning workbench. In: 2nd Australian and New Zealand conference on intelligent information systems, Brisbane, Australia, vol 357–361
OTAVA L (2019) OTAVA chemicals. https://www.otavachemicals.com/products/compound-libraries-for-hts/diversity-sets. Accessed Jan 2019
Mangal M, Sagar P, Singh H, Raghava GP, Agarwal SM (2013) NPACT: naturally occurring plant-based anti-cancer compound-activity-target database. Nucleic Acids Res 41(D1):D1124–D1129. https://doi.org/10.1093/nar/gks1047
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2017) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):D1074–D1082
Georg H (2008) BlueDesc-molecular descriptor calculator. University of Tübingen, Tübingen
Urias RWP, Barigye SJ, Marrero-Ponce Y, García-Jacas CR, Valdes-Martiní JR, Perez-Gimenez F (2015) IMMAN: free software for information theory-based chemometric analysis. Mol Divers 19(2):305–319
Liu K, Feng J, Young SS (2005) PowerMV: a software environment for molecular viewing, descriptor generation, data analysis and hit evaluation. J Chem Inf Model 45(2):515–522
STATISTICA version. 6.0 (2001). Statsoft, I., Tulsa
Todeschini R, Consonni V, Mauri A, Pavan M (2003) MobyDigs: software for regression and classification models by genetic algorithms. In: Leardi R (ed) Data handling in science and technology, vol 23. Elsevier, Amsterdam, pp 141–167
Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. JACS 110(18):5959–5967
Tuppurainen K, Viisas M, Peräkylä M, Laatikainen R (2004) Ligand intramolecular motions in ligand-protein interaction: ALPHA, a novel dynamic descriptor and a QSAR study with extended steroid benchmark dataset. JCAMD 18:175–187
Coats EA (1998) The CoMFA steroids as a benchmark dataset for development of 3D QSAR methods. Perspect Drug Discov Des 12–14:199–213
Hodge VJ, Austin J (2004) A Survey of outlier detection methodologies. Artif Intell Rev 22:85–126
Moldovan CD, Diudea MV, Costescu A, Katona G (2008) Application to QSAR studies of 2-furylethylene derivatives. J Math Chem 45(2):442
Estrada E, Molina E (2001) Novel local (fragment-based) topological molecular descriptors for QSPR/QSAR and molecular design. J Mol Graphics Model 20(1):54–64
Aires-de-Sousa J, Gasteiger J, Gutman I, Vidovic D (2004) Chirality codes and molecular structure. J Chem Inf Comput Sci 44:831–836
Damale MG, Harke SN, Kalam Khan FA, Shinde DB, Sangshetti JN (2014) Recent advances in multidimensional QSAR (4D-6D): a critical review. Mini Rev Med Chem 14(1):35–55
Abraham B (ed) (1998) Quality improvement through statistical methods. Statistics for industry and technology. Birkhäuser, Boston
MACCS Drug Data Report (2000). MDL Information Systems, Inc. 14600 Catalina Street, San Leandro, CA 94577
Cosentino U, Moro G, Bonalumi D, Bonati L, Lasagni M, Todeschini R, Pitea D (2000) A combined use of global and local approaches in 3D-QSAR. Chemom Intell Lab Syst 52:183–194
Alcalá-Fdez J, Sánchez L, García S, Jesus MJd, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318
Barigye SJ, Marrero-Ponce Y, Martínez López Y, Martínez Santiago O, Torrens F, García Domenech R, Galvez J (2013) Event-based criteria in GT-STAF information indices: theory, exploratory diversity analysis and QSPR applications. SAR QSAR Environ Res 24:3–34
Estrada E, Molina E (2001) 3D conectivity indices in QSPR/QSAR studies. J Chem Inf Comput Sci 41:791–797
Martinez-Lopez Y, Caballero Y, Barigye SJ, Marrero-Ponce Y, Millan-Cabrera R, Madera J, Castillo-Garit JA (2017) State of the art review and report of new tool for drug discovery. Curr Top Med Chem 17(26):2957–2976
Martínez-López Y, Barigye SJ, Martínez-Santiago O, Marrero-Ponce Y, Green J, Castillo-Garit JA (2017) Prediction of aquatic toxicity of benzene derivatives using molecular descriptor from atomic weighted vectors. Environ Toxicol Pharmacol 56:314–321
Klebe G, Abraham U, Mietzner T (1994) Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. J Med Chem 37:4130–4146
Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure—activity relationships. J Med Chem 47(22):5541–5554
Salahinejad M, Ghasemi JB (2014) 3D-QSAR studies on the toxicity of substituted benzenes to Tetrahymena pyriformis: coMFA, CoMSIA and VolSurf approaches. Ecotoxicol Environ Safety 105:128–134
Acknowledgements
Yoan Martínez-López thanks the program International Investigator Invited for a postdoctoral fellowship to work at USFQ in 2019. Yovani Marrero-Ponce acknowledges the support from USFQ “Chancellor Grant 2018 (Project ID13525).”
Funding
This work was partially supported from the USFQ (Project ID13525 “Chancellor Grant 2018”).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Martínez-López, Y., Marrero-Ponce, Y., Barigye, S.J. et al. When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?. Mol Divers 24, 913–932 (2020). https://doi.org/10.1007/s11030-019-10002-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-019-10002-3