Original articleComparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds
Graphical abstract
Introduction
In the World Health Organization (WHO) 2012 report on Global Tuberculosis Control, the relevant numbers for 2011 were 8.7 million new cases worldwide and 1.4 million deaths [1]. About 3.7% of all new tuberculosis (TB) infections are resistant to at least one known anti-TB drug and cases of multidrug-resistant (MDR-TB) and extensively drug-resistant tuberculosis (XDR-TB) are believed to be far higher than reported [2], [3]. WHO estimated that in 2011 there was a prevalence of 310.000 cases of MDR-TB among notified TB patients, of which 9% were XDR-TB [3]. The increasing mobility of populations, the long demanding, and sometimes wrong, treatments and the poor patient compliance, giving rise to an alarming emergence of MDR- and XDR-TB strains, along with observed drug–drug interactions in immunocompromised individuals such as those co-infected, for instance, with HIV or diabetes [4], make the fight against TB a very serious global health issue. To make the scenario even worse, researchers in India have just identified what they call a completely drug-resistant form of TB (TDR-TB), resistant to all first- and second-line current drugs [5], after previously documented cases in Italy [6] and Iran [7]. Some authors suggest, however, that this TDR-TB is just a deadlier development of the highly resistant forms of TB and not itself a new entity.
On the other hand, the fact that the needed funding for TB care and control in low- and middle-income countries is around US$ 8 billion per year and that the funding gap is approximately US$ 3 billion per year [3], makes it very clear that, in addition to an undeniable social burden, the economic load associated with this disease is very substantial.
These very significant facts have led to an increasing awareness of the scientific community towards the need of developing new, structurally modified, active antitubercular drugs. In order to succeed in the design of these new compounds it would be of relevance to be able to establish a clear-cut relationship between structure and activity. Several efforts, in general not very successful, have been made towards the understanding of the mechanisms of action of known active compounds against Mycobacterium tuberculosis (M. tuberculosis). Although a number of studies have appeared in the literature in these last few years [8], [9], [10], the level of knowledge about these drug-M. tuberculosis interactions is still rather limited. The establishment of quantitative relationships between structural features and measured biological activity, the so-called QSARs, extensively used in the field of Medicinal Chemistry, is one of the most useful and elegant strategies in the quest for new bioactive molecules. Various methodologies and mathematical tools have been developed throughout the years with the purpose of modeling and predicting all sort of biological behaviors and they have been applied with considerable levels of success in the rational design and synthesis of different new drugs [11], [12], [13].
Comprehensive QSAR models can be achieved through the application of various supervised and/or unsupervised statistical and machine learning techniques (MLT). From these, Multiple Linear Regressions (MLR) [14], [15], Neural Networks (NN) [16], [17], [18], Decision Trees [19], Random Forests (RF) [20], Partial Least Squares (PLS) [21], [22], Principal Components Analysis [23] and Support Vector Machines (SVM) [24] are among the most successful methodologies. The extra-thermodynamic approach using Multiple Linear Regression analyses (MLR), which has been launched more than 40 years ago by Hansch et al. [14], [15], has indeed been very successful in the fields of Medicinal Chemistry and Pharmaceutical Sciences [11], [25]. This extra-thermodynamic approach when applied to the study of pharmacological action is based on the premise that the biological activity of a series of similar drugs, taken as having the same mechanism of action, can be described as a linear combination of molecular properties or descriptors identified as relevant for the process under study.
Another especially successful methodology introduced some years ago and which simulates the processing of information by biological neurons in the human nervous system is the so-called Artificial Neural Networks (ANNs) [16], [17], [18]. ANNs overcame some of the frailties of the MLR approach in the design of new drugs, especially because of their ability to deal with non-linear relationships of high complexity when performing input–output transformations. However, to our knowledge, there are in the literature just a few applications of ANNs to the study of antitubercular activity [26], [27], [28], [29]. Yet, more important than using each of these (or other) methodologies per se, is the possibility of combining them to improve our ability to interpret and predict biological behaviors.
Within our research group, we have started a systematic analysis of a large data set of potentially active compounds against M. tuberculosis, with the purpose of establishing robust and predictive QSAR models [30], [31]. The first aim of this work is not to explore the potentialities of various MLT in the modeling and prediction of biological activity but rather to compare the performance of two particular QSAR methodologies, MLRs and NNs, in the modeling of the antitubercular activity of a series of compounds with a hydrazide functionality. A second important goal is the rational design of new potentially active compounds based on the information provided by the best found models.
Section snippets
Materials and methods
It is widely accepted that the development of any classical QSAR model is based on the following postulates: (i) the molecular structure is responsible for the observed activity (or property); and (ii) the established model can only be applied to compounds belonging to the same physicochemical–structural–biological space, i.e., a model should only be used to make predictions within its applicability domain [32], [33], [34], [35], [36].
On the other hand, the success of a QSAR model in terms of
Lipophilicity
In early stages of drug development, it is very important to be able to model the behavior of certain physicochemical properties such as lipophilicity which is usually considered to determine the ability of a compound to penetrate a biological membrane. In addition, this property affects drug absorption, bioavailability, metabolism and toxicity, and drug-receptor interactions, and may hence influence significantly the activity of a given compound or family of compounds. Its prediction is
Conclusions
The antitubercular activity of a large set of hydrazide derivatives was modeled by two different QSAR methodologies: MLRs and NNs.
A comparison of the statistical performance of the four MLT (MLRs, FFNNs, EnsFFNNs and AsNNs) used, revealed that NNs, in particular AsNNs, had consistently better predictive abilities than the corresponding MLRs, for independent test sets.
From all studied models, those with seven descriptors were analyzed and discussed in more detail. The relative importance of each
Acknowledgments
The authors are grateful to F. Rodrigues and M.J. Sousa from INSRJ (Porto) for the preliminary MIC determinations of the two new compounds and to S. Santos, M. Reis and S. Vitorino (FCUL) for their synthesis. The authors also wish to thank Prof. João Aires-de-Sousa for his helpful comments and suggestions. Financial support from Fundação para a Ciência e a Tecnologia, Portugal, under project FCT/PTDC/QUI/67933/2006 and grants BPD/20743/2004 (C. Ventura) and BPD BPD/63192/2009 (D. Latino), as
References (87)
- et al.
PLS-Regression: a basic tool of chemometrics
Chemom. Intell. Lab. Syst.
(2001) - et al.
Principal components analysis
Chemom. Intell. Lab. Syst.
(1987) - et al.
Synthesis of novel 5-aryl-2-thio-1,3,4-oxadiazoles and the study of their structure-anti-mycobacterial activities
Bioorg. Med. Chem.
(2005) - et al.
Quantitative-structure–activity relationship studies of nitrofuranyl antitubercular agents
Bioorg. Med. Chem.
(2008) - et al.
QSAR modeling of antitubercular activity of diverse organic compounds
Chemom. Intell. Lab. Syst.
(2011) - et al.
Computer-aided study of the relationship between structure and antituberculosis activity of a series of isoniazid derivatives
Chem. Phys.
(1996) - et al.
Susceptibility of Mycobacterium tuberculosis to isoniazid and its derivative, 1-isonicotinyl-2-nonanoyl hydrazine: investigation at cellular level
Tuberculosis
(2004) - et al.
Synthesis and in vivo antimycobacterial activity of isonicotinoyl hydrazones
Bioorg. Med. Chem. Lett.
(2005) - et al.
Application of hydrogen bonding calculations in property based drug design
DDT
(2002) JATOON: Java tools for neural networks
Chemom. Intell. Lab. Syst.
(2002)
Beware of q2!
J. Mol. Graph. Model.
Cook's distance in local polynomial regression
Stat. Prob. Lett.
Note on the Cook' distance
J. Stat. Plan. Inf.
Further exploring rm2 metrics for validation of QSPR models
Chemom. Intell. Lab. Syst.
Global Tuberculosis Control 2012
Worldwide emergence of extensively drug-resistant tuberculosis
Emerg. Inf. Dis.
Tuberculosis Global Facts 2012
The challenge of new drug discovery for tuberculosis
Nature
Totally drug-resistant tuberculosis in India
Clin. Inf. Dis.
First tuberculosis cases in Italy resistant to all tested drugs
Eurosurveillance
Emergence of new forms of totally drug-resistant tuberculosis bacilli: super extensively drug-resistant tuberculosis or totally drug-resistant strains in Iran
Chest
New small-molecule synthetic antimycobacterial
Antimicrob. Agents Chemother.
Molecular mechanisms of drug resistance in Mycobacterium tuberculosis
Annu. Rev. Biochem.
Search for new drugs – main directions in the search for new antituberculous drugs
Pharm. Chem. J.
The Practice of Medicinal Chemistry
C-QSAR: a database of 18,000 QSARs and associated biological and physical data
Comput. Aided Mol. Des.
The role of quantitative structure–activity relationships (QSAR) in biomolecular discovery
Brief Inf.
A quantitative approach to biochemical structure–activity relationships
Acc. Chem. Res.
Correlation analysis. Its application to the structure–activity relation of triazines inhibiting dihydrofolate reductase
J. Am. Chem. Soc.
Neural networks applied to structure–activity relationships
J. Med. Chem.
Introduction to artificial neural networks (ANN) methods: what they are and how to use them
Acta Chim. Slov.
Neural Networks in Chemistry and Drug Design
Classification and Regression Trees
Random forests
Machine Learn.
Partial least squares (PLS) in cheminformatics
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
Exploring QSAR. Fundamentals and Applications in Chemistry and Biology
Substituted xanthones as antimycobacterial agents
Arch. Pharm. Pharm. Med. Chem.
1,3,4-Thiadiazole derivatives. Synthesis, structure elucidation and structure-antituberculosis activity relationship investigation
J. Med. Chem.
Application of quantitative structure–activity relationships to the modeling of antitubercular compounds. 1. The hydrazide family
J. Med. Chem.
Interpretation of quantitative structure–property and –activity relationships
J. Chem. Inf. Comput. Sci.
QSAR applicability domain estimation by projection of the training set in descriptor space: a review
ATLA
Principles of QSAR models validation: internal and external
QSAR Comb. Sci.
Cited by (46)
The role and potential of computer-aided drug discovery strategies in the discovery of novel antimicrobials
2024, Computers in Biology and MedicineMachine learning and big data provide crucial insight for future biomaterials discovery and research
2021, Acta BiomaterialiaCitation Excerpt :Due to the complexity of biomaterial properties and biocompatibility, direct relational datasets are difficult – and many times unfeasible – to generate, which limits the use of associative learning in biomaterial research and development. Associative learning is more commonly used to predict single physical properties and can be used to augment other machine learning architectures [50-52] (Table 1). When trying to group unlabeled data sets, a method called clustering is commonly used.
Prediction of the antimicrobial activity of quaternary ammonium salts against Staphylococcus aureus using artificial neural networks
2021, Arabian Journal of ChemistryCitation Excerpt :Many scientists have compared modeling techniques to answer the question about which methods give the best results for predicting antimicrobial activity. In the numerous studies, all the created models of artificial neural networks with different architectures and algorithms as well as the activation functions used for individual microorganisms have yielded more satisfactory results than other popular modeling techniques (Ghasemi and Nemati-Rashtehroodi, 2015; Kawczak et al., 2018; Kawczak et al., 2019; Ventura et al., 2013). In other popular modeling techniques, the relationship between antimicrobial activity and descriptors is expressed by linear relationships.
Predicting aqueous solubility by QSPR modeling
2021, Journal of Molecular Graphics and Modelling