Skip to main content

Building a Quantitative Structure-Property Relationship (QSPR) Model

  • Protocol
  • First Online:
Bioinformatics and Drug Discovery

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1939))

Abstract

Knowing the physicochemical and general biochemical properties of a compound is critical to understanding how it behaves in different biological environments and to anticipating what is likely to happen in situations where that behavior cannot be measured directly. Quantitative structure-property relationship (QSPR) models provide a way to predict those properties even before a compound has been synthesized simply by knowing what its structure would be. This chapter describes a general workflow for compiling the data upon which a useful QSPR model is built, curating it, evaluating that model’s performance, and then analyzing the predictive errors with an eye toward identifying systematic errors in the input data. The focus here is on models for the absorption, distribution, metabolism, and excretion (ADME) properties of drugs and toxins, but the considerations explored are general and applicable to any QSPR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Chemical structures can also be extracted from text documents by applying optical structure recognition (OSR). Those that the authors have worked with are not yet reliable enough to be applied without careful manual review.

  2. 2.

    Bridging hydrogen atoms that take part in three-center bonds are treated as part of the core structure.

  3. 3.

    Where necessary, promising groups of descriptors are generated using a genetic algorithm (GA) [28].

References

  1. Pragyan P, Kesharwani SS, Nandekar PP, Rathod V, Sangamwa AT (2014) Predicting drug metabolism by CYP1A1, CYP1A2, and CYP1B1: insights from MetaSite, molecular docking and quantum chemical calculations. Mol Divers 18(4):865–878

    Article  CAS  PubMed  Google Scholar 

  2. Houston JB, Kenworthy KE (2000) In vitro-in vivo scaling of CYP kinetic data not consistent with the classical Michaelis-Menten Model. Drug Metab Dispos 28(3):246–254

    CAS  PubMed  Google Scholar 

  3. ADMET Predictor™. Simulations Plus Inc., Lancaster, CA, USA

    Google Scholar 

  4. RCSB Protein Data Bank Royal Society of Chemistry. https://www.rcsb.org/pdb/home/home.do

  5. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36. https://doi.org/10.1021/ci00057a005

    Article  CAS  Google Scholar 

  6. SMILES–A simplified chemical language. Daylight Chemical Information Systems, Inc. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

  7. The IUPAC international chemical identifier (InChI). International union of pure and applied chemistry. https://iupac.org/who-we-are/divisions/division-details/inchi/

  8. Stephen R Heller AM, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC international chemical identifier. J Cheminform 7:23

    Article  PubMed  PubMed Central  Google Scholar 

  9. MedChem Designer™: Chemical structure drawing and property prediction. Simulations Plus, Inc. http://www.simulations-plus.com/software/medchem-designer/

  10. ChEMBL. EMBL-EBI. https://www.ebi.ac.uk/chembl/

  11. Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington J (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42:1083–1090. https://doi.org/10.1093/nar/gkt1031

    Article  CAS  Google Scholar 

  12. Wang YL, Bryant SH, Cheng TJ, Wang JY, Gindulyte A, Shoemaker BA, Thiessen PA, He SQ, Zhang J (2017) PubChem BioAssay: 2017 update. Nucleic Acids Res 45(D1):D955–D963. https://doi.org/10.1093/nar/gkw1118

    Article  CAS  PubMed  Google Scholar 

  13. Waldman M, Fraczkiewicz R, Clark RD (2015) Tales from the war on error: the art and science of curating QSAR data. J Comput Aided Mol Des 29:897

    Article  CAS  PubMed  Google Scholar 

  14. AID 1996: Aqueous Solubility from MLSMR Stock Solutions (2009) Available via National Center for Biotechnology Information. https://pubchem.ncbi.nlm.nih.gov/bioassay/1996. Accessed Nov 2017

  15. ChemSpider: Search and share chemistry. Royal Society of Chemistry. http://www.chemspider.com/

  16. What is R? The R Foundation. https://www.r-project.org/about.html

  17. Willighagen EL, Mayfield JW, Alvarsson J, Arvid Berg LC, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C (2017) The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9:33

    Article  PubMed  PubMed Central  Google Scholar 

  18. About KNIME. KNIME. https://www.knime.com/. Accessed 17 Nov 2017

  19. BIOVIA Pipeline Pilot. Dessault Systemes. http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/

  20. Tosco P, Stiefl N, Landrum G (2014) Bringing the MMFF force field to the RDKit: implementation and validation. J Cheminform 6:37

    Article  PubMed Central  Google Scholar 

  21. Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim; New York

    Book  Google Scholar 

  22. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–752. https://doi.org/10.1021/ci100050t

    Article  CAS  PubMed  Google Scholar 

  23. Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 35:1039–1045

    Article  CAS  Google Scholar 

  24. Yan A, Gasteiger J (2003) Prediction of aqueous solubility of organic compounds by topological descriptors. QSAR Comb Sci 22:821–829. https://doi.org/10.1002/qsar.200330822

    Article  CAS  Google Scholar 

  25. Tiikkainen P, Bellis L, Light Y, Franke L (2013) Estimating error rates in bioactivity databases. J Chem Inf Model 53(10):2499–2505. https://doi.org/10.1021/ci400099q

    Article  CAS  PubMed  Google Scholar 

  26. May R, Maier H, GC D (2010) Data splitting for artificial neural networks using SOM-based stratified sampling. Neural Netw 23:283–294

    Article  CAS  PubMed  Google Scholar 

  27. Clark RD (2003) Boosted leave-many-out cross-validation: the effect of training set and test set diversity on PLS statistics. J Comput Aided Mol Des 17:265–275

    Article  CAS  PubMed  Google Scholar 

  28. Žuvela P, Liu JJ, Macur K, Bączek T (2015) Molecular descriptor subset selection in theoretical peptide quantitative structure–retention relationship model development using nature-inspired optimization algorithms. Anal Chem 87(19):9876–9883. https://doi.org/10.1021/acs.analchem.5b02349

    Article  CAS  PubMed  Google Scholar 

  29. Clark RD, Liang W, Lee AC, Lawless MS, Fraczkiewicz R, Waldman M (2014) Using beta binomials to estimate classification uncertainty for ensemble models. J Cheminform 6(1):34

    Article  PubMed  PubMed Central  Google Scholar 

  30. Hibi S, Ueno K, Nagato S, Kawano K, Ito K, Norimine Y, Takenaka O, Hanada T, Yonaga M (2012) Discovery of 2-(2-Oxo-1-phenyl-5-pyridin-2-yl-1,2-dihydropyridin-3-yl)benzonitrile (Perampanel): a novel, noncompetitive α-amino-3-hydroxy-5-methyl-4-isoxazolepropanoic Acid (AMPA) receptor antagonist. J Med Chem 55(23):10584–10600. https://doi.org/10.1021/jm301268u

    Article  CAS  PubMed  Google Scholar 

  31. Röver S, Andjelkovic M, Bénardeau A, Chaput E, Guba W, Hebeisen P, Mohr S, Nettekoven M, Obst U, Richter WF, Ullmer C, Waldmeier P, Wright MB (2013) 6-Alkoxy-5-aryl-3-pyridinecarboxamides, a new series of bioavailable cannabinoid receptor type 1 (CB1) antagonists including peripherally selective compounds. J Med Chem 56(24):9874–9896. https://doi.org/10.1021/jm4010708

    Article  CAS  PubMed  Google Scholar 

  32. Ran Y, Jain N, Yalkowsky SH (2001) Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE). J Chem Inf Comput Sci 41(5):1208–1217. https://doi.org/10.1021/ci010287z

    Article  CAS  PubMed  Google Scholar 

  33. Beaulieu PL, Marte JD, Garneau M, Luo L, Stammers T, Telang C, Wernic D, Kukolj G, Duan J (2015) A prodrug strategy for the oral delivery of a poorly soluble HCV NS5B thumb pocket 1 polymerase inhibitor using self-emulsifying drug delivery systems (SEDDS). Bioorg Med Chem Lett 25:210–215

    Article  CAS  PubMed  Google Scholar 

  34. Yamashita F, Hashida S-iFM (2002) The “Latent Membrane Permeability” Concept: QSPR Analysis of Inter/Intralaboratorically Variable Caco-2 Permeability. J Chem Inf Comput Sci 42(2):408–413. https://doi.org/10.1021/ci010317y

    Article  CAS  PubMed  Google Scholar 

  35. Sambuy Y, Angelis ID, Ranaldi G, Scarino ML, Stammati A, Zucco F (2005) The Caco-2 cell line as a model of the intestinal barrier: influence of cell and culture-related factors on Caco-2 cell functional characteristics. Cell Biol Toxicol 21(1):1–26. https://doi.org/10.1007/s10565-005-0085-6

    Article  CAS  PubMed  Google Scholar 

  36. Pevarello P, Brasca MG, Orsini P, Traquandi G, Longo A, Nesi M, Orzi F, Piutti C, Sansonna P, Varasi M, Cameron A, Vulpetti A, Roletto F, Alzani R, Ciomei M, Albanese C, Pastori W, Marsiglio A, Pesenti E, Fiorentini F, Bischoff JR, Mercurio C (2005) 3-Aminopyrazole Inhibitors of CDK2/Cyclin A as Antitumor Agents. 2. Lead Optimization. J Med Chem 48:2944–2956

    Article  CAS  PubMed  Google Scholar 

  37. Borgstrom L, Nyberg L, Jonsson S, Lindberg C, Paulson J (1989) Pharmacokinetic evaluation in man of terbutaline given as separate enantiomers and as the racemate. Br J Clin Pharmacol 27(1):49–56. https://doi.org/10.1111/j.1365-2125.1989.tb05334.x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge Michael S. Lawless, Marvin Waldman, and Walter S. Woltosz of Simulations Plus, Inc., for their careful reading of the manuscript and their insightful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert D. Clark .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Clark, R.D., Daga, P.R. (2019). Building a Quantitative Structure-Property Relationship (QSPR) Model. In: Larson, R., Oprea, T. (eds) Bioinformatics and Drug Discovery. Methods in Molecular Biology, vol 1939. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9089-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9089-4_8

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-9088-7

  • Online ISBN: 978-1-4939-9089-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics