Skip to main content

Turning Failures into Applications: The Problem of Protein ΔΔG Prediction

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2449))

Abstract

After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Marabotti A, Scafuri B, Facchiano A (2020) Predicting the stability of mutant proteins by computational approaches: an overview. Brief Bioinform:bbaa074. https://doi.org/10.1093/bib/bbaa074. Epub ahead of print

  2. Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P (2020) Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 18:1968–1979. Published 2020 Jul 24. https://doi.org/10.1016/j.csbj.2020.07.011

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Huang P, Chu SKS, Frizzo HN, Connolly MP, Caster RW, Siegel JB (2020) Evaluating protein engineering thermostability prediction tools using an independently generated dataset. ACS Omega 5:6487–6493. https://doi.org/10.1021/acsomega.9b04105

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Yang Y, Urolagin S, Niroula A, Ding X, Shen B, Vihinen M (2018) PON-tstab: Protein variant stability predictor. Importance of training data quality. Int J Mol Sci 19:1009–1025. https://doi.org/10.3390/ijms19041009

    Article  CAS  PubMed Central  Google Scholar 

  5. Dill KA (1990) Dominant forces in protein folding. Biochemistry 29:7133–7155. https://doi.org/10.1021/bi00483a001

    Article  CAS  PubMed  Google Scholar 

  6. Casadio R, Compiani M, Fariselli P, Vivarelli F (1995) Predicting free energy contributions to the conformational stability of folded proteins from the residue sequence with radial basis function networks. Proc Int Conf Intell Syst Mol Biol 3:81–88

    CAS  PubMed  Google Scholar 

  7. Capriotti E, Casadio R (2007) K-Fold: a tool for the prediction of the protein folding kinetic order and rate. Bioinformatics 23:385–386. https://doi.org/10.1093/bioinformatics/btl610

    Article  CAS  PubMed  Google Scholar 

  8. Pucci F, Kwasigroch JM, Rooman M (2017) SCooP: an accurate and fast predictor of protein stability curves as a function of temperature. Bioinformatics 33:3415–3422. https://doi.org/10.1093/bioinformatics/btx417

    Article  CAS  PubMed  Google Scholar 

  9. Caldararu O, Blundell TL, Kepp KP (2021) A base measure of precision for protein stability predictors: structural sensitivity. BMC Bioinformatics 22:88–102. https://doi.org/10.1186/s12859-021-04030-w

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Juritz E, Fornasari MS, Martelli PL, Fariselli P, Casadio R, Parisi G (2012) On the effect of protein conformation diversity in discriminating among neutral and disease related single amino acid substitutions. BMC Genomics 13(Suppl. 4):S5. https://doi.org/10.1186/1471-2164-13-S4-S5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Nisthal A, Wang CY, Ary ML, Mayo SL (2019) Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc Natl Acad Sci U S A 116:16367–16377. https://doi.org/10.1073/pnas.1903888116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gray VE, Hause RJ, Fowler DM (2017) Analysis of large-scale mutagenesis data to assess the impact of single amino acid substitutions. Genetics 207:53–61. https://doi.org/10.1534/genetics.117.300064

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Nikam R, Kulandaisamy A, Harini K, Sharma D, Gromiha M (2021) ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res 49:D420–D424. https://doi.org/10.1093/nar/gkaa1035

    Article  CAS  PubMed  Google Scholar 

  14. Stourac J, Dubrava J, Musil M, Horackova J, Damborsky J, Mazurenko S, Bednar D (2021) FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 49:D319–D324. https://doi.org/10.1093/nar/gkaa981

    Article  CAS  PubMed  Google Scholar 

  15. Xavier JS, Nguyen TB, Karmarkar M, Portelli S, Rezende PM, Pires DVE (2021) ThermoMutDB: a thermodynamic database for missense mutations. Nucleic Acids Res 49:D475–D479. https://doi.org/10.1093/nar/gkaa925

    Article  CAS  PubMed  Google Scholar 

  16. Fang J (2019) A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinformatics 21:1285–1292. https://doi.org/10.1093/bib/bbz071

    Article  CAS  Google Scholar 

  17. Savojardo C, Martelli PL, Casadio R, Fariselli P (2019) On the critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinformatics 22:601–603. https://doi.org/10.1093/bib/bbz1682019

    Article  Google Scholar 

  18. Pucci F, Bernaerts KV, Kwasigroch JM, Rooman M (2018) Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics 34:3659–3665. https://doi.org/10.1093/bioinformatics/bty348

    Article  CAS  PubMed  Google Scholar 

  19. Usmanova DR, Bogatyreva NS, Ariño Bernad J, Eremina AA, Gorshkova AA, Ivankov DN (2018) Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation. Bioinformatics 34:3653–3658. https://doi.org/10.1093/bioinformatics/bty340

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Montanucci L, Savojardo C, Martelli PL, Casadio R, Fariselli P (2019) On the biases in predictions of protein stability changes upon variations: the INPS test case. Bioinformatics 35:2525–2527. https://doi.org/10.1093/bioinformatics/bty979

    Article  CAS  PubMed  Google Scholar 

  21. Capriotti E, Fariselli P, Rossi I, Casadio R (2008) A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics 9(Suppl. 2):S6. https://doi.org/10.1186/1471-2105-9-s2-s6

    Article  PubMed  PubMed Central  Google Scholar 

  22. Christensen NJ, Kepp KP (2012) Accurate stabilities of laccase mutants predicted with a modified FoldX protocol. J Chem Inf Model 52:3028–3042. https://doi.org/10.1021/ci300398z

    Article  CAS  PubMed  Google Scholar 

  23. Thiltgen G, Goldstein RA (2012) Assessing predictors of changes in protein stability upon mutation using self-consistency. PLoS One 7:e46084. https://doi.org/10.1371/journal.pone.0046084

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Fariselli P, Martelli PL, Savojardo C, Casadio R (2015) INPS: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics 31:2816–2821. https://doi.org/10.1093/bioinformatics/btv291

    Article  CAS  PubMed  Google Scholar 

  25. Pandurangan AP, Ochoa-Montaño B, Ascher DB, Blundell TL (2017) SDM: a server for predicting effects of mutations on protein stability. Nucleic Acids Res 45:W229–W235. https://doi.org/10.1093/nar/gkx439

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Broom A, Jacobi Z, Trainor K, Meiering EM (2017) Computational tools help improve protein stability but with a solubility tradeoff. J Biol Chem 292:14349–14361. https://doi.org/10.1074/jbc.M117.784165

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li Y, Fang J (2012) PROTS-RF: a robust model for predicting mutation-induced protein stability changes. PLoS One 7:e47247. https://doi.org/10.1371/journal.pone.0047247

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Cao H, Wang J, He L, Qi Y, Zhang JZ (2019) DeepDDG: predicting the stability change of protein point mutations using neural networks. J Chem Inf Model 59:1508–1514. https://doi.org/10.1021/acs.jcim.8b00697

    Article  CAS  PubMed  Google Scholar 

  29. Li B, Yang YT, Capra JA, Gerstein MB (2020) Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks. PLoS Comput Biol 16:e1008291. https://doi.org/10.1371/journal.pcbi.1008291

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Montanucci L, Capriotti E, Frank Y, Ben-Tal N, Fariselli P (2019) DDGun: an untrained method for the prediction of protein stability changes upon single and multiple point variations. BMC Bioinformatics 20(Suppl. 14):335. https://doi.org/10.1186/s12859-019-2923-1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Caldararu O, Mehra R, Blundell TL, Kepp KP (2020) Systematic Investigation of the data set dependency of protein stability predictors. J Chem Inf Model 60:4772–4784. https://doi.org/10.1021/acs.jcim.0c00591

    Article  CAS  PubMed  Google Scholar 

  32. Andreoletti G, Lipika RP, Moult J, Brenner SE (2019) Reports from the fifth edition of CAGI: the critical assessment of genome interpretation. Hum Mutat 40:1197–1201. https://doi.org/10.1002/humu.23876

    Article  PubMed  PubMed Central  Google Scholar 

  33. Andreoletti G, Mount SM, Avsec Ž, Carmel L, Casadio R, Zhao H (2019) Assessing predictions of the impact of variants on splicing in CAGI5. Hum Mutat 40:1215–1224. https://doi.org/10.1002/humu.23869

    Article  Google Scholar 

  34. Petrosino M, Pasquo A, Novak L, Toto A, Gianni S, Consalvi V (2019) Characterization of human frataxin missense variants in cancer tissues. Hum Mutat 40:1400–1413. https://doi.org/10.1002/humu.23789

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Savojardo C, Petrosino M, Babbi G, Bovo S, Corbi-Verge C, Capriotti E (2019) Evaluating the predictions of the protein stability change upon single amino acid substitutions for the FXN CAGI5 challenge. Hum Mutat 40:1392–1399. https://doi.org/10.1002/humu.23843

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Meersche YV, Cretin G, de Brevern AG, Gelly JC, Galochkina T (2021) MEDUSA: prediction of protein flexibility from sequence. J Mol Biol 433:166882. https://doi.org/10.1016/j.jmb.2021.166882

    Article  CAS  Google Scholar 

  37. Savojardo C, Fariselli P, Martelli PL, Casadio R (2016) INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics 32:2542–2544. https://doi.org/10.1093/bioinformatics/btw192

    Article  CAS  PubMed  Google Scholar 

  38. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382–W388. https://doi.org/10.1093/nar/gki387

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Sali A (2006) Comparative protein structure modeling using modeller. Curr Protoc Bioinformatics. Chapter 5:Unit-5.6. https://doi.org/10.1002/0471250953.bi0506s15

  40. Savojardo C, Babbi G, Martelli PL, Casadio R (2019) Functional and structural features of disease-related protein variants. Int J Mol Sci 20(7):1530. https://doi.org/10.3390/ijms20071530. PMID: 30934684; PMCID: PMC6479756

    Article  CAS  PubMed Central  Google Scholar 

  41. Leuenberger P, Ganscha S, Kahraman A, Cappelletti V, Boersema PJ, Picotti P (2017) Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 355:1–13. https://doi.org/10.1126/science.aai7825

    Article  CAS  Google Scholar 

  42. Li G, Panday SK, Alexov E (2021) SAAFEC-SEQ: a sequence-based method for predicting the effect of single point mutations on protein thermodynamic stability. Int J Mol Sci 22:606. https://doi.org/10.3390/ijms22020606

    Article  CAS  PubMed Central  Google Scholar 

  43. Montanucci L, Martelli PL, Ben-Tal N, Fariselli P (2019) A natural upper bound to the accuracy of predicting protein stability changes upon mutations. Bioinformatics 35(9):1513–1517. https://doi.org/10.1093/bioinformatics/bty880

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rita Casadio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Casadio, R., Savojardo, C., Fariselli, P., Capriotti, E., Martelli, P.L. (2022). Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 2449. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2095-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2095-3_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2094-6

  • Online ISBN: 978-1-0716-2095-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics