Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Marabotti A, Scafuri B, Facchiano A (2020) Predicting the stability of mutant proteins by computational approaches: an overview. Brief Bioinform:bbaa074. https://doi.org/10.1093/bib/bbaa074. Epub ahead of print
Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P (2020) Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 18:1968–1979. Published 2020 Jul 24. https://doi.org/10.1016/j.csbj.2020.07.011
Huang P, Chu SKS, Frizzo HN, Connolly MP, Caster RW, Siegel JB (2020) Evaluating protein engineering thermostability prediction tools using an independently generated dataset. ACS Omega 5:6487–6493. https://doi.org/10.1021/acsomega.9b04105
Yang Y, Urolagin S, Niroula A, Ding X, Shen B, Vihinen M (2018) PON-tstab: Protein variant stability predictor. Importance of training data quality. Int J Mol Sci 19:1009–1025. https://doi.org/10.3390/ijms19041009
Dill KA (1990) Dominant forces in protein folding. Biochemistry 29:7133–7155. https://doi.org/10.1021/bi00483a001
Casadio R, Compiani M, Fariselli P, Vivarelli F (1995) Predicting free energy contributions to the conformational stability of folded proteins from the residue sequence with radial basis function networks. Proc Int Conf Intell Syst Mol Biol 3:81–88
Capriotti E, Casadio R (2007) K-Fold: a tool for the prediction of the protein folding kinetic order and rate. Bioinformatics 23:385–386. https://doi.org/10.1093/bioinformatics/btl610
Pucci F, Kwasigroch JM, Rooman M (2017) SCooP: an accurate and fast predictor of protein stability curves as a function of temperature. Bioinformatics 33:3415–3422. https://doi.org/10.1093/bioinformatics/btx417
Caldararu O, Blundell TL, Kepp KP (2021) A base measure of precision for protein stability predictors: structural sensitivity. BMC Bioinformatics 22:88–102. https://doi.org/10.1186/s12859-021-04030-w
Juritz E, Fornasari MS, Martelli PL, Fariselli P, Casadio R, Parisi G (2012) On the effect of protein conformation diversity in discriminating among neutral and disease related single amino acid substitutions. BMC Genomics 13(Suppl. 4):S5. https://doi.org/10.1186/1471-2164-13-S4-S5
Nisthal A, Wang CY, Ary ML, Mayo SL (2019) Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc Natl Acad Sci U S A 116:16367–16377. https://doi.org/10.1073/pnas.1903888116
Gray VE, Hause RJ, Fowler DM (2017) Analysis of large-scale mutagenesis data to assess the impact of single amino acid substitutions. Genetics 207:53–61. https://doi.org/10.1534/genetics.117.300064
Nikam R, Kulandaisamy A, Harini K, Sharma D, Gromiha M (2021) ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res 49:D420–D424. https://doi.org/10.1093/nar/gkaa1035
Stourac J, Dubrava J, Musil M, Horackova J, Damborsky J, Mazurenko S, Bednar D (2021) FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 49:D319–D324. https://doi.org/10.1093/nar/gkaa981
Xavier JS, Nguyen TB, Karmarkar M, Portelli S, Rezende PM, Pires DVE (2021) ThermoMutDB: a thermodynamic database for missense mutations. Nucleic Acids Res 49:D475–D479. https://doi.org/10.1093/nar/gkaa925
Fang J (2019) A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinformatics 21:1285–1292. https://doi.org/10.1093/bib/bbz071
Savojardo C, Martelli PL, Casadio R, Fariselli P (2019) On the critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinformatics 22:601–603. https://doi.org/10.1093/bib/bbz1682019
Pucci F, Bernaerts KV, Kwasigroch JM, Rooman M (2018) Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics 34:3659–3665. https://doi.org/10.1093/bioinformatics/bty348
Usmanova DR, Bogatyreva NS, Ariño Bernad J, Eremina AA, Gorshkova AA, Ivankov DN (2018) Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation. Bioinformatics 34:3653–3658. https://doi.org/10.1093/bioinformatics/bty340
Montanucci L, Savojardo C, Martelli PL, Casadio R, Fariselli P (2019) On the biases in predictions of protein stability changes upon variations: the INPS test case. Bioinformatics 35:2525–2527. https://doi.org/10.1093/bioinformatics/bty979
Capriotti E, Fariselli P, Rossi I, Casadio R (2008) A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics 9(Suppl. 2):S6. https://doi.org/10.1186/1471-2105-9-s2-s6
Christensen NJ, Kepp KP (2012) Accurate stabilities of laccase mutants predicted with a modified FoldX protocol. J Chem Inf Model 52:3028–3042. https://doi.org/10.1021/ci300398z
Thiltgen G, Goldstein RA (2012) Assessing predictors of changes in protein stability upon mutation using self-consistency. PLoS One 7:e46084. https://doi.org/10.1371/journal.pone.0046084
Fariselli P, Martelli PL, Savojardo C, Casadio R (2015) INPS: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics 31:2816–2821. https://doi.org/10.1093/bioinformatics/btv291
Pandurangan AP, Ochoa-Montaño B, Ascher DB, Blundell TL (2017) SDM: a server for predicting effects of mutations on protein stability. Nucleic Acids Res 45:W229–W235. https://doi.org/10.1093/nar/gkx439
Broom A, Jacobi Z, Trainor K, Meiering EM (2017) Computational tools help improve protein stability but with a solubility tradeoff. J Biol Chem 292:14349–14361. https://doi.org/10.1074/jbc.M117.784165
Li Y, Fang J (2012) PROTS-RF: a robust model for predicting mutation-induced protein stability changes. PLoS One 7:e47247. https://doi.org/10.1371/journal.pone.0047247
Cao H, Wang J, He L, Qi Y, Zhang JZ (2019) DeepDDG: predicting the stability change of protein point mutations using neural networks. J Chem Inf Model 59:1508–1514. https://doi.org/10.1021/acs.jcim.8b00697
Li B, Yang YT, Capra JA, Gerstein MB (2020) Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks. PLoS Comput Biol 16:e1008291. https://doi.org/10.1371/journal.pcbi.1008291
Montanucci L, Capriotti E, Frank Y, Ben-Tal N, Fariselli P (2019) DDGun: an untrained method for the prediction of protein stability changes upon single and multiple point variations. BMC Bioinformatics 20(Suppl. 14):335. https://doi.org/10.1186/s12859-019-2923-1
Caldararu O, Mehra R, Blundell TL, Kepp KP (2020) Systematic Investigation of the data set dependency of protein stability predictors. J Chem Inf Model 60:4772–4784. https://doi.org/10.1021/acs.jcim.0c00591
Andreoletti G, Lipika RP, Moult J, Brenner SE (2019) Reports from the fifth edition of CAGI: the critical assessment of genome interpretation. Hum Mutat 40:1197–1201. https://doi.org/10.1002/humu.23876
Andreoletti G, Mount SM, Avsec Ž, Carmel L, Casadio R, Zhao H (2019) Assessing predictions of the impact of variants on splicing in CAGI5. Hum Mutat 40:1215–1224. https://doi.org/10.1002/humu.23869
Petrosino M, Pasquo A, Novak L, Toto A, Gianni S, Consalvi V (2019) Characterization of human frataxin missense variants in cancer tissues. Hum Mutat 40:1400–1413. https://doi.org/10.1002/humu.23789
Savojardo C, Petrosino M, Babbi G, Bovo S, Corbi-Verge C, Capriotti E (2019) Evaluating the predictions of the protein stability change upon single amino acid substitutions for the FXN CAGI5 challenge. Hum Mutat 40:1392–1399. https://doi.org/10.1002/humu.23843
Meersche YV, Cretin G, de Brevern AG, Gelly JC, Galochkina T (2021) MEDUSA: prediction of protein flexibility from sequence. J Mol Biol 433:166882. https://doi.org/10.1016/j.jmb.2021.166882
Savojardo C, Fariselli P, Martelli PL, Casadio R (2016) INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics 32:2542–2544. https://doi.org/10.1093/bioinformatics/btw192
Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382–W388. https://doi.org/10.1093/nar/gki387
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Sali A (2006) Comparative protein structure modeling using modeller. Curr Protoc Bioinformatics. Chapter 5:Unit-5.6. https://doi.org/10.1002/0471250953.bi0506s15
Savojardo C, Babbi G, Martelli PL, Casadio R (2019) Functional and structural features of disease-related protein variants. Int J Mol Sci 20(7):1530. https://doi.org/10.3390/ijms20071530. PMID: 30934684; PMCID: PMC6479756
Leuenberger P, Ganscha S, Kahraman A, Cappelletti V, Boersema PJ, Picotti P (2017) Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 355:1–13. https://doi.org/10.1126/science.aai7825
Li G, Panday SK, Alexov E (2021) SAAFEC-SEQ: a sequence-based method for predicting the effect of single point mutations on protein thermodynamic stability. Int J Mol Sci 22:606. https://doi.org/10.3390/ijms22020606
Montanucci L, Martelli PL, Ben-Tal N, Fariselli P (2019) A natural upper bound to the accuracy of predicting protein stability changes upon mutations. Bioinformatics 35(9):1513–1517. https://doi.org/10.1093/bioinformatics/bty880
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Casadio, R., Savojardo, C., Fariselli, P., Capriotti, E., Martelli, P.L. (2022). Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 2449. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2095-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2095-3_6
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2094-6
Online ISBN: 978-1-0716-2095-3
eBook Packages: Springer Protocols