Skip to main content

Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information

  • Protocol
  • First Online:
Prediction of Protein Secondary Structure

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1484))

Abstract

Predicting the secondary structure of a protein from its sequence still remains a challenging problem. The prediction accuracies remain around 80 %, and for very diverse methods. Using evolutionary information and machine learning algorithms in particular has had the most impact. In this chapter, we will first define secondary structures, then we will review the Consensus Data Mining (CDM) technique based on the robust GOR algorithm and Fragment Database Mining (FDM) approach. GOR V is an empirical method utilizing a sliding window approach to model the secondary structural elements of a protein by making use of generalized evolutionary information. FDM uses data mining from experimental structure fragments, and is able to successfully predict the secondary structure of a protein by combining experimentally determined structural fragments based on sequence similarities of the fragments. The CDM method combines predictions from GOR V and FDM in a hierarchical manner to produce consensus predictions for secondary structure. In other words, if sequence fragment are not available, then it uses GOR V to make the secondary structure prediction. The online server of CDM is available at http://gor.bb.iastate.edu/cdm/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1):2256–2268. doi:10.1107/S0907444904026460

    Article  CAS  PubMed  Google Scholar 

  2. Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134(2–3):204–218. doi:10.1006/jsbi.2001.4336

    Article  CAS  PubMed  Google Scholar 

  3. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637. doi:10.1002/bip.360221211

    Article  CAS  PubMed  Google Scholar 

  4. Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins 23(4):566–579. doi:10.1002/prot.340230412

    Article  CAS  PubMed  Google Scholar 

  5. Moult J, Pedersen JT, Judson R, Fidelis K (1995) A large‐scale experiment to assess protein structure prediction methods. Proteins 23(3):ii–iv

    Article  CAS  PubMed  Google Scholar 

  6. Biou V, Gibrat JF, Levin JM, Robson B, Garnier J (1988) Secondary structure prediction: combination of three different methods. Protein Eng 2:185–191

    Article  CAS  PubMed  Google Scholar 

  7. Salamov AA, Solovyev VV (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol 247:11–15

    Article  CAS  PubMed  Google Scholar 

  8. Rost B, Sander C (2000) Third generation prediction of secondary structures. Methods Mol Biol 143:71–95

    CAS  PubMed  Google Scholar 

  9. Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120

    Article  CAS  PubMed  Google Scholar 

  10. Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:222–245

    Article  CAS  PubMed  Google Scholar 

  11. Lim VI (1974) Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol 88:873–894

    Article  CAS  PubMed  Google Scholar 

  12. Lim VI (1974) Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. J Mol Biol 88:857–872

    Article  CAS  PubMed  Google Scholar 

  13. Levin JM, Garnier J (1988) Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. Biochim Biophys Acta 955:283–295

    Article  CAS  PubMed  Google Scholar 

  14. Levin JM, Robson B, Garnier J (1986) An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett 205:303–308

    Article  CAS  PubMed  Google Scholar 

  15. Salamov AA, Solovyev VV (1997) Protein secondary structure prediction using local alignments. J Mol Biol 268:31–36

    Article  CAS  PubMed  Google Scholar 

  16. Salzberg S, Cost S (1992) Predicting protein secondary structure with a nearest-neighbor algorithm. J Mol Biol 227:371–374

    Article  CAS  PubMed  Google Scholar 

  17. Yi TM, Lander ES (1993) Protein secondary structure prediction using nearest-neighbor methods. J Mol Biol 232:1117–1129

    Article  CAS  PubMed  Google Scholar 

  18. Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27:329–335

    Article  CAS  PubMed  Google Scholar 

  19. Holley LH, Karplus M (1989) Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A 86:152–156

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202

    Article  CAS  PubMed  Google Scholar 

  21. Petersen TN, Lundegaard C, Nielsen M, Bohr H, Bohr J, Brunak S, Gippert GP, Lund O (2000) Prediction of protein secondary structure at 80 % accuracy. Proteins 41:17–20

    Article  CAS  PubMed  Google Scholar 

  22. Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884

    Article  CAS  PubMed  Google Scholar 

  23. Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70 % accuracy. J Mol Biol 232:584–599

    Article  CAS  PubMed  Google Scholar 

  24. Rost B, Sander C, Schneider R (1994) PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci 10:53–60

    CAS  PubMed  Google Scholar 

  25. Stolorz P, Lapedes A, Xia Y (1992) Predicting protein secondary structure using neural net and statistical methods. J Mol Biol 225:363–377

    Article  CAS  PubMed  Google Scholar 

  26. Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154–166

    Article  CAS  PubMed  Google Scholar 

  27. Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information. Polymer 43:441–449

    Article  CAS  Google Scholar 

  28. Bystroff C, Thorsson V, Baker D (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301:173–190

    Article  CAS  PubMed  Google Scholar 

  29. Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(suppl 2):W244–W248

    Article  PubMed  PubMed Central  Google Scholar 

  30. Karplus K (2009) SAM-T08, HMM-based protein structure prediction. Nucleic Acids Res 37(suppl 2):W492–W497

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Asai K, Hayamizu S, Handa KI (1993) Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci 9(2):141–146

    CAS  PubMed  Google Scholar 

  32. Li SC, Bu D, Xu J, Li M (2008) Fragment‐HMM: a new approach to protein structure prediction. Protein Sci 17(11):1925–1934

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ding W, Dai D, Xie J, Zhang H, Zhang W, Xie H (2012) PRT-HMM: A novel hidden Markov model for protein secondary structure prediction. In Computer and information science (ICIS), 2012 IEEE/ACIS 11th international conference on. IEEE. pp 207–212

    Google Scholar 

  34. Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL (2005) Prediction of protein secondary structure by mining structural fragment database. Polymer 46:4314–4321

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) GOR V server for protein secondary structure prediction. Bioinformatics 21:2787–2788

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sen TZ, Cheng H, Kloczkowski A, Jernigan RL (2006) A consensus data mining secondary structure prediction by combining GOR V and fragment database mining. Protein Sci 15:2499–2506

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Consensus data mining (CDM) protein secondary structure prediction server: combining GOR V and fragment database mining (FDM). Bioinformatics 23:2628–2630

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33(3):259–267

    Article  CAS  PubMed  Google Scholar 

  39. Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476

    Article  PubMed  PubMed Central  Google Scholar 

  40. Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34(4):508–519

    Article  CAS  PubMed  Google Scholar 

  41. Cuff JA, Barton GJ (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40(3):502–511

    Article  CAS  PubMed  Google Scholar 

  42. Simossis VA, Heringa J (2004) Integrating protein secondary structure prediction and multiple sequence alignment. Curr Protein Pept Sci 5(4):249–266

    Article  CAS  PubMed  Google Scholar 

  43. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Rost B (2003) Prediction in 1D: secondary structure, membrane helices, and accessibility. Methods Biochem Anal 44:559–587

    CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taner Z. Sen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Kandoi, G., Leelananda, S.P., Jernigan, R.L., Sen, T.Z. (2017). Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information. In: Zhou, Y., Kloczkowski, A., Faraggi, E., Yang, Y. (eds) Prediction of Protein Secondary Structure. Methods in Molecular Biology, vol 1484. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6406-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6406-2_4

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6404-8

  • Online ISBN: 978-1-4939-6406-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics