Skip to main content
Log in

Assigning secondary structure in proteins using AI

  • Original Paper
  • Published:
Journal of Molecular Modeling Aims and scope Submit manuscript

Abstract

Knowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the 1980s, various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by machine learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in the protein files. Our method proposed a multi-class classifier program named DLFSA for assigning protein secondary structure elements (SSE) using convolutional neural networks (CNNs). A fast and efficient GPU-based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of the protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. The model uses only Cα coordinates for secondary structure assignments. The model has been successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9:
Fig. 10:

Similar content being viewed by others

Data availability

DLFSA is made available to the public through the Web portal—www.proteinallinfo.in. The datasets generated during and/or analyzed during the current study are not publicly available due to their large size but are available from the corresponding author on reasonable request.

References

  1. Pauling L, Corey RB, Branson HR (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci 37(4):205–211

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Reeb J, Rost B (2019) Secondary structure prediction. Encyclopedia of Bioin-formatics and Computational Biology, pp 488–496

    Chapter  Google Scholar 

  3. Srinivasan R, Rose GD (1999) A physical basis for protein secondary structure. Proc Natl Acad Sci 96(25):14258–14263

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Eisenberg D (2003) The discovery of the α-helix and β-sheet, the principal structural features of proteins. Proc Natl Acad Sci 100(20):11207–11210

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Zhou J, Wang H, Zhao Z, Xu R, Lu Q (2018) CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway. BMC Bioinform 19(4):99–109

    Google Scholar 

  6. Abbass J, Nebel JC, Mansour N, Elloumi M, Zomaya AY (2013) Ab initio protein structure prediction: methods and challenges. Biol Knowl Discov Handb. John Wiley & Sons, Inc, Hoboken, New Jersey, pp 703–724

    Chapter  Google Scholar 

  7. Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(4096):223–230

    Article  CAS  PubMed  Google Scholar 

  8. Onuchic JN, Wolynes PG (2004) Theory of protein folding. Curr Opin Struct Biol 14(1):70–75

    Article  CAS  PubMed  Google Scholar 

  9. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637

    Article  CAS  PubMed  Google Scholar 

  10. Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins Struct Funct Bioinf 23(4):566–579

    Article  CAS  Google Scholar 

  11. Ramachandran GT, Sasisekharan V (1968) Conformation of polypeptides and proteins. Adv Protein Chem 23:283–437

    Article  CAS  PubMed  Google Scholar 

  12. Zacharias J, Knapp EW (2014) Protein secondary structure classification revisited: processing DSSP information with PSSC. J Chem Inf Model 54(7):2166–2179

    Article  CAS  PubMed  Google Scholar 

  13. Fodje MN, Al-Karadaghi S (2002) Occurrence, conformational features and amino acid propensities for the π-helix. Protein Eng Des Sel 15(5):353–358

    Article  CAS  Google Scholar 

  14. Nagy G, Oostenbrink C (2014) Dihedral-based segment identification and classification of biopolymers I: proteins. J Chem Inf Model 54(1):266–277

    Article  CAS  PubMed  Google Scholar 

  15. Cubellis MV, Cailliez F, Lovell SC (2005) Secondary structure assignment that accurately reflects physical and evolutionary characteristics. BMC Bioinform 6(4):1–9

    Google Scholar 

  16. Richards FM, Kundrot CE (1988) Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins Struct Funct Bioinf 3(2):71–84

    Article  CAS  Google Scholar 

  17. Sklenar H, Etchebest C, Lavery R (1989) Describing protein structure: a general algorithm yielding complete helicoidal parameters and a unique overall axis. Proteins Struct Funct Bioinf 6(1):46–60

    Article  CAS  Google Scholar 

  18. Hosseini SR, Sadeghi M, Pezeshk H, Eslahchi C, Habibi M (2008) PROSIGN: a method for protein secondary structure assignment based on three-dimensional coordinates of consecutive Cα atoms. Comput Biol Chem 32(6):406–411

    Article  CAS  PubMed  Google Scholar 

  19. Labesse G, Colloc'h N, Pothier J, Mornon JP (1997) P-SEA: a new efficient assignment of secondary structure from Cα trace of proteins. Bioinformatics 13(3):291–295

    Article  CAS  Google Scholar 

  20. Majumdar I, Krishna SS, Grishin NV (2005) PALSSE: a program to delineate linear secondary structural elements from protein structures. BMC Bioinform 6(1):1–24

    Article  CAS  Google Scholar 

  21. Taylor WR (2001) Defining linear segments in protein structure. J Mol Biol 310(5):1135–1150

    Article  CAS  PubMed  Google Scholar 

  22. Dupuis F, Sadoc JF, Mornon JP (2004) Protein secondary structure assignment through Voronoi tessellation. Proteins Struct Funct Bioinf 55(3):519–528

    Article  CAS  Google Scholar 

  23. Park SY, Yoo MJ, Shin JM, Cho KH (2011) SABA (secondary structure assignment program based on only alpha carbons): a novel pseudo center geometrical criterion for accurate assignment of protein secondary structures. BMB Rep 44(2):118–122

    Article  CAS  PubMed  Google Scholar 

  24. Zhang W, Dunker AK, Zhou Y (2008) Assessing secondary structure assignment of protein structures by using pairwise sequence-alignment benchmarks. Proteins Struct Funct Bioinf 71(1):61–67

    Article  CAS  Google Scholar 

  25. Cao C, Wang G, Liu A, Xu S, Wang L, Zou S (2016) A new secondary structure assignment algorithm using Cα backbone fragments. Int J Mol Sci 17(3):333

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Konagurthu AS, Lesk AM, Allison L (2012) Minimum message length inference of secondary structure from protein coordinate data. Bioinformatics 28(12):i97–i105

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Haghighi H, Higham J, Henchman RH (2016) Parameter-free hydrogen-bond definition to classify protein secondary structure. J Phys Chem B 120(33):8566–8570

    Article  CAS  PubMed  Google Scholar 

  28. Kumar P, Bansal M (2012) HELANAL-Plus: a web server for analysis of helix geometry in protein structures. J Biomol Struct Dyn 30(6):773–783

    Article  CAS  PubMed  Google Scholar 

  29. King SM, Johnson WC (1999) Assigning secondary structure from protein coordinate data. Proteins Struct Funct Bioinf 35(3):313–320

    Article  CAS  Google Scholar 

  30. Carter P, Andersen CA, Rost B (2003) DSSPcont: continuous secondary structure assignments for proteins. Nucleic Acids Res 31(13):3293–3295

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Konagurthu AS, Allison L, Stuckey PJ, Lesk AM (2011) Piecewise linear approximation of protein structures using the principle of minimum message length. Bioinformatics 27(13):i43–i51

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Levitt M, Greer J (1977) Automatic identification of secondary structure in globular proteins. J Mol Biol 114(2):181–239

    Article  CAS  PubMed  Google Scholar 

  33. Cao C, Xu S, Wang L (2015) An algorithm for protein helix assignment using helix geometry. PLoS One 10(7):e0129674

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Klose DP, Wallace BA, Janes RW (2010) 2Struc: the secondary structure server. Bioinformatics 26(20):2624–2625

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Kumar P, Bansal M (2015) Identification of local variations within secondary structures of proteins. Acta Crystallogr D Biol Crystallogr 71(5):1077–1086

    Article  CAS  PubMed  Google Scholar 

  36. Habibia M, Eslahchia C, Pezeshkc H, Sadeghid M (2008) An information-theoretic approach to secondary structure assignment, Journal of Science (University of Tehran) (JSUT)

  37. Taylor T, Rivera M, Wilson G, Vaisman II (2005) New method for protein secondary structure assignment based on a simple topological descriptor. Proteins Struct Funct Bioinf 60(3):513–524

    Article  CAS  Google Scholar 

  38. Zhang Y, Sagui C (2015) Secondary structure assignment for conformationally irregular peptides: comparison between DSSP, STRIDE and KAKSI. J Mol Graph Model 55:72–84

    Article  PubMed  CAS  Google Scholar 

  39. Law SM, Frank AT, Brooks III CL (2014) PCASSO: a fast and efficient Cα-based method for accurately assigning protein secondary structure elements. J Comput Chem 35(24):1757–1761

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Salawu EO (2016) RaFoSA: Random forests secondary structure assignment for coarse-grained and all-atom protein systems. Cogent Biol 2(1):1214061

    Article  CAS  Google Scholar 

  41. Wang J, Cao H, Zhang JZ, Qi Y (2018) Computational protein design with deep learning neural networks. Sci Rep 8(1):1–9

    Google Scholar 

  42. Cheng J, Tegge AN, Baldi P (2008) Machine learning methods for protein structure prediction. IEEE Rev Biomed Eng 1:41–49

    Article  PubMed  Google Scholar 

  43. Zhang B, Li J, Lü Q (2018) Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinform 19(1):1–13

    Article  CAS  Google Scholar 

  44. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  CAS  PubMed  Google Scholar 

  45. Goh GB, Hodas NO, Vishnu A (2017) Deep learning for computational chemistry. J Comput Chem 38(16):1291–1307

    Article  CAS  PubMed  Google Scholar 

  46. O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.

  47. Busia, A., Collins, J., & Jaitly, N. (2016). Protein secondary structure prediction using deep multi-scale convolutional neural networks and next-step conditioning. arXiv preprint arXiv:1611.01503.

  48. Zamora-Resendiz R, Crivelli S (2019) Structural learning of proteins using graph convolutional neural networks. bioRxiv, 610444, Cold Spring Harbor Laboratory

  49. Niepert, M., Ahmed, M., & Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In International conference on machine learning (pp. 2014-2023). PMLR.

  50. https://www.rcsb.org/structure/, accessed : 2020-09-09.

  51. Holmes JB, Tsai J (2004) Some fundamental aspects of building protein structures from fragment libraries. Protein Sci 13(6):1636–1650

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Xu D, Zhang Y (2013) Toward optimal fragment generations for ab initio protein structure assembly. Proteins Struct Funct Bioinf 81(2):229–239

    Article  CAS  Google Scholar 

  53. de Oliveira SH, Shi J, Deane CM (2015) Building a better fragment library for de novo protein structure prediction. PLoS One 10(4):e0123998

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Abbass J, Nebel JC (2015) Customised fragments libraries for protein structure prediction based on structural class annotations. BMC Bioinform 16(1):1–13

    Article  CAS  Google Scholar 

  55. Trevizani R, Custódio FL, Dos Santos KB, Dardenne LE (2017) Critical features of fragment libraries for protein structure prediction. PLoS One 12(1):e0170131

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. Abbass J, Nebel JC (2020) Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinform 21:1–23

    Article  CAS  Google Scholar 

  57. https://www.djangoproject.com/, accessed : 2020-12-12.

Download references

Acknowledgements

The authors would like to thank the Central Computing Centre, National Institute of Technology Calicut (NITC), for providing GPU servers for this work. The authors would like to acknowledge the valuable suggestions from Jinto Antony for developing the Web portal.

Code availability

The library generation source codes and the model codes are made open at https://github.com/jisnava/DLFSA/.

Funding

It is part of my (VA Jisna) PhD work at the National Institute of Technology Calicut, India. The research is funded by the Ministry of Human Resource Development, India.

Author information

Authors and Affiliations

Authors

Contributions

Jisna Vellara Antony (JVA) did the conceptualization. Hemant Yadav developed the fragment library. Prayagh Madhu did the model coding. JVA developed the Web portal and wrote the manuscript. Jayaraj Pottekkattuvalappil Balakrishnan supervised the project. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jisna Vellara Antony.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(DOCX 335 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Antony, J.V., Madhu, P., Balakrishnan, J.P. et al. Assigning secondary structure in proteins using AI. J Mol Model 27, 252 (2021). https://doi.org/10.1007/s00894-021-04825-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00894-021-04825-x

Keywords

Navigation