Skip to main content

Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins

  • Protocol
  • First Online:
Computational Methods for Predicting Post-Translational Modification Sites

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2499))

Abstract

Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew’s correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xiao H, Sun F, Suttapitugsakul S, Wu R (2019) Global and site-specific analysis of protein glycosylation in complex biological systems with mass spectrometry. Mass Spectrom Rev 38(4–5):356–379

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Joshi HJ, Narimatsu Y, Schjoldager KT, Tytgat HL, Aebi M, Clausen H, Halim A (2018) SnapShot: O-glycosylation pathways across kingdoms. Cell 172(3):632–632

    Article  CAS  PubMed  Google Scholar 

  3. Varki A (2017) Biological roles of glycans. Glycobiology 27(1):3–49

    Article  CAS  PubMed  Google Scholar 

  4. Thomas DR, Scott NE (2021) Glycoproteomics: growing up fast. Curr Opin Struct Biol 68:18–25

    Article  CAS  PubMed  Google Scholar 

  5. Woods RJ (2018) Predicting the structures of glycans, glycoproteins, and their complexes. Chem Rev 118(17):8005–8024

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Banazadeh A, Veillon L, Wooding KM, Zabet-moghaddam M, Mechref Y (2017) Recent advances in mass spectrometric analysis of glycoproteins. Electrophoresis 38(1):162–189

    Article  CAS  PubMed  Google Scholar 

  7. Weerapana E, Imperiali B (2006) Asparagine-linked protein glycosylation: from eukaryotic to prokaryotic systems. Glycobiology 16(6):91R–101R

    Article  CAS  PubMed  Google Scholar 

  8. Li Q, Xie Y, Wong M, Lebrilla CB (2019) Characterization of cell glycocalyx with mass spectrometry methods. Cell 8(8):882

    Article  CAS  Google Scholar 

  9. Ruhaak LR, Xu G, Li Q, Goonatilleke E, Lebrilla CB (2018) Mass spectrometry approaches to glycomic and glycoproteomic analyses. Chem Rev 118(17):7886–7930

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4(6):1633–1649

    Article  CAS  PubMed  Google Scholar 

  11. Caragea C, Sinapov J, Silvescu A, Dobbs D, Honavar V (2007) Glycosylation site prediction using ensembles of support vector machine classifiers. BMC Bioinformatics 8(1):1–13

    Article  CAS  Google Scholar 

  12. Hamby SE, Hirst JD (2008) Prediction of glycosylation sites using random forests. BMC Bioinformatics 9(1):1–13

    Article  CAS  Google Scholar 

  13. Chauhan JS, Bhat AH, Raghava GP, Rao A (2012) GlycoPP: a webserver for prediction of N-and O-glycosites in prokaryotic protein sequences. PLoS One 7(7):e40155

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Chauhan JS, Rao A, Raghava GP (2013) In silico platform for prediction of N-, O-and C-glycosites in eukaryotic protein sequences. PLoS One 8(6):e67008

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li F, Li C, Wang M, Webb GI, Zhang Y, Whisstock JC, Song J (2015) GlycoMine: a machine learning-based approach for predicting N-, C-and O-linked glycosylation in the human proteome. Bioinformatics 31(9):1411–1419

    Article  PubMed  CAS  Google Scholar 

  16. Pitti T, Chen CT, Lin HN, Choong WK, Hsu WL, Sung TY (2019) N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding. Sci Rep 9(1):1–11

    Article  CAS  Google Scholar 

  17. Chien CH, Chang CC, Lin SH, Chen CW, Chang ZH, Chu YW (2020) N-GlycoGo: predicting protein N-glycosylation sites on imbalanced data sets by using heterogeneous and comprehensive strategy. IEEE Access 8:165944–165950

    Article  Google Scholar 

  18. Taherzadeh G, Dehzangi A, Golchin M, Zhou Y, Campbell MP (2019) SPRINT-Gly: predicting N-and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics 35(20):4140–4146

    Article  CAS  PubMed  Google Scholar 

  19. Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849

    Article  CAS  PubMed  Google Scholar 

  20. Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J et al (2015) Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11

    Article  Google Scholar 

  21. Taherzadeh G, Yang Y, Xu H, Xue Y, Liew AWC, Zhou Y (2018) Predicting lysine-malonylation sites of proteins using sequence and predicted structural features. J Comput Chem 39(22):1757–1763

    Article  CAS  PubMed  Google Scholar 

  22. Ahmad MW, Arafat ME, Taherzadeh G, Sharma A, Dipta SR, Dehzangi A, Shatabda S (2020) Mal-light: enhancing lysine malonylation sites prediction problem using evolutionary-based features. IEEE Access 8:77888–77902

    Article  PubMed  PubMed Central  Google Scholar 

  23. Arafat M, Ahmad M, Shovan SM, Dehzangi A, Dipta SR, Hasan M et al (2020) Accurately predicting glutarylation sites using sequential bi-peptide-based evolutionary features. Genes 11(9):1023

    Article  CAS  PubMed Central  Google Scholar 

  24. Dipta SR, Taherzadeh G, Ahmad MW, Arafat ME, Shatabda S, Dehzangi A (2020) SEMal: accurate protein malonylation site predictor using structural and evolutionary information. Comput Biol Med 125:104022

    Article  CAS  PubMed  Google Scholar 

  25. Chandra A, Sharma A, Dehzangi A, Ranganathan S, Jokhan A, Chou KC, Tsunoda T (2018) PhoglyStruct: prediction of phosphoglycerylated lysine residues using structural properties of amino acids. Sci Rep 8(1):1–11

    CAS  Google Scholar 

  26. Islam MM, Saha S, Rahman MM, Shatabda S, Farid DM, Dehzangi A (2018) iProtGly-SS: identifying protein glycation sites using sequence and structure based features. Proteins 86(7):777–789

    Article  CAS  PubMed  Google Scholar 

  27. Dehzangi A, López Y, Lal SP, Taherzadeh G, Michaelson J, Sattar A et al (2017) PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J Theor Biol 425:97–102

    Article  CAS  PubMed  Google Scholar 

  28. Chandra AA, Sharma A, Dehzangi A, Tsunoda T (2019) EvolStruct-Phogly: incorporating structural properties and evolutionary information from profile bigrams for the phosphoglycerylation prediction. BMC Genomics 19(9):1–9

    Google Scholar 

  29. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32(suppl_1):D115–D119

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH et al (2016) dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res 44(D1):D435–D446

    Article  CAS  PubMed  Google Scholar 

  31. Kaji H, Shikanai T, Suzuki Y, Narimatsu H (2017) GlycoProtDB: a database of glycoproteins mapped with actual glycosylation sites identified by mass spectrometry. In: A practical guide to using glycomics databases. Springer, Tokyo, pp 215–224

    Chapter  Google Scholar 

  32. Zhang H, Loriaux P, Eng J, Campbell D, Keller A, Moss P et al (2006) UniPep-a database for human N-linked glycosites: a resource for biomarker discovery. Genome Biol 7(8):1–12

    Article  Google Scholar 

  33. Campbell MP, Peterson R, Mariethoz J, Gasteiger E, Akune Y, Aoki-Kinoshita KF et al (2014) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res 42(D1):D215–D221

    Article  CAS  PubMed  Google Scholar 

  34. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659

    Article  CAS  PubMed  Google Scholar 

  36. Taherzadeh G, Yang Y, Zhang T, Liew AWC, Zhou Y (2016) Sequence-based prediction of protein–peptide binding sites using support vector machine. J Comput Chem 37(13):1223–1229

    Article  CAS  PubMed  Google Scholar 

  37. Meiler J, Müller M, Zeidler A, Schmäschke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. Mol Model Ann 7(9):360–369

    Article  CAS  Google Scholar 

  38. Ruiz-Blanco YB, Marrero-Ponce Y, García-Hernández E, Green J (2017) Novel “extended sequons” of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features. Amino Acids 49(2):317–325

    Article  CAS  PubMed  Google Scholar 

  39. Hamelryck T (2005) An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins 59(1):38–48

    Article  CAS  PubMed  Google Scholar 

  40. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26

    Article  Google Scholar 

  41. Vapnik V (2013) The nature of statistical learning theory. Springer

    Google Scholar 

  42. Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recogn 33(1):25–41

    Article  Google Scholar 

Download references

Acknowledgments

We acknowledge the use of the High-Performance Computing Cluster “Gowonda,” resources provided by the Queensland Cyber Infrastructure Foundation, and Australian Research Data Commons. We also acknowledge the authors of previous methods for providing the web servers. This work was supported in part by Institute for Glycomics to M.P.C., and by the Australian Government Research Training Program Scholarship for G.T.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ghazaleh Taherzadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Taherzadeh, G., Campbell, M., Zhou, Y. (2022). Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins. In: KC, D.B. (eds) Computational Methods for Predicting Post-Translational Modification Sites. Methods in Molecular Biology, vol 2499. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2317-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2317-6_9

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2316-9

  • Online ISBN: 978-1-0716-2317-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics