Abstract
Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew’s correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xiao H, Sun F, Suttapitugsakul S, Wu R (2019) Global and site-specific analysis of protein glycosylation in complex biological systems with mass spectrometry. Mass Spectrom Rev 38(4–5):356–379
Joshi HJ, Narimatsu Y, Schjoldager KT, Tytgat HL, Aebi M, Clausen H, Halim A (2018) SnapShot: O-glycosylation pathways across kingdoms. Cell 172(3):632–632
Varki A (2017) Biological roles of glycans. Glycobiology 27(1):3–49
Thomas DR, Scott NE (2021) Glycoproteomics: growing up fast. Curr Opin Struct Biol 68:18–25
Woods RJ (2018) Predicting the structures of glycans, glycoproteins, and their complexes. Chem Rev 118(17):8005–8024
Banazadeh A, Veillon L, Wooding KM, Zabet-moghaddam M, Mechref Y (2017) Recent advances in mass spectrometric analysis of glycoproteins. Electrophoresis 38(1):162–189
Weerapana E, Imperiali B (2006) Asparagine-linked protein glycosylation: from eukaryotic to prokaryotic systems. Glycobiology 16(6):91R–101R
Li Q, Xie Y, Wong M, Lebrilla CB (2019) Characterization of cell glycocalyx with mass spectrometry methods. Cell 8(8):882
Ruhaak LR, Xu G, Li Q, Goonatilleke E, Lebrilla CB (2018) Mass spectrometry approaches to glycomic and glycoproteomic analyses. Chem Rev 118(17):7886–7930
Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4(6):1633–1649
Caragea C, Sinapov J, Silvescu A, Dobbs D, Honavar V (2007) Glycosylation site prediction using ensembles of support vector machine classifiers. BMC Bioinformatics 8(1):1–13
Hamby SE, Hirst JD (2008) Prediction of glycosylation sites using random forests. BMC Bioinformatics 9(1):1–13
Chauhan JS, Bhat AH, Raghava GP, Rao A (2012) GlycoPP: a webserver for prediction of N-and O-glycosites in prokaryotic protein sequences. PLoS One 7(7):e40155
Chauhan JS, Rao A, Raghava GP (2013) In silico platform for prediction of N-, O-and C-glycosites in eukaryotic protein sequences. PLoS One 8(6):e67008
Li F, Li C, Wang M, Webb GI, Zhang Y, Whisstock JC, Song J (2015) GlycoMine: a machine learning-based approach for predicting N-, C-and O-linked glycosylation in the human proteome. Bioinformatics 31(9):1411–1419
Pitti T, Chen CT, Lin HN, Choong WK, Hsu WL, Sung TY (2019) N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding. Sci Rep 9(1):1–11
Chien CH, Chang CC, Lin SH, Chen CW, Chang ZH, Chu YW (2020) N-GlycoGo: predicting protein N-glycosylation sites on imbalanced data sets by using heterogeneous and comprehensive strategy. IEEE Access 8:165944–165950
Taherzadeh G, Dehzangi A, Golchin M, Zhou Y, Campbell MP (2019) SPRINT-Gly: predicting N-and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics 35(20):4140–4146
Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J et al (2015) Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11
Taherzadeh G, Yang Y, Xu H, Xue Y, Liew AWC, Zhou Y (2018) Predicting lysine-malonylation sites of proteins using sequence and predicted structural features. J Comput Chem 39(22):1757–1763
Ahmad MW, Arafat ME, Taherzadeh G, Sharma A, Dipta SR, Dehzangi A, Shatabda S (2020) Mal-light: enhancing lysine malonylation sites prediction problem using evolutionary-based features. IEEE Access 8:77888–77902
Arafat M, Ahmad M, Shovan SM, Dehzangi A, Dipta SR, Hasan M et al (2020) Accurately predicting glutarylation sites using sequential bi-peptide-based evolutionary features. Genes 11(9):1023
Dipta SR, Taherzadeh G, Ahmad MW, Arafat ME, Shatabda S, Dehzangi A (2020) SEMal: accurate protein malonylation site predictor using structural and evolutionary information. Comput Biol Med 125:104022
Chandra A, Sharma A, Dehzangi A, Ranganathan S, Jokhan A, Chou KC, Tsunoda T (2018) PhoglyStruct: prediction of phosphoglycerylated lysine residues using structural properties of amino acids. Sci Rep 8(1):1–11
Islam MM, Saha S, Rahman MM, Shatabda S, Farid DM, Dehzangi A (2018) iProtGly-SS: identifying protein glycation sites using sequence and structure based features. Proteins 86(7):777–789
Dehzangi A, López Y, Lal SP, Taherzadeh G, Michaelson J, Sattar A et al (2017) PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J Theor Biol 425:97–102
Chandra AA, Sharma A, Dehzangi A, Tsunoda T (2019) EvolStruct-Phogly: incorporating structural properties and evolutionary information from profile bigrams for the phosphoglycerylation prediction. BMC Genomics 19(9):1–9
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32(suppl_1):D115–D119
Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH et al (2016) dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res 44(D1):D435–D446
Kaji H, Shikanai T, Suzuki Y, Narimatsu H (2017) GlycoProtDB: a database of glycoproteins mapped with actual glycosylation sites identified by mass spectrometry. In: A practical guide to using glycomics databases. Springer, Tokyo, pp 215–224
Zhang H, Loriaux P, Eng J, Campbell D, Keller A, Moss P et al (2006) UniPep-a database for human N-linked glycosites: a resource for biomarker discovery. Genome Biol 7(8):1–12
Campbell MP, Peterson R, Mariethoz J, Gasteiger E, Akune Y, Aoki-Kinoshita KF et al (2014) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res 42(D1):D215–D221
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659
Taherzadeh G, Yang Y, Zhang T, Liew AWC, Zhou Y (2016) Sequence-based prediction of protein–peptide binding sites using support vector machine. J Comput Chem 37(13):1223–1229
Meiler J, Müller M, Zeidler A, Schmäschke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. Mol Model Ann 7(9):360–369
Ruiz-Blanco YB, Marrero-Ponce Y, GarcĂa-Hernández E, Green J (2017) Novel “extended sequons” of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features. Amino Acids 49(2):317–325
Hamelryck T (2005) An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins 59(1):38–48
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Vapnik V (2013) The nature of statistical learning theory. Springer
Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recogn 33(1):25–41
Acknowledgments
We acknowledge the use of the High-Performance Computing Cluster “Gowonda,” resources provided by the Queensland Cyber Infrastructure Foundation, and Australian Research Data Commons. We also acknowledge the authors of previous methods for providing the web servers. This work was supported in part by Institute for Glycomics to M.P.C., and by the Australian Government Research Training Program Scholarship for G.T.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Taherzadeh, G., Campbell, M., Zhou, Y. (2022). Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins. In: KC, D.B. (eds) Computational Methods for Predicting Post-Translational Modification Sites. Methods in Molecular Biology, vol 2499. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2317-6_9
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2317-6_9
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2316-9
Online ISBN: 978-1-0716-2317-6
eBook Packages: Springer Protocols