Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins

Taherzadeh, Ghazaleh; Campbell, Matthew; Zhou, Yaoqi

doi:10.1007/978-1-0716-2317-6_9

Ghazaleh Taherzadeh^nAff3,
Matthew Campbell⁴ &
Yaoqi Zhou⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2499))

1170 Accesses
1 Citations

Abstract

Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew’s correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xiao H, Sun F, Suttapitugsakul S, Wu R (2019) Global and site-specific analysis of protein glycosylation in complex biological systems with mass spectrometry. Mass Spectrom Rev 38(4–5):356–379
Article CAS PubMed PubMed Central Google Scholar
Joshi HJ, Narimatsu Y, Schjoldager KT, Tytgat HL, Aebi M, Clausen H, Halim A (2018) SnapShot: O-glycosylation pathways across kingdoms. Cell 172(3):632–632
Article CAS PubMed Google Scholar
Varki A (2017) Biological roles of glycans. Glycobiology 27(1):3–49
Article CAS PubMed Google Scholar
Thomas DR, Scott NE (2021) Glycoproteomics: growing up fast. Curr Opin Struct Biol 68:18–25
Article CAS PubMed Google Scholar
Woods RJ (2018) Predicting the structures of glycans, glycoproteins, and their complexes. Chem Rev 118(17):8005–8024
Article CAS PubMed PubMed Central Google Scholar
Banazadeh A, Veillon L, Wooding KM, Zabet-moghaddam M, Mechref Y (2017) Recent advances in mass spectrometric analysis of glycoproteins. Electrophoresis 38(1):162–189
Article CAS PubMed Google Scholar
Weerapana E, Imperiali B (2006) Asparagine-linked protein glycosylation: from eukaryotic to prokaryotic systems. Glycobiology 16(6):91R–101R
Article CAS PubMed Google Scholar
Li Q, Xie Y, Wong M, Lebrilla CB (2019) Characterization of cell glycocalyx with mass spectrometry methods. Cell 8(8):882
Article CAS Google Scholar
Ruhaak LR, Xu G, Li Q, Goonatilleke E, Lebrilla CB (2018) Mass spectrometry approaches to glycomic and glycoproteomic analyses. Chem Rev 118(17):7886–7930
Article CAS PubMed PubMed Central Google Scholar
Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4(6):1633–1649
Article CAS PubMed Google Scholar
Caragea C, Sinapov J, Silvescu A, Dobbs D, Honavar V (2007) Glycosylation site prediction using ensembles of support vector machine classifiers. BMC Bioinformatics 8(1):1–13
Article CAS Google Scholar
Hamby SE, Hirst JD (2008) Prediction of glycosylation sites using random forests. BMC Bioinformatics 9(1):1–13
Article CAS Google Scholar
Chauhan JS, Bhat AH, Raghava GP, Rao A (2012) GlycoPP: a webserver for prediction of N-and O-glycosites in prokaryotic protein sequences. PLoS One 7(7):e40155
Article CAS PubMed PubMed Central Google Scholar
Chauhan JS, Rao A, Raghava GP (2013) In silico platform for prediction of N-, O-and C-glycosites in eukaryotic protein sequences. PLoS One 8(6):e67008
Article CAS PubMed PubMed Central Google Scholar
Li F, Li C, Wang M, Webb GI, Zhang Y, Whisstock JC, Song J (2015) GlycoMine: a machine learning-based approach for predicting N-, C-and O-linked glycosylation in the human proteome. Bioinformatics 31(9):1411–1419
Article PubMed CAS Google Scholar
Pitti T, Chen CT, Lin HN, Choong WK, Hsu WL, Sung TY (2019) N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding. Sci Rep 9(1):1–11
Article CAS Google Scholar
Chien CH, Chang CC, Lin SH, Chen CW, Chang ZH, Chu YW (2020) N-GlycoGo: predicting protein N-glycosylation sites on imbalanced data sets by using heterogeneous and comprehensive strategy. IEEE Access 8:165944–165950
Article Google Scholar
Taherzadeh G, Dehzangi A, Golchin M, Zhou Y, Campbell MP (2019) SPRINT-Gly: predicting N-and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics 35(20):4140–4146
Article CAS PubMed Google Scholar
Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849
Article CAS PubMed Google Scholar
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J et al (2015) Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11
Article Google Scholar
Taherzadeh G, Yang Y, Xu H, Xue Y, Liew AWC, Zhou Y (2018) Predicting lysine-malonylation sites of proteins using sequence and predicted structural features. J Comput Chem 39(22):1757–1763
Article CAS PubMed Google Scholar
Ahmad MW, Arafat ME, Taherzadeh G, Sharma A, Dipta SR, Dehzangi A, Shatabda S (2020) Mal-light: enhancing lysine malonylation sites prediction problem using evolutionary-based features. IEEE Access 8:77888–77902
Article PubMed PubMed Central Google Scholar
Arafat M, Ahmad M, Shovan SM, Dehzangi A, Dipta SR, Hasan M et al (2020) Accurately predicting glutarylation sites using sequential bi-peptide-based evolutionary features. Genes 11(9):1023
Article CAS PubMed Central Google Scholar
Dipta SR, Taherzadeh G, Ahmad MW, Arafat ME, Shatabda S, Dehzangi A (2020) SEMal: accurate protein malonylation site predictor using structural and evolutionary information. Comput Biol Med 125:104022
Article CAS PubMed Google Scholar
Chandra A, Sharma A, Dehzangi A, Ranganathan S, Jokhan A, Chou KC, Tsunoda T (2018) PhoglyStruct: prediction of phosphoglycerylated lysine residues using structural properties of amino acids. Sci Rep 8(1):1–11
CAS Google Scholar
Islam MM, Saha S, Rahman MM, Shatabda S, Farid DM, Dehzangi A (2018) iProtGly-SS: identifying protein glycation sites using sequence and structure based features. Proteins 86(7):777–789
Article CAS PubMed Google Scholar
Dehzangi A, López Y, Lal SP, Taherzadeh G, Michaelson J, Sattar A et al (2017) PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J Theor Biol 425:97–102
Article CAS PubMed Google Scholar
Chandra AA, Sharma A, Dehzangi A, Tsunoda T (2019) EvolStruct-Phogly: incorporating structural properties and evolutionary information from profile bigrams for the phosphoglycerylation prediction. BMC Genomics 19(9):1–9
Google Scholar
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32(suppl_1):D115–D119
Article CAS PubMed PubMed Central Google Scholar
Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH et al (2016) dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res 44(D1):D435–D446
Article CAS PubMed Google Scholar
Kaji H, Shikanai T, Suzuki Y, Narimatsu H (2017) GlycoProtDB: a database of glycoproteins mapped with actual glycosylation sites identified by mass spectrometry. In: A practical guide to using glycomics databases. Springer, Tokyo, pp 215–224
Chapter Google Scholar
Zhang H, Loriaux P, Eng J, Campbell D, Keller A, Moss P et al (2006) UniPep-a database for human N-linked glycosites: a resource for biomarker discovery. Genome Biol 7(8):1–12
Article Google Scholar
Campbell MP, Peterson R, Mariethoz J, Gasteiger E, Akune Y, Aoki-Kinoshita KF et al (2014) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res 42(D1):D215–D221
Article CAS PubMed Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article CAS PubMed PubMed Central Google Scholar
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659
Article CAS PubMed Google Scholar
Taherzadeh G, Yang Y, Zhang T, Liew AWC, Zhou Y (2016) Sequence-based prediction of protein–peptide binding sites using support vector machine. J Comput Chem 37(13):1223–1229
Article CAS PubMed Google Scholar
Meiler J, Müller M, Zeidler A, Schmäschke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. Mol Model Ann 7(9):360–369
Article CAS Google Scholar
Ruiz-Blanco YB, Marrero-Ponce Y, García-Hernández E, Green J (2017) Novel “extended sequons” of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features. Amino Acids 49(2):317–325
Article CAS PubMed Google Scholar
Hamelryck T (2005) An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins 59(1):38–48
Article CAS PubMed Google Scholar
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Article Google Scholar
Vapnik V (2013) The nature of statistical learning theory. Springer
Google Scholar
Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recogn 33(1):25–41
Article Google Scholar

Download references

Acknowledgments

We acknowledge the use of the High-Performance Computing Cluster “Gowonda,” resources provided by the Queensland Cyber Infrastructure Foundation, and Australian Research Data Commons. We also acknowledge the authors of previous methods for providing the web servers. This work was supported in part by Institute for Glycomics to M.P.C., and by the Australian Government Research Training Program Scholarship for G.T.

Author information

Ghazaleh Taherzadeh
Present address: Department of Mathematics and Computer Science, Wilkes University, Wilkes-Barre, PA, USA

Authors and Affiliations

Institute for Glycomics, Griffith University, Southport, QLD, Australia
Matthew Campbell
Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China
Yaoqi Zhou

Authors

Ghazaleh Taherzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Yaoqi Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ghazaleh Taherzadeh .

Editor information

Editors and Affiliations

Michigan Technological University, Houghton, MI, USA
Dukka B. KC

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Taherzadeh, G., Campbell, M., Zhou, Y. (2022). Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins. In: KC, D.B. (eds) Computational Methods for Predicting Post-Translational Modification Sites. Methods in Molecular Biology, vol 2499. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2317-6_9

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2317-6_9
Published: 14 June 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2316-9
Online ISBN: 978-1-0716-2317-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics