Use of  tetrapeptide signals for protein secondary-structure prediction

Feng, Yonge; Luo, Liaofu

doi:10.1007/s00726-008-0089-7

Use of tetrapeptide signals for protein secondary-structure prediction

Original Article
Published: 23 April 2008

Volume 35, pages 607–614, (2008)
Cite this article

Amino Acids Aims and scope Submit manuscript

Yonge Feng¹ &
Liaofu Luo¹

1294 Accesses
33 Citations
Explore all metrics

Abstract

This paper develops a novel sequence-based method, tetra-peptide-based increment of diversity with quadratic discriminant analysis (TPIDQD for short), for protein secondary-structure prediction. The proposed TPIDQD method is based on tetra-peptide signals and is used to predict the structure of the central residue of a sequence fragment. The three-state overall per-residue accuracy (Q ₃) is about 80% in the threefold cross-validated test for 21-residue fragments in the CB513 dataset. The accuracy can be further improved by taking long-range sequence information (fragments of more than 21 residues) into account in prediction. The results show the tetra-peptide signals can indeed reflect some relationship between an amino acid’s sequence and its secondary structure, indicating the importance of tetra-peptide signals as the protein folding code in the protein structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Article Open access 16 May 2015

Protein Structural Signatures Revisited: Geometric Linearity of Main Chains are More Relevant to Classification Performance than Packing of Residues

Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification

Article 15 June 2014

References

Cai YD, Li YX, Chou KC (2000) Using neural networks for prediction of domain structural classes. Biochim Biophys Acta 1476:1–2
PubMed CAS Google Scholar
Cao Y, Liu S, Zhang L et al (2006) Prediction of protein structural class with rough sets. BMC Bioinformatics 7:20
Article PubMed Google Scholar
Cedano J, Aloy P, P’erez-Pons JA et al (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600
Article PubMed CAS Google Scholar
Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381
Article PubMed CAS Google Scholar
Chen C, Tian YX, Zou XY et al (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448
Article PubMed CAS Google Scholar
Chen C, Zhou X, Tian Y et al (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121
Article PubMed CAS Google Scholar
Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33:423–428
Article PubMed CAS Google Scholar
Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1) D amino acid composition space. Proteins 21:319–344
Article PubMed CAS Google Scholar
Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 43:246–255 (Erratum: ibid., 2001, 44:60)
Google Scholar
Chou KC (2005a) Review: progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr Protein Pept Sci 6(5):423–436
Article PubMed CAS Google Scholar
Chou KC (2005b) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19
Article PubMed CAS Google Scholar
Chou KC, Cai YD (2006) Predicting protein-protein interactions from sequences in a hybridization space. J Proteome Res 5:316–322
Article PubMed CAS Google Scholar
Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118
Article PubMed CAS Google Scholar
Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:211–215
Article PubMed CAS Google Scholar
Chou KC, Maggiora GM (1998) Domain structural class prediction. Protein Eng 11:523–538
Article PubMed CAS Google Scholar
Chou KC, Shen HB (2007a) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345
Article PubMed CAS Google Scholar
Chou KC, Shen HB (2007b) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16
Article PubMed CAS Google Scholar
Chou KC, Shen HB (2007c) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357:633–640
Article PubMed CAS Google Scholar
Chou KC, Shen HB (2008) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162
Article PubMed CAS Google Scholar
Chou KC, Zhang CT (1994) Predicting protein folding types by distance functions that make allowances for amino acid interactions. J Biol Chem 269:22014–22020
PubMed CAS Google Scholar
Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Article PubMed CAS Google Scholar
Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34:508–519
Article PubMed CAS Google Scholar
Diao Y, Li M, Feng Z et al (2007) The community structure of human cellular signaling network. J Theor Biol 247:608–615
Article PubMed Google Scholar
Diao Y, Ma D, Wen Z et al (2008) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids 34:111–117
Article PubMed CAS Google Scholar
Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett 14:811–815
Article PubMed CAS Google Scholar
Dor O, Zhou Y (2007) Achieving 80% tenfold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins 66:838–845
Article PubMed CAS Google Scholar
Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7:518
Article PubMed Google Scholar
Fang Y, Guo Y, Feng Y et al (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34:103–109
Article PubMed CAS Google Scholar
Frishman D, Argos P (1995) Knowledge-based secondary structure assignment. Proteins 23:566–579
Article PubMed CAS Google Scholar
Gao QB, Wang ZZ (2006) Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 19:511–516
Article PubMed CAS Google Scholar
Gao QB, Wang ZZ, Yan C et al (2005) Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 579:3444–3448
Article PubMed CAS Google Scholar
Garnier J, Osguthorpe DJ, Robson B (1978) Analysis and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120
Article PubMed CAS Google Scholar
Guo J, Hu C, Sun ZR et al (2004) A novel method for protein secondary structure prediction using dual-layer SVM and profiles. Proteins 54:738–743
Article PubMed CAS Google Scholar
Guo J, Lin Y, Liu X (2006a) GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 6:5099–5105
Article PubMed CAS Google Scholar
Guo YZ, Li M, Lu M et al (2006b) Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform. Amino Acids 30:397–402
Article PubMed CAS Google Scholar
Hua SJ, Sun ZR (2001) A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 308:397–407
Article PubMed CAS Google Scholar
Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20:21–28
Article PubMed CAS Google Scholar
Jahandideh S, Abdolmaleki P, Jahandideh M et al (2007) Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys Chem 128:87–93
Article PubMed CAS Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Article PubMed CAS Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22:2577–2637
Article PubMed CAS Google Scholar
Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14:846–856
Article PubMed CAS Google Scholar
Kedarisetti KD, Kurgan LA, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988
Article PubMed CAS Google Scholar
Kihara D (2005) The effect of long-range interactions on the secondary structure formation of proteins. Protein Sci 14:1955–1963
Article PubMed CAS Google Scholar
Laxton RR (1978) The measure of diversity. J Theor Biol 71:51–67
Article Google Scholar
Li FM, Li QZ (2008) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids 34:119–125
Article PubMed CAS Google Scholar
Li QZ, Lu ZQ (2001) The prediction of the structural class of protein: application of the measure of diversity. J Theor Biol 213:493–502
Article PubMed CAS Google Scholar
Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551
Article PubMed CAS Google Scholar
Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466
Article PubMed CAS Google Scholar
Lin K, Simossis VA, Taylor WR et al (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–159
Article PubMed CAS Google Scholar
Liu H, Wang M, Chou KC (2005) Low-frequency Fourier spectrum for predicting membrane protein types. Biochem Biophys Res Commun 336:737–739
Article PubMed CAS Google Scholar
Mondal S, Bhavna R, Mohan Babu R et al (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–260
Article PubMed CAS Google Scholar
Montgomerie S, Sundararaj S, Gallin WJ et al (2006) Improving the accuracy of protein secondary structure prediction using structural alignment. Bioinformatics 7:301–313
Article PubMed Google Scholar
Mundra P, Kumar M, Kumar KK et al (2007) Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recognit Lett 28:1610–1615
Article Google Scholar
Pauling L, Corey RB, Branson HR (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 37:205–234
Article PubMed CAS Google Scholar
Petersen TN, Lundegaard C, Nielsen M et al (2000) Prediction of protein secondary structure at 80% accuracy. Proteins 41:17–20
Article PubMed CAS Google Scholar
Pugalenthi G, Tang K, Suganthan PN et al (2007) A machine learning approach for the identification of odorant binding proteins from sequence-derived properties. BMC Bioinformatics 8:351
Article PubMed Google Scholar
Rackovsky S (1993) On the nature of protein folding code. Proc Natl Acad Sci USA 90:644–648
Article PubMed CAS Google Scholar
Richards FM, Kundrot CE (1988) Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins 3:71–84
Article PubMed CAS Google Scholar
Rost B, Sander C (1993) Prediction of secondary structure at better than 70% accuracy. J Mol Biol 232:584–599
Article PubMed CAS Google Scholar
Rost B, Sander C (2000) Third generation prediction of secondary structures. Methods Mol Biol 143:71–95
PubMed CAS Google Scholar
Sadeghi M, Parto S, Arab S et al (2005) Prediction of protein secondary structure based on residue pair types and conformational states using dynamic programming algorithm. FEBS Lett 579:3397–3400
Article PubMed CAS Google Scholar
Shen HB, Chou KC (2007a) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364:53–59
Article PubMed CAS Google Scholar
Shen HB, Chou KC (2007b) Signal-3L: a 3-layer approach for predicting signal peptide. Biochem Biophys Res Commun 363:297–303
Article PubMed CAS Google Scholar
Shi JY, Zhang SW, Pan Q et al (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74
Article PubMed CAS Google Scholar
Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475
Article PubMed CAS Google Scholar
Tan F, Feng X, Fang Z et al (2007) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids 33:669–675
Article PubMed CAS Google Scholar
Tsai CJ, Nussinov R (2005) The implications of higher (or lower) success in secondary structure prediction of chain fragments. Protein Sci 14:1943–1944
Article PubMed CAS Google Scholar
Wang M, Yang J, Liu GP et al (2004) Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition. Protein Eng Des Sel 17:509–516
Article PubMed CAS Google Scholar
Wang M, Yang J, Xu ZJ et al (2005) SLLE for predicting membrane protein types. J Theor Biol 232:7–15
Article PubMed CAS Google Scholar
Ward JJ, McGuffin LJ, Jones DT (2003) Secondary structure prediction with support vector machines. Bioinformatics 19:1650–1655
Article PubMed CAS Google Scholar
Wen Z, Li M, Li Y et al (2006) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283
Article PubMed Google Scholar
Xiao X, Shao SH, Ding YS et al (2006) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54
Article PubMed CAS Google Scholar
Xu KX (1999) Biomathematics (in Chinese). Science Press, Beijing
Zhang MQ (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci USA 94:565–568
Article PubMed CAS Google Scholar
Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 33:623–629
Article PubMed CAS Google Scholar
Zhang LR, Luo LF (2003) Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res 31:6214–6220
Article PubMed CAS Google Scholar
Zhang SW, Pan Q, Zhang HC et al (2006a) Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30:461–468
Article PubMed CAS Google Scholar
Zhang ZH, Wang ZH, Zhang ZR et al (2006b) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–6174
Article PubMed CAS Google Scholar
Zhang TL, Ding YS, Chou KC (2007) Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. J Theor Biol 250:186–193
Article PubMed Google Scholar
Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738
Article PubMed CAS Google Scholar
Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins 44:57–59
Article PubMed CAS Google Scholar
Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins 50:44–48
Article PubMed CAS Google Scholar
Zhou XB, Chen C, Li ZC et al (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551
Article PubMed CAS Google Scholar

Download references

Acknowledgments

The work was supported by the National Science Foundation of China (No. 90403010). The authors are grateful to Drs. Jun Lu, Ying Zhang, and Hao Lin for their helpful discussions.

Author information

Authors and Affiliations

Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot, 010021, China
Yonge Feng & Liaofu Luo

Authors

Yonge Feng
View author publications
You can also search for this author in PubMed Google Scholar
Liaofu Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liaofu Luo.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 3213 kb)

(PDF 225 kb)

(PDF 511 kb)

(PDF 411 kb)

(PDF 404 kb)

(PDF 390 kb)

(PDF 439 kb)

Appendices

Appendix 1

Table 4

Table 4 Probability parameters in defining tetra-peptide structural words in CB513 dataset

Full size table

Appendix 2: the deduction of quadratic discriminant analysis in two-group case by using Bayesian theorem

For a sequence X to be classified between group ω ₁ and group ω ₁, assuming ω ₁ is positive set and ω ₂ is negative set, the discriminant function is defined by

$$ \xi = \ln p(\omega _{1} |x) - \ln p(\omega _{2} |x). $$

(10)

According to Bayes’ theorem,

$$ p(\omega _{l} |x) = p(\omega _{l} )p(x|\omega _{l} )/p(x)\quad (l = 1,2) $$

(11)

where p(ω _l) is the probability a priori of set l (l = 1, 2), inserting Eq. 11 into Eq. 10, we obtain

$$ \xi = \ln \frac{{p(\omega _{1} )}} {{p(\omega _{2} )}} + \ln \frac{{p(x|\omega _{1} )}} {{p(x|\omega _{2} )}}. $$

(12)

Assume normal distribution of feature variables (M-dimensional vector) in two sets

$$ p(x|\omega _{l} ) = \frac{1} {{Z_{l} }}\exp \left( - \frac{1} {2}(x - \mu _{l} )^{T} \sum ^{{ - 1}}_{l} (x - \mu _{l} )\right) $$

(13)

$$ Z_{l} = (2\pi )^{{M/2}} {\left| {\Sigma _{l} } \right|}^{{1/2}} \quad (l = 1,2) $$

where μ _l (M-dimensional vector) and $ \Sigma _{l} $ (M × M matrix) are the mean and covariant of feature variables over positive and negative sets respectively. Inserting Eq. 13 into Eq. 12, we obtain

$$ \xi = \ln \frac{{p(\omega _{1} )}} {{p(\omega _{2} )}} - \frac{1} {2}((x - \mu _{1} )^{T} \sum ^{{ - 1}}_{1} (x - \mu _{1} ) - (x - \mu _{2} )^{T} \sum ^{{ - 1}}_{2} (x - \mu _{2} )) - \frac{1} {2}\ln \frac{{{\left| {\Sigma _{1} } \right|}}} {{{\left| {\Sigma _{2} } \right|}}} $$

(14)

$$ {\text{Set}}\quad \delta _{l} = (x - \mu _{l} )\;\sum ^{{ - 1}}_{l} \;(x - \mu _{l} )\quad (l = 1,\;2) $$

(15)

$$ {\text{So}}\quad \xi _{{ij}} = \ln \frac{{p_{i} }} {{p_{j} }} - \frac{{\delta _{i} - \delta _{j} }} {2} - \frac{1} {2}\ln \frac{{{\left| {\Sigma _{i} } \right|}}} {{{\left| {\Sigma _{j} } \right|}}}. $$

(16)

This result can easily be generalized to more than two groups as shown in text.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, Y., Luo, L. Use of tetrapeptide signals for protein secondary-structure prediction. Amino Acids 35, 607–614 (2008). https://doi.org/10.1007/s00726-008-0089-7

Download citation

Received: 15 December 2007
Accepted: 04 March 2008
Published: 23 April 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s00726-008-0089-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use of tetrapeptide signals for protein secondary-structure prediction

Abstract

Access this article

Similar content being viewed by others

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Protein Structural Signatures Revisited: Geometric Linearity of Main Chains are More Relevant to Classification Performance than Packing of Residues

Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

(PDF 3213 kb)

(PDF 225 kb)

(PDF 511 kb)

(PDF 411 kb)

(PDF 404 kb)

(PDF 390 kb)

(PDF 439 kb)

Appendices

Appendix 1

Appendix 2: the deduction of quadratic discriminant analysis in two-group case by using Bayesian theorem

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Use of tetrapeptide signals for protein secondary-structure prediction

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Appendices

Appendix 1

Appendix 2: the deduction of quadratic discriminant analysis in two-group case by using Bayesian theorem

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation