Skip to main content
Log in

Use of  tetrapeptide signals for protein secondary-structure prediction

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

This paper develops a novel sequence-based method, tetra-peptide-based increment of diversity with quadratic discriminant analysis (TPIDQD for short), for protein secondary-structure prediction. The proposed TPIDQD method is based on tetra-peptide signals and is used to predict the structure of the central residue of a sequence fragment. The three-state overall per-residue accuracy (Q 3) is about 80% in the threefold cross-validated test for 21-residue fragments in the CB513 dataset. The accuracy can be further improved by  taking long-range sequence information (fragments of more than 21 residues) into account in prediction. The results show the tetra-peptide signals can indeed reflect some relationship between an amino acid’s sequence and its secondary structure, indicating the importance of  tetra-peptide signals as the protein folding code in the protein structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cai YD, Li YX, Chou KC (2000) Using neural networks for prediction of domain structural classes. Biochim Biophys Acta 1476:1–2

    PubMed  CAS  Google Scholar 

  • Cao Y, Liu S, Zhang L et al (2006) Prediction of protein structural class with rough sets. BMC Bioinformatics 7:20

    Article  PubMed  Google Scholar 

  • Cedano J, Aloy P, P’erez-Pons JA et al (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600

    Article  PubMed  CAS  Google Scholar 

  • Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Tian YX, Zou XY et al (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Zhou X, Tian Y et al (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121

    Article  PubMed  CAS  Google Scholar 

  • Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33:423–428

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1) D amino acid composition space. Proteins 21:319–344

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 43:246–255 (Erratum: ibid., 2001, 44:60)

    Google Scholar 

  • Chou KC (2005a) Review: progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr Protein Pept Sci 6(5):423–436

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2005b) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2006) Predicting protein-protein interactions from sequences in a hybridization space. J Proteome Res 5:316–322

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118

    Article  PubMed  CAS  Google Scholar 

  • Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:211–215

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Maggiora GM (1998) Domain structural class prediction. Protein Eng 11:523–538

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007a) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007b) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007c) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357:633–640

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2008) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1994) Predicting protein folding types by distance functions that make allowances for amino acid interactions. J Biol Chem 269:22014–22020

    PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  PubMed  CAS  Google Scholar 

  • Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34:508–519

    Article  PubMed  CAS  Google Scholar 

  • Diao Y, Li M, Feng Z et al (2007) The community structure of human cellular signaling network. J Theor Biol 247:608–615

    Article  PubMed  Google Scholar 

  • Diao Y, Ma D, Wen Z et al (2008) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids 34:111–117

    Article  PubMed  CAS  Google Scholar 

  • Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett 14:811–815

    Article  PubMed  CAS  Google Scholar 

  • Dor O, Zhou Y (2007) Achieving 80% tenfold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins 66:838–845

    Article  PubMed  CAS  Google Scholar 

  • Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7:518

    Article  PubMed  Google Scholar 

  • Fang Y, Guo Y, Feng Y et al (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34:103–109

    Article  PubMed  CAS  Google Scholar 

  • Frishman D, Argos P (1995) Knowledge-based secondary structure assignment. Proteins 23:566–579

    Article  PubMed  CAS  Google Scholar 

  • Gao QB, Wang ZZ (2006) Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 19:511–516

    Article  PubMed  CAS  Google Scholar 

  • Gao QB, Wang ZZ, Yan C et al (2005) Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 579:3444–3448

    Article  PubMed  CAS  Google Scholar 

  • Garnier J, Osguthorpe DJ, Robson B (1978) Analysis and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120

    Article  PubMed  CAS  Google Scholar 

  • Guo J, Hu C, Sun ZR et al (2004) A novel method for protein secondary structure prediction using dual-layer SVM and profiles. Proteins 54:738–743

    Article  PubMed  CAS  Google Scholar 

  • Guo J, Lin Y, Liu X (2006a) GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 6:5099–5105

    Article  PubMed  CAS  Google Scholar 

  • Guo YZ, Li M, Lu M et al (2006b) Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform. Amino Acids 30:397–402

    Article  PubMed  CAS  Google Scholar 

  • Hua SJ, Sun ZR (2001) A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 308:397–407

    Article  PubMed  CAS  Google Scholar 

  • Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20:21–28

    Article  PubMed  CAS  Google Scholar 

  • Jahandideh S, Abdolmaleki P, Jahandideh M et al (2007) Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys Chem 128:87–93

    Article  PubMed  CAS  Google Scholar 

  • Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202

    Article  PubMed  CAS  Google Scholar 

  • Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22:2577–2637

    Article  PubMed  CAS  Google Scholar 

  • Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14:846–856

    Article  PubMed  CAS  Google Scholar 

  • Kedarisetti KD, Kurgan LA, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988

    Article  PubMed  CAS  Google Scholar 

  • Kihara D (2005) The effect of long-range interactions on the secondary structure formation of proteins. Protein Sci 14:1955–1963

    Article  PubMed  CAS  Google Scholar 

  • Laxton RR (1978) The measure of diversity. J Theor Biol 71:51–67

    Article  Google Scholar 

  • Li FM, Li QZ (2008) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids 34:119–125

    Article  PubMed  CAS  Google Scholar 

  • Li QZ, Lu ZQ (2001) The prediction of the structural class of protein: application of the measure of diversity. J Theor Biol 213:493–502

    Article  PubMed  CAS  Google Scholar 

  • Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551

    Article  PubMed  CAS  Google Scholar 

  • Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466

    Article  PubMed  CAS  Google Scholar 

  • Lin K, Simossis VA, Taylor WR et al (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–159

    Article  PubMed  CAS  Google Scholar 

  • Liu H, Wang M, Chou KC (2005) Low-frequency Fourier spectrum for predicting membrane protein types. Biochem Biophys Res Commun 336:737–739

    Article  PubMed  CAS  Google Scholar 

  • Mondal S, Bhavna R, Mohan Babu R et al (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–260

    Article  PubMed  CAS  Google Scholar 

  • Montgomerie S, Sundararaj S, Gallin WJ et al (2006) Improving the accuracy of protein secondary structure prediction using structural alignment. Bioinformatics 7:301–313

    Article  PubMed  Google Scholar 

  • Mundra P, Kumar M, Kumar KK et al (2007) Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recognit Lett 28:1610–1615

    Article  Google Scholar 

  • Pauling L, Corey RB, Branson HR (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 37:205–234

    Article  PubMed  CAS  Google Scholar 

  • Petersen TN, Lundegaard C, Nielsen M et al (2000) Prediction of protein secondary structure at 80% accuracy. Proteins 41:17–20

    Article  PubMed  CAS  Google Scholar 

  • Pugalenthi G, Tang K, Suganthan PN et al (2007) A machine learning approach for the identification of odorant binding proteins from sequence-derived properties. BMC Bioinformatics 8:351

    Article  PubMed  Google Scholar 

  • Rackovsky S (1993) On the nature of protein folding code. Proc Natl Acad Sci USA 90:644–648

    Article  PubMed  CAS  Google Scholar 

  • Richards FM, Kundrot CE (1988) Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins 3:71–84

    Article  PubMed  CAS  Google Scholar 

  • Rost B, Sander C (1993) Prediction of secondary structure at better than 70% accuracy. J Mol Biol 232:584–599

    Article  PubMed  CAS  Google Scholar 

  • Rost B, Sander C (2000) Third generation prediction of secondary structures. Methods Mol Biol 143:71–95

    PubMed  CAS  Google Scholar 

  • Sadeghi M, Parto S, Arab S et al (2005) Prediction of protein secondary structure based on residue pair types and conformational states using dynamic programming algorithm. FEBS Lett 579:3397–3400

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007a) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364:53–59

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007b) Signal-3L: a 3-layer approach for predicting signal peptide. Biochem Biophys Res Commun 363:297–303

    Article  PubMed  CAS  Google Scholar 

  • Shi JY, Zhang SW, Pan Q et al (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74

    Article  PubMed  CAS  Google Scholar 

  • Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475

    Article  PubMed  CAS  Google Scholar 

  • Tan F, Feng X, Fang Z et al (2007) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids 33:669–675

    Article  PubMed  CAS  Google Scholar 

  • Tsai CJ, Nussinov R (2005) The implications of higher (or lower) success in secondary structure prediction of chain fragments. Protein Sci 14:1943–1944

    Article  PubMed  CAS  Google Scholar 

  • Wang M, Yang J, Liu GP et al (2004) Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition. Protein Eng Des Sel 17:509–516

    Article  PubMed  CAS  Google Scholar 

  • Wang M, Yang J, Xu ZJ et al (2005) SLLE for predicting membrane protein types. J Theor Biol 232:7–15

    Article  PubMed  CAS  Google Scholar 

  • Ward JJ, McGuffin LJ, Jones DT (2003) Secondary structure prediction with support vector machines. Bioinformatics 19:1650–1655

    Article  PubMed  CAS  Google Scholar 

  • Wen Z, Li M, Li Y et al (2006) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283

    Article  PubMed  Google Scholar 

  • Xiao X, Shao SH, Ding YS et al (2006) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54

    Article  PubMed  CAS  Google Scholar 

  • Xu KX (1999) Biomathematics (in Chinese). Science Press, Beijing

  • Zhang MQ (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci USA 94:565–568

    Article  PubMed  CAS  Google Scholar 

  • Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 33:623–629

    Article  PubMed  CAS  Google Scholar 

  • Zhang LR, Luo LF (2003) Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res 31:6214–6220

    Article  PubMed  CAS  Google Scholar 

  • Zhang SW, Pan Q, Zhang HC et al (2006a) Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30:461–468

    Article  PubMed  CAS  Google Scholar 

  • Zhang ZH, Wang ZH, Zhang ZR et al (2006b) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–6174

    Article  PubMed  CAS  Google Scholar 

  • Zhang TL, Ding YS, Chou KC (2007) Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. J Theor Biol 250:186–193

    Article  PubMed  Google Scholar 

  • Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins 44:57–59

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins 50:44–48

    Article  PubMed  CAS  Google Scholar 

  • Zhou XB, Chen C, Li ZC et al (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The work was supported by the National Science Foundation of  China (No. 90403010). The authors are grateful to Drs. Jun Lu, Ying Zhang, and Hao Lin for their helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liaofu Luo.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 3213 kb)

(PDF 225 kb)

(PDF 511 kb)

(PDF 411 kb)

(PDF 404 kb)

(PDF 390 kb)

(PDF 439 kb)

Appendices

Appendix 1

Table 4

Table 4 Probability parameters in defining tetra-peptide structural words in CB513 dataset

Appendix 2: the deduction of quadratic discriminant analysis in two-group case by using Bayesian theorem

For a sequence X to be classified between group ω 1 and group ω 1, assuming ω 1 is positive set and ω 2 is negative set, the discriminant function is defined by

$$ \xi = \ln p(\omega _{1} |x) - \ln p(\omega _{2} |x). $$
(10)

According to Bayes’ theorem,

$$ p(\omega _{l} |x) = p(\omega _{l} )p(x|\omega _{l} )/p(x)\quad (l = 1,2) $$
(11)

where p(ω l ) is the probability a priori of set l (= 1, 2), inserting Eq. 11 into Eq. 10, we obtain

$$ \xi = \ln \frac{{p(\omega _{1} )}} {{p(\omega _{2} )}} + \ln \frac{{p(x|\omega _{1} )}} {{p(x|\omega _{2} )}}. $$
(12)

Assume normal distribution of feature variables (M-dimensional vector) in two sets

$$ p(x|\omega _{l} ) = \frac{1} {{Z_{l} }}\exp \left( - \frac{1} {2}(x - \mu _{l} )^{T} \sum ^{{ - 1}}_{l} (x - \mu _{l} )\right) $$
(13)
$$ Z_{l} = (2\pi )^{{M/2}} {\left| {\Sigma _{l} } \right|}^{{1/2}} \quad (l = 1,2) $$

where μ l (M-dimensional vector) and \( \Sigma _{l} \) (M × M matrix) are the mean and covariant of feature variables over positive and negative sets respectively. Inserting Eq. 13 into Eq. 12, we obtain

$$ \xi = \ln \frac{{p(\omega _{1} )}} {{p(\omega _{2} )}} - \frac{1} {2}((x - \mu _{1} )^{T} \sum ^{{ - 1}}_{1} (x - \mu _{1} ) - (x - \mu _{2} )^{T} \sum ^{{ - 1}}_{2} (x - \mu _{2} )) - \frac{1} {2}\ln \frac{{{\left| {\Sigma _{1} } \right|}}} {{{\left| {\Sigma _{2} } \right|}}} $$
(14)
$$ {\text{Set}}\quad \delta _{l} = (x - \mu _{l} )\;\sum ^{{ - 1}}_{l} \;(x - \mu _{l} )\quad (l = 1,\;2) $$
(15)
$$ {\text{So}}\quad \xi _{{ij}} = \ln \frac{{p_{i} }} {{p_{j} }} - \frac{{\delta _{i} - \delta _{j} }} {2} - \frac{1} {2}\ln \frac{{{\left| {\Sigma _{i} } \right|}}} {{{\left| {\Sigma _{j} } \right|}}}. $$
(16)

This result can easily be generalized to more than two groups as shown in text.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, Y., Luo, L. Use of  tetrapeptide signals for protein secondary-structure prediction. Amino Acids 35, 607–614 (2008). https://doi.org/10.1007/s00726-008-0089-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-008-0089-7

Keywords

Navigation