Abstract
In the development of methodology for statistical prediction of protein folding types, how to test the predicted results is a crucial problem. In addition to the resubstitution test in which the folding type of each protein from a training set is predicted based on the rules derived from the same set, cross-validation tests are needed. Among them, the single-testset method seems to be least reliable due to the arbitrariness in selecting the test set. Although the leaving-one-out (or jackknife) test is more objective and hence more reliable, it may cause a severe information loss by leaving a protein in turn out of the training set when its size is not large enough. In order to overcome the above drawback, a seed-propagated sampling approach is proposed that can be used to generate any number of simulated proteins with a desired type based on a given training set database. There is no need to make any predetermined assumption about the statistical distribution function of the amino acid frequencies. Combined with the existing cross-validation methods, the new technique may provide a more objective estimation for various protein-folding-type prediction methods.
Similar content being viewed by others
References
Chou, K. C. (1995). A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space.Proteins: Structure, Function, and Genetics.21, 319–344.
Chou, K. C., and Zhang, C. T. (1992). A correlation-coefficient method to predicting protein-structural classes from amino acid compositions.Eur. J. Biochem. 207, 429–433.
Chou, K. C. and Zhang, C. T. (1994). Predicting protein folding types by distance functions that make allowances for amino acid interactions.Journal of Biological Chemistry.269, 22014–22020.
Chou, P. Y. (1980). Amino acid composition of four classes of proteins, inAbstracts of Papers, Part I, Second Chemical Congress of the North American Continent, Las Vegas.
Chou, P. Y. (1989). Prediction of protein structural classes from amino acid composition, inPrediction of Protein Structure and the Principles of Protein Conformation (Fasman, G. D., ed.), Plenum Press, New York, pp. 549–586.
DeGroot, M. H. (1986). InProbability and Statistics, 2nd ed., Addison-Wesley, Reading, Massachusetts, Chapter 5.
Dubchak, I., Holbrook, S. R., and Kim, S.-H. (1993). Prediction of protein folding class from amino acid composition,Proteins Struct. Funct. Genet. 16, 79–91.
Efron, B. (1982). InThe Jackknife, the Bootstrap and Other Resampling Plans, Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania.
Klein, P. (1986). Prediction of protein structural class by discriminant analysis,Biochim. Biophys. Acta 874, 205–215.
Klein, P., and Delisi, C. (1986). Prediction of protein structural class from amino acid sequence,Biopolymers 25, 1569–1672.
Levitt, M., and Chothia, C. (1976). Structural patterns in globular proteins,Nature 261, 552–557.
Mao, B., Chou, K. C., and Zhang, C. T. (1994). Protein folding class: A geometric interpretation of the amino acid composition of globular proteins,Protein Eng. 7, 319–330.
Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979).Multivariate Analysis, Academic Press, London.
Metfessel, B. A., Saurugger, P. N., Connelly, D. P., and Rich, S. T. (1993). Cross-validation of protein structural class prediction using statistical clustering and neural networks,Protein Sci. 2, 1171–1182.
Mooney, C. Z., and Duval, R. D. (1993). InBootstrapping: A Nonparametric Approach to Statistical Inference, Sage, Newbury Park.
Muskal, S. M., and Kim, S.-H. (1992). Predicting protin second structure content,J. Mol. Biol. 225, 713–727.
Nakashima, H., Nishikawa, K., and Ooi, T. (1986). The folding type of a protein is relevant to the amino acid composition,J. Biochem. 99, 152–162.
Richardson, J. S., and Richardson, D. C. (1989). Principles and patterns of protein conformation, inPrediction of Protein Structure and the Principles of Protein Conformation (Fasman, G. D., ed.), Plenum Press, New York, pp. 1–98.
Zhang, C. T., and Chou, K. C. (1992a). An optimization approach to predicting protein structural class from amino acid composition,Protein Sci. 1, 401–408.
Zhang, C. T., and Chou, K. C. (1992b). Monte Carlo simulation studies on the prediction of protein folding types from amino acid composition,Biophys. J. 63, 1523–1529.
Zhou, G. F., Xu, X., and Zhang, C. T. (1992). A weighting method for predicting protein structural class from amino acid composition,Eur. J. Biochem. 210, 747–749.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zhang, CT., Chou, KC. An analysis of protein folding type prediction by seed-propagated sampling and jackknife test. J Protein Chem 14, 583–593 (1995). https://doi.org/10.1007/BF01886884
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01886884