Abstract
Pattern recognition and classification problems are most popular issue in machine learning, and it seem that they meet their second golden age with bioinformatics. However, the dataset of bioinformatics has several distinctive characteristics compared to the data set in classical pattern recognition and classification research area. One of the most difficulties using this theory in bioinformatics is that raw data of DNA or protein sequences cannot be directly used as input data for machine learning because every sequence has different length of its own code sequences. Therefore, this paper introduces one of the methods to overcome this difficulty, and also argues that the capability of generalization in this method is very poor as showing simple experiments. Finally, this paper suggests different approach to select the fixed number of effective features by using Support Vector Machine, and noise whitening method. This paper also defines the criteria of this suggested method and shows that this method improves the precision of diagnosing abnormal protein sequences with experiment of classifying ovarian cancer data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Inc, New York (2001)
Hansen, P.C.: Regularization Tools, A Matlab Package for Analysis and Solution of Discrete Ill-Posed Problems. Version 3.1 for Matlab 6.0 (2001)
Haykin, S.: Neural Networks, A comprehensive Foundation. Prentice-Hall Inc., Englewood Cliffs (1999)
Jeong, J.C.: A New Learning Methodology for Support Vector Machine and Regularization RBF Neural Networks. Thesis for the degree of the master of engineering. Department of Computer Engineering Graduate School. Yosu National University. Republic of Korea (2002)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Kim, E.M., Park, S.M., Kim, K.H., Lee, B.H.: An effective machine learning algorithm using momentum scheduling. In: Hybrid Intelligent Systems, Japan, pp. 442–443 (2004)
Li, L., William, S.N.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationship. Journal of Computational Biology 10(6), 857–867 (2003)
Mangasarian, O.L., Musicant, D.R.: Active Set Support Vector Machine Classification. In: Lee, T.K., Dietterich, T.G., Tresp, V. (eds.) Neural Information Processing Systems 2000 (NIPS 2000), pp. 577–583. MIT Press, Cambridge (2001)
Platt, J.C.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Advances in Kernels Methods: Support Vector Learning, MIT Press, Cambridge (1998)
Tikhonov, A.N.: On solving incorrectly posed problems and method of regularization. Doklady Akademii Nauk USSR 151, 501–504 (1963)
Vapnik, V.: Statistical learning theory. John Wiley and Sons, New York (1998)
Yoo, J.H., Jeong, J.C.: Sparse Representation Learning of Kernel Space Using the Kernel Relaxation Procedure. Journal of Fuzzy Logic and Intelligent Systems 11(9), 817–821 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Kim, EM., Jeong, JC., Pae, HY., Lee, BH. (2007). A New Feature Selection Method for Improving the Precision of Diagnosing Abnormal Protein Sequences by Support Vector Machine and Vectorization Method. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71629-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-71629-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71590-0
Online ISBN: 978-3-540-71629-7
eBook Packages: Computer ScienceComputer Science (R0)