Abstract
Recent advanced technologies in DNA microarray analysis are intensively applied in disease classification, especially for cancer classification. Most recent proposed gene expression classifiers can successfully classify testing samples obtained from the same microarray experiment as training samples with the assumption that the symmetric errors are constant among training and testing samples. However, the classification performance is degraded with heterogeneous testing samples obtained from different microarray experiments. In this paper, we propose the "impact factors" (IFs) to measure the variations between individual classes in training samples and heterogeneous testing samples, and integrate the IFs to classifiers for classification of heterogeneous samples. Two publicly available lung adenocarcinomas gene expression data sets are used in our experiments to demonstrate the effectiveness of the IFs. It shows that, with the integration of the IFs to the Golub and Slonim (GS) and k-nearest neighbors (kNN) classifiers, the classifiers can be further improved on the classification accuracy of heterogeneous samples. Even more, the classification accuracy of the integrated GS classifier is around 90%.
- Aliferis, C. F., Hardin, D. and Massion, P. P. (2002): Machine learning models for lung cancer classification using array comparative genomic hybridization. Proc. of the American Medical Informatics Association 2002 Symposium, San Antonio, USA, pp. 7--11, AMIA.Google Scholar
- Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, Z. (2000): Tissue classification with gene expression profiles. Journal of Computational Biology, 7, pp. 559--584.Google ScholarCross Ref
- Ben-Dor, A., Shamir, R. and Yakhini, Z. (1999): Clustering gene expression patterns. Journal of Computational Biology, 6, pp. 281--297.Google ScholarCross Ref
- Bhattacharjee, A., Richards, W., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E., Lander, E., Wong, W., Johnson, B., Golub, T., Sugarbaker, D., and Meyerson, M. (2001): Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. of the Natl. Acad. of Sci. USA, 98(24), pp. 13790--13795.Google ScholarCross Ref
- Bilban, M., Buehler, L. K., Head, S, Desoye, G. and Quaranta, V. (2000): Normalizing DNA microarray data. Current Issues in Molecular Biology, 4(2), pp. 57--64.Google Scholar
- Cho, S. B. and Won, H. H. (2003): Machine learning in DNA microarray analysis for cancer classification. Proc. of the First Asia Pacific Bioinformatics Conference. Adelaide, Australia, 19, pp. 189--198, Australian Computer Society. Google ScholarDigital Library
- Chu, G., Narasimhan, B., Tibshirani, R. and Tusher, V. (2001): Significant analysis of microarrays. Users Guide and Technical Document, Technical Report, Department of Biological Science, University of Tulsa, USA.Google Scholar
- Dudoit, S., Fridlyand, J., and Speed, T. P. (2002): Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, pp. 77--87.Google ScholarCross Ref
- Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., GassenBeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Blomfield, C. D. and Lander, E. S. (1999): Molecular classification of cancer: Class discovery and class predication by gene-expression monitoring. Science, 286, pp. 531--537.Google ScholarCross Ref
- Jain, A. K., Duin, R. P. W. and Mao, J. (2000): Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis and Machine Intelligençe, 22(1), pp. 4--37. Google ScholarDigital Library
- Lu, Y. and Han, J. (2003): Cancer classification using gene expression data. Information Systems, 28(4), pp. 243--268. Google ScholarDigital Library
- Morrison, N. and Hoyle, D. C. (2003): Normalization: Concepts and methods for normalizing microarray data. In A Practical Approach to MicroArray Data Analysis. Berrar, D. P., Dubitzky, W. and Granzow, M. (eds.). Boston, Kluwer Academic Publishers.Google Scholar
- Ramaswamy, S., Ross, K. N., Lander, E. S. and Golub, T. R. (2003): Evidence for a molecular signature of metastasis in primary solid tumors. Nature Genetics, 33, pp. 49--54.Google ScholarCross Ref
- Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C. H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J. P., Poggio, T., Gerald, W., Loda, M., Lander, E. S. and Golub, T. R. (2001): Multi-class cancer diagnosis using tumor gene expression signatures. Proc. of the Natl. Acad. of Sci. USA, 98(26), pp. 15149--15154.Google ScholarCross Ref
- Schuchhardt, J., Beule, D., Malik, A., Wolski, E., Eickhoff, H., Lehrach, H. and Herzel, H. (2000): Normalization Strategies for cDNA Microarrays. Nucleic Acids Research, 28(10), E47.Google ScholarCross Ref
- Slonim, D., Tamayo, P., Mesirov, J., Golub, T. and Lander, E. (2000): Class prediction and discovery using gene expression data. Proc. of the 4th Annual International Conference on Computational Molecular Biology, Tokyo, Japan, pp. 263--272, Universal Academy Press. Google ScholarDigital Library
- Tan, A. C. and Gilbert, D. (2003): An Empirical Comparison of Supervised Machine Learning Techniques in Bioinformatics. Proc. of the First Asia Pacific Bioinformatics Conference, Adelaide, Australia, 19, pp. 219--222, Australian Computer Society. Google ScholarDigital Library
- Tsodikov, A., Szabo, A. and Jones, D. (2002): Adjustments and measures of differential expression for microarray data. Bioinformatics, 18(2), pp. 251--260.Google ScholarCross Ref
- Tusher, V. G., Tibshirani, R., and Chu, G. (2001): Significance analysis of microarrays applied to the ionizing radiation response. Proc. of the Natl. Acad. of Sci. USA, 98(9), pp. 5116--5121.Google ScholarCross Ref
- Virtanen, C., Ishikawa, Y., Honjoh, D., Kimura, M., Shimane, M., Miyoshi, T., Nomura, H. and Jones, M. H (2002): Integrated classification of lung tumors and cell lines by expression profiling. Proc. of the Natl. Acad. of Sci. USA, 99(19), pp. 12357--12362.Google ScholarCross Ref
- Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002): Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4), e15.Google ScholarCross Ref
- Yao, X. and Liu, Y. (1999): Neural networks for breast cancer diagnosis. Proc. of the 1999 Congress on Evolutionary Computation, New York, USA, pp. 1760--1767, IEEE Press.Google Scholar
Index Terms
- Classification of heterogeneous gene expression data
Recommendations
Cancer classification using gene expression data
Special issue: Data management in bioinformaticsThe classification of different tumor types is of great importance in cancer diagnosis and drug discovery. However, most previous cancer classification studies are clinical based and have limited diagnostic ability. Cancer classification using gene ...
Gene Expression Data Classification Using Independent Variable Group Analysis
ISNN '08: Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part IIMicroarrays are capable of detecting the expression levels of thousands of genes simultaneously. In this paper, a new method for gene selection based on independent variable group analysis is proposed. In this method, we first used <em>t</em>-statistics ...
Finding Correlated Biclusters from Gene Expression Data
Extracting biologically relevant information from DNA microarrays is a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been proposed for the analysis of gene expression data, ...
Comments