skip to main content
article

Classification of heterogeneous gene expression data

Authors Info & Claims
Published:01 December 2003Publication History
Skip Abstract Section

Abstract

Recent advanced technologies in DNA microarray analysis are intensively applied in disease classification, especially for cancer classification. Most recent proposed gene expression classifiers can successfully classify testing samples obtained from the same microarray experiment as training samples with the assumption that the symmetric errors are constant among training and testing samples. However, the classification performance is degraded with heterogeneous testing samples obtained from different microarray experiments. In this paper, we propose the "impact factors" (IFs) to measure the variations between individual classes in training samples and heterogeneous testing samples, and integrate the IFs to classifiers for classification of heterogeneous samples. Two publicly available lung adenocarcinomas gene expression data sets are used in our experiments to demonstrate the effectiveness of the IFs. It shows that, with the integration of the IFs to the Golub and Slonim (GS) and k-nearest neighbors (kNN) classifiers, the classifiers can be further improved on the classification accuracy of heterogeneous samples. Even more, the classification accuracy of the integrated GS classifier is around 90%.

References

  1. Aliferis, C. F., Hardin, D. and Massion, P. P. (2002): Machine learning models for lung cancer classification using array comparative genomic hybridization. Proc. of the American Medical Informatics Association 2002 Symposium, San Antonio, USA, pp. 7--11, AMIA.Google ScholarGoogle Scholar
  2. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, Z. (2000): Tissue classification with gene expression profiles. Journal of Computational Biology, 7, pp. 559--584.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ben-Dor, A., Shamir, R. and Yakhini, Z. (1999): Clustering gene expression patterns. Journal of Computational Biology, 6, pp. 281--297.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bhattacharjee, A., Richards, W., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E., Lander, E., Wong, W., Johnson, B., Golub, T., Sugarbaker, D., and Meyerson, M. (2001): Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. of the Natl. Acad. of Sci. USA, 98(24), pp. 13790--13795.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bilban, M., Buehler, L. K., Head, S, Desoye, G. and Quaranta, V. (2000): Normalizing DNA microarray data. Current Issues in Molecular Biology, 4(2), pp. 57--64.Google ScholarGoogle Scholar
  6. Cho, S. B. and Won, H. H. (2003): Machine learning in DNA microarray analysis for cancer classification. Proc. of the First Asia Pacific Bioinformatics Conference. Adelaide, Australia, 19, pp. 189--198, Australian Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chu, G., Narasimhan, B., Tibshirani, R. and Tusher, V. (2001): Significant analysis of microarrays. Users Guide and Technical Document, Technical Report, Department of Biological Science, University of Tulsa, USA.Google ScholarGoogle Scholar
  8. Dudoit, S., Fridlyand, J., and Speed, T. P. (2002): Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, pp. 77--87.Google ScholarGoogle ScholarCross RefCross Ref
  9. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., GassenBeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Blomfield, C. D. and Lander, E. S. (1999): Molecular classification of cancer: Class discovery and class predication by gene-expression monitoring. Science, 286, pp. 531--537.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jain, A. K., Duin, R. P. W. and Mao, J. (2000): Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis and Machine Intelligençe, 22(1), pp. 4--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lu, Y. and Han, J. (2003): Cancer classification using gene expression data. Information Systems, 28(4), pp. 243--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Morrison, N. and Hoyle, D. C. (2003): Normalization: Concepts and methods for normalizing microarray data. In A Practical Approach to MicroArray Data Analysis. Berrar, D. P., Dubitzky, W. and Granzow, M. (eds.). Boston, Kluwer Academic Publishers.Google ScholarGoogle Scholar
  13. Ramaswamy, S., Ross, K. N., Lander, E. S. and Golub, T. R. (2003): Evidence for a molecular signature of metastasis in primary solid tumors. Nature Genetics, 33, pp. 49--54.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C. H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J. P., Poggio, T., Gerald, W., Loda, M., Lander, E. S. and Golub, T. R. (2001): Multi-class cancer diagnosis using tumor gene expression signatures. Proc. of the Natl. Acad. of Sci. USA, 98(26), pp. 15149--15154.Google ScholarGoogle ScholarCross RefCross Ref
  15. Schuchhardt, J., Beule, D., Malik, A., Wolski, E., Eickhoff, H., Lehrach, H. and Herzel, H. (2000): Normalization Strategies for cDNA Microarrays. Nucleic Acids Research, 28(10), E47.Google ScholarGoogle ScholarCross RefCross Ref
  16. Slonim, D., Tamayo, P., Mesirov, J., Golub, T. and Lander, E. (2000): Class prediction and discovery using gene expression data. Proc. of the 4th Annual International Conference on Computational Molecular Biology, Tokyo, Japan, pp. 263--272, Universal Academy Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tan, A. C. and Gilbert, D. (2003): An Empirical Comparison of Supervised Machine Learning Techniques in Bioinformatics. Proc. of the First Asia Pacific Bioinformatics Conference, Adelaide, Australia, 19, pp. 219--222, Australian Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tsodikov, A., Szabo, A. and Jones, D. (2002): Adjustments and measures of differential expression for microarray data. Bioinformatics, 18(2), pp. 251--260.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tusher, V. G., Tibshirani, R., and Chu, G. (2001): Significance analysis of microarrays applied to the ionizing radiation response. Proc. of the Natl. Acad. of Sci. USA, 98(9), pp. 5116--5121.Google ScholarGoogle ScholarCross RefCross Ref
  20. Virtanen, C., Ishikawa, Y., Honjoh, D., Kimura, M., Shimane, M., Miyoshi, T., Nomura, H. and Jones, M. H (2002): Integrated classification of lung tumors and cell lines by expression profiling. Proc. of the Natl. Acad. of Sci. USA, 99(19), pp. 12357--12362.Google ScholarGoogle ScholarCross RefCross Ref
  21. Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002): Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4), e15.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yao, X. and Liu, Y. (1999): Neural networks for breast cancer diagnosis. Proc. of the 1999 Congress on Evolutionary Computation, New York, USA, pp. 1760--1767, IEEE Press.Google ScholarGoogle Scholar

Index Terms

  1. Classification of heterogeneous gene expression data
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader