Null space based feature selection method for gene expression data

Sharma, Alok; Imoto, Seiya; Miyano, Satoru; Sharma, Vandana

doi:10.1007/s13042-011-0061-9

Null space based feature selection method for gene expression data

Original Article
Published: 29 November 2011

Volume 3, pages 269–276, (2012)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Alok Sharma^1,2,
Seiya Imoto¹,
Satoru Miyano¹ &
…
Vandana Sharma³

465 Accesses
55 Citations
Explore all metrics

Abstract

Feature selection is quite an important process in gene expression data analysis. Feature selection methods discard unimportant genes from several thousands of genes for finding important genes or pathways for the target biological phenomenon like cancer. The obtained gene subset is used for statistical analysis for prediction such as survival as well as functional analysis for understanding biological characteristics. In this paper we propose a null space based feature selection method for gene expression data in terms of supervised classification. The proposed method discards the redundant genes by applying the information of null space of scatter matrices. We derive the method theoretically and demonstrate its effectiveness on several DNA gene expression datasets. The method is easy to implement and computationally efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection using non-dominant features-guided search for gene expression profile data

Article Open access 26 April 2023

Xiaoying Pan, Jun Sun, … Yufeng Xue

Feature Selection Method Based on Differential Correlation Information Entropy

Article 01 August 2020

Xiujuan Wang, Yixuan Yan & Xiaoyue Ma

Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets

Notes

The finer categorization of feature selection methods will include filter approach, wrapper approach and embedded approach [19].
Most of the datasets are downloaded from the Kent Ridge Bio-medical Dataset (KRBD) (http://datam.i2r.a-star.edu.sg/datasets/krbd/). The datasets are transformed or reformatted and made available by KRBD repository and we have used them without any further preprocessing. Some datasets which are not available on KRBD repository are downloaded and directly used from respective authors’ supplement link. The URL addresses for all the datasets are given in the Reference Section.
IPA, http://www.ingenuity.com.

References

Arif M, Akram MU, Minhas FAA (2010) Pruned fuzzy k-nearest neighbor classifier for beat classification. J Biomed Sci Eng 3:380–389
Article Google Scholar
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsemeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30:41–47 (Data Source 1: http://datam.i2r.a-star.edu.sg/datasets/krbd/) (Data Source 2: http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=63)
Banerjee M, Mitra S, Banka H (2007) Evolutinary-rough feature selection in gene expression data. IEEE Trans Syst Man Cybern Part C Appl Rev 37:622–632
Article Google Scholar
Chen L-F, Liao H-YM, Ko M-T, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 33:1713–1726
Article Google Scholar
Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134
Article Google Scholar
Caballero JCF, Martinez FJ, Hervas C, Gutierrez PA (2010) Sensitivity versus accuracy in multiclass problems using memetic Pareto evolutionary neural networks. IEEE Trans Neural Netw 21(5):750–770
Article Google Scholar
Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: The ACM SIGMOD International Conference on Management of Data, pp 670–681
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
MATH Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discriminant methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Article MathSciNet MATH Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press Inc., Hartcourt Brace Jovanovich, Publishers, Boston
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914
Article Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537 (Data Source: http://datam.i2r.a-star.edu.sg/datasets/krbd/)
Google Scholar
Gordon GJ, Jensen RV, Hsiao L-L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967 (Data Source 1: http://datam.i2r.a-star.edu.sg/datasets/krbd/) (Data Source 2: http://www.chestsurg.org)
Huang R, Liu Q, Lu H, Ma S (2002) Solving the small sample size problem of LDA. Proc ICPR 3:29–32
Google Scholar
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural network. Nat Med 7:673–679 (Data Source: http://research.nhgri.nih.gov/microarray/Supplement/)
Google Scholar
Li J, Wong L (2003) Using rules to analyse bio-medical data: a comparison between C4.5 and PCL, In: Advances in Web-Age Information Management. Springer, Berlin/Heidelberg, pp 254–265
Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18:546–554
Article Google Scholar
Pavlidis P, Weston J, Cai J, Grundy WN, (2001) Gene functional classification from heterogeneous data. In: International Conference on Computational Biology, pp 249–255
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Google Scholar
Sharma A, Paliwal KK (2010) Improved nearest centroid classifier with shrunken distance measure for null LDA method on cancer classification problem. Electron Lett IEE 46(18):1251–1252
Article Google Scholar
Sharma A, Koh CH, Imoto S, Miyano S (2011) Strategy of finding optimal number of features on gene expression data. Electron Lett IEE 47(8):480–482
Article Google Scholar
Sharma A, Imoto S, Miyano S (2012) A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinforma. (accepted) http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.151
Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinforma 2(3 Suppl):S75–S83
Google Scholar
Tao L, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(14):2429–2437
Google Scholar
Thomas J, Olson JM, Tapscott SJ, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236
Article Google Scholar
Tong DL, Mintram R (2010) Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75–87
Article Google Scholar
Wang X-Z, Dong C-R (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567
Article Google Scholar
Wang X-Z, Zhai J-H, Lu S-X (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202
Article MathSciNet MATH Google Scholar
Ye J (2005) Characterization of a family of algorithms for generalized discriminant analysis on under sampled problems. J Mach Learn Res 6:483–502
MathSciNet MATH Google Scholar
Zhao H-X, Xing H-J, Wang X-Z (2011) Two-stage dimensionality reduction approach based on 2DLDA and fuzzy rough sets technique. Neurocomputing 74:3722–3727
Article Google Scholar

Download references

Acknowledgments

We thank the Reviewers and the Editor for their constructive comments which appreciably improved the presentation quality of the paper.

Author information

Authors and Affiliations

Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan
Alok Sharma, Seiya Imoto & Satoru Miyano
School of Engineering and Physics, University of the South Pacific, Suva, Fiji
Alok Sharma
CWM Hospital, Suva, Fiji
Vandana Sharma

Authors

Alok Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Seiya Imoto
View author publications
You can also search for this author in PubMed Google Scholar
Satoru Miyano
View author publications
You can also search for this author in PubMed Google Scholar
Vandana Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alok Sharma.

Appendix

Theorem 1

Let the column vectors of orthogonal matrix W span the null space of within-class scatter matrix S _W and $ {\mathbf{w}} \in \mathbb{R}^{d} $ be any column vector of W. Let the j-th sample in i-th class be denoted by $ {\mathbf{x}}_{j}^{i} \in \mathbb{R}^{d} $. Then the projection of sample $ {\mathbf{x}}_{j}^{i} $ onto the null space of S _W is independent of the sample selection in class.

Proof Since $ {\mathbf{w}} \in \mathbb{R}^{d} $ is in the null space of S _W, by definition $ {\mathbf{S}}_{W} {\mathbf{w}} = 0 $ or $ {\mathbf{w}}^{\text{T}} {\mathbf{S}}_{W} {\mathbf{w}} = 0. $ The within-class scatter matrix S _W is a sum of scatter matrices $ {\mathbf{S}}_{W} = \sum\nolimits_{i = 1}^{c} {{\mathbf{S}}_{i} ,} $ where c denotes the number of classes and scatter matrix S _i can be represented by [8]:

$$ {\mathbf{S}}_{i} = \sum\nolimits_{j = 1}^{{n_{i} }} {\left( {{\mathbf{x}}_{j}^{i} - {\varvec{\mu}}_{i} } \right)\left( {{\mathbf{x}}_{j}^{i} - {\varvec{\mu}}_{i} } \right)^{\text{T}} } $$

(A1)

where μ _i denotes the mean of class i and n _i denotes the number of samples in class i.

Since $ {\mathbf{w}}^{\text{T}} {\mathbf{S}}_{W} {\mathbf{w}} = 0 $ (or $ {\mathbf{w}}^{T} \sum\nolimits_{i = 1}^{c} {{\mathbf{S}}_{i} {\mathbf{w}}} = 0 $) and S _i is positive semi-definite matrix, we can represent $ {\mathbf{w}}^{T} {\mathbf{S}}_{i} {\mathbf{w}} = 0 $. From Eq. A1, we can say

$$ \begin{gathered} {\mathbf{w}}^{\text{T}} \sum\nolimits_{j = 1}^{{n_{i} }} {\left( {{\mathbf{x}}_{j}^{i} - {\varvec{\mu}}_{i} } \right)\left( {{\mathbf{x}}_{j}^{i} - {\varvec{\mu}}_{i} } \right)^{\text{T}} {\mathbf{w}} = 0} \hfill \\ {\text{or}}\quad \sum\nolimits_{j = 1}^{{n_{i} }} {{\mathbf{w}}^{\text{T}} {\mathbf{X}}_{j}^{i} {\mathbf{X}}_{j}^{{i^{\text{T}} }} {\mathbf{w}}} - \sum\nolimits_{j = 1}^{{n_{i} }} {{\mathbf{w}}^{\text{T}} {\varvec{\mu}}_{i} {\varvec{\mu}}_{i}^{\text{T}} {\mathbf{w}}} \hfill \\ {\text{or}}\quad \sum\nolimits_{j = 1}^{{n_{i} }} {\left( {\left\| {{\mathbf{w}}^{T} {\mathbf{x}}_{j}^{i} } \right\|^{2} - \left\| {{\mathbf{w}}^{T} {\varvec{\mu}}_{i} } \right\|^{2} } \right) = 0} \hfill \\ \end{gathered} $$

(A2)

where $ \left\| . \right\| $ is the Euclidean norm. Eq. A2 immediately leads to $ {\mathbf{w}}^{T} {\mathbf{x}}_{j}^{i} = {\mathbf{w}}^{T} {\varvec{\mu}}_{i} ; $ i.e., projection of sample $ {\mathbf{x}}_{j}^{i} $ onto the null space of S _W is independent of j (or in other words independent of sample selection). This concludes the proof of the Theorem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, A., Imoto, S., Miyano, S. et al. Null space based feature selection method for gene expression data. Int. J. Mach. Learn. & Cyber. 3, 269–276 (2012). https://doi.org/10.1007/s13042-011-0061-9

Download citation

Received: 09 May 2011
Accepted: 14 November 2011
Published: 29 November 2011
Issue Date: December 2012
DOI: https://doi.org/10.1007/s13042-011-0061-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Null space based feature selection method for gene expression data

Abstract

Access this article

Similar content being viewed by others

Feature selection using non-dominant features-guided search for gene expression profile data

Feature Selection Method Based on Differential Correlation Information Entropy

Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Null space based feature selection method for gene expression data

Abstract

Access this article

Similar content being viewed by others

Feature selection using non-dominant features-guided search for gene expression profile data

Feature Selection Method Based on Differential Correlation Information Entropy

Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation