Original paperThe effect of feature selection on multivariate pattern analysis of structural brain MR images
Introduction
Structural MR images provide a good quality view of the brain that can be used to describe the shape, size, and structures quantitatively. Improving the quality of images and developing new clinical diagnosis methods are active areas of brain MR imaging research [1], [2], [3]. Predicting neurodegenerative diseases using structural brain MR images is one of the fundamental purposes of neuroimaging studies where MVPA is used as a powerful tool. MVPA is beneficial where disease-related changes in the brain are subtle and spatially distributed that it is difficult to discriminate healthy and diseased images by using conventional mass-univariate methods like voxel-based morphometry. MVPA provides correction for multiple comparisons and statistical power for the prediction that improves its diagnostic value [4], [5]. MVPA methods that use brain MR images are implemented successfully in previous clinical decision making studies as predictive tools to determine the clinical condition of the subjects [3], [4], [5], [6], [7], [8], [9], [10], [11], [12].
Machine learning algorithms are employed frequently to evaluate multivariate patterns in the structural brain MR images for the purpose of classifying images as healthy or diseased for a number of neurodegenerative diseases [4], [5], [9], [13]. Sabuncu and Konukoglu used SVM, the neighborhood approximation forest (NAF) and the relevance voxel machine (RVoxM) algorithms and common types of structural measurements from brain MR scans to predict an array of clinically relevant variables. Their results revealed that neurodegenerative diseases can be predicted from the brain MR images in a degree and MVPA produces better prediction accuracies than univariate models [4]. Ecker et al. investigated the predictive value of structural MR images in adults with autism using a whole-brain classification approach employing an SVM. They classified autism correctly at a specificity of 86.0% and a sensitivity of 88.0% [5]. Liu et al. utilized MVPA to classify major depressive disorder (MDD) patients with different therapeutic responses and healthy controls which combined searchlight algorithm and principal component analysis (PCA). According to the obtained results, they suggested that structural MR images with MVPA might be a useful and reliable method to study the neuroanatomical changes to differentiate patients with MDD from healthy controls [9]. Salvatore et al. analyzed T1-weighted MR images of 137 CE, 210 MCI and 162 healthy controls selected from the Alzheimer’s disease neuroimaging initiative (ADNI) cohort to classify AD, MCI converters and MCI non-converters to AD. They selected the most discriminative features by PCA and used SVM for classification. Their classification accuracies were 76% for AD, 72% for MCI converters and 66% for MCI non-converters [13].
Feature selection is an essential operation to determine the effective subset of the input variables for a successful MVPA [7], [14], [15]. Input that is useful for classification has to be determined before MVPA analysis to ensure that it is meaningful in the condition of the disease and comparable across subjects. There are two types of feature selection methods namely feature ranking and feature subset selection. Feature ranking methods give a ranking score to each feature according to its degree of relevance that corresponds to the discriminative power of the feature for classification. Top-ranked features are then used for classification. Feature ranking methods are successful in high dimensional feature sets because of their good generalization ability. They have advantages like the independence of the classifier, lower computational cost and being fast. The disadvantage of these methods is that they do not have interaction with the classifier. Information gain, ReliefF, and mRMR can be listed as examples of feature ranking methods. Subset selection methods use a search strategy to determine a subset of features that jointly have discriminative power. Feature subset selection methods are not preferred for high dimensional problems since they are computationally expensive and have a risk of overfitting. Capturing feature dependencies is an advantage of these methods but selecting features depending on the classifier can be counted as a disadvantage. Correlation-based feature selection, consistency-based subset evaluation, and wrapper subset evaluation are some of the feature subset selection methods [14], [15], [16], [17]. Two of the most frequently used feature ranking methods namely ReliefF and mRMR are used in this study for feature reduction because of their lower computational costs and independence of classifier since the same feature reduction method is applied to three different machine learning algorithms that are used to analyze images of 1390 people that belongs to four different neurodegenerative disease groups.
Previous neuroimaging studies that identify neurodegenerative diseases have proven that reducing the dimension of the input boost the classification accuracy and decrease the computation time by excluding the highly correlated features and features that are not valuable to discriminate between classes [18], [19], [20], [21], [22]. Demirhan et al. improved the accuracy of classifying AD and MCI using SVM up to 15% by selecting the most relevant features with ReliefF algorithm [18]. Cui et al. identified the conversion from MCI to AD by using mRMR method for feature selection to choose optimal subsets of features from each modality of data, then they employed the SVM by incrementally adding features based on their ranking till obtaining the highest area under the curve (AUC). They proved that the selected features are closely related to AD progression and verified the effectiveness of feature selection [19]. Wee et al. combined ranking and wrapper-based feature selection methods to identify the most relevant features for autism spectrum disorder classification. T-test and mRMR ranking based methods are used to reduce the number of features based on general characteristics of the data. Then SVM-based recursive feature elimination (SVM-RFE) is used to determine the subset of features. They obtained high classification accuracies up to 96% [20]. Castro et al. proposed a recursive feature elimination method that uses a machine learning algorithm based on composite kernels to the classification of healthy controls and patients with schizophrenia. They showed that feature selection improved the accuracy of classification and allowed a better identification of the brain regions that characterize schizophrenia [21]. Dai et al. integrated multimodal image features using multi-kernel learning and compared the effects of using different features for classification of ADHD patients. They selected optimal feature subset by combining feature ranking methods and feature subset selection methods. Their experiments showed that multi-kernel learning using selected multimodal features can yield better classification results for ADHD prediction [22].
In this study, MVPA analysis is performed to discriminate AD, schizophrenia, autism, and ADHD patients from the healthy controls using the morphometric features such as volumes and thickness of anatomical structures obtained from the T1-weighted structural brain MR images. Effect of using feature selection on the classification performance is investigated using ReliefF and mRMR feature ranking methods with an unbiased brain-wide approach. Three state-of-the-art machine learning algorithms, SVM, kNN, and BP-NN, are employed for the MVPA analysis. 5-fold cross validation (CV) is used for all feature selection and classification tasks to assess the generalization ability of the performance.
Section snippets
Materials and methods
A brief description of the used data, feature selection methods, and the MVPA algorithms are given in this section. Details about the model hyperparameters are also stated in this section for reproducibility of the study. Flow diagram of the system that shows the workflow is given in Fig. 1. Feature selection is performed inside the CV loop before the classification to prevent the leakage of label information from the test data.
Results
MVPA analyses are performed using SVM, kNN, and BP-NN algorithms on four different feature sets constructed using morphometric brain measures obtained from the T1-weighted structural brain images of 1390 subjects that have AD, schizophrenia, autism, or ADHD diseases. Dimension reduction that will improve the classification accuracy is first performed to select a subset of the feature set without using a priori information related to the cases. Different number of features are selected using
Discussion
In the present study, the effect of the feature selection for the MVPA of the OASIS, ABIDE, COBRE, ADHD and MCIC datasets are evaluated using ReliefF and mRMR methods.
All the MVPA algorithms were more successful for the OASIS dataset, for classifying the case 2 subjects that have AD than the case 1 subjects that have AD mild. BP-NN was not functional for the feature sets that have too many features because of its high computational costs. SVM that use FS-1 was the most successful method for
Conclusions
In this section, conclusions that are obtained in the results of the study and generally valid for all datasets used in this study are given. Feature selection improved the performance of the MVPA for all feature sets independent of the number of features that include. Effect of feature selection was prominent for all datasets except ADHD which can be interpreted that the structural brain measures might only be weakly related to the disease that feature selection did not help it to achieve
Conflict of interest
The authors have no conflicts of interest to disclose.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References (40)
- et al.
A review of technical aspects of T1-weighted dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) in human brain
Phys Med
(2014) - et al.
Investigating the predictive value of whole-brain structural MR scans in autism: a pattern classification approach
Neuroimage
(2010) - et al.
Diagnostic neuroimaging across diseases
Neuroimage
(2012) - et al.
Beyond mind-reading: multi-voxel pattern analysis of fMRI data
Trends Cogn Sci
(2006) - et al.
Automated voxel-by-voxel tissue classification for hippocampal segmentation: methods and validation
Phys Med
(2014) - et al.
A fuzzy-based system reveals Alzheimer’s Disease onset in subjects with Mild Cognitive Impairment
Phys Med
(2017) - et al.
Characterization of groups using composite kernels and multi-source fMRI analysis data: application to schizophrenia
Neuroimage
(2011) FreeSurfer
NeuroImage
(2012)- et al.
A practical approach to feature selection
- et al.
An improved K-nearest-neighbor algorithm for text categorization
Expert Syst Appl
(2012)
Medical image analysis with artificial neural networks
Comput Med Imaging Graph
Removal of impulse noise in digital images with naïve Bayes classifier method
Turk J Elec Eng Comp Sci
Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with Tsallis entropy and generalized eigenvalue proximal support vector machine (GEPSVM)
Entropy
Clinical prediction from structural brain MRI scans: a large-scale empirical study
Neuroinformatics
Computer-aided diagnosis of Parkinson’s disease using complex-valued neural networks and mRMR feature selection algorithm
J Healthc Eng
Classification of different therapeutic responses of major depressive disorder with multivariate pattern analysis method based on structural MR scans
PLoS One
Neural systems predicting long-term outcome in dyslexia
Proc Natl Acad Sci USA
Predictive classification of individual magnetic resonance imaging scans from children and adolescents
Eur Child Adoles Psy
Magnetic resonance imaging biomarkers for the early diagnosis of Alzheimer's disease: a machine learning approach
Front Neurosci
A review of feature selection methods on synthetic data
Knowl Inf Syst
Cited by (21)
Automatic characterization of cerebral MRI images for the detection of autism spectrum disorders
2024, Intelligence-Based MedicineA classification framework for Autism Spectrum Disorder detection using sMRI: Optimizer based ensemble of deep convolution neural network with on-the-fly data augmentation
2023, Biomedical Signal Processing and ControlBrain imaging-based machine learning in autism spectrum disorder: methods and applications
2021, Journal of Neuroscience MethodsAn algorithm for learning shape and appearance models without annotations
2019, Medical Image AnalysisCitation Excerpt :Those papers reported multiple accuracies, so it would be difficult to choose a single accuracy with which to compare. The accuracy achieved for the COBRE dataset was 74.7 ± 7.1%, which is similar to the 69.7% accuracy reported by Cabral et al. (2016) using COBRE, and was roughly comparable with many of the accuracies obtained by Monté-Rubio et al. (2018) or Demirhan (2018). Others have used other datasets of T1-weighted scans for identifying patients with schizophrenia.
A systematic review of structural MRI biomarkers in autism spectrum disorder: A machine learning perspective
2018, International Journal of Developmental NeuroscienceCitation Excerpt :The next best classification algorithms include Learned Vector Quantization (LVQ) with an accuracy of 87.7% (as before, obtained on a small dataset), and the Radial Basis Function Neural Classifier (RFBN) with accuracies between 70 to 96% (Subbaraju et al., 2015; Vigneshwaran et al., 2013), although the later result was for classifying ASD specifically in females. Other classification algorithms such as Random Forests (RF) and k-nearest neighbours (k-NN) were not observed to perform as well as SVM, with RF achieving an Area Under the Curve (AUC) performance metric of 60 in one study (Katuwal et al., 2015) and an accuracy of 99% (Vigneshwaran et al., 2013), and k-NN achieving an AUC of 0.54 (Demirhan, 2018) and an accuracy of 75% (Abdelrahman et al., 2012). However, this reduced performance may be due to both methods being generalisable to multi-class problems, and hence are not as tailored to the binary classification of ASD as SVM.
Automated diagnosis of autism with artificial intelligence: State of the art
2024, Reviews in the Neurosciences