In this part, 42 ensemble learning methods used for cancer detection are classified into three famous distinct fusion categories and discussed; include data and feature integration methods, decision integration methods, and on the smaller scale, model integration methods. Then in each case, input data, number of samples, the statistical tools for evaluation of performance will be introduced. And decision-making strategies for decision integration methods will be specifically determined. In this systematic research, ensemble systems as regards 45 most relevant articles in the category of cancer prognosis and diagnosis were studied. The studies that used a valid statistical tool to evaluate their performance were considered and included. In contrast, studies that did not compare the percentage of accuracy of their findings with other methods and also the different aspects of their performance had not clearly assessed, were excluded. In this way, 45 studies since 2002 onwards have been deeply studied. They were selected from online databases, such as PubMed and Scopus.
2.1 Data and Feature Integration
When a couple or more biological heterogeneous input data such as clinical data, mutation data, expression data, proteomics data, or gene ontology (GO) database data are combined, a kind of fusion system is created that is called data integration. The higher level of this category called multi-omics data integration. It means that various levels and scales of data, including genomics, epigenomics, transcriptomics, proteomics, and metagenomics data are integrated. It has been mentioned combining multi-Omics data can improve predictive performance [11]. Also, in some studies, various features are selected or extracted from homogeneous or heterogeneous data. When the combination of these features is used to implement an algorithm, another type of integration is created, which is called feature fusion. In most cases, the integration of data and features are simultaneously used as an ensemble system.
2.1.1 Bayesian Network Classifiers (1,2)
Bayesian network classifiers are methods that can integrate heterogeneous data from multiple sources to reveal the mechanisms of complex diseases such as cancers. In a study on hepatocellular carcinoma (HCC), also called malignant hepatoma, carried out by previous researchers., microarray and clinical data were integrated from biological databases and literature. Some liver cancer protein biomarkers were predicted, functional modules that show progression mechanism of liver cancer were identified, and performance was evaluated with 10-fold cross-validation on the testing set. For selecting the test set, the training data was split into ten approximately equal size sets, and then one of them was used for testing. For choosing a different testing set, this process was repeated ten times. Results showed that compared to Bayesian network (BN), Naïve Bayes (NB), full Bayesian network (FBN), and Support Vector Machine (SVM) classifiers, their proposed method gives maximum the area under the curve (AUC) [12].
In other studies, the Bayesian network was used for integrating four types of data consist of GO database data, MicroArray (MA) co-expression data, orthologous Protein-Protein Interactions (PPI) human data and True Positive (TP) data as two networks including (GO+MA+PPI) network or (GO+MA+PPI+TP) network. This approach uses dissimilar data sets and can deal with missing data. This Bayesian network has been applied for prioritizing candidate genes in related to breast cancer such as PIK3CA, CHEK2, BARD1, and TP53 that were predicted with (GO+MA+PPI) network. This data integration method was called Prioritizer. It has been evaluated the performance of the various gene networks using cross-validating on all data sets, ten times. Performance of Prioritizer is good when genes on the basis of their functional interactions are ranked. Prioritizer approach can help to the diagnosis of disorders by introducing driver genes. Accuracy of (GO+MA+PPI) network is significant (AUC = 90%). Also, results showed that the proposed method has a far better performance for the prioritization of genes in Mendelian diseases in comparison with complex disorders [13].
2.1.2 stSVM by Data Integration
In an investigation that has been carried out by in 2013, a new fusion method was introduced. This approach was called smoothed t-statistic SVM (stSVM). It integrates features obtained from experimental data such as mRNA and miRNA expression data into one SVM classifier. It has been applied for prognosis and diagnosis of cancers including breast, prostate and ovarian cancers and for gene prioritization. In this study, four datasets were used [14]. One breast cancer dataset (GSE4922) [15] two prostate cancer datasets (GSE25136 [16] and GSE21032 [17] and one ovarian cancer dataset (TCGA) [18] from various data repositories. The stSVM was evaluated via 10 times repeated 10-fold cross-validation method [14]. Finally, stSVM were compared with saliency guidedSVM (sgSVM) as a meta-classifier, an SVM machine trained with significant differentially expressed genes with FDR cutoff 5%, selected by Significant Analysis for Microarrays (SAM) [19]. It has been shown that stSVM approach has high predictive power for introducing novel gene lists for mentioned cancers [14].
2.1.3 FSCOX-SVM
Feature selection with Cox proportional hazard regression model (FSCOX) is a novel method based on feature selection with Cox proportional hazard regression model that integrates data from different datasets into one SVM classifier. The proposed method was called FSCOX-SVM. This fusion model carried out data integration between miRNA and mRNA features. It has been applied for improving the prediction power of various cancers survival time, especially ovarian cancer and Glioblastoma Multiforme (GBM). In this study, two computational methods were suggested for the prediction of target genes, including TargetScan and miRanda. After computing optimal sequence complementarities between a mature miRNA and mRNA, TargetScan identifies miRNA targets. While a weighted sum of the match and mismatch scores for base pairs and gap penalties was computed by miRanda. The proposed approach predicts the class of each test sample, via the Leave-One-Out Cross-Validation (LOOCV) procedure. Finally, the approach was compared with three classifiers, including RF, SVM, and FSCOX median. Their findings demonstrated that FSCOX-SVM approach showed the highest performance and accuracy, among others. In fact, data integration between miRNA and mRNA features Led to better achievements [20].
2.1.4 Multiple RFE Selection Methods (1,2)
Multiple Recursive Feature Elimination (RFE) is an ensemble feature selection method. This strategy has been applied for identifying metastatic breast cancer core module biomarkers. First, 100 features, including gene expression based on DNA microarray technology and activity vectors features, were the candidate for classification and divided into 500 random splits (with the possibility of overlapping). Then, 500 classifiers were constructed and recorded their AUCs as well as their weight vectors. Third, the features were ranked by the average square weight of each feature among 500 different splits. The lowest ranked feature was eliminated recursively until the maximum average AUC obtained. this procedure was repeated 100 times for selecting a final marker gene set [21]. The consistency of the proposed approach was evaluated with a multi-level reproducibility validation framework [22]. It is a kind of level-by-level validation method [23]. The algorithm related to this strategy identifies the highly reproducible marker. It means that it generates highly reproducible results across multiple experiments. The results show that this method has improved accuracy and biomarker reproducibility by as much as 15% and 30%, respectively. This method computes an average weight from 500 classifiers and uses algebraic combiners for decision making. Multiple RFE was applied for a classification tool called COre Module Biomarker Identification with Network ExploRation (COMBINER) in the feature selection step [21]. COMBINER software was implemented on three independent breast cancer datasets including the Netherlands [24], the USA [25], and Belgium [26] and identified 13 driver genes as reproducible discriminative biomarkers. Also, a robust regulatory network was constructed [21].
In other studies, RFE selection method has also been applied for biomarker discovery with colon, leukemia, lymphoma, and prostate cancer. Finally, the outputs of all selectors are aggregated, and the ensemble result is computed. In general, this method generated a diverse set of feature selections [27]. The proposed approach was assessed on four microarray datasets, including leukemia dataset [28], colon dataset [29], lymphoma dataset [30], and prostate dataset [31]. For selecting a training set, subsampling was done, and each time, 10% of the data was used as an independent validation set to evaluate classifier performance. The results showed that the robustness of the selected driver genes was increased up to almost 30% and performance improved up to ∼15% in the classification method [27].
2.1.5 Feature Subsets Method
In a study aimed for predicting survival in breast cancer patients, the researchers designed an ensemble method to learn models using feature subsets then combined their predictions [32]. Two breast data set were used, including Dataset 1 [24, 33] and Dataset 2 [25]. Data achieved through microarray experiments, and their results were compared with clinical criteria. Feature subsets were obtained from three different methods, including splitting feature selection method, sliding window feature selection method, and random subsets feature selection method. These three feature-subset-selection methods were integrated to construct the proposed ensemble model. The performance of the proposed method was evaluated for 100 different training/test sets from dataset 2, and the results of the method showed a high performance and confidence interval. Compared to the Amsterdam signature and clinical criteria, the proposed method generated a high sensitivity and Negative Predictive Value (NPV). The performance of the approach was enhanced. When splitting feature subsets were used, sensitivity and accuracy were improved [32].
2.1.6 Multimodal Data Fusion of Separate Datasets
Multimodal Data Fusion is a fusion model that integrates clinical and bio-molecular data such as image and microarray data. In fact, in this study, data were heterogeneous. This approach has been applied to the diagnosis of melanoma. There are two different types of multimodal data fusion for fusing separate datasets. Two approaches including a combination of data (COD) and the combination of interpretations (COI) were exploited for feature integration. COD is applied before classification, and it aggregates features from each source for producing a single feature vector, but in COI, independent classifications are done based on the individual feature subsets using a proper voting mechanism [34], and involve aggregating outputs, so it uses algebraic combiners for decision-making strategy. In another study that was done in related to prostate cancer has been told that COD methods are more optimal [35]. It should be noted that in this method and feature selection step, SBE (sequential backward elimination) with RF (random forest) algorithm was used that utilizes integrations of decision trees. This kind of feature selection techniques leads to dimensionality reduction. Finally, feature selection and dimensionality reduction methods use algorithms for extracting of better biomarkers. The performance of this classification method was evaluated using 10-fold cross-validation procedure with 50 repetitions on different datasets. Results demonstrated a random forest approach that was used in classifying of bootstrapped samples gained an AUC score of high. In contrast, obtained performance with other linear methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) was not high [34].
2.1.7 Meta-classifier Ensemble Learning based on Genetic Programming Technique
In other studies, that were conducted by researchers, an ensemble meta-classifier combining five classifiers was used for feature integration. The features were produced by Genetic Programming (GP) technique which used a kind of evolutionary algorithm for generating thousands of classifiers as features. Input data for the GP system is gene expression data. Finally, the top five classifiers were chosen as the individual classifier. GP classifiers often including five or fewer genes as biomarkers and predict cancer class, successfully. Then, the proposed ensemble method integrates these features for achieving better results. This method is applied for the diagnosis of some cancer types such as prostate and Lung Cancers. Also, it can classify their subtypes, such as Metastatic Prostate Cancer (MPC) and Primary Prostate Cancer (PPC). Results demonstrated GP is a robust method in feature selection and accurately suggests genes for prognostic and diagnostic targets. Also, misclassification Error Rates of GP is very slight. The performance evaluation of the GP system was done using five-fold cross-validation on the training set. The maximal accuracy obtained with the proposed method and average prediction rate was very high when meta-classifier ensemble learning based on GP compared with other classification methods such as 3-nearest neighbors, nearest centroid, covariate predictor, SVM, and Diagonal Linear Discriminant Analysis (DLDA) [36].
2.1.8 Ensembles of BioHEL Rule Sets
Bioinformatics-Oriented Hierarchical Learning (BioHEL) is an evolutionary machine learning approach that integrates microarray data from different datasets. It uses Random Forest based Feature Selection (RFS), Correlation-Based Feature Selection (CFS) and Partial-Least-Squares Based Feature Selection (PLSS) in the feature selection phase. This method has been applied for gene prioritization as diagnostic markers related to prostate, lymphoma, and breast cancers. The proposed method uses algebraic combiners for feature ranking. For each training set, BioHEL was run 100 times, separately. The main procedure that was used in this classification method is a kind of cross-validation scheme called as two-level external cross-validation. Results compared with some other machine learning tools. Genetic Algorithms based classifier system (GAssist), SVM, RF, and Prediction Analysis of Microarrays (PAM) are among them. The accuracy obtained from BioHEL was highest. BioHEL showed better performance on large datasets in comparison to its nearest rival, the GAssist [37].
2.1.9 Kernel-based Data Fusion Method for Gene Prioritization
This fusion method combines all kernel matrices on human genes. A kernel matrix is used in kernel machines such as SVM. So, all instances are represented by a kernel matrix, which is an n × n positive semidefinite matrix. Each element (ai,j) shows the similarity between ith and jth instances using a pairwise kernel function k(x_i, x_j). By using this approach, we can enhance learning methods by not explicitly making a feature space. In fact, the kernel matrix implicitly represents the inner product between all Paris of instances in an embedded feature space yielded by an explicit feature mapping. Since the resulting feature space may be high-dimensional or even infinite-dimensional, kernel matrix helps to have tractable and efficient computation in the original space without explicit mapping [38]. They are integrated through Log-Euclidean Mean (LogE), Arithmetic Mean (AM) and weighted-version of LogE (W-LogE). Input data were 12,000 human genes. By this approach, 24 novel driver genes were a candidate for 13 diseases such as breast and ovarian cancers. The method uses GO, Swiss-prot (SW) annotation, PPI network based on STRING database, and literature as annotated data sources. For these cancers, kernels performance was evaluated using leave-one-out cross-validation on the training genes. Average True-Positive Rate (TPR) results obtained from proposed Kernel-based data fusion tools were compared with ENDEAVOUR. Results showed that various approaches of Kernel-based data fusion including LogE, W-LogE, and AM performed better than ENDEAVOUR, and the best was W-LogE [39].
2.2 Decision Integration
These methods have been constructed from several base classifiers. Actually, the critical component of any ensemble system is the strategy employed in combining classifiers. The module of outputs combination based-decision making methods is another major issue in this kind of ensemble method. There is no unique naming procedure for the same decision making strategy that has been used in different articles/books.
The terminology we have used to outputs combination and decision-making for final decision integration is according to the following pattern (Fig. 1):
The pattern is divided into two main categories, such as combining class labels (CCL) and combining continuous outputs (CCO). CCL is divided into four sub-types that include majority voting (MV), weighted majority voting (WMV), behavior knowledge space (BKS) and Borda Count. CCO is divided into three sub-types, which includes algebraic combiners, decision templates, and Dempster-Shafer based combination [3].
2.2.1 Homogeneous Ensemble Methods
Homogeneous ensemble methods refer to the fact that all of the base classifiers are the same and from a single type, but they are different at the data used for training phase or model parameters (e.g., linear combination fusion function model) or a combination of the two categories. It has also been offered the heterogeneous classifiers fusions outperform slightly better than the homogeneous classifier ensembles [40].
2.2.1.1 SVM Classifiers Fusion (three SVM)
The kind of homogeneous ensemble method is SVM Classifiers Fusion. It is a Multi-classification system (MCS) that combines three SVM classifiers. This computational method has been used for breast cancer detection. Combination of three classifiers minimized the classification error in the training phase. For every base SVM, training data and testing data achieved from Digital Database for Screening Mammography (DDSM) mammographic images database [41, 42]. In this study, 300 images were used at the training phase and 100 images for the testing phase. It uses simple Majority Voting for decision making. The cross-validation technique was used for these multiple classifier system evaluations. The results showed that fusion of SVM classifiers improves the performance of the system. It is better than applying of all features in one features vector. Also, results of MCS with voting compared to each single SVM classifier showed that accuracy was increased because of the quality of the decision is improved [43].
2.2.1.2 enSVM (200 SVM)
In a study, a fusion approach was used and called ensemble SVM (enSVM) that is included in three steps. Step 1 is sub-sampling of genes that generate Gene Subsets and then constructs 200 diverse classifiers. Input data for this step are gene microarray data related to 97 patient samples. In step 2, SVM came into operation and generated 25 candidate classifiers. In this phase, SVM is suitable because it solves variable and high dimensionality problem of training data. In step 3, final decision making with majority voting strategy mechanism was done. The proposed method has been applied for microarray data classification and accurate diagnosis of breast cancer, cancers of the central nervous system, colon tumor, leukemia, and prostate cancer. In this study, LOOCV was used to evaluate the performance of SVM as the base classifier. Results showed that the proposed gene sub-sampling-based ensemble learning that is called enSVM outperforms single SVM and re-sampling ensemble learning methods such as bagging and boosting and enSVM showed relatively the best classification accuracy [44, 45].
2.2.1.3 Three neural networks fusion
In previous studies, the researchers had constructed a combinational feature selection method concerning ensemble three neural networks (NNs). This method has both levels (feature selection integration level and decisions integration level). It has been applied for several cancers diagnosis and treatment via discovering of marker genes of the diseases by gene expression data. These cancers are adult acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), malignant pleural mesothelioma (MPM), adenocarcinoma (ADCA) of the lung, and prostate cancer. At the first step, bagging generates 100 individual classifiers by resampling 100 times on microarray data. Then each classifier as an input will be given to 3 neural networks. The ensemble three neural networks method uses algebraic combiners for decision making, but there are 100 ensemble networks methods with 100 different outputs. Majority voting was used for combining their results. Also, final decision making was done by majority voting. In comparison with other methods, the proposed method effectively improved results. This method can provide more information in microarray data to increase accuracy and introduce the driver genes of the diseases for diagnosing and treatment. The accuracy of this method for ALL, AML, lung cancer, and prostate cancer was 100%, 100%, and 97.06%, respectively. In this method, classification performance and accuracy value were evaluated through 10-fold cross-validation and LOOCV. The accuracy of bagged decision trees for ALL, AML, lung cancer, and prostate cancer were 91.18%, 93.29%, and 73.53%, respectively. The accuracy of the best methods considering ALL, AML, lung cancer, and prostate cancer were 97.06%, 97.99 %, and 73.53 %, respectively [46].
2.2.1.4 NED method (five artificial neural networks fusion)
An ensemble method called Neural Ensemble based Detection (NED) is a learning method that combines five Artificial Neural Networks (ANNs) [48]. The neural algorithm of each network is Fast Adaptive Neural Network Classifier (FANNC) [47]. FANNC has both high performance and speed. Also, FANNC is an automatic algorithm, and it requires no manual set up. The proposed ensemble method has been applied for the lung cancer diagnosis. In this study, images of the specimens of needle biopsies are used as input data containing 552 cell images from biopsies of subjects. This ensemble system has a two-level ensemble structure. At the first level, outputs of each individual neural network place in two classes: normal cell or cancer cell. Then, NED method uses full voting for decision making. In this decision making strategy, the cell is considered normal when all of the individual networks vote it is normal. At the second-level ensemble, each network has five outputs, including adenocarcinoma, squamous cell carcinoma, small cell carcinoma, large cell carcinoma, and normal. This method uses plurality voting for decision making. In this way, the identification rate of NED is high, and its false negative rate is low. This method helps to miss less positive cancer patients. The accuracy of the method was evaluated by 5-fold cross-validation on the data set and was demonstrated that confidence of the first-level ensemble is high, especially. The proposed approach consists of five FANNC networks were compared with a single artificial neural network. Results showed that NED outperforms the single FANNC [48].
2.2.1.5 Clinical decision support system
The Clinical Decision Support System (CDSS) is an ensemble method that is included four different Weighted Random Forest (WRFs). Each WRF has been constructed with 80 trees. This ensemble method combines the results of clinical techniques such as classic and ancillary techniques. Crucial clinical data were included visit dates, patient age, Human Papilloma Virus (HPV) genetic examinations, cytological diagnoses, and histological examination of biopsies. Under the project, 740 cases were studied. With this method, more accurate results were produced. It has been applied for Cervical Cancer (CxCa) diagnosis and uses majority voting for decision making strategy. The performance of the proposed system was estimated using 10-fold cross-validation. Results showed that performance obtained from the proposed method (CDSS consists of four different WRFs) is better than single classifiers approaches including k-Nearest Neighbors (KNN), NB, Classification and Regression Tree (CART), Multi-Layer Perceptron Network (MLP), Radial Basis Function (RBF) Network and Probabilistic Neural Network (PNN), but it showed a bit worse performance than the integrated CDSS consists of two ANNs [49].
2.2.1.6 Bagging subgroup identification trees
Bagging subgroup identification trees is a tree-based ensemble method which combines binary trees. Generation of bootstrap samples is done by resampling the training data with replacement, and then several trees are constructed as diverse classifiers. In the next step, each tree converted to a binary classifier. Finally, “binary trees” are constructed, and the final prediction is done with a simple majority vote strategy. In this study, clinical data such as gender, age, surg, etc., was used. About colon cancer, 929 cases participated [9]. The dataset associated with this cancer can be downloaded from R package survival [50]. Also, the GSE14814 dataset [51] related to lung cancer, including 133 patients, was applied. These two datasets were extracted from the R package survival and GEO database related to colon cancer and lung cancer, respectively. As mentioned, this proposed method uses a majority voting for decision making strategy. The selection of biomarkers and building of classifiers were done with Leave one out cross-validation. Sensitivity, specificity, and accuracy rate of the proposed method was compared with a multivariate Cox model. Results showed that the proposed ensemble bagging (a novel tree-based) method is better, especially when data is imbalanced. For balanced data, the Cox model was slightly better [9].
2.2.1.7 CAD system
Computer-Aided Diagnostic (CAD) system is an ensemble method that combines Bayes classifiers. It was applied for the tissue classification and diagnosis of focal liver lesions. In this study, input data were Computed Tomography (CT) contrast-enhanced images of 20 cases of liver cancer. This classification process has two phases. In the first phase, CT images were classified using the Bayes classifier. In the second phase, the combination of classifier outputs and the decision-making process were done using majority voting strategy. Also, classification success rates were evaluated by leave-one-out technique. This approach for classifier combination generated better performance. Findings showed that the best performance of this method was obtained by majority voting in data. And CAD-based Bayes generates relatively high accuracy value [52].
2.2.2 Heterogeneous Ensemble Methods
Heterogeneous ensemble methods incorporate different base classifiers in spite of they usually use the same training dataset and input data for running different learning algorithms [53].
2.2.2.1 BNCE method
BNCE is an ensemble approach for training neural network fusions. This ensemble method combines boosting with negative correlation (NC). It has been used for breast cancer detection to the classification of the tumor as either benign or malignant [54]. Data were well-known benchmarks related to breast cancer that were downloaded from the datasets of the UCI machine learning benchmarks repository [55, 56]. The proposed approach uses majority voting for decision making. In order to performance evaluation, classification error in the percentage of BNCE was estimated. The results of this method were compared with the results of other methods, including Evolutionary Programming Network (EPNet), a single NN, a simple NN ensemble, bagging, Ada-boosting, and arc boosting. They used BNCE on breast cancer well-known benchmarks. Generally, a comparison in terms of the classification error rate for benchmark datasets showed the proposed method has the best performance [54].
2.2.2.2 The meta-learning method
In another study, a meta-classification tool was used for prostate cancer detection. Data for this ensemble strategy are mass spectrum data (MS data), and it combines the results of several machine learning approaches [57]. Individual classifiers are ANN, KNN, SVM, LOGISTIC-REGRESSION, and CART [58]. It uses weighted majority voting for decision making. The proposed method combines multiple error independent base classifiers into a meta-classifier. Validation of meta-classifier was done with k-folding validation (leave-one-out) experiments on the training set. This ensemble method improves prediction accuracy over individual classifiers. In comparison with individual classifiers such as ANN, KNN, SVM, LOGISTIC-REGRESSION, CART, results showed that their proposed method accuracy is better. Also, sensitivity and specificity were high (respectively, 91.30% and 98.81%). By the way, 11 biomarkers associated with prostate cancer was diagnosed [57].
2.2.2.3 Heterogeneous ensemble (KNN-SVM- DT-LDA)
Generally, heterogeneous ensemble methods combine the outputs of several base classifiers. They train some learner machines with different learning strategies using a single common training dataset. This definition is in contrast with the methods that use different datasets for training a single learner machine. In one of the heterogeneous ensemble methods proposed by previous studies, five base classification algorithms such as KNN (K=3), KNN (K=5), SVMs, DT (Decision Trees) and LDA (Linear Discriminant Analysis) were used. This method has been designed for increasing the chance of early prostate cancer diagnosis [59]. Data were proteomic prostate cancer data obtained from protein mass spectrometry available in JNCI Data 7-3-02 [60]. The statistical population was 322 patients that data obtained from their sera. 63 people were with normal prostate, 190 patients were with benign prostate tumors, 26 patients with prostate cancer and Prostate-Specific Antigen (PSA) level in the range 4–10, and 43 patients with prostate cancer and PSA levels above 10. The proposed approach uses simple majority voting for decision making. Also, for performance validation of the method, 10-fold cross validation was used. The results showed that accuracy and sensitivity were increased, but specificity slightly decreased after using ensemble method. This simple fusion strategy improved prostate cancer mass spectroscopy dataset-based methods which are faced with High Dimensionality Small Sample (HDSS) problem. By this approach, overall performance is boosted. Diagnosis using protein mass spectrometry technique is a new solution. Many of the learning algorithms use it to increase the chances of prognosis of cancer in the early stages. However, the problem of small samples with high dimensions concerning the proteomic data in cancers requires more sophisticated solutions to improve classification accuracy. In this method, five classification algorithms were used. Applying this simple strategy in making the final decision, it yields a more promising performance for the use of mass spectroscopy data related to prostate cancer [59].
2.2.2.4 MRS method
The mixture of Rough set and SVM (MRS) is a mixture classification model based on clinical markers that are made by combining rough set and SVM classification tools in serial form. This model is a serial multi-sensor system that integrates several methods with different sources and characteristics for breast cancer prognosis. In this fusion method, rough set classifier acts as the first layer for identification of some singular samples in data, and the SVM classifier comes into operation as the second layer for the classification of remaining samples. The upper layer is called shrinking classifier, too and uses a voting strategy for decision making. For each sample, the rough set tries to assign a class type to it. In this step, if the class type is unknown, the second layer comes into operation for assigning a class type to its sample. This two-layer construction without voting is a suitable way for better clinical prognosis. MRS has used two open breast cancer datasets for prediction [61]. One dataset called BRC-1 hereafter that is included both clinical data and gene expression data from 97 breast cancer tumors of lymph node-negative patients [62]. The other dataset is BRC-2 hereafter that uses baseline human primary breast tumor clinical data from Lawrence Berkeley Laboratory (LBL) breast cancer cell collection containing 174 samples [63]. This approach gives higher accuracy, specificity, sensitivity, and Matthew’s correlation coefficient (MCC) than previous prognostic methods such as NB, SVM, J48, random forest, and attribute selected classifier. Also, the higher accuracy of the method was validated by 5-fold cross-validation [61].
2.2.2.5 Bagging (bootstrap aggregating) method
In a study was done in 2016, two different meta-learning algorithms were used. They applied Bagging- RF, Bagging- NB and Bagging-K* instance (K*) and compared their results with individual classifiers including RF, NB, K* and vote ensemble classifiers (RF-NB-K*) too. All of the methods have been used for melanoma skin cancer detection. Data were clinical images of skin lesions. These ensemble methods use simple majority voting for decision making. Because of the Bagging reduces variance and helps to avoid overfitting, so Bagging aggregation improves the accuracy and stability of the other selected tools. In this study, 10 − fold cross-validation test was used for estimation of its accuracy. In comparison with other methods, the results show that when the number of positive cases is insufficient, using Bagging with the Random Forest is suitable. Using this approach, sensitivity and AUC have been meaningfully improved [64].
2.2.2.6 Artificial intelligence based hybrid ensemble technique
Researchers have also designed a novel artificial intelligence based hybrid ensemble technique for screening of cervical cancer. They used smear images data for diagnosis as clinical data. The hybrid ensemble system combines fifteen different classifiers [65] including Bagging, decorate, decision table, Ensemble of Nested Dichotomies (END) [66], filtered classifier, J48 graft [67], Projective Adaptive Resonance Theory (PART) [68], multiple backpropagation ANN, multiclass classifier, NB, random subset space, radial basis function network [69], rotation forest, random forest and random committee. The method uses voting for decision making. Validation is done on multiple training and testing datasets, and 10-folds cross-validation is applied for evaluation of this algorithm. This approach provides high performance for classification of complex datasets. The hybrid ensemble technique is a promising method for classification of pap-smear images and can be used to detect cancer cervical cancer. receiver operating characteristic (ROC) Area of the proposed novel hybrid ensemble system was increased, and the overall performance of the ensemble approach was improved. Also, in comparison with individual classifiers, results were better for both multi-class and two-class problems [65].
2.2.2.7 Boosting-TWSVM method
In other studies, researchers used boosting with SVM together for MicroCalcifications (MCs) clusters detection in digital mammograms. MCs clusters are an important sign of breast cancer diagnosis. This ensemble method uses algebraic combiners for decision making because the aggregation is computed by the weighted averaging in this method [70]. In comparison, they showed that their proposed method outperformed the other methods such as twin SVM (TWSVM) [71]. In this study, there were 650 positive and 3567 negative samples. Samples were split into two subsets, the first part was used for the training set and validation, while the second part was applied for the testing set. Also, the TWSVM classifier was trained using 10-fold cross-validation technique for evaluation of method performance. Since the TWSVM is sensitive to the training samples, it is inconsistent, but when Bagging was integrated into TWSVM, the inconsistent problem in the training set will be solved. Result related to BOOSTING-TWSVM showed that Sensitivity and specificity were increased. Also, they demonstrated that Boosted-TWSVM is a promising approach for MCs detection [70].
2.2.2.8 Bagging and boosting-based TWSVM
Bagging and boosting based twin support vector machine (BBTWSVM) is yet another ensemble method. The structure of the algorithm of this ensemble method consists of three modules: the image preprocessing, the feature extraction component, and the BBTWSVM modules. Also, BBTWSVM modules composed of 2 two algorithms: bagging TWSVM and boosting-TWSVM Combining these algorithms results in a more efficient solution that composed of several classifiers: BBTWSVM. This method has been applied for clustered MCs detection and so is called MCs detection approach too. Breast cancer can be diagnosed by MCs detection approach. This fusion method uses algebraic combiners for decision making because it finds the maximum score from all the base classifiers, or computes a weighted scoring scheme from among the base learners. Data for validation were chosen through the training set and like of Boosting-TWSVM method, 10-fold cross-validation technique was used in the training phase for evaluation of method performance. BB-TWSVM outperforms TWSVM. The sensitivity of BB-TWSVM classifier was increased, and ROC curves showed that in comparison with TW-SVM, the performance of the proposed approach is improved [72,73].
2.2.2.9 Ensemble multi-class learning
Ensemble multi-class learning algorithm is an ensemble approach that combines error-correcting output coding (ECOC) scheme and one-against-one pairwise coupling (PWC) scheme. This method has been used for finding biomarkers in liver cancer. The method uses an algorithm called extended Markov blanket (EMB). Also, a liver cancer matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) dataset was used as a training dataset. In this method, redundancy and relevance were two aspects of biomarkers that were considered for feature selection. By this ensemble method, identification of proteomic biomarkers for liver cancer was possible. It uses voting for decision making [74]. Samples for liver cancer data were 201 spectra of MALDI-TOF MS belonging to HCC patients, cirrhosis patients, and healthy participants [75]. Then all samples were divided into 10 exclusive folds, randomly. In this study, the error rate was estimated, and 10-fold cross-validation was selected for evaluation of experimental results such as accuracy value. The proposed method was compared with random forest, NB, classical ECOC, and J48 approaches, and the results showed accuracy was increased to 88.71% [74].
2.2.2.10 RSS-SCS method
This approach has combined Random Subspace (RSS) and Static Classifiers Selection (SCS) Paradigms. Proposed ensemble method has been used for Breast Cancer diagnosis by CAD. The approach used from a real database, including 300 mammograms as clinical data. These mammograms collected from the DDSM. In this research, it has been shown that CAD is an effective approach for breast cancer detection in the initial stages. At first, RSS constructs diverse classifiers by using different subsets of features that are used in the training phase. The second, SCS selects diverse classifiers. Then, the outputs of classifiers are combined. These diverse classifiers use majority voting for decision making. For estimating the final accuracy of feature subsets, cross-validation was used. Results demonstrated that in comparison with the three best ensemble methods such as Bagging, AdaBoost, and Random Subspace, proposed approach generated higher rate in three metrics including sensitivity, specificity, and accuracy [76].
2.2.2.11 REIS-based ensemble method
The Resonance-frequency Electrical Impedance Spectroscopy (REIS) – based ensemble method has fused five classifiers. It is a kind of heterogeneous ensemble method. These classifiers are ANN, SVM, Gaussian mixture models (GMM), CART, and LDA. This fusion method has been applied for suspicious breast lesions detection. The lesions are the sign of the risk of having or developing breast cancer. In this investigation, 174 cases were examined. Imaging-based examinations such as mammography, additional views, ultrasound, and magnetic resonance imaging were used as clinical data. For feature selection stage, a genetic algorithm was applied. The Reis-based method uses algebraic combiners for decision making. Actually, this method combines the results of classifiers via three rules, including sum rule, Weighted Sum Fusion Rule (WSFR), and Weighted Median Fusion Rule (WMFR). The performance was evaluated using a leave-one-case-out cross-validation technique. In this study, ROC curves were compared among the ANN, SVM, GMM, CART, and LDA individual classifiers. Without fusion, ANN had a higher rate. Also, Comparison of ROC curves for the single best classifier (ANN) and proposed fusion model with three rules showed that WSFR and WMFR are better than ANN and Sum Rule. So, the weighted median fusion rule is the best fusion approach in this study [77].
2.2.2.12 MV-ACE method
Multi-view based AdaBoost classifier ensemble (MV-ACE) framework is an ensemble method that has integrated multiple views in a straight forward manner, such as the linear combination of different views and AdaBoost algorithm. AdaBoost produced the base classifiers and optimized them. In this study, gene expression datasets were used. This ensemble method has been applied for class prediction from several cancers gene expression profiles, including blood, bladder, liver, prostate, brain, endometrium, and bone marrow. MV-ACE works well for cancer classification by gene expression profiles. This ensemble method uses algebraic combiners for decision making. In this way, the algorithm was run 20 times separately, and the average value was calculated. Also, prediction accuracies were evaluated using 3-fold cross-validation. In this investigation, an accuracy value of the proposed method (MV-ACE) was compared with other best classifier ensemble methods, such as Bagging, MultiBoosting (MB), RF, RSS, and AdaBoost The results showed this approach achieved relatively better performance for most of the data sets [78].
2.2.2.13 DECORATE method
Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples (DECORATE) is another ensemble classifier method that can be categorized as a member of decision integration level class. Decorate is an ensemble method that combines four base classifiers including NB, Sequential Minimal Optimization (SMO) algorithm for training a support vector classifier, C4.5 DT, and forest of random trees. It has been applied for introducing prognostic biomarkers in related to breast cancer and ovarian cancer. For feature selection step, 10 different methods were used as individual classifiers, and then 10 feature vectors were constructed, respectively [79]. Input data for these classifiers is somatic mutation data that is available in TCGA data set [18, 80]. Five individual classifiers which rank candidate genes based on p-values are including OncodriveFM, OncodriveCLUST, MutSig, ActiveDriver, Simon. The five remaining individual classifiers, including FLN, NetBox, MEMo, Dendrix, and FLNP, choose driver genes based on linkage weights. This method also has a layer of feature integration. Input test data is 20,624 genes annotated as protein-coding that was downloaded from NCBI database. Finally, supervised classification is done and DECORATE uses the average posterior probabilities of four above mentioned base classifiers. It is necessary to note when data is limited, and this ensemble method is effective because DECORATE is an approach that creates diverse artificial data. This method has been applied for introducing cancer driver genes, including breast cancer and ovarian cancer. Although DECORATE is grouped in decision integration methods with algebraic combination rule, in a distinct layer, it employs some kind of feature fusion mechanism. During training, the method was run 50 times, and performance evaluation was estimated using 10-fold cross-validation. Results showed that when the training set is small, DECORATE gained higher accuracy than other best ensemble approaches such as Bagging or boosting [79].
2.2.2.14 HyDRA method
Hybrid Distance-score Rank Aggregation (HyDRA) is an ensemble approach that has combined advantages of score and distance methods [81]. The score and distance methods [82] are aggregation techniques. The predictive potency of this aggregation approach is evaluated very high. HyDRA aggregates genomic data based on mutation data and has been applied on several gene sets related to diseases such as autism, breast cancer, colorectal cancer, endometriosis, glioblastoma, meningioma, ischaemic stroke, leukemia, lymphoma, and osteoarthritis. By this method, driver genes for these diseases were prioritized. The proposed approach uses decision templates for decision making strategy because this method ranks driver genes based on different similarity criteria that are combined with statistical tools. In each disease and for disease gene discovery, the performance of HyDRA method was evaluated by Cross-validation. Results showed that performance obtained from HyDRA was higher than other methods such as Endeavor [83] and ToppGene [84] for a majority of quality criteria. The analyses also show that each method has specialized advantages in prioritization for some diseases [81].
2.2.2.15 Stacking IB3-NBS-RF-SVM method
This ensemble approach has combined four well-known individual classifier types, including Instance Based3 (IB3), Naïve Bayes Simple (NBS), RF, and SVM. This method is grouped as a decision integration level tool. It classifies DNA microarray data using biological gene sets such as KEGG gene sets. This ensemble approach has been applied for breast cancer and Leukemia diagnosis. It uses weighted majority voting for decision-making strategy. In this study, the kappa value is calculated instead of the accuracy because it is better criteria for the classification of unbalanced data; then, evaluation of individual classifiers was estimated using a 10-fold cross-validation schema. In this study, the proposed approach gives a better performance in comparison of the various integration methods, including AdaBoost hybrid, Bagging hybrid, Stacking-IB3, Stacking-NBS, Stacking-RF, and Stacking-SVM. The proposed approach is able to generate a ranked list of genes that can be effective for cancer diagnosis and shows meaningful improvement in cancer classification results such as accuracy and kappa values [85].
2.2.2.16 GenEnsemble method (NBS-IB3-SVM-C4.5 DT)
Similar to the previous method is GenEnsemble method. This ensemble method has combined biological knowledge in the form of gene sets for the microarray data classification process. Four base classifiers for these approaches are NBS, IB3, SVM, and C4.5 DT. Clinically, GenEnsemble model has been applied for cancer diagnosis such as breast cancer in bi-class classification issue and leukemia in multi-class classification issue. In this study and in the training phase, each gene set was used as the informed feature selection subsets to train base classifiers and then determine their accuracy. Similar to the previous case in 2.2.17 section, this approach uses weighted majority voting for decision-making strategy too. An internal k-fold cross validation strategy was used for each data set, and GenEnsemble was evaluated over the training data. Although, the Naïve Bayes algorithm as base classifier of Bagging or AdaBoost ensembles gave the best results for the three breast cancer datasets but other evidence showed that the proposed approach achieved better performance compared with other popular ensemble algorithms, such as Bagging, Boosting, IB3, SVM, J48, AdaBoost-IB3, AdaBoost-J48, Stacking-IB3 and Stacking-SVM [86].
2.2.2.17 ADASVM method
ADASVM is an ensemble method that has incorporated AdaBoost with linear SVM classifier. It classifies cancers based on microarray gene expression data using Support Vector Machines Ensemble [87]. In this study, the cancer dataset that is selected as a benchmark was leukemia dataset [28]. Leukemia has two sub-types, including AML and ALL. ADASVM is a suitable algorithm for two class problem. This algorithm resolves defects and dilemmas of AdaBoost and SVM. This fusion method has dealt with the diversity of the AdaBoost algorithm. Also, the boosting mechanism was caused to reduce misclassification rate to improve accuracy. It uses weighted majority voting for decision-making strategy. The main measure for evaluation of AdaBoost is weight error of the component, and if it was higher than 0.5, the process is stopped. In this study, researchers showed that the proposed method outperforms than SVM and KNN classifiers. Results showed ADASVM accuracy was 100%. In return, SVM and KNN accuracy were lower, respectively [87].
2.2.2.18 NB (Naïve Bayes) combiner method
The previous investigations showed that NB combiner could be introduced as a fusion strategy in decision integration level. It combines 100 decision tree classifiers [88]. This integration model has been used with 73 benchmark datasets such as breast cancer [55, 56], Arrhythmia [89, 90], Hypothyroid [91, 92]. These datasets belonged to the UCI Machine Learning Repository. The UCI Machine Learning Repository is an open-access database of machine learning problems. In this study, the NB combiner was compared with other three combination methods, including Majority Vote combiner, Weighted Majority Vote combiner, and Recall Combiner (REC). In this study, 10-fold cross-validation was applied. For each cross-validation fold, the training set was divided into two equal parts, including “proper” training and validation. All base classifiers were validated on the proper training part, but combiners were evaluated on the validation part. Of course, estimation of prior probabilities was validated on the whole training data. Results have shown among mentioned methods; NB combiner was the best. Accuracy value of NB combiner was high. This approach has solved problems with a large number of fairly balanced classes while WMV combiner is successful for problems with a small number of imbalanced classes. The previous studies with simulation had demonstrated that NB combiner estimates are inaccurate, but these results did not show such anomalies when data were real and sufficient [88].
2.2.2.19 Collective approach (correlation, color palette, color proportion, and SVM)
The collective ensemble method operates at the decision integration level. It has combined several methods, including correlation method, color palette approach, color proportion method, and SVM classifier. This fusion method determines CEN17 and HER2 biomarkers status by Fluorescence in Situ Hybridization (FISH) images that are important in breast cancer detection as clinical data. This method uses weighted majority voting for decision-making strategy. Performance of the ensemble approach was confirmed using the statistical evaluation of the mentioned spot recognition system. It was demonstrated that the main advantage of this method is the absolute repeatability of scores in several independent runs. This property is in contrast with human expert decisions which are dependent on mental and physical condition. In this study, the average sensitivity, specificity, and the mean of summed sensitivity and specificity of the proposed fusion approach were compared with different individual methods. Results showed the fusion method gives better efficiency than other methods [93].
2.2.2.20 Rankboost_W
Rankboost _weighting function (Rankboost_W) is a Rankboost algorithm that a heuristic weighting function has been added to it [94]. It is an ensemble method that uses boosting learning techniques for combining different computational approaches as a set of weak features to improve overall performance [95]. Similar to DECORATE method, this approach also has a layer of features integration. It has been applied for gene prioritization related to prostate cancer. In this study that was carried out in 2013, training and test data were genomic data based on mutations for prostate cancer detection. Driver genes and protein-coding genes data were downloaded from Online Mendelian Inheritance in Man (OMIM) and HUGO Gene Nomenclature Committee (HGNC) databases, respectively. It uses algebraic combiners for final decision-making strategy with a novel weighting function. They used the LOOCV method for determining confidence interval estimation. In comparison with other approaches, including ToppGene and ToppNet [87], the performance of the proposed model (Rankboost_W) was better. AUC and mean average precision (MAP) as two performance indicators showed better results for Rankboost_W in comparison with ToppGene method [94].
2.2.2.21 RVM-based ensemble learning
Relevance Vector Machine (RVM) is an ensemble approach that is a combination of AdaBoost and reduced-feature. The RVM has been applied to classify and diagnose different cancers by the construction of a human genetic network. Input data were heterogeneous genomics data such as microarray data. There are three major problems related to heterogeneous genomics data in the construction of a human genetic network. Lack of gold-standard negative set, large-scale learning, and massive missing data values are these problems. This ensemble method has addressed two problems by using kernel-based techniques. AdaBoost helped to solve the problem of large-scale learning, and the reduced-feature model resolved the problem of massive missing data values, which both caused meaningful improvement in performance. 10-fold cross- validation testing was used to evaluate the performance of models. Generally, RVM is an effective approach for encountering the large dimensionality of feature space and the existence of massive missing values. The proposed ensemble method uses algebraic combiners for decision making strategy. In comparison with a robust ensemble approach such as Naïve Bayes Baseline, the proposed model is preferred. Its performance, even with massive missing data values, is high, and this method can be used for classification tasks in biological datasets [96].
2.2.2.22 PSO–ANN ensemble
particle swarm optimization (PSO)–ANN ensemble is an ensemble method that was used in microarray data classification. The critical point of microarray data analysis is related to the fact that only a few numbers of thousands of genes affect the classification results. This fusion approach has been applied for cancer diagnosis, including leukemia, colon cancer, ovarian cancer, and lung cancer by microarray data classification. PSO-ANN approach has four steps. In step 1 and for gene selection, Fisher-ratio is used. Also, for feature selection and dimension reduction, correlation analysis is employed. In step 2, feature subsets are re-sampled with PSO algorithm, and several base classifiers are trained. In step 3, appropriate base classifiers are chosen. In the step 4, selected base classifiers are combined using an Estimation of Distribution Algorithms (EDAs). In this study, the ANN was used as the base classifiers and was trained with the PSO algorithm. This intelligent ensemble method uses algebraic combiners for decision making. For each data set and evaluation of classification, leave-one-out cross-validation was used. In this investigation, the proposed method was compared with single PSO–ANN, SVM, C4.5, Neuro-fuzzy, and KNN. On the basis of this comparison, the accuracy of classification was improved. Results showed the PSO–ANN ensemble model offers the best overall classification accuracy [97].
2.2.2.23 MF-GE system
The multi-filter enhanced genetic ensemble (MF-GE), hybrid ensemble model, includes two sequentially phases. The first phase consists of a filtering process, and the second phase includes the wrapper process. In phase 1, genes in the microarray dataset were scored using multiple filtering (MF) algorithm and obtained scores were integrated. In the wrapper process, genes were selected with genetic ensemble (GE) algorithm [98]. This approach has been applied to four benchmark microarray datasets for gene selection related to leukemia [31], colon cancer [32], liver cancer [99] and mixed-lineage leukemia (MLL) [100]. This fusion method can be effective for binary-class and multi-class classification problems. Also, the hybrid system overcame the overfitting problem of the GE algorithm. It was used both majority voting and algebraic combiners for decision making, but majority voting generated better classification results. In this study MF-GE system compared with the original GE system and the GA/KNN hybrid. In this study, the double cross-validation process was applied, including internal cross-validation and external cross-validation. The internal cross-validation was done in gene selection phase while the external cross-validation was used for evaluation of selection results. Results showed the proposed approach (MF-GE system) achieved higher classification accuracy value, generated more compact gene subset, and led to the election results more quickly [98].
2.2.2.24 Evolutionary Ensemble Model
In a study, an ensemble model was designed that integrated results of three modules of evolutionary Multilayer Perceptron Neural Networks (MLPNNs). This approach is a parallel ensemble method. Four techniques including polling, maximum, minimum, and weighed average were used for integration, separately. An evolutionary ensemble model is suitable for breast cancer correct diagnosis [101]. Data were taken from Wisconsin Diagnostic Breast Cancer dataset [102, 103] from the UCI Machine Learning Repository that is contained data vectors from 569 patients. About 70% of the total dataset were used as training data using a genetic algorithm, and 30% of data were used as testing data. Each module used algebraic combiner for decision making strategy. Then, voting came into operation between modules. The results demonstrated that Maximum fusion operator generates the best performance when was compared with other fusion technique. The authors of the mentioned work proposed that considering the type of their training method, validation data is not necessary. Accuracy value obtained using the maximum integration operator showed the best performance. Also, sensitivity, specificity, False Positive Rate (FPR), and False Negative Rate (FNR) values were reported [101].
2.2.2.25 Optimized naïve-Bayes model
This classification fusion system is a heuristic algorithm that improves the performance of the naïve-Bayes classifier. It integrates different heterogeneous data, including clinical, laboratory, and flow cytometry. The dataset was included of 112 cases of B-Cell Chronic Lymphocytic Leukemia (B-CLL) patients. The mentioned dataset was obtained from clinical, general laboratory (hematological) examinations and flow cytometry analysis. The proposed method uses algebraic combiner for decision-making strategy. In this study, data classification was done using naïve-Bayes, and performance was evaluated by 10-fold cross-validation. The proposed optimized naïve-Bayes model showed high classification accuracy values. Results demonstrated that including the flow cytometry parameters can improve performance [104].
2.3 Model Integration
Model integration implies when we construct the model, the integration should be done at the model level. In this approach, each model transforms the input data into the format required, and then models are combined. By linking models, a single model comes to decisions to be made based on it. This method can be developed using different tools [105]. One of the tools is based on Bayesian networks that are mentioned in the literature.
2.3.1 Bayesian networks-based model integration (1,2)
In a study experimented in 2006, Bayesian networks have also been used concerning breast cancer. Researchers used three models for integration. In the first model named full integration, they integrated two data sources and then built a Bayesian network based on integrated data and two data sources, including the clinical and microarray data, were combined. So, at this step, data integration was just done. In the second modal in this approach named decision integration model, an independent model was built for each data source, and then the outcome decision from these models was combined based on weighting policy. In the last modal named partial integration, similar to the second one, and the independent model was developed for each data source, and then the models were linked and integrated for building a single combined model and final decision making. This method used model integration for decision making. The models mentioned above were used for predicting metastatic state in breast cancer. The training set was selected 100 times, randomly and proposed methods performance was evaluated using ROC (Receiver Operator Characteristic) curves analysis. The obtained results revealed that partial integration achieved higher performance and proved to be the best method for data integration [106].
In another study, a novel Bayesian hierarchical model-based method has been proposed. This approach uses single-nucleotide variants (SNVs) and insertions and deletions (InDels) in whole genome sequence data as mutation data [107] obtained from sequencing of the breast cancer cell lines dataset that are available in TCGA [108] and data can be downloaded from https://gdc.cancer.gov/files/public/file/TCGA_mutation_calling_benchmark_files.zip. It first generates two models include of the tumor model and error model by setting partition rules on paired-end reads and datasets, and then this framework integrates these models for mutation calling associated with breast cancer through input data partitioning. So, it is confirmed that the proposed method can improve performance using incorporating heterozygous single nucleotide polymorphisms (SNPs) and strand bias information comparison with other Bayesian network classifiers [107].