1 Introduction

The significance of disease classification and prediction can be observed from the previous years. The important properties and features given in a dataset should be well-known to identify the exact cause along with the symptom of the disease. Artificial Intelligence (AI) has shown promising results by classifying and assisting in decision making. Machine Learning (ML), a subset of AI, has accelerated many research related to the medical field. Whereas, Deep Learning (DL) is a subset of ML that deals with neural network layers, analyzing the exact features required for disease detection [34, 71, 94]. The existing studies from 2014 to present, discusses many applications and algorithms developed for enhancing the medical field by providing accurate results for a patient. Using data, ML has driven advanced technologies in many areas including natural language processing, automatic speech recognition, and computer vision to deliver robust systems such as driverless cars, automated translation, etc. Despite all advances, the application of ML in medical care remained affected with hazards. Many of these issues were raised from medical care stating the goal of making accurate predictions using the collected data and managed by the medical system.

AI examines a given dataset using various techniques to get the required features or highlights from a huge amount of data resulting in difficulty for tracking down an ideal arrangement of significant features and excluding repetitive ones. Considering such features is inconvenient and accuracy metrics becomes erroneous. Hence, choosing a small subset from a wide scope of features will upgrade the efficiency of the model. Subsequently, the exclusion of inconvenient and repetitive features will decline the dimensionality of the information, speed up the learned model similar to boosting [37]. From the existing features, the significant features are extracted using practical approaches such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Particularly, choosing a feature has two essential clashing objectives, first, boosting the presentation of arrangement and second, limiting the count of features to conquer the issue of dimensionality. Hence, selection of features is considered as an essential task for aforementioned objectives. Later, research related to the features improvement was enhanced by using choice-based multi-target strategies. Thus, in this review, strategies to choose efficient features will be focused.

Cancer disease was identified using multiple techniques of image segmentation, feature selection, and regression using Root Mean Square Error (RMSE), with the parameters such as recognizing patterns, detecting objects, and classifying the image [7]. Brain tumor was detected using six classifiers and Transfer Learning (TL) techniques for image segmentation with Magnetic Resonance Imaging (MRI) of the brain [28]. Also, a TL approach was implemented to identify lung cancer and brain disease in [55]. It analyzed MRI and Computer-Tomography (CT) scan images by using supervised learning Support Vector Machine (SVM) classifiers. The image analysis process has been well understood in the existing studies. However, the techniques using ML and DL are continuously being updated. Therefore, it is a complex task for researchers to identify an accurate method for analyzing images and feature selection techniques varying with every method. The key contributions of this study include:

  1. (i)

    Classification of diseases after reviewing primary studies,

  2. (ii)

    Recognition of various image modalities provided by existing articles,

  3. (iii)

    Description of tools along with reliable ML and DL techniques for disease prediction,

  4. (iv)

    Dataset description to provide awareness of available sources,

  5. (v)

    Experimental results using MRI dataset to compare different ML and DL methods,

  6. (vi)

    Selection of suitable features and classifiers to get better accuracy, and.

  7. (vii)

    Insights on classification as well as review of the techniques to infer future research.

The significance of this review is to enable physicians or clinicians to use ML or DL techniques for precise and reliable detection, classification and diagnosis of the disease. Also, it will assist clinicians and researchers to avoid misinterpretation of datasets and derive efficient algorithms for disease diagnosis along with information on the multiple modern medical imaging modalities of ML and DL.

The study presented consists of 11 sections. The organization of the section is described as follows: Section 2 discusses the background of study, Section 3 discusses the review techniques, search criteria, source material and the quality assessment. Section 4 summarizes the current techniques and important parameters to acquire good accuracy. Section 5 gives an insight of medical image modalities. Section 6 sums up the tools and techniques being used in ML and DL models. Section 7 discusses the datasets used by the authors previously and gives an insight of data. Section 8 represents the experimental section using ML classifiers and DL models over brain MRI dataset. Section 9 recaps the analytic discussion about the techniques, datasets being used, tools in ML and DL, journals studied for the given article. Discussion, conclusion and future scope is discussed in Sections 10 and 11, respectively.

2 Background

This section discusses the preliminary terms which are required to comprehend this review. Further, it also presents the statistical analysis of ML and DL techniques used for medical image diagnosis.

2.1 Machine learning

ML is a branch of AI where a machine learns from the data by identifying patterns and automates decision-making with minimum human intervention [96, 24, 12]. The most important characteristic of a ML model is to adapt independently, learn from previous calculations and produce reliable results when new datasets are exposed to models repeatedly. The two main aspects include (i) ML techniques help the physicians to interpret medical images using Computer Aided Design (CAD) in a small period of time, and (ii) algorithms used for challenging tasks like segmentation with CT scan [81], breast cancer and mammography, segmenting brain tumors with MRI. Traditional ML models worked on structured datasets where the techniques were predefined for every step, the applied technique fails if any of the steps were missed. The process of evaluating the data quality used by ML and DL algorithms is essential [1622, 61]. Whereas, new algorithms adapt the omission of data based on the requirement for robustness of the algorithm. Figure 1 illustrates the process used by ML algorithms for the prediction and diagnosis of disease.

Fig. 1
figure 1

ML process [10]

2.2 Deep learning

DL models enable machines to achieve the accuracy by advancements in techniques to analyze medical images. In [58], the heart disease was diagnosed using the labelled chest X-Rays, cardiologist reviewed and relabelled all the data while discarding the data other than heart failure and normal images. To extract the exact features from the images, data augmentation and TL were used with 82% accuracy, 74% specificity and 95% sensitivity for heart failure. In [14], an automatic feature selection, using histopathology images with the labelling of positive and negative cancer images, was developed with minimum manual work. Two networks named Deep Neural Network (DNN) 2-F and DNN1-F were used with PCA to reduce features in DNN whereas for unsupervised feature learning a single-layer network of K-means centroids was used. Later, the results of unsupervised (93.56%) and supervised (94.52%) learning were compared. The DL model automates the feature extraction procedure to handle data efficiently [14, 74]. Figure 2 depicts the process used by DL algorithms for the prediction and diagnosis of various diseases.

Fig. 2
figure 2

DL process

To process the medical images for better prediction and accuracy, ML and DL techniques were used as shown in Figs. 1 and 2, respectively. As input, medical images from various modalities are taken into consideration, and then algorithms are applied to these images. Further, the input image is segmented based on various factors, these segments were used to extract the essential and maximum features using feature extraction techniques. After the extraction of the required features, they are further refined to obtain actual features used for the identification of diseases [60]. Also, ML approaches were used to denoise the medical images for better prediction and accuracy in [46]. Once the feature selection and noise removal from the data are achieved, the classification of the images according to the disease using classifiers like SVM, Decision Tree (DT), etc. was attained.

ML is the process where computers learn from data and use algorithms to carry out a task without being explicitly programmed. It uses pattern recognition to make predictions with new dataset. Alternatively, DL is modeled according to the human brain including a complex structure of algorithms enabling machines to process images, text and documents. It uses layered-structure algorithms such as Convolutional Neural Network (CNN), Artificial Neural Network (ANN), etc., to analyze the data with logics. Comparatively, DL is more capable of processing huge amount of data than ML models.

3 Review technique

In this section, an overview of the technique used to conduct this review systematically is discussed. It provides the details of the electronic databases used to search, retrieve information, and discuss the research questions framed to execute the review successfully. The systematic review guidelines implemented by [49, 50] were followed for this literature review.

3.1 Research questions

In this review, following review questions will be discussed:

  1. 1.

    What are the present techniques of ML and DL used for medical imaging?

    1. 1.1

      What are the considered parameters while selecting the classifiers?

    2. 1.2

      What are the evaluation metrics used to evaluate classification models?

  2. 2.

    What are various medical image modalities for classifying the diseases?

  3. 3.

    What are the tools and techniques used for medical imaging?

  4. 4.

    What are various datasets used by several researchers in the domain of healthcare?

  5. 5.

    What are the results of comparative analysis of ML classifiers and DL models based on experiments using MRI dataset?

3.2 Source material

The guidelines given in [49, 50] are followed for searching the existing literature related to the area of ML and DL in medical imaging. Following electronic database sources are used for searching:

3.3 Search criteria

This review consists of the articles written in English language between the years 2014–2022. The review process can be considered as the filtering process for attaining the quality research articles with the inclusion and exclusion criteria at various stages. The search was based on the keywords as shown in Table 1 to retrieve research articles from various journals, conferences, book chapters, and other sources.

Table 1 Keywords used

The journals and conferences included were taken from IEEE, Science Direct, Springer, Oxford Publication, etc. The article selection method is depicted in Fig. 3. As depicted in Fig. 3, the initial search consisted of 16,900 articles which were refined to 250 based on the specific keywords used as shown in Table 1. Then 100 articles were retrieved based on their titles and were reduced to 75 articles based on their abstract and introduction. Finally, 40 articles were selected as primary studies based on the criteria of exclusion and inclusion.

Fig. 3
figure 3

Article selection method

3.4 Quality assessment

The quality of this review was assured after inclusion and exclusion criteria discussed in sub-section 3.3. These primary studies were from various journals, conferences, workshops, and others (manuscripts, online records, and society publications). To retrieve the quality articles, analysis of each article was done to maintain fairness and validation (external and internal) of the results based on the CRD guidelines [50].

Table 2 presents the top 20 highly influential and cited articles related to the classification of diseases, identification of tools and techniques, explanation for the cause of disease, and solutions to the diagnosed disease (source: https://scholar.google.co.in).

Table 2 Top 20 cited articles

3.5 Data extraction

Initially, many challenges were faced to extract the relevant data for this review, therefore, some researchers were approached to acquire the necessary information. The method for extracting the required data in this review is as follow:

  • One of the authors extracted the data after a thorough review of 40 articles.

  • The acquired results of the review were cross checked by another author to maintain consistency.

  • During the process of cross checking (in case of a conflict), issues were resolved by meetings between the authors.

4 ML and DL techniques for medical imaging

Research question 1 is answered in this section to provide an overview of the current techniques of ML and DL for medical imaging. Further, followed by various parameters considered for selecting the classifiers and the evaluation metrics used to evaluate classification models. The existing literature review is divided according to the diseases such as breast cancer, brain tumor, lung disease, diabetes, multiple disease detection, etc.

4.1 Breast disease

In this subsection, articles related to breast disease symptoms, detection, classification, prediction and diagnosis using ML and DL methods are discussed. In [33], significant features were identified using BI-RADS (Breast Imaging Reporting and Data System) to develop a CAD system for obtaining breast ultrasound. Also, 10-fold cross validation technique was used upon the benign and malignant lesions. As a result, 77% accuracy was achieved using the SVM classifier. However, some methods with a few algorithms handling the vast variety of data need to be understood and analyzed precisely [84]. CNN was used to train the system with the available clinical data and to comprehend the complex structure. Moreover, it was suggested to study radiomics and expansion of CADx to get the tumor signs using a CAD system. Breast cancer disease was classified using the parameters like Area Under Curve (AUC), sensitivity, and specificity [100]. A CAD system was developed using CNN where a large number of features were required, using multiview features. These features provide the maximum details of the image data to be extracted for the accuracy of detection and classification.

DL was used for analyzing medical images and also, the limitations along with success of DL techniques for medical imaging were discussed in [86]. Recent ML and DL technologies were reviewed for the classification and detection of medical imaging modalities [39]. It provided an insight on the progress of the technology used in the medical field. Various ML techniques used for image processing and DL techniques with the architecture of the algorithm were discussed. To study the technologies, the evaluation of various images such as histological images, thermography images, mammography, ultrasound and MRI using the CAD system was explored. Moreover, the system included ML techniques like SVM, ANN, DT, Naive bayes, K-Nearest Neighbor (KNN), etc.

4.2 Brain disease

The concept of TL was used for image segmentation where the MRI scan of the brain was segmented using voxel wise classification [7]. ML classifiers were applied for the classification of multiple diseases. Later, the results obtained were compared with the existing results to detect the disease.

A brief introduction of DNN in medical image analysis to diagnose the brain tumor using brain tissues is provided in [56]. It indicated the ways for applying DL to the entire process of MRI scanning, image retrieval, segmentation and disease prediction. It also focused on image acquisition to image retrieval, and from feature segmentation to prediction of disease. The entire process was divided into two parts: (i) the signal processing of MRI including the image restoration and image registration, and (ii) usage of DL for disease detection and prediction-based reports in the form of text and images. Also, the influence of DL in medical imaging was discussed in [82]. Image segmentation approaches using DL included tumor segmentation, brain and lung’s structure with bone tissues or cells. Patches were taken as input and 2-Dimensional Convolutional Neural Network (2D-CNN) was used to preprocess these at a later stage.

4.3 Lung disease

DL has the ability to automate the process of image interpretation which enhances the clinical decision making, identifying the disease and predicting the best treatment for the patient by reviewing the pros and cons of the DL techniques [51]. These techniques were used for the cardiovascular medication, following are the steps for implementing DL model: (i) problem identification, (ii) data selection, (iii) hardware and software selection, (iv) data preparation, (v) feature selection, and (vi) splitting of data for training as well as validation process. In [13], a disease was analyzed automatically using labeled data and achieved the accuracy by processing medical images using DL models. The automatic prediction of the disease using ML techniques and the concept of big data was summarized to detect the patterns [23]. The advantages and disadvantages for each algorithm were also discussed.

4.4 Diabetes

A comparative analysis of the classification algorithms based on iris images, using an iridology chart, was done for the diagnosis of diabetes [76]. Type-2 diabetes was detected by identifying the center of the pupil of an eye at the early stage using the I-Scan-2. Also, a filter-based feature selection method was used with the combination of five classifiers namely binary tree, SVM, neural network model, Random Forest (RF) and adaptive boosting model. Later, in [77] a study was compiled using the textural, statistical and various features (62 features of iris) to detect the same disease, however, an iridology chart was not used. ML and DL techniques were used to diagnose the errors in existing diagnostic systems [81]. These techniques were used to analyze the medical images and extract the features which are required for the diagnosis of errors in existing diagnostic systems. Both supervised and unsupervised algorithms were used for the prediction of the disease in specific datasets.

It was observed that DL technique is a way more powerful to investigate medical images [65]. Various techniques such as image classification, object detection, pattern recognition, etc. were used for the proper decision-making. It improved medical treatments by predicting the early symptoms of a disease. Moreover, an overview of ML and DL techniques used in the medical field was given for providing knowledge to the future researchers. In [78], techniques such as rubber sheet normalization, ML classifiers, PCA, etc. were used with self-created data and computed six parameters (i) accuracy, (ii) sensitivity, (iii) specificity, (iv)AUC, (v) precision, and (vi) F-score for accurate prediction of Type-2 diabetes.

4.5 Multiple disease detection

Multiple diseases were identified with different radiology techniques like MRI imaging for breast cancer along with brain tumor, CAD for breast cancer along with skin lesions, and X-Rays for chest analysis [46]. Also, ML techniques were used to attain better accuracy with denoising techniques including homographic wavelet, soft thresholding, non-homomorphic and wavelet thresholding. A CAD system using CNN was proposed to diagnose breast lesions as benign and malignant to assist the radiologists [100]. It was implemented using Inception-v3 architecture to extract the multiview features from Automated Breast Ultrasound (ABUS) images. For the implementation of the model, 316 breast lesions data were trained and evaluated. ML feature extraction scheme was compared with the given method, resulting in 10% increase in AUC value.

A review on image fusion was presented in [42], it reduced the randomness and improved the quality of available images. Various methods and challenges related to image fusion were also summarized. In [44], ML and DL techniques focusing on small labeled dataset were discussed as it was considered one of the important factors in decision making. Further, noisy data in medical images was analyzed with pros and cons of various ML algorithms.

In [4], data augmentation techniques were used to evaluate the dermatology diseases such as acne, atopic dermatitis, impetigo, psoriasis, and rosacea. To diagnose the mentioned diseases, the model was retrained in two phases: (i) with data augmentation, and (ii) without data augmentation using TensorFlow Inception V3. For statistical analysis, both the models were then compared and six parameters namely: (i) Positive Predictive Value (PPV), (ii) Negative Predictive Value (NPV), (iii) Matthew’s Correlation Coefficient (MCC), (iv) sensitivity, (v) specificity, and (vi) F1 score were calculated resulting in an increase of 7.7% average correlation coefficient.

Multiple diseases like diabetes, heart disease, liver disease, dengue and hepatitis were identified by recognizing the pattern in the available data and classifying them further using ML classifiers [29, 27, 47]. It used high-dimensional and multimodal dataset to predict the diseases accurately. The deteriorating condition of a patient was predicted using ML techniques like ML pipelines, classifiers (SVM and 5-fold cross-validation) with the baseline variables from MRI imaging [79]. AI applications in medical imaging, DL tools for the prediction and pattern recognition were described in [87]. In addition, apart from AI techniques, ANN and CNN were also useful for predicting the disease by analyzing the image pattern and classification of the disease can be carried out with the help of classifiers [62, 63].

Various algorithms were reviewed to detect the error in the diagnosis system implying the importance of ML and DL for early diagnosis of the disease [81]. Whereas, [104] discussed the three main challenges: (i) to cope up with image variations, (ii) learning from weak labels, and (iii) interpreting the results with accuracy for the diagnosis of cancer through given medical images. It concluded that TL was used to cope up with image variations. The concept of Multiple Instance Learning (MIL) and weighted TL were used to overcome the weakly labeled data and improve the accuracy of the disease classification for better medical results, respectively. It was suggested to comprehend the relation between image label and image collection instead of learning about the individual instance. The main advantage of the used technique is that it does not require the local manual annotations.

Table 3 represents the current ML and DL techniques for medical imaging, various parameters considered while selecting the classifiers, identified disease and evaluation metrics. Also, early tumor detection can assist clinicians to treat patients timely.

Table 3 Summary of existing works related to ML and DL techniques for medical imaging

5 Modalities for medical image

Research question 2 (refer subsection 3.1) is addressed in this section, various medical image modalities (I-Scan-2, CT-Scan, MRI, X-Ray, Mammogram and Electrocardiogram (ECG)) used for classifying the diseases in the primary studies are shown in Table 4. As observed, following modalities were used for the evaluation of medical data using ML and DL techniques.

  • MRI: It uses magnetic resonance for obtaining electromagnetic signals. These signals are generated from human organs, which further reconstructs information about human organ structure [91]. MRIs with high resolution have more structural details which are required to locate lesions and disease diagnosis.

  • CT-Scan: It is a technology which generates 3-D images from 2-D X-Ray images using digital geometry [88].

  • Mammogram: For the effective breast cancer screening and early detection of abnormalities in the body, mammograms are used. Calcifications and masses are considered as the most common abnormalities resulting in breast cancer [5].

  • ECG: It is used to measure the heart activity electrically and to detect the cardiac problems in humans [8, 9, 105].

Table 4 Modalities for medical imaging and digital signal

6 Tools and techniques

This section addresses research question 3 (refer subsection 3.1). After a thorough analysis of primary studies, various techniques (refer Table 6) and tools (refer Fig. 4) related to ML and DL techniques for healthcare were identified [67, 89]. It was observed that techniques have used scanned images with the help of image modalities such as MRI, CT-Scan, X-Rays, and so on. Also, in order to automate the process of image segmentation and classification, programming languages like R, MATLAB and Python were used to obtain accurate results. The subsections 6.1 and 6.2 precisely explain the tools and techniques used in primary studies for medical images, respectively.

Fig. 4
figure 4

Tools used for medical image analysis

6.1 Tools used for medical images

Figure 4 depicts the percentage of various tools (Table 5) used in the primary studies for the implementation of ML and DL models where MATLAB and NumPy have the percentage of 38 and 37, respectively, which signify the popularity of these tools among researchers. R and TensorFlow are the second most used tools with a percentage of 13 and 12, respectively.

Table 5 Tool description

6.2 Techniques used for medical images

This subsection includes the description and identification of the most common ML and DL techniques (i) used for disease classification, detection and diagnosis, (ii) based on type of disease, and (iii) used for EEG and MEG data processing.

6.2.1 Description of techniques

  • CNN: It is a combination of DNNs which comprises three components, used to analyze the images (Refer Fig. 5). The components of CNN are as follow:

    1. a.

      Convolutional Layer: It is responsible to apply the filters systematically to create feature maps for summarizing features present in the input image.

    2. b.

      Pooling Layer: It is used for ordering the repeated layers in a model. It operates on each feature map, received from the convolutional layer, to produce a new set of feature maps pooled together. Pooling operation is used to reduce the feature map size with required pixels or values in each feature map, hence, reducing the overfitting problem. It consists of two main functions namely, average pooling and maximum pooling.

    3. c.

      Fully-Connected Layer: It is simply the feed-forward neural network where input is received from the final pooling layer. Based on the extracted features, a fully connected layer predicts the image class.

  • ANN: The flowchart shown in Fig. 6 depicts the working of ANN architecture. The model extracts the data required from the input image to further normalize it accordingly. While the images are normalized, random weights are assigned to all the connections present in the network. Furthermore, the dataset is divided in the ratio of 80:20. Then the training algorithm is selected for the error attainment, and if errors are identified then the weights should be recalibrated. At last, the model is tested and validated for further evaluation.

  • TL: It introduces a concept of transferring selected features from a predefined model to another model for solving the problem. It selects the feature or learns from the previous model and applies the features or learning to the new model to address different issues. ML models are capable of addressing only one specific task, however, TL can be applied to more than one problem making it more reliable and efficient (Refer Fig. 7).

  • RF: Fig. 8 illustrates the working of RF algorithm, where the algorithm randomly selects the data from a given dataset. Further, the GINI index given in Eq. 1 is applied to select the best possible split of the dataset. The splitting is applied to the dataset until the dataset becomes too small for splitting.

    $$\boldsymbol{G}\boldsymbol{I}\boldsymbol{N}\boldsymbol{I} \left(\boldsymbol{T}\right)=1-{\sum }_{j=1}^{n}{p}_{{j}^{2}}$$
    (1)

    where,

    T:

    dataset

    n:

    number of classes

    pj:

    relative frequency of class j in T

  • DT: It is a supervised ML algorithm which divides the problem into small sub problems. It consists of root node, internal node and leaf node. As shown in Fig. 9, root node, internal node, and leaf node depict the optimized version of the best selected feature, new subsets or features and outcome of each internal node, respectively.

  • SVM: It is a supervised ML algorithm used for the classification and regression problems. It is well known for predicting the class of unknown data. Also, it categorizes the unknown data into one of the two categories based on the labeled dataset (Refer Fig. 10).

Fig. 5
figure 5

CNN architecture

Fig. 6
figure 6

ANN architecture

Fig. 7
figure 7

TL architecture

Fig. 8
figure 8

RF architecture

Fig. 9
figure 9

DT architecture

Fig. 10
figure 10

SVM architecture

6.2.2 ML and DL techniques

Table 6 summarizes ML and DL techniques such as Naïve bayes [43, 69], KNN [6], DTs [36, 48], neural networks, and SVM [59, 73, 90] which are used for medical imaging in primary studies. Here, column 1 represents articles and row 1 represents various techniques. Further, “+” and “-” signify that the article does and does not support the corresponding technique, respectively. The most reliable ML and DL techniques based on the type of disease are shown in Table 7. The most significant ML and DL techniques for EEG and MEG data processing are shown in Table 8.

Table 6 ML and DL techniques used for medical imaging
Table 7 ML and DL techniques based on the type of disease
Table 8 ML and DL techniques for EEG and MEG data processing

7 Dataset description

Following section addresses the research question 4 (refer subsection 3.1) by providing the details of the datasets used in primary studies for implementing ML and DL algorithms. Table 9 summarizes the description of dataset(s) such as MRI, X-Rays, lesion data, infra-red images and CT-Scan. The accessibility to a dataset is divided as (i) public (available at online repositories), and (ii) own created (created by the authors).

Table 9 Dataset description

8 Experimental description

Research question 5 (refer subsection 3.1) is addressed in this section. MRI dataset is used for the experiments to show the comparative analysis of ML classifiers and DL models. Dataset¹ description and experimental setup are discussed in subsections 8.1 and 8.2, respectively. Similarly, the methodology and results are discussed in subsections 8.3 and 8.4, respectively.

8.1 Dataset description

The experiments to classify the brain tumor include the publicly available tumor dataset. (https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset). The MRI dataset contains the 711 images of meningioma tumor and no tumor. Dataset is divided into two parts: testing and training with different image resolutions.

8.2 Experimental setup

The whole series of experiments were performed on a 64-bit computer with an Intel(R) 221 Core(TM) i3-10110U CPU @ 2.10 GHz 2.59 GHz, 8GB RAM. To train and validate the model, code was implemented in python language in Google colab platform.

8.3 Methodology

Figure 11 depicts the methodology used in the experiments for disease classification. It is described as follows:

  1. 1.

    Import dataset: Dataset¹ is retrieved from the public website which is divided into two categories namely: no tumor and meningioma tumor. The dimensions of images given in the dataset were different from one another, which was further resized to 200 × 200.

  2. 2.

    Label dataset: Dataset is labeled in the form of 0 and 1, where 0 and 1 indicate the data having no tumor and data having meningioma tumor, respectively.

  3. 3.

    Split dataset: Further, the dataset is splitted in the ratio of 80:20 for training (80%) and testing (20%) dataset.

  4. 4.

    Feature scaling and feature selection: ML algorithms work on numbers without knowing what the number represents. Feature scaling helps to resolve the given problem by scaling the features into a specific defined range, so that one feature does not dominate the other one. In this experiment, PCA technique is used to reduce the feature count and select the required features.

  5. 5.

    Apply ML classifiers: For this experiment, ML classifiers (SVM, RF, DT, LR) and DL models (CNN, ResNet50V2) are used, which further classified the dataset into two categories i.e., 0 and 1.

  6. 6.

    Prediction and testing the model: The model was tested with testing data (20% of the dataset) and predicted the disease accurately for the given dataset.

  7. 7.

    Metrics calculation: The prediction for dataset using classifiers is illustrated with the help of a confusion matrix shown in Fig. 13. It calculates the four parameters, TP, TN, FP and FN, along with the accuracy metrics.

Fig. 11
figure 11

Methodology used for disease prediction

8.4 Results

This subsection discusses the results obtained by ML classifiers as shown in Fig. 12; Table 10. In Fig. 12a, b, c, and d illustrate the confusion matrix obtained from SVM, LR, RF, and DT, respectively. Table 10 shows the values of accuracy obtained after implementing the considered ML classifiers and DL models for the MRI dataset. The results show that CNN and RF have better accuracy with 97.6% and 96.93%, respectively.

Fig. 12
figure 12

Confusion matrix for a SVM, b LR, c RF, and d DT

Table 10 Accuracy results for MRI images using ML classifier/DL model

9 Analytical discussion

The primary studies were analyzed based on the publisher citation count, year wise publications, keywords, various diseases, techniques, imaging modalities and type of publication.

9.1 Publisher by citations

A schematic view of the influential publishers in the concerned domain is presented by the citations of the articles published in it. Figure 13 shows all the publishers considered for this review in between 2014 and 2022. Moreover, it depicts the number of citations of ML and DL articles with respect to the publishers in Fig. 14a and b, respectively. Due to many types of indexing procedures along with time, there is a variation in the count of citations in Google Scholar. It was observed that most of the articles for ML and DL were published in ScienceDirect and IEEE publishers with the maximum citation 2425 and 42,866, respectively.

Fig. 13
figure 13

Publisher by number of citations for a ML articles b DL articles

Fig. 14
figure 14

Year wise publication of ML and DL in healthcare

9.2 Scholarly articles published between 2014 and 2022

In this subsection, Fig. 14 depicts that out of 40 primary studies, the most published articles for ML were from the year 2020 with a count of 10, which is equivalent to 25% of the total. Followed by the year 2021 with 8 (20%), 2019 with 6 (15%), and 2022 with 4 (10%). Other years like 2017 and 2018 have the same count of 3 with 7.5%, 2014 and 2016 have the same count of 2 with 5%, and 2015 has the count of 1 with 2.5%. Thus, it can be observed that the maximum number of articles for primary study was considered from the year 2020 and minimum from 2014 to 2017.

9.3 Most commonly used keywords in the primary studies

Word cloud is the simple way to identify the relevant terms and themes utilized in the referenced research articles. Figure 15 depicts the word cloud which represents larger font for the most often used keywords and smaller font for less frequent keywords.

Fig. 15
figure 15

Word cloud for frequently used keywords

9.4 Disease types

Figure 16 depicts the percentage of multiple diseases diagnosed in the primary studies. As observed, breast disease is the most common disease with the highest percentage (21%) among all. Brain tumor took the second place (18%) followed by diabetes (16%) and lung disease (16%). Also, other diseases such as eye, liver, skin, hepatitis and cancer were diagnosed using various techniques.

Fig. 16
figure 16

Percentage of diseases

9.5 Publication by techniques and statistical analysis of techniques

It was observed that researchers have used multiple techniques to attain better results as shown in Table 5. For classification, ML classifiers like SVM, RF and Naïve bayes were combinedly used for the same. Detection was performed using neural networks such as ANN or CNN, and TL was performed frequently due to its capability of breaking down the large datasets. Figure 17 depicts the percentage of various techniques used in primary studies. It summarizes that SVM (20%) is the most widely used technique for medical image classification.

Fig. 17
figure 17

Percentage of ML and DL techniques in healthcare

The statistical analysis of ML and DL techniques for medical diagnosis is represented in Fig. 18.

Fig. 18
figure 18

Statistical analysis of ML and DL techniques for medical diagnosis

9.6 Imaging modalities

Figure 19 demonstrates the multiple image modalities used for the evaluation of medical images. However, MRI/X-Ray dominates the subject area with 45%. The second most used modality is CT-Scan (30%), followed by mammogram (10%) and I-Scan-2 (10%). Moreover, to automate the process of retrieving and analyzing the features, computer modalities such as CAD was included for the detection of hepatitis and cancer [55, 60].

Fig. 19
figure 19

Percentage of modalities used in medical imaging

9.7 Type of publication

Figure 20 illustrates the distribution of articles according to the type of publications considered for this review. Majority of the articles were considered from journals with 70%, book Chaps. (8%), conference proceeding papers (7%), workshop articles (2%) and others (13%) including the society articles, online database articles, articles from publications like Bentham Science, springer archives and the transcript.

Fig. 20
figure 20

Percentage of type of publication

10 Discussion

From this study, it was observed that the variability in the literature occurred due to uncertainty of the evaluated data and models (refer Fig. 21). Data uncertainty was caused due to the multiple sources such as transmission noise, missing values and measurement noise. Whereas, model uncertainty was observed due to the less understanding of architecture and prediction of future data with parameters. The observed uncertainty was helpful to attain different results with various methods. Recently, many advanced technologies were introduced to attain enormous amounts of raw data in different scenarios.

Fig. 21
figure 21

Data and model uncertainties

Further, while reviewing the literature, it has been observed that focusing on every aspect of data (noisy or clear) is important as it impacts the results. The utilization of an appropriate algorithm to analyze images can be used for increasing the success ratio. Thus, variation in expected standard results is due to the use of raw data which may incorporate a certain amount of noise (refer Fig. 22). CNN is not much sensitive to the noise due to which it can extract information from noisy data [44]. Moreover, Hermitian basis functions were used for extracting the accumulated data from the ECG data which reduce the effects of Gaussian noise.

Fig. 22
figure 22

Noisy data [11]

Therefore, dealing with the uncertainty of data and models with ML along with DL techniques is the most important issue to be handled by researchers. These techniques are useful for obtaining accurate and better results for decision making in every respective domain [2, 345, 64, 75, 93]. Therefore, there is a need to deal with the variance in ML and DL algorithms such as RF, Rubber Sheet Normalization, DT, bagging-boosting, ANN, CNN, SVM, TL, Bayes Net, and GLCM. Further, such strategies can be used to deal with ambiguity in medical data for achieving high performance. Based on this review, it has been observed that medical professionals may be able to treat tumors promptly if they are identified early.

11 Conclusions and future work

This study provides an overview of various ML and DL approaches for the disease diagnosis along with classification, imaging modalities, tools, techniques, datasets and challenges in the medical domain. MRI and X-Ray scans are the most commonly used modalities for the disease diagnosis. Further, among all the tools and techniques studied, MATLAB and SVM dominated, respectively. It was observed that MRI dataset is widely used by researchers. Also, a series of experiments using MRI dataset has provided a comparative analysis of ML classifiers and DL models where CNN (97.6%) and RF (96.93%) have outperformed other algorithms. This study indicates that there is a need to include denoising techniques with DL models in the healthcare domain. It also concludes that various classical ML and DL techniques are extensively applied to deal with data uncertainty. Due to the superior performance, DL approaches have recently become quite popular among researchers. This review will assist healthcare community, physicians, clinicians and medical practitioners to choose an appropriate ML and DL technique for the diagnosis of disease with reduced time and high accuracy.

Future work will incorporate DL approaches for the diagnosis of all diseases considering noise removal from any given dataset. The additional aspects and properties of DL models for medical images can be explored. To increase the accuracy, enormous amount of data is required, therefore, the potential of the model should be improved to deal with large datasets. Also, different data augmentation techniques along with required features of the dataset can be explored to attain better accuracy.