Next Article in Journal
Experimental Investigation on Hardware and Triggering Effect in Tip-Timing Measurement Uncertainty
Previous Article in Journal
A Crack Segmentation Model Combining Morphological Network and Multiple Loss Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Blunge Calibration Intelligent Feature Classification Model for the Prediction of Hypothyroid Disease

by
Munisamy Shyamala Devi
1,
Venkatesan Dhilip Kumar
1,
Adrian Brezulianu
2,3,*,
Oana Geman
4 and
Muhammad Arif
5
1
Department of CSE, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai 600062, India
2
Faculty of Electronics, Telecommunications and Information Technology, “Gheorghe Asachi” Tehnical University, 700506 Iasi, Romania
3
Greensoft Ltd., 700137 Iasi, Romania
4
The Computers, Electronics and Automation Department, Faculty of Electrical Engineering and Computer Science, “Stefan cel Mare” University of Suceava, 720229 Suceava, Romania
5
Department of Computer Science, Superior University, Lahore 54000, Pakistan
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(3), 1128; https://doi.org/10.3390/s23031128
Submission received: 22 November 2022 / Revised: 13 January 2023 / Accepted: 13 January 2023 / Published: 18 January 2023
(This article belongs to the Section Biomedical Sensors)

Abstract

:
According to the Indian health line report, 12% of the population suffer from abnormal thyroid functioning. The major challenge in this disease is that the existence of hypothyroid may not propagate any noticeable symptoms in its early stages. However, delayed treatment of this disease may lead to several other health problems, such as fertility issues and obesity. Therefore, early treatment is essential for patient survival. The proposed technology could be used for the prediction of hypothyroid disease and its severity during its early stages. Though several classification and regression algorithms are available for the prediction of hypothyroid using clinical information, there exists a gap in knowledge as to whether predicted outcomes may reach a higher accuracy or not. Therefore, the objective of this research is to predict the existence of hypothyroidism with higher accuracy by optimizing the estimator list of the pycaret classifier model. With this overview, a blunge calibration intelligent feature classification model that supports the assessment of the presence of hypothyroidism with high accuracy is proposed. A hypothyroidism dataset containing 3163 patient details with 23 independent and one dependent feature from the University of California Irvine (UCI) machine-learning repository was used for this work. We undertook dataset preprocessing and determined its incomplete values. Exploratory data analysis was performed to analyze all the clinical parameters and the extent to which each feature supports the prediction of hypothyroidism. ANOVA was used to verify the F-statistic values of all attributes that might highly influence the target. Then, hypothyroidism was predicted using various classifier algorithms, and the performance metrics were analyzed. The original dataset was subjected to dimensionality reduction by using regressor and classifier feature-selection algorithms to determine the best subset components for predicting hypothyroidism. The feature-selected subset of the clinical parameters was subjected to various classifier algorithms, and its performance was analyzed. The system was implemented with python in the Spyder editor of Anaconda Navigator IDE. Investigational results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained the accuracy of 89.5% for the regressor feature-selection methods. The blunge calibration regression model (BCRM) was designed with naive Bayes, AdaBoost, and Ridge as the estimators with accuracy optimization and with soft blending based on the sum of predicted probabilities of classifiers. The proposed BCRM showed 99.5% accuracy in predicting hypothyroidism. The implementation results show that the Kernel SVM, KNeighbor, and Ridge classifier maintained an accuracy of 87.5% for the classifier feature-selection methods. The blunge calibration classifier model (BCCM) was developed with Kernel SVM, KNeighbor, and Ridge as the estimators, with accuracy optimization and with soft blending based on the sum of predicted probabilities of classifiers. The proposed BCCM showed 99.7% accuracy in predicting hypothyroidism. The main contribution of this research is the design of BCCM and BCRM models that were built with accuracy optimization with soft blending based on the sum of predicted probabilities of classifiers. The BCRM and BCCM models uniqueness’s are achieved by updating the estimators list with the effective classifiers and regressors that suit the application at runtime.

1. Introduction

For the accurate diagnosis of thyroid illness, functional data from the thyroid gland must be interpreted. Hypothyroidism is a condition where the thyroid gland in the body is unable to secrete thyroid hormone. Women are eight times as likely as males to suffer a thyroid condition. The thyroid condition tends to worsen and persist with ageing and may affect adults differently compared with children. The thyroid gland mainly helps in controlling the body’s metabolism. Globally, thyroid disorders have begun to become more prevalent. For instance, one in eight women in Romania suffers from thyroid cancer. Approximately 30% of Romanians have endemic goiters. A limited diet, the use of drugs, anxiety, sickness, trauma, pollutants, and other elements can all affect thyroid function. Predetermined data sets can be categorized using classification, and this is a crucial supervised learning data-mining approach.

2. Literature Review

We examined, using machine learning, the thyroid data included in UC Irvine’s knowledge discovery archive [1]. Thyroid disease has been classified as a common thyroid dysfunction in the general population. Our findings demonstrate the great accuracy of each of the aforementioned classification models, in which the decision tree model has the highest categorization rate. The infrastructure for creating and evaluating the models was provided by the KNIME analytics platform and Weka, which are two data-mining applications [2]. Classification is commonly used in the healthcare sector to inform business choices, diagnose patients, and provide them with exceptional care [3].
The precise estimation of thyroid gland operational information is critical for thyroid diagnosis. The thyroid gland mainly aids in the control of an individual’s metabolism. The types of thyroid disease are determined by the production of either too little or too much thyroid hormone. Various neural networks have been used in this study to aid in the analysis of thyroid disease [4]. These networks aimed to diagnose thyroid disease by using a new hybrid machine-learning method that includes our classification system. A method for solving this diagnosis problem via classification was obtained by hybridizing AIRS with an advanced fuzzy weighted pre-processing. A cross-validation analysis was used to determine the technique’s soundness for sampling variability [5]. A novel hybrid machine learning approach that incorporates this classification system was used to identify thyroid illness. AIRS and sophisticated fuzzy weighted pre-processing were combined to create a strategy for categorizing this diagnostic issue. By using cross-validation analysis, the technique’s robustness for sampling variability was evaluated [6]. The expansion of scientific knowledge and the massive production of data have resulted in an exponential growth in databases and repositories. One of these rich data domains is the biomedical domain. A large amount of biomedical data is available, ranging from clinical symptom details to various types of biochemical data and imaging device outputs. Mechanically retrieving biological information from images and reshaping them into machine-readable knowledge is a challenging task, because the biomedical domain is vast, dynamic, and complicated. Data mining can improve the quality of biomedical pattern extraction [7].
A backpropagation algorithm is an early method for the detection of thyroid disease. An advanced neural network (ANN) was created using backpropagation of error for prior disease diagnosis. Afterward, this ANN was trained using empirical values, and testing was performed using information that had not been used during the training process [8]. Data collection is an important methodological approach in the field of medical disciplines, because efficient techniques for analyzing and identifying disorders are required. Data mining applications are used in clinical governance, health information technology, and patient care systems. It is also important in determining disease resilience. The popular data mining techniques used to recognize the complex parameters of the nutrition data set are classification and clustering [9].
A novel approach was used for the detection of three types of anomalous red blood cells, known as poikilocytes, that were found in iron-deficient blood smears. The classification and counting of poikilocytes are critical steps in the rapid recognition of iron deficiency anemia disease. The three basic poikilocytes in IDA are dacrocyte, elliptocyte, and schistocyte [10]. High-dimensional biomedical datasets contain thousands of features that are used to diagnose genetic diseases, but their predictive accuracy is affected by numerous irrelevant or weak connection features. While minimizing computation complexity in data mining, feature-selection techniques enable classification models to precisely discover patterns in features and determine a feature vector from an initial set of features. An enhanced shuffled frog-leaping algorithm (ISFLA) is presented in this paper, and it explores the space for potential subsets to choose the set of attributes that maximizes prognostication while minimizing irrelevant attributes in high-dimensional biological data [11]. The latest ANN-based finite impulse response extreme learning machine (FIR-ELM) was used to further analyze the categorization of two binary bioinformatics datasets into leukemia and colon tumor. To investigate the hidden layer of the neural classifier’s FIR-ELM for the smoothing capabilities of feature identification, we performed a time series analysis of the microarray samples. Afterward, we determined how linearly divergent the data patterns in the microarray datasets were [12].
The optimal feature-selection problem, and its authors, describe a coherent analytical foundation that can retrofit successful heuristic criteria, indicating the approximate solutions made by each method [13]. The outcome of a microstructure heart arrhythmia detection system based on electrocardiography, ECG, and signal features was analyzed. These signals came from the MIT/BIH arrhythmia directory. Initially, Hermitian basis functions were used to model the ECG beats. The width parameter—sigma—of HBF was optimized in this step to minimize model error. The extracted features, which contain the model’s boundary conditions, were used as input for the k-nearest neighbor classifier, KNN, to evaluate the model’s effectiveness [14]. Approximately 90% of patients with Parkinson’s disease are predicted to have vocal and speech issues. Vocal folds are often weakened by this infection, causing the patients to have an unnatural voice. In the present study, different samples from the auditory signal of patients with Parkinson’s disease and healthy individuals were gathered. The data classification was then completed using the KNN classification approach based on varied amounts of optimized features after the optimized features that influenced the data classification process were determined using a genetic algorithm [15]. Although thrombolysis reduces impairment and increases survival rates in patients with ischemic stroke, some people continue to suffer detrimental effects. Consequently, it will be beneficial, when making health decisions, to predict how patients with myocardial infarction might react to regional rehabilitation [16].
Straightforward, mathematical assessment criteria need to be established to generate and quantify pragmatic forecasts in cerebral ischemia with data that are readily available post-surgery in the emergency unit. Regression was used to investigate the causes of inferior outcomes in the originating sample of formerly independent people with information systems. The covariate correlations from the computed holistic framework were used to build a scoring model based on integers for each correlation coefficient, and the average of the scores for the criterion was used to obtain the total result [17]. This process aims to offer a self-contained method for improving learning-argumentation frameworks that employ deformation key frames of MR images to aid in the rational frameworks of ischemic stroke diagnosis. Anthropological, physiological, and statistical approaches were gathered from the fragmented tumors to form a feature set that was then further defined using classification techniques. The results of the recommended approach, which accurately designates electromagnetic fields as vascular tumors with a 93.4% accuracy, are significantly superior to those of the classification model [18]. Among many other clinical and imaging parameters, ageing and the severity of a hemorrhage are immediate, precise indicators of the likelihood of SICH and the results of treatment following intravenous infusion therapy [19]. The use of aided technology for stroke could reduce the evaluation period, improve prediction performance, make it simpler to discriminate between different types of ICH, and reduce the chance of human error [20].
One study presents the improvements in learning methods and developments that are in line with the different varieties and manifestations of dyslexia. This study opens with a discussion of cosmic mythology and examines how learning environments that consider student’s skills and requirements can be combined with the appropriate assistive technology to deliver effective e-learning experiences and reliable instructional resources. The Ontology Web Language, a data-handling framework that enables programmers to handle both the substance and the introduction of the data available on the web, was used to generate the metaphysics used in this evaluation [21]. The methodology was designed and implemented to help identify the fundamental problems that may affect students learning to read or write and problems that may then lead to further problems with memory cognition. This strategy was used to assist activists and parents in understanding the issue of dyslexia and to put children on the right path to academic success [22]. Participants, with and without dyslexia, used an online game with language-independent melodic and visual components to communicate in different languages. A total of 178 participants were involved. The analysis revealed nine game measures for Spanish children with and without dyslexia that had significant differences and which could be used in current projects as a justification for speech independent exploration [23]. Quantitative and artificial intelligence-based methods are recommended to instinctually seek innovative and complicated features that consider reliable credentials among dyslexic and control listeners and to support the hypothesis that the majority of differences between dyslexic and talented readers are located on the left side of the brain. Unexpectedly, these devices have also demonstrated how high pass signals carry vital information [24]. Their analysis revealed certain remarkable EEG patterns associated with autism, which is a learning disability with a neurological basis. Although EEG signals contain important information about mental processes, understanding these practices is typically indirect because of their intricate nature. This approach identifies the optimal EEG terminals and brain regions for order and the extraordinary EEG signals produced during writing and composition in adults with dyslexia [25]. The central idea is to begin creating code language for scripting matrices by using the Boolean algebra features of the codes and to present two decryption techniques that enable the identification and retrieval of potential faults or rejection [26]. Dynamic subsamples of ocean climate predictions of surface temperature anomalous outliers in the Tasman Sea were enhanced by the employment of reports of extreme sea-surface temperature that derived from the space station’s geographical position system. The parameters of an extreme value distribution were predicted using regression analysis on the important marine meteorological data in a probabilistic conceptual structure [27]. Additional or standardized nuclear approaches can be employed to overcome the constraints of current investigations into the original sources of seafood. Cross luminescence and carbon isotope analysis have been used to pinpoint the production method and geographic origin of Asian freshwater fish [28].
Security concerns that develop during earthquake activity and during periods when the threat of earthquake activity is at its peak, should always be handled probabilistically [29]. For this study, the two quantifiable methods for estimating the likelihood of seismic behavior to affect important and relatively low- and mid-rise structures are presented. The non-linear and linear systems separately and simultaneously assess the injury concerns of an inclined plane exposed to uncontrollable shaking and atmospheric threats, respectively. These systems are divided into three parts: danger showcase; underpinning delicacy examination; and destruction likelihood processing [30].
Numerous well-known classification methods, such as decision tree, ANN, logistic regression, and naive Bayes were examined for one study. Then, bagging and boosting procedures were created to increase the durability of these frameworks. Additionally, random forest was considered when the investigation was evaluated. The best result of the sickness risk random-forest strategy was employed for classification according to outcomes. Subsequently, a web application for predicting future occurrences was created using this approach. People with higher chance of getting diabetes were included in the diabetes risk class [31]. Heart rate variation information derived from ECG signal data were used for a further investigation. Here, when CNN-LSTM was originally tested with the HRV data, the prediction accuracy was 90.9%. By using CNN-LSTM integration, the accuracy was improved to 95.1%, and by using five-fold cross-validation based on the same data, the efficiency was enhanced to 93.6%. The cross-validation efficiency is the maximum priority currently available for the automatic identification of hypertension [32]. The information was subjected to several machine-learning approaches, and categorization was carried out using a range of strategies, in which regression analysis resulted in the highest accuracy of 96%. With a 98.8% accuracy rate, the AdaBoost classifier was the pipeline’s most appropriate prediction. Two independent datasets were used to compare the accuracy of the machine-learning methods. The algorithm clearly enhanced the diabetes prediction accuracy and precision when utilizing this information compared to previous resources [33].
Additionally, the mellitus dataset was used to evaluate the effectiveness of various suggested deep neural networks and machine learning classification techniques. The other methods had an accuracy that is higher than 90%; for instance, the XGBoost classifier achieved a performance of approximately 100.0% [34]. Both cutting-edge methodologies and some well-known machine learning strategies were contrasted with the DNN algorithm. The suggested technique, which is dependent on the DNN technique, delivered impressive outcomes, with an accuracy of 99.75% and an F1-score of 99.66% [35]. Some papers have been published by authors that report the application of SVM, KNN, or other ML tools in biomedical applications [36,37,38,39,40]. Automated medical diagnostic systems can be easily accessed by the general public, especially by those who cannot afford quality medical care. This methodology essentially combines soft and harsh inputs. A wide range of typical symptoms, including fever, headaches, and cough, were considered soft inputs. Each chosen illness was associated with a range of universal symptoms. Images of the tongue were considered hard inputs because doctors frequently utilize them to identify a variety of illnesses. Hard input analysis was split into two stages: chromatic color analysis and statistical analysis based on texture. After being decoded from the hard and soft inputs, the feature vectors were supplied to a neural network to create a classification mode [41].

3. Research Methodology

A hypothyroidism dataset from the UCI containing 3163 patient details with 23 independent features and one dependent feature (https://archive.ics.uci.edu/ml/datasets/thyroid+disease, accessed on 12 January 2023) was used as shown in Equation (1).
H Y = { [ H 1 , H 2 , H 3 , .. ,   H 23 ] ,   [ D ] }
where H Y represents the hypothyroid dataset.
We undertook dataset preprocessing and determined its incomplete values. The incomplete data were computed for the hypothyroidism dataset by computing the mean of input values for each attribute with Equation (2).
H Y i j = 1 23   i = 1 23   j = 1 3163   v = 1 23 ( H Y i j ) v
Equation (2) expresses the estimation of the null data information and attribute scaling of the vehicle motion dataset with Equation (3).
H Y = 1 23 V = 1 23   ( H Y H Y ) V
where H Y is the complete processed dataset without null values. The imputation deviation of features was measured using the average of the estimated variance within the hypothyroidism dataset as shown in Equation (4).
H Y = 1 23 V = 1 23 ( H Y H Y ) V = 1 23 v = 1 23 ( v a r i a n c e ( H Y H Y ) v )
The imputed dataset was estimated with the interval value “ I n t e r v a l ”of each feature by finding its variance and was estimated using Equation (5).
I n t e r v a l = H Y 1 v 1 v = 1 7 H Y v = 1 23 ( v a r i a n c e ( H Y H Y ) v  
The overall architecture of the work is shown in Figure 1. The following contributions are provided in this work.
The complete processed data including incomplete values that contained the complete variance were estimated using Equation (6) as follows:
F i n a l H Y = H Y + ( v + 1 v ) × I n t e r v a l
An exploratory data analysis was performed to analyze all the clinical parameters and the extent to which each feature supports the prediction of hypothyroidism. The number of parameters, the correlation of all variables, as in the following equation, and the data type of the characteristics as given in Equation (7), were evaluated by subjecting a dataset to exploratory prescriptive data analysis.
c o r r = [ h = 1 123 ( H Y h   H _ Y )   d = 1 1 ( D d   D _ )   h = 1 18 ( H Y h   H _ Y ) 2 d = 1 1 ( D d   D _ ) 2   ]
As stated in Equations (8)–(10), the dataset was divided into training and testing data with an 80:20 ratio. Python script was used for the implementation by using the Spyder platform and Anaconda navigator.
T r a i n ( H Y _ ) = 80   p e r c e n t   o f   ( R a n d ) 2 h y ( H Y h y ) H Y
T e s t ( H Y _ ) = 20   p e r c e n t   o f   ( R a n d ) 2 h y ( H Y h y ) H Y
( R a n d ) 2 = [   h = 1 23 ( H Y h H Y h ) 2 H Y h 1 ]
ANOVA test was carried out to verify the F-statistic values of all features with a PR(>F) <0.05 that highly influence the target. Then, hypothyroidism was predicted using various classifier algorithms, and the performance was analyzed. The original dataset was subjected to normalization in order to make it ready for application of the ANOVA test. This is achieved by using the Box–Cox method from the statistical package of NumPY and pandas. The Box–Cox approach transforms and normalizes the data to handle non-normally distributed data. The results obtained from the Box-Cox method is shown below in Figure 2.
The original dataset was subjected to dimensionality reduction using the regressor and classifier feature-selection algorithms to determine the best subset components for predicting hypothyroidism. The feature-selected subset of the clinical parameters was subjected to various classifier algorithms, and the performance was analyzed using the specified metrics. The implementation was carried out with python in Spyder editor with Anaconda Navigator IDE. Investigational results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained an accuracy of 89.5% for the regressor feature selection methods. The blunge calibration regression model, as shown in Figure 3, was created with naive Bayes, Ada boost, and Ridge as the estimators with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers as shown in Equations (11)–(15).
G u a s s i a n N B = 1 2 π σ h 2   e x p ( ( H Y h M e a n s ) 2 2 σ s 2 )
A d a b o o s t = h = 1 23 ( D d H Y h 23   β ^ ) 2 2 + λ ( 1 α 2   h = 1 23 β ^ h 2 + α h = 1 23 | β ^ h 2 | )
β ^ = a r g m i n [ s = 1 9 | ( D d ) h = 1 23 ( H Y h ) | ]
R i d g e = λ ( 1 α 2   h = 1 23 β ^ h 2 + α h = 1 23 | β ^ h 2 | )
B C R M = E s t i m a t o r { ( G u a s s i a n N B ,   A d a b o o s t ,   R i d g e ) }
The implementation results show that the Kernel SVM classifiers, KNeighbor classifier, and Ridge classifier maintained an accuracy of 87.5% for the classifier feature-selection methods. The blunge calibration classifier model, as shown in the Figure 4, was created with Kernel SVM, KNeighbor, and Ridge as the estimators with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers as shown in Equations (16)–(19).
K N N ( H Y ,   D ) = v = 1 23 ( H Y h D h ) 2
k e r n e l ( H Y ,   H Y ) = e x p o n e n t i a l ( [ H Y V H Y ] 2 2 σ 2 )
k e r n e l S V M ( H Y ) = v = 1 7 × B × k e r n e l ( H Y ,   H Y ) + v e c t o r
B C C M = E s t i m a t o r { ( K N N ,   k e r n e l S V M ,   R i d g e ) }

4. Implementation Setup

The hypothyroid dataset with 3163 rows and 24 feature components from UCI was used for data preprocessing. The dataset information is shown in Figure 5.
Implementation was undertaken with Python under an NVidia Tesla V100 GPU server with 30 training epochs and a batch size of 64. All clinical parameters were analyzed by determining the relationship between each feature and its correlation, as shown in Figure 6.

4.1. Anova Test Analysis

ANOVA was carried out to analyze those attributes of the dataset with PR(>F) < 0.05 that highly influence the target. ANOVA was applied to the dataset features, and the results show that the features (thyroid surgery, pregnant, tumor, lithium) have values of PR(>F) > 0.05 and do not contribute to the target, the results are shown in Table 1.

4.2. Results and Discussion

Hypothyroidism was predicted using various classifier algorithms before and after feature scaling, the performances were analyzed, and the results are shown in Table 2 and Table 3.
The raw dataset was subjected to dimensionality reduction by using AdaBoost, gradient boosting regressor, extra trees, and random forest regressor feature-selection methods, and the feature importance values of each attribute of the hypothyroidism dataset before and after feature scaling are shown in Table 4 and Table 5. The raw dataset was subjected to dimensionality reduction using AdaBoost, gradient boosting, extra trees, and random forest classifier feature-selection methods, and the feature importance values of each attribute of the hypothyroid dataset before and after scaling are shown in Table 6 and Table 7.
A feature importance index of all the regressor and classifier feature-selection methods of the hypothyroid dataset, before and after feature scaling, was also compared, and the results are shown in Table 8.
The feature-selected subset of the AdaBoost regressor was applied to the classifiers, and the performance was analyzed. The results are shown in Table 9 and Table 10.
The feature-selected subset of the gradient boosting regressor was applied to the classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Table 11 and Table 12.
The feature-selected subset of extra trees regressor was applied to the classifiers, the performances before and after scaling were analyzed, and the results are shown in Table 13 and Table 14.
The feature-selected subset of random forest regressor was applied to the classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Table 15 and Table 16.
The performances of all classifiers after reduction with the feature importance of the AdaBoost, gradient boost, extra tree, and random forest regressors before and after feature scaling are shown in Figure 7 and Figure 8.
The feature selected subset of the AdaBoost classifier was applied to the other classifiers, the performances were analyzed, and the results are shown in Table 17 and Table 18.
The feature-selected subset of the gradient boosting classifier was applied to the classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Table 19 and Table 20.
The feature selected subset of the extra trees classifier was applied to the other classifiers, the performances were analyzed, and the results are shown in Table 21 and Table 22.
The feature-selected subset of the random forest classifier was applied to the other classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Table 23 and Table 24.
The performances of all classifiers after reduction with the feature importance of the AdaBoost, gradient boost, extra tree, and random forest classifiers before and after feature scaling are shown in Figure 9 and Figure 10.
The overall dataset was analyzed with the OLS features, such as p value, R squared, adjusted R squared, parameter coefficient, significance, AIC, BIC, standard error, F-statistic, log-likelihood, residual MSE, model MSE, omnibus probability, and JarqueBera probability for all 255 subset combinations of the features. The following subset includes highly significant features based on the p values, and the parameters are listed in Table 25, Table 26, Table 27 and Table 28.
Experimental results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained an accuracy of 89.5% before and after feature scaling for the regressor feature-selection methods. The proposed BCRM was designed with Gaussian naïve Bayes, Ada boost, and Ridge as the estimators and with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers. The proposed BCRM model showed 99.5% accuracy in predicting hypothyroidism. The implementation results show that the Kernel SVM, KNeighbor, and Ridge classifiers maintained an accuracy of 87.5% before and after feature scaling for the classifier feature-selection methods. The BCCM was created with Kernel SVM, KNeighbor, and Ridge as the estimators with accuracy optimization using soft blending based on the sum of the predicted probabilities of classifiers. The proposed BCCM showed 99.7% accuracy in predicting hypothyroidism. The performance analysis of the proposed BCRM was analyzed with the existing classifiers and the results are shown in Table 29 and Figure 11.

5. Conclusions

This paper aimed to predict the existence of hypothyroidism based on an analysis of the features required for classification. The ANOVA test was utilized for the identification of the significant features that predict the target variable. This paper also attempted to apply the regressor and classifier feature-selection algorithms to reduce the dataset with significant features. The dataset was also examined with OLS performance indicators for identification of the best subset of features based on p values. The subset feature [‘TSH_measured’, ‘T4U_measured’] has an R squared value of 0.938, which is close to the ideal value. The implementation was carried out with Python in Spyder editor with the Anaconda Navigator IDE. Experimental results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained an accuracy of 89.5% before and after feature scaling for the regressor feature-selection methods. The MCRM was developed with Gaussian naive Bayes, Ada boost, and Ridge as the estimators, with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers. The proposed BCRM showed 99.5% accuracy in predicting hypothyroidism. The implementation results show that the Kernel SVM, KNeighbor, and Ridge classifiers maintained an accuracy of 87.5% before and after feature scaling for the classifier feature selection methods. The blunge calibration classifier model was developed with Kernel SVM, KNeighbor, and Ridge as the estimators, with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers. The proposed blunge calibration classifier model showed 99.7% accuracy in predicting hypothyroidism. As an overview of novelty, the BCCM and BCRM models were built to optimize accuracy with soft blending based on the sum of predicted probabilities of classifiers. The BCRM and BCCM models uniqueness’s are achieved by updating the estimators list with the effective classifiers and regressors that suit the application at runtime. Despite the outstanding performance of the BCRM and BCCM models, it is still difficult for researchers to adjust the model hyper-parameters by combining them with other optimizers and statistical loss functions.

Author Contributions

Conceptualization, M.S.D. and V.D.K.; methodology, M.S.D., V.D.K. and O.G.; formal analysis, O.G. and A.B.; investigation, O.G. and M.A.; resources, A.B., M.A. and O.G.; data curation, M.A. and A.B.; writing—original draft, M.S.D.; writing—review & editing M.S.D., V.D.K. and O.G.; supervision, V.D.K., O.G. and M.A.; project administration, O.G.; funding acquisition, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Contract No. 2665/07.02.2022 of Greensoft/University of Medicine and Pharmacy “Grigore T. Popa”, Iasi, Romania, project name: Living Lab.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Marimuthu, M.; Hariesh, K.; Madhankumar, K. Heart Disease Prediction using Machine Learning and Data Analytics Approach. Int. J. Comput. Appl. 2018, 181, 20–25. [Google Scholar] [CrossRef]
  2. Huang, Q.-A.; Dong, L.; Wang, L.-F. Cardiotocography Analysis for Fetal State Classification Using Machine Learning Algorithms. J. Micro Electromech. Syst. 2016, 25. [Google Scholar]
  3. Maknouninejad, A.; Woronowicz, K.; Safaee, A. Enhanced Algorithm for Real Time Temperature Rise Prediction of A Traction Linear Induction Motor. In Proceedings of the 2018 IEEE Transportation Electrification Conference and Expo (ITEC), Long Beach, CA, USA, 13–15 June 2018. [Google Scholar]
  4. Lakshmanaprabu, S.K.; Shankar, K.; Khanna, A.; Gupta, D.; Rodrigues, J.J.P.C.; Pinheiro, P.R.; De Albuquerque, V.H.C. Effective Features to Classify Big Data Using Social Internet of Things. IEEE Access 2018, 6, 24196–24204. [Google Scholar] [CrossRef]
  5. Jancovic, P.; Kokuer, M. Bird Species Recognition Using Unsupervised Modeling of Individual Vocalization Elements. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 932–947. [Google Scholar] [CrossRef]
  6. Sethi, P.; Jain, M. Comparative Feature Selection Approach for the Prediction of Healthcare Coverage. Commun. Comput. Inf. Sci. 2010, 54, 392–403. [Google Scholar]
  7. Piri, J.; Mohapatra, P.; Dey, R. Fetal Health Status Classification Using MOGA—CD Based Feature Selection Approach. In Proceedings of the IEEE International Conference on Electronics, Computing and Communication Technologies, Bangalore, India, 2–4 July 2020. [Google Scholar]
  8. Keenan, E.; Udhayakumar, R.; Karmakar, C.; Brownfoot, F.; Palaniswami, M. Entropy Profiling for Detection of Fetal Arrhythmias in Short Length Fetal Heart Rate Recordings. In Proceedings of the International Conference of the IEEE Engineering in Medicine & Biology Society, Montreal, QC, Canada, 20–24 July 2020. [Google Scholar]
  9. Li, J.; Huang, L.; Shen, Z.; Zhang, Y.; Fang, M.; Li, B.; Fu, X.; Zhao, Q.; Wang, H. Automatic Classification of Fetal Heart Rate Based on Convolutional Neural Network. IEEE Internet Things J. 2019, 6, 1394–1401. [Google Scholar] [CrossRef]
  10. Chen, M.; Hao, Y.; Hwang, K.; Wang, L. Disease prediction by machine learning over big data from health care communities. IEEE Access 2017, 5, 8869–8879. [Google Scholar] [CrossRef]
  11. Dahiwade, D.; Patle, G.; Meshram, E. Designing disease prediction model using machine learning approach. In Proceedings of the 3rd International Conference on Computing Methodologies and Communication (ICCMC) (Erode, 2019), Erode, India, 27–29 March 2019; pp. 1211–1215. [Google Scholar]
  12. Razia, S.; Rao, M. Machine Learning Techniques for Thyroid Disease Diagnosis—A Review. Indian J. Sci. Technol. 2016, 9, 28. [Google Scholar] [CrossRef]
  13. Tyagi, A.; Mehra, R.; Saxena, A. Interactive Thyroid Disease Prediction System Using Machine Learning Technique. In Proceedings of the IEEE International Conference on Parallel, Distributed and Grid Computing (PDGC-2018), Solan, India, 20–22 December 2018; pp. 20–22. [Google Scholar]
  14. Liu, D.Y.; Chen, H.L.; Yang, B.L.; Lv, X.-E.; Li, L.-N.; Liu, J. Design of an Enhanced Fuzzy k-nearest Neighbor Classifier Based Computer Aided Diagnostic System for Thyroid Disease. J. Med. Syst. 2011, 36, 3243–3254. [Google Scholar] [CrossRef]
  15. Shankar KLakshmanaprabu, S.K.; Gupta, D. Optimal feature-based multi-kernel SVM approach for thyroid disease classification. J. Supercomput. 2020, 76, 1128–1143. [Google Scholar] [CrossRef]
  16. Cheng, C.A.; Lin, Y.; Chiu, H.W. Prediction of the prognosis of ischemic stroke patients after intravenous thrombolysis using artificial neural networks. Stud. Health Technol. Inform. 2014, 202, 115–118. [Google Scholar]
  17. Ntaios, G.; Faouzi, M.; Ferrari, W.; Lang, J.; Vemmos, K.; Michel, P. An integer-based score to predict functional outcome in acute ischemic stroke: The ASTRAL score. Neurology 2012, 78, 1916–1922. [Google Scholar] [CrossRef]
  18. Subudhi, A.; Dash, M.; Sabut, S. Automated segmentation and classification of brain stroke using expectation-maximization and random forest classifier. Biocybern. Biomed. Eng. 2020, 40, 277–289. [Google Scholar]
  19. Jardine, A.K.S.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
  20. Lee, H.J.; Lee, J.; Choi, J.; Cho, Y.; Kim, B.; Bae, H.; Kim, D.; Ryu, W.; Cha, J.; Kim, D. Simple estimates of symptomatic intracranial hemorrhage risk and outcome after intravenous thrombolysis using age and stroke severity. J. Stroke 2017, 19, 229–231. [Google Scholar] [CrossRef] [Green Version]
  21. Dogan, S.; Barua, P.D.; Baygin, M.; Chakraborty, S.; Ciaccio, E.; Tuncer, T.; Kadir, K.A.; Shah, M.N.M.; Azman, R.R.; Lee, C.; et al. Novel multiple pooling and local phase quantization stable feature extraction techniques for automated classification of brain infarcts. Biocybern. Biomed. Eng. 2022, 42, 888–901. [Google Scholar] [CrossRef]
  22. Alsobhi, A.Y.; Khan, N.; Rahanu, H. Personalised learning materials based on dyslexia types: Ontological approach. Proc. Comput. Sci. 2015, 60, 113–121. [Google Scholar] [CrossRef] [Green Version]
  23. Al-Barhamtoshy, H.M.; Motaweh, D.M. Diagnosis of dyslexia using computation analysis. In Proceedings of the 2017 International Conference on Informatics, Health & Technology (ICIHT), Riyadh, Saudi Arabia, 21–23 February 2017; pp. 1–7. [Google Scholar]
  24. Rauschenberger, M.; Rello, L.; Baeza-Yates, R.; Bigham, J.P. Towards language independent detection of dyslexia with a web-basedgame. In Proceedings of the 15th International Web for All Conference, Lyon, France, 23–25 April 2018; pp. 1–10. [Google Scholar]
  25. Frid, A.; Manevitz, L.M. Features and machine learning for correlating and classifying between brain areas and dyslexia. arXiv 2018, arXiv:1812.10622. [Google Scholar]
  26. Perera, H.; Shiratuddin, M.F.; Wong, K.W.; Fullarton, K. Eeg signal analysis of writing and typing between adults with dyslexia and normal controls. Int. J. Interact. Multimed. Artif. Intell. 2018, 5, 62. [Google Scholar] [CrossRef] [Green Version]
  27. Oliver, E.C.J.; Wotherspoon, S.J.; Chamberlain, M.A.; Holbrook, N.J. Projected Tasman Sea extremes in sea surface temperature through the twenty-first century. J. Clim. 2014, 27, 1980–1998. [Google Scholar] [CrossRef] [Green Version]
  28. Gopi, K.; Mazumder, D.; Sammut, J.; Saintilan, N.; Crawford, J.; Gadd, P. Isotopic and elemental profiling to trace the geographic origins of farmed and wild-caught Asian seabass. Aquaculture 2019, 502, 56–62. [Google Scholar] [CrossRef]
  29. Yucemen, M.S.; Askan, A. Estimation of Earthquake Damage Probabilities for Reinforced Concrete Buildings. In Seismic Assessment and Rehabilitation of Existing Buildings; NATO Science Series; Springer: Berlin/Heidelberg, Germany, 2003; Volume 29, pp. 149–164. [Google Scholar]
  30. Zheng, X.-W.; Li, H.-N.; Yang, Y.-B.; Li, G.; Huo, L.-S.; Liu, Y. Damage risk assessment of a high-rise building against multihazard of earthquake and strong wind with recorded data. Eng. Struct. 2019, 200, 1096971. [Google Scholar] [CrossRef]
  31. Nai-arun, N.; Moungmai, R. Comparison of Classifiers for the Risk of Diabetes Prediction. Procedia Comput. Sci. 2015, 69, 132–142. [Google Scholar] [CrossRef] [Green Version]
  32. Swapna, G.; Soman, K.P.; Vinayakumar, R. Automated detection of diabetes using CNN and CNN-LSTM network and heart rate signals. Procedia Comput. Sci. 2018, 132, 1253–1262. [Google Scholar]
  33. Mujumdara, A.; Vaidehi, V. Diabetes Prediction using Machine Learning Algorithm. Procedia Comput. Sci. 2019, 165, 292–299. [Google Scholar] [CrossRef]
  34. Refat, M.A.; Al Amin, M.; Kaushal, C.; Yeasmin, M.N.; Islam, M.K. A Comparative Analysis of Early-Stage Diabetes Prediction using Machine Learning and Deep Learning Approach. In Proceedings of the 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9 October 2021. [Google Scholar] [CrossRef]
  35. Beghriche, T.; Djerioui, M.; Brik, Y.; Attallah, B.; Belhaouari, S.B. An Efficient Prediction System for Diabetes Disease Based on Deep Neural Network. Complexity 2021, 2021, 6053824. [Google Scholar] [CrossRef]
  36. Mahesh, T.R.; Kumar, V.D.; Kumar, V.V.; Asghar, J.; Geman, O.; Arulkumaran, G.; Arun, N. AdaBoost Ensemble Methods Using K-Fold Cross Validation for Survivability with the Early Detection of Heart Disease. Comput. Intell. Neurosci. 2022, 2022, 9005278. [Google Scholar] [CrossRef]
  37. Geman, O.; Chiuchisan, I.; Ungurean, I.; Hagan, M.; Arif, M. Ubiquitous healthcare system based on the sensors network and android internet of things gateway. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 8–12 October 2018; pp. 1390–13958. [Google Scholar]
  38. Arif, M.; Philip, F.; Ajesh, F.; Izdrui, D.; Craciun, M.D.; Geman, O. Automated Detection of Nonmelanoma Skin Cancer Based on Deep Convolutional Neural Network. J. Healthc. Eng. 2022, 2022, 6952304. [Google Scholar] [CrossRef]
  39. Munishamaiaha, K.; Rajagopal, G.; Venkatesan, D.K.; Arif, M.; Vicoveanu, D.; Chiuchisan, I.; Izdrui, D.; Geman, O. Robust Spatial–Spectral Squeeze–Excitation AdaBound Dense Network (SE-AB-Densenet) for Hyperspectral Image Classification. Sensors 2022, 22, 3229. [Google Scholar] [CrossRef]
  40. Dai, Y.; Wang, G.; Dai, J.; Geman, O. A multimodal deep architecture for traditional Chinese medicine diagnosis. Concurr. Comput. Pract. Exp. 2020, 32, e5781. [Google Scholar] [CrossRef]
  41. Ramamurthy, K.; Menaka, R.; Kulkarni, S.; Deshpande, R. Virtual doctor: An artificial medical diagnostic system based on hard and soft inputs. Int. J. Biomed. Eng. Technol. 2014, 16, 329. [Google Scholar] [CrossRef]
Figure 1. Proposed system workflow.
Figure 1. Proposed system workflow.
Sensors 23 01128 g001
Figure 2. Normalization of the hypothyroidism dataset.
Figure 2. Normalization of the hypothyroidism dataset.
Sensors 23 01128 g002
Figure 3. Blunge calibration classifier model workflow.
Figure 3. Blunge calibration classifier model workflow.
Sensors 23 01128 g003
Figure 4. Blunge calibration regressor model workflow.
Figure 4. Blunge calibration regressor model workflow.
Sensors 23 01128 g004
Figure 5. Statistical information and correlation matrix of the dataset.
Figure 5. Statistical information and correlation matrix of the dataset.
Sensors 23 01128 g005
Figure 6. Density plot and target distribution of the hypothyroidism dataset.
Figure 6. Density plot and target distribution of the hypothyroidism dataset.
Sensors 23 01128 g006
Figure 7. Regressor feature importance performance of all classifiers before scaling.
Figure 7. Regressor feature importance performance of all classifiers before scaling.
Sensors 23 01128 g007
Figure 8. Regressor feature importance performance of all classifiers after scaling.
Figure 8. Regressor feature importance performance of all classifiers after scaling.
Sensors 23 01128 g008
Figure 9. Classifier feature importance performance of all classifiers before scaling.
Figure 9. Classifier feature importance performance of all classifiers before scaling.
Sensors 23 01128 g009
Figure 10. Classifier feature importance performance of all classifiers after scaling.
Figure 10. Classifier feature importance performance of all classifiers after scaling.
Sensors 23 01128 g010
Figure 11. Performance of Proposed BCRM and BCCM with existing classifiers.
Figure 11. Performance of Proposed BCRM and BCCM with existing classifiers.
Sensors 23 01128 g011
Table 1. Attribute analysis with the ANOVA test.
Table 1. Attribute analysis with the ANOVA test.
Featuressum_sqdfF-StatisticPR(>F)
Age4.105155.13391.44 × 10−13
Sex2.127128.34211.08 × 10−7
on_thyroxine0.920112.19860.000485
query_on_thyroxine0.23813.14940.076051
on_antithyroid_medication0.46716.17980.012973
thyroid_surgery0.02410.32830.566654
query_hypothyroid0.43915.80590.016029
query_hyperthyroid2.556134.11245.72 × 10−9
pregnant0.00610.00840.926861
sick0.27813.68210.052087
tumor0.04710.62410.429556
lithium0.01310.17980.6715
goitre2.246129.93074.82 × 10−8
TSH_measured117.41813041.19750.000045
TSH0.03310.44460.050492
T3_measured72.0381136.10745.89 × 10−248
T30.00810.01110.005934
TT4_measured223.546144,395.610.00043
TT40.0003610.00470.00450
T4U_measured224.534147,542.780.00053
T4U0.0108710.143490.00485
FTI_measured225.5303151,167.190.00034
FTI0.003610.04820.00049
Table 2. Classification metrics before feature scaling.
Table 2. Classification metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8342850.8342610.8341920.834261
KNeighbors classifier0.8405210.8405210.8405210.840521
Kernel SVM classifier0.8511740.9225910.8854450.922591
Gaussian naive Bayes0.8342850.8342610.8341920.834261
Decision tree classifier0.8460430.8461010.8460640.846101
Extra tree classifier0.8342850.8342610.8341920.834261
Random forest classifier0.8342850.8342610.8341920.834261
Gradient boosting classifier0.8460430.8461010.8460640.846101
AdaBoost classifier0.843630.8436810.843620.843681
Ridge classifier0.8342850.8342610.8341920.834261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8342850.8342610.8341920.834261
Table 3. Classification metrics after feature scaling.
Table 3. Classification metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8472850.8472610.8471920.847261
KNeighbors classifier0.8460430.8461010.8460640.846101
Kernel SVM classifier0.8511740.9225910.8854450.922591
Gaussian baive Bayes0.8472850.8472610.8471920.847261
Decision tree classifier0.8342850.8342610.8341920.834261
Extra tree classifier0.8176550.8173620.8174770.817362
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8405210.8405210.8405210.840521
Ridge classifier0.8472850.8472610.8471920.847261
Ridge classifierCV0.8472850.8472610.8471920.847261
SGD classifier0.8472850.8472610.8471920.847261
Table 4. Regressor feature importance values of each feature before feature scaling.
Table 4. Regressor feature importance values of each feature before feature scaling.
IndexClassifiersAdaBoost RegressorGradient Boosting RegressorExtra Trees RegressorRandom Forest Regressor
0.Age0.0004283410.0030814060.0100409520.007403365
1.Sex0.0086220324.39 × 10−50.0022549280.000698556
2.on_thyroxine08.62 × 10−60.0008043810.000134897
3.query_on_thyroxine02.19 × 10−50.0006070230.00022945
4.on_antithyroid_medication0000
5.thyroid_surgery001.29 × 10−50
6.query_hypothyroid000.0001373270
7.query_hyperthyroid00.0001802980.0012585130.00083488
8.pregnant01.03 × 10−2000
9.sick005.18 × 10−50
10.tumor006.24 × 10−70
11.lithium0000
12.goitre01.87 × 10−50.003369610.000819246
13.TSH_measured0.1975019220.0062466660.0012433910.002119903
14.TSH0.0776190940.0016506850.0024268620.002049851
15.T3_measured0.0173067780.0003654210.0010736020.00051231
16.T300.0006148730.0021197240.001266411
17.TT4_measured0.00412645805.15 × 10−50.056711358
18.TT40.0531742220.0138848970.0103586890.01558167
19.T4U_measured0000.113416878
20.T4U0.0802533710.0072957230.011257850.011729611
21.FTI_measured0.5050447740.9579306030.9436352150.774966334
22.FTI0.0559230070.0086563680.0092951570.011525281
Table 5. Regressor Feature Importance Values of Each Features after Feature Scaling.
Table 5. Regressor Feature Importance Values of Each Features after Feature Scaling.
FeaturesAdaBoostGradientBoostingExtraTreesRandomForest
Age0.0183968250.0033500510.0104653580.007502654
Sex0.0068475984.39 × 10−50.00226960.000651238
on_thyroxine02.43 × 10−50.0006707220
query_on_thyroxine02.19 × 10−50.0004534890.000176286
on_antithyroid_medication0000
thyroid_surgery005.32 × 10−50
query_hypothyroid01.25 × 10−60.0001342260
query_hyperthyroid00.0001802980.0011342480.00070627
pregnant0000
sick007.78 × 10−50
tumor003.07 × 10−50
lithium0000
goitre01.87 × 10−50.0038848170.000984925
TSH_measured0.1308487380.0051191830.001411530.002186828
TSH0.1401851470.0005086190.0022277750.002164946
T3_measured0.0165449390.0018845640.0011132770.000384264
T30.0195466641.93 × 10−50.0016945190.000764557
TT4_measured0.002640564000.0375457
TT40.0791905460.0151187640.010541950.014953169
T4U_measured0000.170098636
T4U0.1314095660.0070219850.0107568140.012046173
FTI_measured0.4371626590.9579306030.9436352150.735679166
FTI0.0172267540.0087566290.0094447970.014155188
Table 6. Classifier feature importance values of each feature before feature scaling.
Table 6. Classifier feature importance values of each feature before feature scaling.
FeaturesAdaBoostGradient BoostingExtra TreesRandom Forest
Age0.30.002762370.0104038750.009334997
Sex0.046.31 × 10−50.0031677780.001804309
on_thyroxine0.028.98 × 10−50.0023134040.001197477
query_on_thyroxine0.029.65 × 10−60.0014414070.000280689
on_antithyroid_medication0−3.89 × 10−210.0001359513.81 × 10−5
thyroid_surgery000.0001584099.99 × 10−5
query_hypothyroid0.024.13 × 10−200.0002746980.000158845
query_hyperthyroid0.020.0001429740.0027324080.001816944
pregnant01.67 × 10−214.77 × 10−50.000171466
sick008.79 × 10−54.88 × 10−5
tumor000.0007881750.000713225
lithium0000
goitre0−1.70 × 10−180.0032669470.001503002
TSH_measured00.0028690760.0774582790.048169136
TSH0.10.0010609640.0030825010.046241881
T3_measured0.040.0003828420.0329099320.05790805
T30.025.52 × 10−60.0035736370.004400236
TT4_measured02.45 × 10−60.2240229870.16221615
TT40.160.0117388930.0132817680.050282253
T4U_measured03.85 × 10−60.256109960.301724198
T4U0.120.0088728110.0137953670.045971388
FTI_measured0.020.9622673010.3374925460.219637329
FTI0.120.0097283440.0134543010.046281594
Table 7. Classifier feature importance values of each feature after feature scaling.
Table 7. Classifier feature importance values of each feature after feature scaling.
FeaturesAdaBoostGradient BoostingExtra TreesRandom Forest
Age0.30.0024534860.0095099370.008279074
Sex0.044.82 × 10−50.003241650.001846886
on_thyroxine0.023.51 × 10−50.002958890.000712379
query_on_thyroxine0.027.33 × 10−60.0008001670.000653312
on_antithyroid_medication000.0002855020.000107717
thyroid_surgery000.0001003574.84 × 10−5
query_hypothyroid0.02−3.89 × 10−210.0002454599.70 × 10−5
query_hyperthyroid0.020.0001380310.0029827020.001717612
pregnant01.07 × 10−204.60 × 10−57.20 × 10−5
sick006.70 × 10−54.23 × 10−6
tumor01.54 × 10−180.000714820.000571423
lithium0000
goitre0−1.70 × 10−180.0025050620.000921635
TSH_measured00.0030784280.0892118860.078112108
TSH0.10.0009780360.0036744230.031291888
T3_measured0.040.0005750780.0437372810.037562216
T30.029.39 × 10−60.0033548080.009181534
TT4_measured07.58 × 10−60.18201230.205768548
TT40.160.0099392120.0141369870.054038719
T4U_measured000.3551908550.242828144
T4U0.120.0080356370.0138615210.035851905
FTI_measured0.020.9627093230.2591463960.21946496
FTI0.120.0119851670.0122160280.070868256
Table 8. Feature importance index of regressor and classifier methods.
Table 8. Feature importance index of regressor and classifier methods.
ClassifiersBefore Feature ScalingAfter Feature Scaling
AdaBoost Regressor13, 14, 18, 20, 21,220, 13, 14, 18, 20, 21
GradientBoostingRegressor0, 13, 18, 20, 21, 220, 13, 18, 20, 21, 22
ExtraTrees Regressor0, 12, 18, 20, 21, 220, 12, 18, 20, 21, 22
RandomForest Regressor17, 18, 19, 20, 21, 2217, 18, 19, 20, 21, 22
AdaBoost Classifier0, 1, 14, 18, 20, 220, 1, 14, 18, 20, 22
GradientBoosting Classifier0, 13, 18, 20, 21, 220, 13, 18, 20, 21, 22
ExtraTrees Classifier13, 15, 17, 18, 19, 2113, 15, 17, 18, 19, 21
RandomForest Classifier13, 15, 17, 18, 19, 2113, 15, 17, 18, 19, 21
Table 9. AdaBoost regressor metrics before feature scaling.
Table 9. AdaBoost regressor metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8460430.8461010.8460640.846101
KNeighbors classifier0.8511740.8225910.8854450.822591
Kernel SVM classifier0.8472850.8472610.8471920.847261
Gaussian naive Bayes0.8952850.8952610.8951920.895261
Decision tree classifier0.8460430.8461010.8460640.846101
Extra tree classifier0.8511740.8225910.8854450.822591
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8460430.8461010.8460640.846101
SGD classifier0.8342850.8342610.8341920.834261
Passive aggressive0.8460430.8461010.8460640.846101
Bagging classifier0.8342850.8342610.8341920.834261
Table 10. AdaBoost regressor metrics after feature scaling.
Table 10. AdaBoost regressor metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8460430.8461010.8460640.846101
KNeighbors classifier0.8472850.8472610.8471920.847261
Kernel SVM classifier0.8342850.8342610.8341920.834261
Gaussian naive Bayes0.8952850.8952610.8951920.895261
Decision tree classifier0.8460430.8461010.8460640.846101
Extra tree classifier0.8511740.9225910.8854450.882591
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8460430.8461010.8460640.846101
Passive aggressive0.8472850.8472610.8471920.847261
Bagging classifier0.8342850.8342610.8341920.834261
Table 11. Gradient boosting regressor metrics before feature scaling.
Table 11. Gradient boosting regressor metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8342850.8342610.8341920.834261
KNeighbors classifier0.8460430.8461010.8460640.846101
Kernel SVM classifier0.8472850.8472610.8471920.847261
Gaussian naive Bayes0.8952850.8952610.8951920.895261
Decision tree classifier0.8460430.8461010.8460640.846101
Extra tree classifier0.8511740.8225910.8854450.822591
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8460430.8461010.8460640.846101
Passive aggressive classifier0.8472850.8472610.8471920.847261
Bagging classifier0.8342850.8342610.8341920.834261
Table 12. Gradient boosting regressor metrics after feature scaling.
Table 12. Gradient boosting regressor metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8472850.8472610.8471920.847261
KNeighbors classifier0.8472850.8472610.8471920.847261
Kernel SVM classifier0.8342850.8342610.8341920.834261
Gaussian naive Bayes0.8952850.8952610.8951920.895261
Decision tree0.8460430.8461010.8460640.846101
Extra tree classifier0.8511740.8225910.8854450.822591
Random forest classifier0.8472850.8472610.8471920.847261
GBoosting0.8342850.8342610.8341920.834261
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8472850.8472610.8471920.847261
Passive Aggressive classifier0.8342850.8342610.8341920.834261
Bagging classifier0.8342850.8342610.8341920.834261
Table 13. Extra trees regressor metrics before feature scaling.
Table 13. Extra trees regressor metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8342850.8342610.8341920.834261
KNeighbors classifier0.8460430.8461010.8460640.846101
Kernel SVM classifier0.8511740.8225910.8854450.822591
Gaussian naive Bayes0.8952850.8952610.8951920.895261
Decision tree classifier0.8472850.8472610.8471920.847261
Extra tree classifier0.8472850.8472610.8471920.847261
Random forest classifier0.8342850.8342610.8341920.834261
Gradient boosting classifier0.8472850.8472610.8471920.847261
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8472850.8472610.8471920.847261
SGD classifier0.8342850.8342610.8341920.834261
Passive Aggressive classifier0.8472850.8472610.8471920.847261
Bagging classifier0.8342850.8342610.8341920.834261
Table 14. Extra trees regressor metrics after feature scaling.
Table 14. Extra trees regressor metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8460430.8461010.8460640.846101
KNeighbors classifier0.8511740.9225910.8854450.922591
Kernel SVM classifier0.8472850.8472610.8471920.847261
Gaussian naïve Bayes0.8952850.8952610.8951920.895261
Decision tree classifier0.8460430.8461010.8460640.846101
Extra tree classifier0.8511740.9225910.8854450.922591
Random Forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8460430.8461010.8460640.846101
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8460430.8461010.8460640.846101
SGD classifier0.8511740.9225910.8854450.922591
Passive aggressive classifier0.8472850.8472610.8471920.847261
Bagging classifier0.8342850.8342610.8341920.834261
Table 15. Random forest regressor metrics before feature scaling.
Table 15. Random forest regressor metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8472850.8472610.8471920.847261
KNeighbors classifier0.8342850.8342610.8341920.834261
Kernel SVM classifier0.8511740.9225910.8854450.922591
Gaussian naive Bayes0.8952850.8952610.8951920.895261
Decision tree classifier0.8472850.8472610.8471920.847261
Extra tree Classifier0.8342850.8342610.8341920.834261
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8342850.8342610.8341920.834261
Passive aggressive classifier0.8472850.8472610.8471920.847261
Bagging classifier0.8342850.8342610.8341920.834261
Table 16. Random forest regressor metrics after feature scaling.
Table 16. Random forest regressor metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8460430.8461010.8460640.846101
KNeighbors classifier0.8511740.8225910.8854450.822591
Kernel SVM classifier0.8472850.8472610.8471920.847261
Gaussian naive Bayes0.8952850.8952610.8951920.895261
Decision tree classifier0.8460430.8461010.8460640.846101
Extra tree classifier0.8511740.8225910.8854450.822591
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8952850.8952610.8951920.895261
Ridge classifier0.8952850.8952610.8951920.895261
Ridge classifierCV0.8472850.8472610.8471920.847261
SGD classifier0.8511740.8225910.8854450.822591
Passive aggressive classifier0.8472850.8472610.8471920.847261
Bagging classifier0.8342850.8342610.8341920.834261
Table 17. AdaBoost classifier metrics before feature scaling.
Table 17. AdaBoost classifier metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8511740.8225910.8854450.822591
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8472850.8472610.8471920.847261
Decision tree classifier0.8511740.8225910.8854450.822591
Extra tree classifier0.8472850.8472610.8471920.847261
Random forest classifier0.8342850.8342610.8341920.834261
Gradient boosting classifier0.8511740.8225910.8854450.822591
AdaBoost classifier0.8472850.8472610.8471920.847261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8472850.8472610.8471920.847261
Passive aggressive classifier0.8342850.8342610.8341920.834261
Bagging classifier0.8511740.8225910.8854450.822591
Table 18. AdaBoost classifier metrics after feature scaling.
Table 18. AdaBoost classifier metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8342850.8342610.8341920.834261
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8342850.8342610.8341920.834261
Decision tree classifier0.8342850.8342610.8341920.834261
Extra tree classifier0.8472850.8472610.8471920.847261
Random forest classifier0.8342850.8342610.8341920.834261
Gradient boosting classifier0.8472850.8472610.8471920.847261
AdaBoost classifier0.8472850.8472610.8471920.847261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8511740.8225910.8854450.822591
SGD classifier0.8342850.8342610.8341920.834261
Passive aggressive classifier0.8472850.8472610.8471920.847261
Bagging classifier0.8342850.8342610.8341920.834261
Table 19. Gradient boosting classifier metrics before feature scaling.
Table 19. Gradient boosting classifier metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8511740.8225910.8854450.822591
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8472850.8472610.8471920.847261
Decision tree0.8342850.8342610.8341920.834261
Extra tree0.8472850.8472610.8471920.847261
Random forest0.8342850.8342610.8341920.834261
Gradient boosting0.8472850.8472610.8471920.847261
AdaBoost classifier0.8472850.8472610.8471920.847261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8511740.9225910.8854450.922591
SGD classifier0.8472850.8472610.8471920.847261
Passive aggressive classifier0.8342850.8342610.8341920.834261
Bagging classifier0.8472850.8472610.8471920.847261
Table 20. Gradient boosting classifier metrics after feature scaling.
Table 20. Gradient boosting classifier metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8342850.8342610.8341920.834261
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8342850.8342610.8341920.834261
Decision tree classifier0.8511740.8225910.8854450.822591
Extra tree classifier0.8342850.8342610.8341920.834261
Random forest classifier0.8511740.8225910.8854450.822591
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8342850.8342610.8341920.834261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8511740.9225910.8854450.922591
SGD classifier0.8511740.8225910.8854450.822591
Passive aggressive classifier0.8545410.8957350.874040.895735
Bagging classifier0.8342850.8342610.8341920.834261
Table 21. Extra trees classifier metrics before feature scaling.
Table 21. Extra trees classifier metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8342850.8342610.8341920.834261
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8342850.8342610.8341920.834261
Decision tree classifier0.8342850.8342610.8341920.834261
Extra tree classifier0.8511740.8225910.8854450.822591
Random forest classifier0.8342850.8342610.8341920.834261
Gradient boosting classifier0.8511740.8225910.8854450.822591
AdaBoost classifier0.8342850.8342610.8341920.834261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8511740.8225910.8854450.822591
Passive aggressive0.8342850.8342610.8341920.834261
Bagging classifier0.8472850.8472610.8471920.847261
Table 22. Extra trees classifier metrics after feature scaling.
Table 22. Extra trees classifier metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8342850.8342610.8341920.834261
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8972850.8972610.8971920.897261
Decision Tree classifier0.8511740.8225910.8854450.822591
Extra tree classifier0.8342850.8342610.8341920.834261
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8342850.8342610.8341920.834261
AdaBoost classifier0.8342850.8342610.8341920.834261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8511740.8225910.8854450.822591
SGD classifier0.8342850.8342610.8341920.834261
Passive aggressive classifier0.8511740.8225910.8854450.822591
Bagging classifier0.8342850.8342610.8341920.834261
Table 23. Random forest classifier metrics before feature scaling.
Table 23. Random forest classifier metrics before feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8511740.8225910.8854450.822591
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8972850.8972610.8971920.897261
Decision tree classifier0.8511740.8225910.8854450.822591
Extra tree classifier0.8342850.8342610.8341920.834261
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8311740.8325910.8354450.832591
AdaBoost classifier0.8342850.8342610.8341920.834261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8342850.8342610.8341920.834261
SGD classifier0.8472850.8472610.8471920.847261
Passive aggressive classifier0.8311740.8325910.8354450.832591
Bagging classifier0.8472850.8472610.8471920.847261
Table 24. Random forest classifier metrics after feature scaling.
Table 24. Random forest classifier metrics after feature scaling.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8472850.8472610.8471920.847261
KNeighbors classifier0.8752850.8772610.8771920.875261
Kernel SVM classifier0.8752850.8772610.8771920.875261
Gaussian naive Bayes0.8342850.8342610.8341920.834261
Decision tree classifier0.8511740.8225910.8854450.822591
Extra tree classifier0.8342850.8342610.8341920.834261
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting classifier0.8311740.8325910.8354450.832591
AdaBoost classifier0.8342850.8342610.8341920.834261
Ridge classifier0.8752850.8772610.8771920.875261
Ridge classifierCV0.8511740.8225910.8854450.822591
SGD classifier0.8342850.8342610.8341920.834261
Passive aggressive0.8472850.8472610.8471920.847261
Bagging classifier0.8311740.8325910.8354450.832591
Table 25. OLS features of the significant subset attributes of the hypothyroidism dataset.
Table 25. OLS features of the significant subset attributes of the hypothyroidism dataset.
S.NoAttributesR SquaredAdjusted R SquaredParameter Coefficient
1.[‘TSH_measured’]0.4903420.490181[−0.1926723]
2.[‘TSH_measured’ ‘TSH’ ‘TT4_measured’]0.9349660.934904[−0.01378328 0.00334404 −0.25622659]
3.[‘TSH_measured’ ‘TSH’ ‘T4U_measured’]0.9390860.939028[−0.01372514 0.00334439 −0.25687507]
4.[‘TSH_measured’ ‘T3_measured’]0.5080190.507707[−0.16245081 −0.04745117]
5.[‘TSH_measured’ ‘T3_measured’ ‘T3’]0.5087730.508306[−0.16300138 −0.04710052 −0.00756629]
6.[‘TSH_measured’ ‘T3’]0.4913780.491056[−0.19305575 −0.00886611]
7.[‘TSH_measured’ ‘T3’ ‘T4U’]0.4921640.491682[−0.19327524 −0.01212442 0.00837232]
8.[‘TSH_measured’ ‘TT4_measured’]0.9348180.934777[−0.01378472 −0.25622453]
9.[‘TSH_measured’ ‘T4U_measured’]0.9389380.9389[−0.01372657 −0.25687302]
10.[‘TSH’ ‘T3_measured’ ‘TT4_measured’]0.9341990.934136[0.00338423 −0.00749316 −0.26174326]
11.[‘TSH’ ‘T3_measured’ ‘T4U_measured’]0.9383230.938265[0.0033845 −0.00747897 −0.26234687]
12.[‘TSH’ ‘TT4_measured’]0.933680.933638[0.00334707 −0.26584963]
13.[‘TSH’ ‘T4U_measured’]0.9378060.937766[0.00334741 −0.26643643]
14.[‘T3_measured’]0.3008350.300614[−0.15091554]
15.[‘T3_measured’ ‘TT4_measured’]0.9340470.934006[−0.00746918 −0.26175534]
16.[‘T3_measured’ ‘TT4’]0.3021350.301694[−0.15159006 −0.00994458]
17.[‘T3_measured’ ‘T4U_measured’]0.9381720.938133[−0.00745503 −0.26235889]
18.[‘TT4_measured’]0.9335320.933511[−0.26584857]
19.[‘T4U_measured’]0.9376580.937638[−0.26643537]
Table 26. OLS features of the significant subset attributes of the hypothyroid dataset.
Table 26. OLS features of the significant subset attributes of the hypothyroid dataset.
S.NoAttributesp ValuesAICBIC
1.[‘TSH_measured’][0.]−1315.02−1302.9
2.[‘TSH_measured’ ‘TSH’ ‘TT4_measured’][3.69199042 × 10−15 7.43156772 × 10−3 0.00000000]−7823.1−7798.86
3.[‘TSH_measured’ ‘TSH’ ‘T4U_measured’][5.19971275 × 10−16 5.67318032 × 10−3 0.00000000]−8030.12−8005.88
4.[‘TSH_measured’ ‘T3_measured’][1.81373646 × 10−243 4.53106996 × 10−26]−1424.67−1406.49
5.[‘TSH_measured’ ‘T3_measured’ ‘T3’][1.93109194 × 10−244 1.02662783 × 10−25 2.77571221 × 10−2]−1427.52−1403.28
6.[‘TSH_measured’ ‘T3’][0.         0.01121299]−1319.46−1301.28
7.[‘TSH_measured’ ‘T3’ ‘T4U’][0.         0.00139307 0.02711176]−1322.35−1298.11
8.[‘TSH_measured’ ‘TT4_measured’][3.89722887 × 10−15 0.00000000]−7817.92−7799.74
9.[‘TSH_measured’ ‘T4U_measured’][5.53480256 × 10−16 0.00000000]−8024.46−8006.28
10.[‘TSH’ ‘T3_measured’ ‘TT4_measured’][7.07863933 × 10−3 6.32943158 × 10−7 0.00000000]−7786−7761.76
11.[‘TSH’ ‘T3_measured’ ‘T4U_measured’][5.40529071 × 10−3 2.75925083 × 10−7 0.00000000]−7990.75−7966.52
12.[‘TSH’ ‘TT4_measured’][0.00796318 0.         ]−7763.15−7744.97
13.[‘TSH’ ‘T4U_measured’][0.00613635 0.         ]−7966.31−7948.13
14.[‘T3_measured’][5.89765862 × 10−248]−315.046−302.927
15.[‘T3_measured’ ‘TT4_measured’][7.04200951 × 10−7 0.00000000]−7780.73−7762.56
16.[‘T3_measured’ ‘TT4’][3.72603592 × 10−249 1.53025228 × 10−2]−318.934−300.756
17.[‘T3_measured’ ‘T4U_measured’][3.09674931 × 10−7 0.00000000]−7985−7966.83
18.[‘TT4_measured’][0.]−7758.1−7745.98
19.[‘T4U_measured’][0.]−7960.79−7948.67
Table 27. OLS features of the significant subset attributes of the hypothyroid dataset.
Table 27. OLS features of the significant subset attributes of the hypothyroid dataset.
S.NoAttributesStandardErrorFStatisticLikelihood
1.[‘TSH_measured’][0.00349379]3041.198659.5089
2.[‘TSH_measured’ ‘TSH’ ‘TT4_measured’][0.00174378 0.00124843 0.00174378]15,138.543915.549
3.[‘TSH_measured’ ‘TSH’ ‘T4U_measured’][0.00168412 0.00120824 0.00168412]16,233.734019.059
4.[‘TSH_measured’ ‘T3_measured’][0.00445323 0.00445323]1631.505715.3352
5.[‘TSH_measured’ ‘T3_measured’ ‘T3’][0.00445754 0.00445337 0.00343653]1090.61717.7602
6.[‘TSH_measured’ ‘T3’][0.00349406 0.00349406]1526.435662.7281
7.[‘TSH_measured’ ‘T3’ ‘T4U’][0.00349332 0.00379016 0.00378678]1020.505665.1734
8.[‘TSH_measured’ ‘TT4_measured’][0.00174548 0.00174548]22,659.943911.961
9.[‘TSH_measured’ ‘T4U_measured’][0.0016859 0.0016859]24,295.554015.228
10.[‘TSH’ ‘T3_measured’ ‘TT4_measured’][0.0012558 0.00150131 0.00150129]14,949.733896.998
11.[‘TSH’ ‘T3_measured’ ‘T4U_measured’][0.0012158 0.00145213 0.00145211]16,019.933999.377
12.[‘TSH’ ‘TT4_measured’][0.00126052 0.00126052]22,243.823884.576
13.[‘TSH’ ‘T4U_measured’][0.00122068 0.00122068]23,824.183986.153
14.[‘T3_measured’][0.00409211]1360.107159.5229
15.[‘T3_measured’ ‘TT4_measured’][0.00150277 0.00150277]22,376.613893.367
16.[‘T3_measured’ ‘TT4’][0.00409839 0.00409839]684.0491162.4668
17.[‘T3_measured’ ‘T4U_measured’][0.00145365 0.00145365]23,974.813995.502
18.[‘TT4_measured’][0.00126172]44,395.613881.051
19.[‘T4U_measured’][0.00122194]47,542.783982.394
Table 28. OLS features of the significant subset attributes of the hypothyroid dataset.
Table 28. OLS features of the significant subset attributes of the hypothyroid dataset.
S.NoAttributesResidual MSEModel MSEOmnibus ProbabilityJarqueBera Probability
1.[‘TSH_measured’]0.0386090.0757322.61 × 10−760
2.[‘TSH_measured’ ‘TSH’ ‘TT4_measured’]0.004930.07573200
3.[‘TSH_measured’ ‘TSH’ ‘T4U_measured’]0.0046170.07573200
4.[‘TSH_measured’ ‘T3_measured’]0.0372820.0757322.49 × 10−750
5.[‘TSH_measured’ ‘T3_measured’ ‘T3’]0.0372370.0757321.90 × 10−750
6.[‘TSH_measured’ ‘T3’]0.0385430.0757324.64 × 10−760
7.[‘TSH_measured’ ‘T3’ ‘T4U’]0.0384960.0757327.40 × 10−760
8.[‘TSH_measured’ ‘TT4_measured’]0.0049390.07573200
9.[‘TSH_measured’ ‘T4U_measured’]0.0046270.07573200
10.[‘TSH’ ‘T3_measured’ ‘TT4_measured’]0.0049880.07573200
11.[‘TSH’ ‘T3_measured’ ‘T4U_measured’]0.0046750.07573200
12.[‘TSH’ ‘TT4_measured’]0.0050260.07573200
13.[‘TSH’ ‘T4U_measured’]0.0047130.07573200
14.[‘T3_measured’]0.0529660.07573200
15.[‘T3_measured’ ‘TT4_measured’]0.0049980.07573200
16.[‘T3_measured’ ‘TT4’]0.0528840.07573200
17.[‘T3_measured’ ‘T4U_measured’]0.0046850.07573200
18.[‘TT4_measured’]0.0050350.07573200
19.[‘T4U_measured’]0.0047230.07573200
Table 29. Performance analysis of proposed BCRM and BCCM with existing classifiers.
Table 29. Performance analysis of proposed BCRM and BCCM with existing classifiers.
ClassifiersPrecisionRecallFScoreAccuracy
Logistic regression0.8472850.8472610.8471920.847261
KNeighbors classifier0.8460430.8461010.8460640.846101
Kernel SVM classifier0.8511740.9225910.8854450.922591
Gaussian naive Bayes0.8472850.8472610.8471920.847261
Decision tree classifier0.8342850.8342610.8341920.834261
Extra tree classifier0.8176550.8173620.8174770.817362
Random forest classifier0.8472850.8472610.8471920.847261
Gradient boosting0.8342850.8342610.8341920.834261
AdaBoost classifier0.8405210.8405210.8405210.840521
Ridge classifier0.8472850.8472610.8471920.847261
Ridge classifierCV0.8472850.8472610.8471920.847261
SGD classifier0.8472850.8472610.8471920.847261
Proposed BCRM0.9952340.9952240.9953340.995334
Proposed BCCM0.9974320.9974220.9974320.997454
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Devi, M.S.; Kumar, V.D.; Brezulianu, A.; Geman, O.; Arif, M. A Novel Blunge Calibration Intelligent Feature Classification Model for the Prediction of Hypothyroid Disease. Sensors 2023, 23, 1128. https://doi.org/10.3390/s23031128

AMA Style

Devi MS, Kumar VD, Brezulianu A, Geman O, Arif M. A Novel Blunge Calibration Intelligent Feature Classification Model for the Prediction of Hypothyroid Disease. Sensors. 2023; 23(3):1128. https://doi.org/10.3390/s23031128

Chicago/Turabian Style

Devi, Munisamy Shyamala, Venkatesan Dhilip Kumar, Adrian Brezulianu, Oana Geman, and Muhammad Arif. 2023. "A Novel Blunge Calibration Intelligent Feature Classification Model for the Prediction of Hypothyroid Disease" Sensors 23, no. 3: 1128. https://doi.org/10.3390/s23031128

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop