Combining neural network predictions for medical diagnosis

doi:10.1016/S0010-4825(02)00006-9

Computers in Biology and Medicine

Volume 32, Issue 4, July 2002, Pages 237-246

https://doi.org/10.1016/S0010-4825(02)00006-9 Get rights and content

Abstract

We present our results from combining the predictions of an ensemble of neural networks for the diagnosis of hepatobiliary disorders. To improve the accuracy of the diagnosis, we train the second level networks using the outputs of the first level networks as input data. The second level networks achieve an accuracy that is higher than that of the individual networks in the first level. Compared to the simple method which averages the outputs of the first level networks, the second level networks are also more accurate. We discuss how the overall predictive accuracy can be improved by introducing bias during the training of the level one networks.

Introduction

Many authors have shown that combining the predictions of several models often results in a prediction accuracy that is higher than that of the individual models. The general framework for predicting using an ensemble of models consists of two levels and is often referred to as stacked generalization [1]. In the first level, various learning methods are used to learn different models from the original data set. The predictions of the models from the first level along with the corresponding target class of the original input data are then used as inputs to learn a second level model.

As neural networks are among the most popular models for pattern classification, numerous papers that report on theoretical and experimental results on combining the neural network predictions can be found in the literature. Among the second level models proposed for combining the network predictions are the simple averaging method and the generalized ensemble method [2] and the weighted least-squares method [3]. In these methods, the second level model is a simple weighted predictions of the component networks in the first level. The three methods differ in their computation of the ensemble weights given to the component networks in the ensemble. While the ensemble weights for the simple averaging method are equal for all component networks, the generalized ensemble weights depend on the correlation matrix of the errors of the component networks. In the weighted least-squares method, the weights are computed as the product of the component networks’ outputs and the target vector of the training samples.

The accuracy of the different methods for combining regularized neural networks have been compared on a breast cancer database [4]. The regularized neural networks investigated are networks that have been trained to minimize a cost function involving the sum of squared error function and a quadratic penalty term of the network weights. The first level models are neural networks that have been initialized with different random initial weights and neural networks that have been trained using different subsets of the data. The different data subsets are obtained by randomly drawing samples from the original data set with replacement. The second level models used include simple averaging method and a variance-based weighting method of the first level neural networks.

Another application of the network ensemble approach for the diagnosis of breast cancer has also been reported recently [5]. The network ensemble is adapted so that it is less likely to make false positive diagnosis (malignant diagnosis for benign data). The adaptation is achieved through training neural networks using different proportions of malignant to benign data. The first level models are two groups of neural networks. The networks in the first group have been trained with greater proportion of benign samples, while those in the second group with greater proportion of malignant samples. The second level model is a threshold decision mechanism which, based on a certain empirically determined threshold, decides whether the output of the first group or the second group is to be taken as the final output.

In this paper, we present our experimental results on combining neural network predictions for the diagnosis of hepatobiliary disorders. The data have been collected from a total of 536 patients who were admitted to a university-affiliated hospital in Japan. Nine real-valued measurements from biomedical tests were obtained from these patients. The hepatobiliary disorders alcoholic liver damage (ALD), primary hepatoma (PH), liver cirrhosis (LC), and cholelithiasis (C) constitute the four output classes. Because there are four possible outcomes of a diagnosis, for the first level models we have used four sets of neural networks. Networks in each set have been trained so that they are likely to be more accurate for one type of disorder than the other three disorders. The predictions of the networks in the first level are combined by a second level neural network. We have been able to achieve significant improvement in accuracy by applying neural networks as the second level model compared to the simple averaging method.

The outline of this paper is as follows. In Section 2, we describe the data that have been collected in more detail. We also describe the neural network topology used in this section. In Section 3, we present the results of our experiments using neural network for combining the predictions of the first level networks. Finally, in Section 4 we conclude the paper.

Section snippets

The data set

The hepatobiliary disorder data set contains 536 samples with nine input attributes. The attributes correspond to measurements from biomedical tests. They are glutamic oxaloacetic transaminase (GOT¹), glutamic pyruvic transaminase (GPT²), lactate dehydrase (LDH), gamma glutamyl transpeptidase (GGT), blood urea nitrogen (BUN), mean corpuscular volume of red blood cells (MCV), mean corpuscular hemoglobin

Simple averaging

Simple averaging of the predictions have been known to improve the performance of the individual predictions.

Table 4 shows the accuracy obtained by averaging the predictions from N networks, where N is 5, 10, or 15. The accuracy rates are averaged over five groups of randomly selected N networks from the 30 networks that we have trained. From the figures in this table, we see that there is a 1% increase in predictive accuracy over the average accuracy of the individual networks. When we also

Summary

We proposed the use of neural networks to combine the predictions of a neural network ensemble that has been trained for diagnosing hepatobiliary disorders. In order to generate networks with differing error patterns we generated biased networks by training the networks in four separate groups. Networks in each group were trained with different targets. The learning targets were modified so that the trained networks would predict one particular disorder with higher accuracy than the other three

References (10)

D.H. Wolpert
Stacked generalization
Neural Networks
(1992)
Y. Hayashi
Neural expert system using fuzzy teaching input and its application to medical diagnosis
Inf. Sci. Appl.
(1994)
Y. Hayashi et al.
A comparison between two neural network rule extraction techniques for the diagnosis of hepatobiliary disorders
Artif. Intell. Med.
(2000)
S. Mitra
Fuzzy MLP based expert systems for medical diagnosis
Fuzzy Sets Syst.
(1994)
M.P. Perrone et al.
When networks disagree: ensemble methods for hybrid neural networks

There are more references available in the full text version of this article.

Cited by (49)

A deep learning algorithm for classification of oral lichen planus lesions from photographic images: A retrospective study
2023, Journal of Stomatology, Oral and Maxillofacial Surgery
Citation Excerpt :
Many of them discovered that neural networks are instruments for receiving reasonably optimum solutions of partial and restricted data sets since they are flexible in modeling and have logical accuracy in prediction. As a result, neural networks are capable of combining data in many forms of a system, such as data obtained through clinical and experimental assessment methods, as well as aspects of signals and photographs [9,24,25]. One of the most difficult challenges in dermatology is differentiating between erythematous and squamous disorders.
Deep learning methods have recently been applied for the processing of medical images, and they have shown promise in a variety of applications. This study aimed to develop a deep learning approach for identifying oral lichen planus lesions using photographic images.
Anonymous retrospective photographic images of buccal mucosa with 65 healthy and 72 oral lichen planus lesions were identified using the CranioCatch program (CranioCatch, Eskişehir, Turkey). All images were re-checked and verified by Oral Medicine and Maxillofacial Radiology experts. This data set was divided into training (n = 51; n = 58), verification (n = 7; n = 7), and test (n = 7; n = 7) sets for healthy mucosa and mucosa with the oral lichen planus lesion, respectively. In the study, an artificial intelligence model was developed using Google Inception V3 architecture implemented with Tensorflow, which is a deep learning approach.
AI deep learning model provided the classification of all test images for both healthy and diseased mucosa with a 100% success rate.
In the healthcare business, AI offers a wide range of uses and applications. The increased effort increased complexity of the job, and probable doctor fatigue may jeopardize diagnostic abilities and results. Artificial intelligence (AI) components in imaging equipment would lessen this effort and increase efficiency. They can also detect oral lesions and have access to more data than their human counterparts. Our preliminary findings show that deep learning has the potential to handle this significant challenge.
Machine learning materials physics: Multi-resolution neural networks learn the free energy and nonlinear elastic response of evolving microstructures
2020, Computer Methods in Applied Mechanics and Engineering
Many important multi-component crystalline solids undergo mechanochemical spinodal decomposition: a phase transformation in which the compositional redistribution is coupled with structural changes of the crystal, resulting in dynamically evolving microstructures. The ability to rapidly compute the macroscopic behavior based on these detailed microstructures is of paramount importance for accelerating material discovery and design. Here, our focus is on the macroscopic, nonlinear elastic response of materials harboring microstructure. Because of the diversity of microstructural patterns that can form, there is interest in taking a purely computational approach to predicting their macroscopic response. However, the evaluation of macroscopic, nonlinear elastic properties purely based on direct numerical simulations (DNS) is computationally very expensive, and hence impractical for material design when a large number of microstructures need to be tested. A further complexity of a hierarchical nature arises if the elastic free energy and its variation with strain is a small-scale fluctuation on the dominant trajectory of the total free energy driven by microstructural dynamics. To address these challenges, we present a data-driven approach, which combines advanced neural network (NN) models with DNS to predict the homogenized, macroscopic, mechanical free energy and stress fields arising in a family of multi-component crystalline solids that develop microstructure. The microstructures are numerically generated by solving a coupled, mechanochemical spinodal decomposition problem governed by nonlinear strain gradient elasticity and the Cahn–Hilliard phase field equation. The hierarchical structure of the free energy’s evolution induces a multi-resolution character to the machine learning paradigm: We construct knowledge-based neural networks (KBNNs) with either pre-trained fully connected deep neural networks (DNNs), or pre-trained convolutional neural networks (CNNs) that describe the dominant characteristic of the data to fully represent the hierarchically evolving free energy. We demonstrate multi-resolution learning of the materials physics; specifically of the nonlinear elastic response for both fixed and evolving microstructures.
Blood glucose concentration prediction based on kernel canonical correlation analysis with particle swarm optimization and error compensation
2020, Computer Methods and Programs in Biomedicine
Background and objective: Blood glucose levels in humans change over time. Continuous glucose monitoring system (CGMS), can constantly monitor the change of blood glucose concentration. Given the historical data of blood glucose, predicting the trend of blood glucose in a short term is important for diabetes. Appropriate behaviors can be adopted to prevent hypoglycemia or hyperglycemia. Methods: The method proposed in this paper only uses historical blood glucose data as input, rather than complex multi-dimensional input. Previous articles have demonstrated that canonical correlation analysis (CCA) can effectively predict blood glucose. The linear relationship between historical blood glucose values and predicted values was only considered regrettably. To compensate for this, this paper adds a kernel function to find out the non-linear relationship between blood glucose. In the introduced kernel function, some parameters need to be adjusted. To reduce the deviation caused by manual parameter adjustment, this paper discusses the role of particle swarm optimization (PSO). Besides, this article puts forward an error compensation for CCA to enhance the precision. Finally based on the prediction results of PSO-KCCA, a personalized hypoglycemic warning threshold is proposed. Results: The proposed method is validated using clinical data by the root mean square error (RMSE) and differential coefficient (R²). The average RMSE result in PSO-KCCA was 8.01, 11.98, 12.45, 13.23, 14.53, 16.40 mg/dL in prediction horizon (PH) $=$ 5, 10, 15, 20, 25, 30 min. The average R² was 0.95, 0.95, 0.98, 0.97, 0.98, and 0.97, respectively. The CCA with error compensation (EC-CCA) reduced RMSE by 33.45% compared with CCA. For the hypoglycemic warning, the average sensitivity obtained at 6 different PH values was 94.37%, and the specificity was 92.25%. Conclusions: The experimental results confirm the effectiveness of PSO-KCCA in blood glucose prediction. The proposed EC-CCA successfully reduces the delay in the time series prediction. The personalized hypoglycemic warning threshold consider the influence of the model accuracy on the prediction results. This method guarantees the rate of underreporting during monitoring and ensures patient safety.
Continuous blood glucose level prediction of Type 1 Diabetes based on Artificial Neural Network
2018, Biocybernetics and Biomedical Engineering
Recent technological advancements in diabetes technologies, such as Continuous Glucose Monitoring (CGM) systems, provide reliable sources to blood glucose data. Following its development, a new challenging area in the field of artificial intelligence has been opened and an accurate prediction method of blood glucose levels has been targeted by scientific researchers. This article proposes a new method based on Artificial Neural Networks (ANN) for blood glucose level prediction of Type 1 Diabetes (T1D) using only CGM data as inputs. To show the efficiency of our method and to validate our ANN, real CGM data of 13 patients were investigated. The accuracy of the strategy is discussed based on some statistical criteria such as the Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE). The obtained averages of RMSE are 6.43 mg/dL, 7.45 mg/dL, 8.13 mg/dL and 9.03 mg/dL for Prediction Horizon (PH) respectively 15 min, 30 min, 45 min and 60 min and the average of MAPE was 3.87% for PH = 15 min, knowing that the smaller is the RMSE and MAPE, the more accurate is the prediction. Experimental results show that the proposed ANN is accurate, adaptive, and very encouraging for a clinical implementation. Furthermore, while other studies have only focused on the prediction accuracy of blood glucose, this work aims to improve the quality of life of T1D patients by using only CGM data as inputs and by limiting human intervention.
Accurate prediction of continuous blood glucose based on support vector regression and differential evolution algorithm
2018, Biocybernetics and Biomedical Engineering
Citation Excerpt :
Thus, for each patient, the DE algorithm fixed the optimal parameters and the SVR was trained automatically. To quantify the prediction performance, several measures were used and defined in the literature [14,32,33]. In this work, we have used the root mean square error (RMSE), the mean absolute percentage error (MAPE) and the fitness degree (R2).
Type 1 diabetes (T1D) is a chronic disease requiring patients to know their blood glucose values in order to ensure blood glucose levels as close to normal as possible. Hence, the ability to predict blood glucose levels is of a great interest for clinical researchers. In this sense, the literature is rich with several solutions that can predict blood glucose levels. Unfortunately, these methods require the patient to specific their daily activities: meal intake, insulin injection and emotional factors, which can be error prone. To reduce this burden on the patent, this work proposes to use only continuous glucose monitoring (CGM) data to predict blood glucose levels independently of other factors. To support this, support vector regression (SVR) and differential evolution (DE) algorithms were investigated. The proposed method is validated using real CGM data of 12 patients. The obtained average of root mean square error (RMSE) was 9.44, 10.78, 11.82 and 12.95 mg/dL for prediction horizon (PH) respectively equal to 15, 30, 45 and 60 min. The results of the present study and comparison with some previous works show that the proposed method holds promise. The SVR based on DE algorithm achieved high prediction accuracy while being robustness, automatic, and requiring no human intervention.
Pattern recognition at different scales: A statistical perspective
2014, Chaos, Solitons and Fractals
Citation Excerpt :
The theory of artificial neural networks (ANN) represents an open research field setting the stage for the implementation of a statistical mechanical approach in novel interdisciplinary problems, such as the modeling of the collective behavior of the human brain neurons. An important field of application of ANN is represented by the pattern recognition analysis [1,2], which has received an increasing interest in the literature, witnessed by the extensive application of ANN to tackle complex real-word problems, e.g. in medical diagnosis [3–5] and in biological sequences analysis [6–9]. Recent works, in this field, paved also the way to the systematic use of technical tools borrowed from Information Theory and Statistical Mechanics [10–12].
In this paper we borrow concepts from Information Theory and Statistical Mechanics to perform a pattern recognition procedure on a set of X-ray hazelnut images. We identify two relevant statistical scales, whose ratio affects the performance of a machine learning algorithm based on statistical observables, and discuss the dependence of such scales on the image resolution. Finally, by averaging the performance of a Support Vector Machines algorithm over a set of training samples, we numerically verify the predicted onset of an “optimal” scale of resolution, at which the pattern recognition is favoured.

View all citing articles on Scopus

Yoichi Hayashi received the B.E. degree in Management Science, and the M.E. and Dr. Eng. degrees in Systems Engineering, all from the Science University of Tokyo, Japan, in 1979, 1981, and 1984, respectively. He joined Ibaraki University, Japan, as an Assistant Professor in 1986 and was a Visiting Professor at the University of Alabama at Birmingham and University of Canterbury for 10 months. Currently, he is a Professor of Computer Science at Meiji University, Japan. He has published 140 papers in academic journals and international conference proceedings in the fields of computer and information sciences. His current research interest includes artificial neural networks, fuzzy logic, soft computing, expert systems, computational intelligence, data mining and medical informatics. Dr. Hayashi is an Associate Editor of IEEE Transactions on Fuzzy Systems. He is a member of the IEEE, ACM, AAAI, IFSA, INNS, NAFIPS, IPSJ and EICE.

Rudy Setiono received the B.S. degree in Computer Science from Eastern Michigan University in 1984, the M.S. and Ph.D. degrees in Computer Science from the University of Wisconsin-Madison in 1986 and 1990, respectively. Since August 1990, he has been with the National University of Singapore where he is currently an Associate Professor at the Information Systems Department, School of Computing. He is IEEE Senior Member and an Associate Editor of IEEE Transactions on Neural Networks.

View full text

Combining neural network predictions for medical diagnosis

Abstract

Introduction

Section snippets

The data set

Simple averaging

Summary

Neural Networks

Inf. Sci. Appl.

Artif. Intell. Med.

Fuzzy Sets Syst.

When networks disagree: ensemble methods for hybrid neural networks