Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality

Choi, YoungJin; Boo, YooKyung

doi:10.3390/ijerph17030897

Open AccessArticle

Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality

by

YoungJin Choi

¹ and

YooKyung Boo

^2,*

¹

Department of Healthcare Administration, Eulji University, Seongnam 13135, Korea

²

Department of Health Administration, Dankook University, Cheonan 31116, Korea

^*

Author to whom correspondence should be addressed.

Int. J. Environ. Res. Public Health 2020, 17(3), 897; https://doi.org/10.3390/ijerph17030897

Submission received: 10 December 2019 / Revised: 23 January 2020 / Accepted: 27 January 2020 / Published: 31 January 2020

(This article belongs to the Section Public Health Statistics and Risk Assessment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

(1) Medical research has shown an increasing interest in machine learning, permitting massive multivariate data analysis. Thus, we developed drug intoxication mortality prediction models, and compared machine learning models and traditional logistic regression. (2) Categorized as drug intoxication, 8,937 samples were extracted from the Korea Centers for Disease Control and Prevention (2008-2017). We trained, validated, and tested each model through data and compared their performance using three measures: Brier score, calibration slope, and calibration-in-the-large. (3) A chi-square test demonstrated that mortality risk statistically significantly differed according to severity, intent, toxic substance, age, and sex. The multilayer perceptron model (MLP) had the highest area under the curve (AUC), and lowest Brier score in training and validation phases, while the logistic regression model (LR) showed the highest AUC (0.827) and lowest Brier score (0.0307) in the testing phase. MLP also had the second-highest AUC (0.816) and second-lowest Brier score (0.003258) in the testing phase, demonstrating better performance than the decision-making tree model. (4) Given the complexity of choosing tuning parameters, LR proved competitive when using medical datasets, which require strict accuracy.

Keywords:

drug intoxication; influencing factor; logistic regression; machine learning; mortality prediction

1. Introduction

The motive for consuming toxic substances may vary according to its social or regional background and is influenced by generational changes and cultural development. Particularly, in Korea, which has undergone rapid social, economic, and cultural developments, drug-induced suicide attempts and cases of drug intoxication from drug abuse are rising [1]. Intoxication can be defined as a state inducing a functional or structural impairment of the body as a result of exposure to a natural or synthetic substance [2]. Although the rate of intoxication mortality may be lower than that of other injuries, the scale of socioeconomic damage incurred by intoxication is surmised to be substantial, taking into consideration a patient’s rehabilitation process in addition to the cost of treatment [3,4].

Drug intoxication refers to an impairment or adverse reaction caused by an intentional or accidental drug overdose over a short period of time [5]. The clinical symptoms vary widely depending on the cause of poisoning and type of substance. Some cases may be fatal and untreatable [6]. In addition, it poses a significant medical and societal problem, as drug intoxication induces serious complications, and inflicts substantial economical and mental suffering on caregivers and patients alike, even after treatment [7,8]. To attenuate the aggravating problem of intoxication, information on the current state of intoxication is needed in addition to further promotion of preventive efforts, involving prompt and accurate treatment, to reduce anticipated sequela and complications after treatment.

Today, healthcare institutions worldwide have developed a variety of mortality prediction models, such as Acute Physiology and Chronic Health Evaluation, Simplified Acute Physiology Score, and Mortality Probability Model, to enhance the quality of medicine [9,10] and the validity and fit of these models. Particularly, artificial intelligence-based prediction models for inpatient mortality, 30-day unplanned readmission, long lengths of stay, and discharge diagnoses have also been developed [11,12]. Regarding acute trauma mortality prediction models, one study developed a model using a trauma and injury severity score based on the severe trauma database of regional emergency medical centers^, [13].

Most past studies have researched drug intoxication prediction models based on the severity and incidence. The development of risk prediction models is challenging as many factors, including the aforementioned considerations, should be examined [14,15,16]. Thus, the development usually involves a combination of numerous diagnostics, and performance statistics for model calibration and discrimination [17].

Hippisley-Cox et al. developed a model to predict mortality risk for acutely intoxicated inpatients who visited hospital as unplanned or emergency admissions [18]. The variables employed in this model were patients deceased during admission, the identified substances, and patient’s age, and gender. The controversial issue, related to the risk prediction model, is associated with the ways of finding and applying the best predicting models for the situation. There are two approaches for choosing a predicting model: The traditional and the machine learning models recently used in the field, including a decision tree (DT) and multilayer perceptron (MLP).

Logistic regression (LR) is a traditional model commonly employed in medical applications to interpret clinical data in depth. On other hand, the machine learning models recently used in the field, including a DT, support vector machine, and MLP, have been widely used as of late [19,20,21,22,23,24,25,26,27]. Herein, the LR and the intensive approaches (DT and MLP) were compared to identify the best performing prediction model in medical mortality [28,29]. For the comparison, an external validation was used to investigate the predictions made by these models in terms of calibration and discrimination.

2. Materials and Methods

2.1. Data Set

Since 2005, the Korea Centers for Disease Control and Prevention conduct a Korean National Hospital In-depth Injury Survey on discharged patients in 170 hospitals nationwide that contain 100 beds or more. In this study, from the 2,149,572 samples within the Korea National Hospital Discharge Survey (KNHDS) from 2008 to 2017 (10 years), 8,937 samples, categorized as drug intoxication (ICD-10 codes T36.0–T65.9), were used to predict the mortality risk among inpatients with injuries as shown in Figure 1.

To identify the factors that affect the mortality due to drug intoxication, the dependent variable was set to mortality, while the independent variables were primarily set to those mentioned by previous studies, like as sex, age, toxic substance, intent of intoxication, severity, area of residence, and risk factor method of payment, method of hospital admission, and mental disorder. An importance analysis was conducted on these independent variables to identify which should be used in the model comparisons. Based on the results, the following five variables with an importance of 10% or higher, were selected: age, toxic substance, intent of intoxication, severity, and risk factor.

Age, which is a continuous variable, was adjusted to an ordinal variable by dividing the age groups into 65 years or older and below 65 years. Severity was assessed based on the Charlson comorbidity index score, a widely used index for the adjustment of comorbidities calculated by the sum of weighted scores based on the presence/absence of 19 different medical conditions. In addition, toxic substance was grouped into toxic drugs, alcohol, environmentally harmful substances, and other substances. Intent of intoxication was divided into intentional and unintentional, and risk factors were divided into conflict with relatives, physical illness, mental illness, financial problem, and other factors.

2.2. Methods

The calibration is one of the most important factors in prediction, as it is closely related to the model’s reliability [22]. The calibration was performed both internally and externally. Internal validation involves split of the surveyed data into training, validation, and testing sets. Then, external validation, which is a key step before clinical usage, involves model calibration and discrimination investigating the performance of the prediction in different, but possibly related setting [23].

In this study, three performance measures were used to assess model performance: Brier score, area under the Receiver Operating Characteristic (ROC) curve (AUC), and calibration-in-the-large [22]. Overall performance compares the predicted outcome with the actual outcome. Nagelkerke’s R² and the Brier score are frequently used. The Brier score utilizes the mean square error between predicted probabilities and the expected values and is modified to be applied to logistic regression. The equation for the Brier score is shown below. The score ranges from 0 to 0.25 and a score closer to 0 indicates better model performance.

Brier = mean((y−p)²) = mean(y x (1−p)² + (1−y) x p²)

(1)

Second, discrimination represents whether the model is capable of distinguishing people with and without an illness. Discrimination slope, Concordance index (C index), and AUC are often used. In the present study, AUC, area under a ROC curve, was used. Calibration represents the concordance between the predicted and actual values. The Hosmer-Lemeshow’s goodness of fit test and calibration-in-the-large are generally used, and we opted for the latter in this study. It is a measure of the difference between the mean observed value and the mean predicted value in linear regression that is modified to apply in LR using logit functions. It computes the difference in the log odds of the actually observed value and the predicted value. A value closer to 0 indicates better performance.

Calibration - in - the - large (a) = logit (y = 1) - logit (p) = \ln (odds (mean (y)) - \ln (odds (mean (p)) = \ln (\frac{m e a n (y)}{1 - m e a n (y)}) - \ln (\frac{m e a n (p)}{1 - m e a n (p)})

(2)

3. Results

3.1. Statistical Analysis

This study analyzed the factors that affect mortality from drug intoxication with chi-square tests and compared the performances of prediction models using the IBM SPSS Statistics 25 (IBM Corp. Released 2017. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp.). To test the performance and validity of the prediction models, we divided the samples into training, validation, and testing phases. The sample distribution per phase is shown in Table 1.

A total of 4,442 samples over five years, from 2008 to 2012, 2,575 (96.4%) survived drug intoxication. A chi-square test was performed on the drug intoxication mortality predictors, namely age, toxic substance, severity, cause of risk, mental and intent, and the results showed that mortality significantly differed in most of these variables. However, the discrepancy according to cause of risk was significant at 0.05 in all three phases (Table 1).

To avoid overfitting and evaluate exact prediction performance, the dataset was split into training, validation, and a separated test set [30]. The prediction performance and ROC plots of three datasets are demonstrated in Table 2.

To compare, we created box graphs as shown in Figure 2 below, for the training, validation, and testing phases in the three prediction models, with the mortality prediction as the y-axis and the actual treatment outcome (0: Death 1: Survival) as the x-axis. In terms of predicting death (0 on x-axis), all three models had many outliers (O) that deviated from the quartiles. However, the DT model had relatively fewer outliers compared to the LR and MLP models. On the other hand, in terms of predicting survival (1 on x-axis), fewer outliers appeared beyond the quartiles than those for death prediction. In the validation and testing phases of the DT model, the variance was large, but the values were within 50% of the center. In other words, the DT is the superior model in predicting mortality, while the LR model has a relatively smaller variance than others in predicting survival.

3.2. Modeling Results

In the study, seven years of drug intoxication data were divided into training and validation data, and the models’ performances were tested using three years of data. We predicted the risk of mortality using the following modeling approaches: LR, DT, and MLP.

The MLP trains the network by error back-propagation, and with one hidden layer with nine neurons. The hyperbolic tangent activation function was used, and softmax was used as the output layer function for the prediction of diseases [31]. The sensitivity analysis was performed with neurons. When increasing neurons up to nine, the prediction accuracy increased and did not improve considerably afterwards. As such, nine neurons were used as to avoid an over-fitting problem [32]. In order to train the MLP, the scaled conjugate gradient algorithm was used, which tends to minimize error between the output of an MLP and the desired output [33].

The goal of DT was to produce subsets of the data which were as homogeneous as possible with respect to the target variable. In this study, we used the measure of gini impurity that is used for categorical variables [34]. In addition, the minimum parent size was set at 100, minimum child size was 50, and minimum improvement was 0.0001.

For the performance testing, the Brier score was used to compare overall performance, and the AUC to analyze discrimination. Further, calibration-in-the-large was used to analyze calibration. The model performance statistics are shown in Table 2 and plotted in Figure 3.

In the training phase, the AUC ranged from 0.779 to 848. MLP had the highest AUC (0.848), while LR had the lowest (0.779). The Brier score ranged from 0.0599 to 0.0604. MLP had the lowest Brier score (0.0599), and DT had the highest (0.0604). The absolute values of calibration-in-the-large ranged from 0.0034 to 0.3186. LR had the lowest score (0.0034) and DT had the highest (0.3186). In the validation phase, all methods had an AUC range between 0.788 and -0.853, and Brier scores ranging from 0.0422 to -0.043. The absolute values of calibration-in-the-large ranged from 0.117 to 0.394.

In the testing phase, all methods had reductions in their AUC (range: 0.764 to -0.827) and Brier scores (range: 0.0307 to -0.0336). In the training and validation phases, MLP had the best performance with the highest AUC and lowest Brier score, but in the testing phase, LR was found to have the highest AUC (0.827) and lowest Brier score (0.0307). The absolute values of the external validation calibration-in-the-large ranged from 0.149 to 0.5017, with LR having the lowest value (0.149) and MLP having the highest value (0.5017), showing considerable overfitting. Therefore, the LR model performed reasonably well in the external testing phase with AUC (0.827), Brier score (0.00307), and calibration-in-the-large (0.149). MLP followed, showing better performance than the DT model, with an AUC of 0.816 and a Brier score of 0.003258.

4. Discussion

The chi-square test on the drug intoxication mortality predictors showed that mortality statistically significantly differed in most of these variables. Particularly, it significantly differed for the intent predictor in all three phases of the models. This result reflects previous studies from several countries. In the United Kingdom, more than 80,000 people are admitted to an emergency department due to drug intoxication every year, and more than 1,000 die [35]. In the United States, more than 7.3% of the entire 12 year or older population either abuse, or are dependent on alcohol or illegal drugs [36]. According to data of an analysis of the National Emergency Department Information System (NEDIS), the number of suicide attempts admitted to hospitals’ emergency departments registered with the NEDIS data has been consistently on the rise from 2007 to 2010, and drug or pesticide intoxication was the most frequent method [37]. As shown here, intentional drug intoxication accounts for most of the suicide attempt cases in adults, highlighting the need for management.

The results also showed a statistically significant difference in the risk of mortality from drug intoxication among the elderly population (p < 0.01). Ageing diminishes individuals’ physiological abilities and the incidence of trauma among the elderly is on the rise every year, due to medical advances, prolonged life expectancy, and increased participation in social activities. For these reasons, an injury in this population requires more health care services compared to other age groups. Particularly, prognoses are even poorer for patients with chronic diseases with high disease burdens, such as diabetes, coronary artery disease, kidney disease, and lung disease [38]. According to the national-wide survey on the Living Conditions and Welfare Needs of Older Persons, 88.5% of the elderly, aged 65 years or older, have at least one chronic disease [39] and frequently take prescribed and over-the-counter medications. Moreover, various problems, including social, psychological, and health, serve as the cause of drug intoxication among the elderly. In our study, the percentage of drug intoxication, although not fatal, was also high. Hence, as drug intoxication incidences can increase even more with the growing elderly population, it is important to raise awareness on the issue and develop preventive measures for drug management.

Differences in mental disorders were significant in the training phase (p < 0.05), while risk factors were significant in all three phases (p < 0.05). Particularly, psychiatric history, past suicide attempts, family history of suicide, and living alone were identified as risk factors of suicide attempts among intentionally intoxicated patients [40,41]. Therefore, a national-level implementation is needed for evaluating and treating intentional drug intoxicated patients and systematic post-treatment management, such as assessment of psychiatric risk factors, for intentional drug intoxicated patients and their caregivers. In this study, we developed and validated the performance of prediction models for drug intoxication-induced mortality using LR, DT, and MLP. To compare the three models, we processed the samples through training and validation, and tested their performance using the testing dataset. This approach to model testing is infrequent, but methodologically more rigorous than simply considering internal validation.

In the developed prediction models’ validation, the LR model had superior overall performance, discrimination, and external validity in the testing phase compared to the other two models. In addition, all three models had satisfactory overall performance and discrimination, but the DT and MLP had a miscalibration problem in the external validity assessment. Due to this miscalibration, it is possible that applying the DT and MLP models could lead to a systematically incorrect decision when they are applied to new patients. Thus, the models should be updated appropriately to suit the environment in the application stage.

Despite the better data exhibited by MLP, the overall winner between the conventional statistical model and the modern machine learning model is not quite clear for two reasons. First, the MLP and DT demonstrated underwhelming performance in the external validation phase. Second, the complexity of the tuning parameters involved in modern machine learning methods allows for LR, which is less complex, to be perceived as better prediction model. The key finding is that the non-linear and non-additive signals are too weak to make the modern methods beneficial. We suggest that the validation of sophisticated models, such as modern machine learning methods, should involve a comparison with the simple logistic model as a benchmark. Also, using more than one performance measure is essential when comparing the performance of prediction models. For this purpose, the AUC and the Brier score were used in this study. Even though the AUC is a general performance measure, which is limited a discriminator between models, the Brier score, and the calibration in the large are also useful performance indexes when comparing models.

5. Conclusions

In this study, a model was derived to estimate the death effects of drug-addicted patients. In particular, we have been predicted using traditional analysis methods (LR) and new analysis methods (DT, MLP). In addition, we compared the usefulness and validity of the model among the techniques to identify a technique suitable for the medical field where accuracy is emphasized. First, we identified the factors affecting the mortality of drug patients through the estimation model and found that similar items were selected as influencing factors in the three models.

Next, the assessment using LR, DT, and MLP to determine the validity of the estimation model resulted in satisfactory results in the model performance and discriminant power. The MLP was found to be superior to other methods in the training and validation phases, but the LT method was found to be more useful in the testing phase. However, in the external validity evaluation, a new technique (DT, MLP) caused an overfitting problem, but LT was relatively superior to the other two techniques in external validity.

That is, although there is a possibility that the new technique can be used to improve the model performance and discriminant, it cannot be said that the new techniques are superior to the traditional ones in the medical field where the overfitting problem is important. This is because the medical field, which is directly related to the patient’s life, has a high standard of accuracy, and, if the estimation model is applied to a new patient, the model is continuously updated to reduce the possibility of making wrong decisions due to overfitting.

In particular, in the medical field, which is directly related to the patient’s life, accuracy is important. So, when applying the estimation model with the problem of overfitting to a new group, it is necessary to modify the model to reduce wrong decision making due to overfitting.

This study has limitations due to the limited clinical indicators and the small sample size of the sample from the panel data, and the use of MLP instead of Random Forests (RF) and Support Vector Machines (SVM), which are advanced models. In addition, there is a limit in reflecting the characteristics of different age groups by dividing the age into two groups based on the age of 65. However, this study has the meaning of predicting the death factor of drug addiction by using data mining techniques and the advantages and disadvantages between the techniques, indicating that consideration should be given in future field applications.

Author Contributions

All authors have read and agree to the published version of the manuscript. Conceptualization, Y.B. and Y.C.; methodology, Y.B. and Y.C.; software, Y.C.; validation, Y.B. and Y.C.; formal analysis, Y.C.; investigation, Y.B.; resources, Y.B.; data curation, Y.B.; writing—original draft preparation, Y.C.; writing—review and editing, Y.B.; visualization, Y.C.; supervision, Y.B.; project administration, Y.B.; funding acquisition, Y.B.

Funding

The present research was supported by the research fund Dankook University in 2019.

Acknowledgments

The research data needed to conduct this study were provided and used in accordance with the guidelines for using the original data on Korean National Hospital In-depth Injury Survey from the Korea Centers for Disease Control and Prevention.

Conflicts of Interest

The authors declare no conflict of interest.

References

Han, S.T.; Lee, J.H. Comparative analysis of acute drug intoxication between 1980s and 1990s. J. Korean Soc. Emerg. Med. 1999, 10, 441–446. [Google Scholar]
Moon, J.M.; Chun, B.J.; Cho, Y.S. The characteristics of emergency department presentations related to acute herbicide or insecticide poisoning in South Korea between 2011 and 2014. J. Toxicol. Environ. Health 2016, 79, 466–476. [Google Scholar] [CrossRef]
Rockett, I.R.; Smith, G.S.; Caine, E.D.; Kapusta, N.D.; Hanzlick, R.L.; Larkin, G.L.; Naylor, C.P.; Nolte, K.B.; Miller, T.R.; Putnam, S.L.; et al. Confronting death from drug self-intoxication (DDSI): Prevention through a better definition. Am. J. Public Health 2014, 104, 49–55. [Google Scholar] [CrossRef] [PubMed]
Descamps, A.K.; Vandijck, D.M.; Buylaert, W.A.; Mostin, M.A.; Paepe, P. Characteristics and costs in adults with acute poisoning admitted to the emergency department of a university hospital in Belgium. PLoS ONE 2019, 14, e0223479. [Google Scholar] [CrossRef]
Vermes, A.; Roelofsen, E.E.; Sabadi, G.; van den Berg, B.; de Quelerij, M.; Vulto, A.G. Intoxication with therapeutic and illicit drug substances and hospital admission to a Dutch university hospital. Neth. J. Med. 2003, 61, 168–172. [Google Scholar] [PubMed]
Haoka, T.; Sakata, N.; Okamoto, H.; Oshiro, A.; Shimizu, T.; Naito, Y.; Onishi, S.; Morishita, Y.; Nara, S. Intentional or unintentional drug poisoning in elderly people: Retrospective observational study in a tertiary care hospital in Japan. Acute Med. Surgery 2019, 6, 252–258. [Google Scholar] [CrossRef] [Green Version]
Conner, K.R.; Wiegand, T.J.; Gorodetsky, R.; Schult, R.; Pizzarello, E.; Kaukeinen, K. Validation of the Poisoning Severity Score (PSS) in suicidal behavior by self-poisonin. Behav. Sci. Law 2019, 37, 240–246. [Google Scholar] [CrossRef]
Persson, H.E.; Sjöberg, G.K.; Haines, J.A.; de Garbino, J.P. Poisoning severity score. Grading of acute poisoning. J. Toxicol. C. Toxicol. 1998, 36, 205–213. [Google Scholar] [CrossRef]
Vasilevskis, E.E.; Kuzniewicz, M.W.; Cason, B.A.; Lane, R.K.; Dean, M.L.; Clay, T.; Rennie, D.J.; Vittinghoff, E.; Dudley, R.A. Mortality probability model III and simplified acute physiology score II: Assessing their value in predicting length of stay and comparison to APACHE IV. Chest 2009, 136, 89–101. [Google Scholar] [CrossRef] [Green Version]
Engerstrom, L.; Nolin, T.; Mardh, C.; Sjoberg, F.; Karlstrom, G.; Fredrikson, M.; Walther, S.M. Impact of Missing Physiologic Data on Performance of the Simplified Acute Physiology Score 3 Risk-Prediction Model. Crit. Care Med. 2017, 45, 2006–2013. [Google Scholar] [CrossRef]
Rosenthal, J.; Jacolbia, R.; Rajkomar, A.; Lee, H.; Auerbach, A. Using Tablet Computers to Increase Patient Engagement With Electronic Personal Health Records: Protocol For a Prospective, Randomized Interventional Study. J. Med. Internet Res. 2019, 18, 25. [Google Scholar]
Rajkomar, A.; Oren, E.; Chen, K.M.; Dai, A.M.; Hajaj, N.; Liu, J.P.; Liu, X.; Sun, M.; Sundberg, P.; Yee, H.; et al. Scalable and accurate deep learning for electronic health records. Dig. Med. 2018, 1, 18. [Google Scholar] [CrossRef] [PubMed]
Kang, I.H.; Lee, K.H.; Youk, H.; Lee, J.I.; Lee, H.Y.; Bae, K.S. Trauma and Injury Severity Score modification for predicting survival of trauma in one regional emergency medical center in Korea: Construction of Trauma and Injury Severity Score coefficient model, Hong Kong. J. Emerg. Med. 2019, 26, 225–232. [Google Scholar] [CrossRef] [Green Version]
Debray, T.P.; Vergouwe, Y.; Koffijberg, H.; Nieboer, D.; Steyerberg, E.W.; Moons, K.G. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J. Clin. Epidemiol. 2015, 68, 279–289. [Google Scholar] [CrossRef] [Green Version]
Harrell, F.E. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis; Springer: New York, NY, USA, 2001. [Google Scholar]
Helmreich, J.E. Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition). J. Stat. Soft. 2016, 70. [Google Scholar] [CrossRef]
Steyerberg, E.W.; Vickers, A.J.; Cook, N.R.; Gerds, T.; Gonen, M.; Obuchowski, N.; Pencina, M.J.; Kattan, M.W. Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology 2010, 21, 128–138. [Google Scholar] [CrossRef] [Green Version]
Hippisley-Cox, J.; Coupland, C. Predicting risk of emergency admission to hospital using primary care data: Derivation and validation of QAdmissions score. BMJ Open 2013, 3, e003482. [Google Scholar] [CrossRef] [Green Version]
Kim, S.; Kim, W.; Park, R.W. A comparison of intensive care unit mortality prediction models through the use of data mining techniques. Healthc Inform. Res. 2011, 17, 232–243. [Google Scholar] [CrossRef]
Scott, H.F.; Colborn, K. Machine learning for predicting sepsis in-hospital mortality: An important start. Acad Emerg. Med. 2016, 23, 1307. [Google Scholar] [CrossRef]
Churpek, M.M.; Yuen, T.C.; Winslow, C.; Meltzer, D.O.; Kattan, M.W.; Edelson, D.P. Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit. Care Med. 2016, 44, 368–374. [Google Scholar] [CrossRef] [Green Version]
Badriyah, T.; Briggs, J.S.; Prytherch, D.R. Decision trees for predicting risk of mortality using routinely collected data. Int. J. Soc. Hum. Sci. 2012, 6, 660–663. [Google Scholar]
Wang, G.; Lam, K.M.; Deng, Z.; Choi, K.S. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques. Comput. Biol. Med. 2015, 63, 124–132. [Google Scholar] [CrossRef]
Motwani, M.; Dey, D.; Berman, D.S.; Germano, G.; Achenbach, S.; Al-Mallah H., M.; Andreini, D.; Budoff, J.M.; Cademartiri, F.; Callister, Q.T.; et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: A 5-year multicentre prospective registry analysis. Eur. Heart. J. 2016, 52, 468–476. [Google Scholar] [CrossRef] [PubMed]
Stylianou, N.; Akbarov, A.; Kontopantelis, E.; Buchan, I.; Dunn, K.W. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches. Burns 2015, 41, 925–934. [Google Scholar] [CrossRef] [PubMed]
Colombet, I.; Ruelland, A.; Chatellier, G.; Gueyffier, F.; Degoulet, P.; Jaulent, M.C. Models to predict cardiovascular risk: Comparison of CART, multilayer perceptron and logistic regression. Proc. AMIA Symp. 2000, 156–160, PMC2244093. [Google Scholar]
Ross, E.G.; Shah, N.H.; Dalman, R.L.; Nead, K.T.; Cooke, J.P.; Leeper, N.J. The use of machine learning for the identification of peripheral artery disease and future mortality risk. J. Vasc. Surg. 2016, 64, 1515–1522. [Google Scholar] [CrossRef] [Green Version]
Lee, J.; Maslove, D.M.; Dubin, J.A. Personalized Mortality Prediction Driven by Electronic Medical Data and a Patient Similarity Metric. PLoS ONE 2015, 10, e0127428. [Google Scholar] [CrossRef]
Ho, L.V.; Ledbetter, D.; Aczon, M.; Wetzel, R. The Dependence of Machine Learning on Electronic Medical Record Quality. In Proceedings of the AMIA Annual Symposium, Washinton, DC, USA, 6–8 November 2017; American Medical Informatics Association: Washinton, DC, USA, 2017; Volume 2017, p. 883. [Google Scholar]
Tsao, H.Y.; Chan, P.Y.; Su, E.C.Y. Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms. BMC Bioinforma. 2018, 19, 195. [Google Scholar] [CrossRef] [Green Version]
Karlık, B.; Olgaç, A.V. Performance analysis of various activation functions in generalized MLP architectures of neural networks. Inter. J. Art. Int. Exp. Sys. 2011, 1, 111–122. [Google Scholar]
Isa, I.; Saad, Z.; Omar, S.; Osman, M.; Ahmad, K.; Sakim, H.M. Suitable MLP network activation functions for breast cancer and thyroid disease detection. In Proceedings of the IEEE, Tuban, Indonesia, 28–30 September 2010; 2010; pp. 39–44. [Google Scholar] [CrossRef]
Moller, A. Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Neural Netw. 1993, 6, 525–533. [Google Scholar] [CrossRef]
Khemphila, A.; Boonjing, V. Comparing performances of logistic regression, decision trees, and neural networks for classifying heart disease patients. In Proceedings of the International conference on Computer Information Systems and Industrial Management Applications (CISIM), Krakow, Poland, 8–10 October 2010; pp. 193–198. [Google Scholar] [CrossRef]
Greene, S.L.; Dargan, P.I.; Jones, A.L. Acute poisoning: Understanding 90% of cases in a nutshell. Postgrad. Med. J. 2005, 81, 204–216. [Google Scholar] [CrossRef] [PubMed]
Brust, J.C. Neurologic complications of substance abuse. CONTINUUM 2014, 20, 642–656. [Google Scholar] [CrossRef] [PubMed]
Choi, Y.; Kim, Y.; Ko, Y.; Cha, E.S.; Kim, J.; Lee, W.J. Economic burden of acute pesticide poisoning in South Korea. Trop. Med. Int. Health 2012, 17, 12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ahn, H.C.; Seo, J.Y.; Chung, J.B.; Choi, Y.M.; Choi, J.T.; You, K.C.; Ahn, M.E.; Choi, M.C.; Kim, H.K.; Kim, S.W.; et al. Clinical Review In Geriatric Trauma Patients. J. Korean Soc. Emerg. Med. 2002, 13, 1. [Google Scholar]
Jang, S.N.; Kim, D.H. Trends in the health status of older Koreans. J. American Ger. Soc. 2010, 58, 592–598. [Google Scholar] [CrossRef]
Kim, H.J.; Kim, H.M.; Kim, H.J.; Cho, Y.S.; Lee, M.G.; Jun, D.H.; Go, C.Y. Association of prescribed drug intoxication and neuropsychiatric history. J. Korean. Soc. Clin. Toxicol. 2011, 9, 77–80. [Google Scholar]
Min, H.G.; Choi, H.S.; Kwon, O.Y.; Lee, J.S.; Hong, H.P.; Ko, Y.G. Is a psychiatric consultation necessary for the non-suicidal intentional drug ingestion patient in an emergency department. J. Korean Soc. Emerg. Med. 2010, 21, 878–886. [Google Scholar]

Figure 1. Sample structure.

Figure 2. Box Graph.

Figure 3. ROC Graph.

Table 1. Results of chi-square.

Items		Training		Validation		Testing
Age	Under 65	3.333(75.0)	62.1 ***	1.360(74.5)	68.5 ***	1.878(70.3)	67.1 ***
Age	Over 65	1.109(25.0)	62.1 ***	465(25.5)	68.5 ***	792(29.7)	67.1 ***
Toxic Substance	Toxic Drug	1.768(39.8)	440.1 ***	739(40.5)	104.2 ***	1.176(44.0)	64.4 ***
	Alcohol	28(0.6)		17(0.9)		34(1.3)
	Hazardous Substance	1.590(35.8)		614(33.6)		809(30.3)
	Other	1.056(23.8)		455(24.9)		651(24.4)
Severity (CCI)	0	3.844(86.5)	17.5 ***	1.577(86.4)	14.5 ***	2.295(86.0)	17.9 ***
	1	403(9.1)		181(9.9)		249(9.3)
	2	114(2.6)		44(2.4)		73(2.7)
	3	81(1.8)		23(1.3)		53(2.0)
Cause of Risk	Conflict with Relatives	666(15.0)	11.3 **	192(10.5)	13.3 **	317(11.9)	13.3 **
	Physical Illness	116(2.6)		69(3.8)		94(3.5)
	Mental Problem	678(15.3)		285(15.6)		327(12.2)
	Financial Problem	106(2.4)		67(3.7)		114(4.3)
	Other	2.876(64.7)		1.212(66.4)		1.818(68.1)
Intent	Unintentional	1.657(37.3)	123.8 ***	699(38.3)	25.6 ***	1.021(38.2)	37.3 ***
	Intentional	2.562(57.7)		1.033(56.6)		1.500(56.2)
	Missing	223(5.)		93(5.1)		149(5.6)
Total		4.442		1.825		2.670

*** p < 0.01, ** p < 0.05, * p < 0.1.

Table 2. Model performance test.

Items		Brier Score	AUC	Calibration
Logistic Regression	Training	0.06032	0.779	−0.00342
	Validation	0.04266	0.788	0.207416
	Testing	0.030796	0.827	0.149374
Decision Tree	Training	0.060441	0.845	0.244034
	Validation	0.042295	0.845	−0.11715
	Testing	0.033615	0.764	−0.49888
Multilayer Perceptron	Training	0.059971	0.848	−0.31857
	Validation	0.043033	0.853	−0.3938
	Testing	0.032589	0.816	−0.50177

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Choi, Y.; Boo, Y. Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality. Int. J. Environ. Res. Public Health 2020, 17, 897. https://doi.org/10.3390/ijerph17030897

AMA Style

Choi Y, Boo Y. Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality. International Journal of Environmental Research and Public Health. 2020; 17(3):897. https://doi.org/10.3390/ijerph17030897

Chicago/Turabian Style

Choi, YoungJin, and YooKyung Boo. 2020. "Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality" International Journal of Environmental Research and Public Health 17, no. 3: 897. https://doi.org/10.3390/ijerph17030897

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Set

2.2. Methods

3. Results

3.1. Statistical Analysis

3.2. Modeling Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI