Abstract

This study aimed to assess the risk factors for COVID-19 mortality among hospitalized patients in Jordan. All COVID-19 patients admitted to a tertiary hospital in Jordan from September 20, 2020, to August 8, 2021, were included in this study. Demographics, clinical characteristics, comorbidities, and laboratory results were extracted from the patients’ electronic records. Multivariable logistic and machine learning (ML) methods were used to study variable importance. Out of 1,613 COVID-19 patients, 1,004 (62.2%) were discharged from the hospital (survived), while 609 (37.8%) died. Patients who were of elderly age (>65 years) (OR, 2.01; 95% CI, 1.28–3.16), current smokers (OR, 1.61; 95%CI, 1.17–2.23), and had severe or critical illness at admission ((OR, 1.56; 95%CI, 1.05–2.32) (OR, 2.94; 95%CI, 2.02–4.27); respectively), were at higher risk of mortality. Comorbidities including chronic kidney disease (OR, 2.90; 95% CI, 1.90–4.43), deep venous thrombosis (OR, 2.62; 95% CI, 1.08–6.35), malignancy (OR, 2.22; 95% CI, 1.46–3.38), diabetes (OR, 1.31; 95% CI, 1.04–1.65), and heart failure (OR, 1.51; 95% CI, 1.02–2.23) were significantly associated with increased risk of mortality. Laboratory abnormalities associated with mortality included hypernatremia (OR, 11.37; 95% CI, 4.33–29.81), elevated aspartate aminotransferase (OR, 1.81; 95% CI, 1.42–2.31), hypoalbuminemia (OR, 1.75; 95% CI, 1.37–2.25), and low platelets level (OR, 1.43; 95% CI, 1.05–1.95). Several demographic, clinical, and laboratory risk factors for COVID-19 mortality were identified. This study is the first to examine the risk factors associated with mortality using ML methods in the Middle East. This will contribute to a better understanding of the impact of the disease and improve the outcome of the pandemic worldwide.

1. Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), or as it was provisionally named, the 2019 novel coronavirus (2019-nCoV) disease (COVID-19), is a highly contagious viral illness [1]. It has imposed catastrophic global morbidity and mortality burdens with more than 340 million cases and 5 million deaths [2, 3]. In particular, excess mortality of over one million was observed in more than 100 countries in 2020 [4]. Furthermore, a recent study has estimated an excess of 375,235 deaths from direct or indirect effects of COVID-19 [5]. Therefore, even survivors of COVID-19 may develop long-term effects and clinical consequences such as heart and lung damage leading to possible delayed death [6].

The evolution of this pandemic has required the urgent expansion of public health efforts to understand its epidemiology better and identify its impact [7]. Identifying COVID-19 impact requires the elucidation of the spectrum of its clinical severity, including the recognition of potential risk factors for severe illness or death [7]. Consequently, since its emergence, several epidemiological studies have been conducted and shown a significant association between COVID-19 severity and death with demographic characteristics, preexisting comorbidities, and other factors [8, 9].

Several demographic risk factors were observed for COVID-19 severity and mortality including older age and male gender [8, 9]. Ethnicity was also found to be associated with COVID-19 mortality as observed in a study that used OpenSAFELY in which Black and South Asian patients had increased risk of mortality when compared with patients of White ethnicity (adjusted hazard ratio [aHR] 1.48; 95% CI, 1.29–1.69; and 1.45; 95% CI, 1.32–1.58, respectively) [10]. In addition, educational attainment was significantly associated with a lower risk of COVID-19 severity (odds ratio [OR] per one standard deviation increase in years of schooling, 0.540; 95% CI, 0.376–0.777, ) in a European study that used Mendelian randomization (MR) approach [11].

Besides, higher scores of the Charlson Comorbidity Index (CCI) were positively associated with mortality in COVID-19 patients with each point increase in the CCI score increasing the risk of death by 2.5% [12]. In fact, comorbidities in COVID-19 patients were found to hasten the progression of the infection that usually ends with patient death [13]. These findings suggest targeting this group of patients in vaccination plans and enhancing earlier identification and treatment to obtain better outcomes [12, 13]. In addition, a multicenter study has used laboratory biomarkers for the establishment of cutoff points and building a risk score ranging from 0 to 30 points. The study found that moderate (12–18 points) and high-risk patients (≥19 points) were positively associated with COVID-19 mortality (OR 4.75; CI 95%, 2.60–8.68 and 23.86; CI 95%, 13.61–41.84, respectively) [14].

These studies have provided a better understanding of COVID-19 impact and have raised several questions that require further investigation. For instance, a recent population-based study has found that heart failure and ischemic heart disease increased the risk of mortality, while atrial fibrillation and hypertension did not, suggesting potential protective effects of comedications [15]. In addition, other studies have investigated COVID-19 impact on patients with particular diseases or conditions in addition to healthy patients [1620].

Most studies have been conducted in China, Italy, the USA, and other developed high-income countries, while studies from low- and middle-income developing countries in the Middle East are scarce [21, 22]. Therefore, the complete impact of COVID-19 is not thoroughly evaluated, and further studies are required to fill the gap [22]. In this study, we aimed to describe the characteristics of hospitalized patients with COVID-19 and identify the risk factors associated with in-hospital mortality for patients with COVID-19 in Jordan.

2. Materials and Methods

2.1. Data Source and Study Design

This study was conducted using data from King Abdullah University Hospital (KAUH) in Jordan, which has a 750-bed capacity that can be increased to 900 in emergency situations. It is a tertiary hospital and is considered the largest medical facility in the north of the country. All positive COVID-19 patients in the north of Jordan were referred to this hospital, including asymptomatic and mild cases at the beginning of the pandemic. However, hospitalization criteria have changed since August 2020, with only moderate, severe, or critical cases being hospitalized.

A cohort of COVID-19 patients was identified from inpatients’ electronic records based on polymerase chain reaction (PCR) positive test results for each referred patient at the hospital. The study period was between September 20, 2020, and August 8, 2021. Cohort members younger than 18 years and those with asymptomatic infection or mild illness were excluded.

2.2. Study Variables

The patients’ organized clinical data, including vital signs, radiological findings, comorbidities, and hospitalization course and outcomes were identified from electronic hospital records. Data also included age, gender, smoking status, height, weight, and laboratory results. Body mass index (BMI) was calculated based on the equation of BMI = weight(kg)/height2(m2) and categorized based on the World Health Organization (WHO) classification [23]. Patients with BMI below 18.0 kg/m2 were considered underweight, 18.5–24.9 kg/m2 were of normal weight, 25.0–29.9 kg/m2 were overweight, and patients with BMI equal to or more than 30.0 kg/m2 were considered obese. Comorbidities were identified based on related International Classification of Diseases (ICD) codes, and laboratory results were interpreted based on the reference values of the hospital laboratory. The patient severity status at admission was classified according to the National Institute of Health (NIH) Clinical Spectrum of SARS-CoV-2 Infection [24]. Patients with a positive test and no symptoms were considered “asymptomatic” while if any of the various signs and symptoms of COVID-19 were observed with no shortness of breath, dyspnea, or abnormal chest imaging then these were considered to have “mild illness.” “Moderate illness” individuals included those who showed evidence of lower respiratory disease during clinical assessment or imaging and who had an oxygen saturation (SpO2) ≥94% on room air at sea level. Patients who had SpO2 <94%, a ratio of arterial partial pressure of oxygen to fraction of inspired oxygen (PaO2/FiO2) <300 mm Hg, a respiratory rate >30 breaths/min, or lung infiltrates >50% were considered of “severe illness.” “Critically ill” patients included those who had respiratory failure, septic shock, and/or multiple organ dysfunction.

2.3. Statistical Analysis

Analysis began with a distributional analysis of patients’ characteristics. All categorical data were encoded using a one-hot approach which transforms categorical variables into binary vectors. Summary tables were created to examine the proportion of inpatients with COVID-19 by age group, gender, and clinical characteristics. The mortality rates associated with each characteristic were analyzed using χ2 tests to evaluate statistical differences in the frequencies of the categorical groups. In addition to ratio testing, we used univariate logistic regression to produce the odds ratio and statistical significance for each one-hot transformed characteristic. A 2-sided was considered statistically significant.

After examining the variable importance of each characteristic with the increased mortality rate for COVID-19 inpatients, multivariable logistics and machine learning (ML) methods were used to study the variable importance in a comprehensive manner concerning classification modeling. Variable importance is generally defined as the sum of the decrease in error when the variable is included in the model. For linear models, variable importance is based on the absolute t-value of each model parameter used. MARS models look at reductions in the modified generalized cross-validation (MGCV) estimate of error.where N is the number of observations, and is a complexity penalty.

The relative importance is the variable importance divided by the highest variable importance value so that values are scaled between 0 and 1. This makes it convenient to compare and combine variable performance across multiple methods.

The ML methods considered were Random Forest (RF), Multivariate Adaptive Regression Splines (MARS), K-Nearest Neighbor (KNN), Extreme Gradient Boosting (XGB), and Classification and Regression Trees (CART). There are other ML methods and classification methods that can be entertained in addition to the methods employed here. For example, a stepwise regression method can be used to directly eliminate model parameters in the model given an overparameterized starting model. However, we feel the methods selected are among the most consistent in producing classification accuracy.

In terms of the MARS model notation, this is written , where is the threshold value for the x predictor variable, and (·)+ are the spline functions which take on the value 0 if the expression inside (·)+ is negative or its actual value if the expression inside (·)+ is >0. The other considered ML methods are based on classification trees differing by an algorithmic implementation that recursively split variables. These algorithms are quite involved and would take considerable space to describe here and recommend further information to be found in the work Generalized Additive Models by Hastie and Tibshirani [25].

A ten-fold cross-validation method was used to train the ML methods and extract the combined relative importance of the variables identified in each fold. The results from the ten folds were then combined to produce a collective measure of variable importance by the number of times the variable appeared across folds and the magnitude of the variable predictive power. To combine variable importance across multiple methods, the importance measures were scaled between 100 and 0. The variables’ importance for all methods were then combined into a single importance measure and were presented using a lollipop chart. The reduced set of variables was then used to refine a multivariable logistic regression model that can quantify potential risk factors that result in death, adjusted for other confounding characteristics. The backward stepwise variable selection method was used to determine the variables included in the final multivariable logistic regression model. A significant level of was used to retain variables in the final logistic model. To assess the classification accuracy of the final multivariable logistic model compared to the other ML methods, the dataset was randomly partitioned into a training set (70% of data) and a testing set (30% of the data) using a stratified method so that the distributions are as consistent as possible. A panel of receiver operator characteristic (ROC) charts was created. The ROC charts show the performance of a classification model at all classification thresholds using the holdout testing set. The curve plots the True Positive Classification Rate versus the False Positive Classification Rate. The area under the curve (AUC) is an aggregate measure of performance across all possible classification thresholds. All analyses were carried out using the 4.1.0 release of the R package.

3. Results

Over the study period, a total of 1,613 patients with confirmed COVID-19 who were admitted to KAUH met the inclusion criteria and were included in this study. Of these patients, 1,004 (62.2%) were discharged from the hospital (survived), and 609 (37.8%) died (deceased). Table 1 shows the characteristics, comorbidities, and disease severity of the patients by their hospitalization outcome (survived or deceased). Among all patients, 945 (58.6%) were male, 669 (41.5%) were obese (BMI ≥30 kg/m2), and 963 (59.7%) and 802 (49.7%) had hypertension and diabetes, respectively. Surviving patients were of a younger age group (18–40 years old), nonsmokers, and with moderate severity on admission (patients with lower respiratory disease and who have SpO2 ≥94%). Higher proportions of deceased patients compared to the survived patients were observed among those with heart failure (n = 73, 51.0%; value = 0.001), cerebrovascular accident (CVA) (n = 61, 55.5%; value <0.001), chronic kidney disease (CKD) (n = 85, 66.9%; value <0.001), immunocompromised (n = 37, 56.9%; value = 0.002), and malignancy (n = 65, 54.6%; value <0.001).

The laboratory findings from COVID-19 patients are shown in Table 2. Among all patients, a large proportion had high levels of C-reactive protein (CRP) (n = 1,129, 70.0%), D-dimer (n = 1, 192, 73.9%), and lactate dehydrogenase (LDH) (n = 1,112, 68.9%). Higher proportions of deceased patients compared to survived patients had hypernatremia (n = 43, 89.6%; value <0.001), hyperkalemia (n = 61, 54.5%; value <0.001), and high troponin level (n = 15, 62.5%; value = 0.041).

Multivariate logistic regression showed that elderly age (>65 years old) (OR, 2.01; 95% CI, 1.28–3.16; value = 0.003) and current smoking status (OR, 1.61; 95% CI, 1.17–2.23; value = 0.004) were strongly associated with in-hospital mortality (Table 3). Odds of severity status were almost doubled for the risk of mortality between severe (OR, 1.56; 95% CI, 1.05–2.32; value <0.001) and critical (OR, 2.94; 95% CI, 2.02–4.27; value <0.001) cases of COVID-19 patients on admission. Although significant, higher odds for in-hospital mortality were observed for CKD (OR, 2.90; 95% CI, 1.90–4.43; value <0.001), DVT (OR, 2.62; 95%CI, 1.08–6.35; value = 0.033), and malignancy (OR, 2.22; 95% CI, 1.46–3.38; value <0.001) compared to diabetes (OR, 1.31; 95% CI, 1.04–1.65; value = 0.022) and heart failure (OR, 1.51; 95% CI, 1.02–2.23; value = 0.041). Hypernatremia level was the risk factor most strongly associated with mortality among COVID-19 patients (OR, 11.37; 95% CI, 4.33–29.81; value <0.001). Other significantly associated lab results with mortality included high AST (OR, 1.81; 95% CI, 1.42–2.31; value <0.001), hypoalbuminemia levels (OR, 1.75; 95% CI, 1.37–2.25; value <0.001), and low platelets level (OR, 1.43; 95% CI, 1.05–1.95; value = 0.024).

Figure 1 shows the variable reduction based on the consensus variable importance extracted from multiple ML methods. Logistic regression was then applied to the consensus. The variables selected for the final multiple logistic regression model are highlighted in black color.

A sunburst chart with nested rings illustrating the hierarchical breakdown of identified risk factors segmented by patients’ outcome, that is, death versus discharge, is shown in Figure 2.

The panel of Receiver Operator Characteristic (ROC) chart is shown in Figure 3 and ordered by predictive accuracy. The multivariable logistic regression model has the best predictive classification power among the other methods considered.

4. Discussion

In this study, we observed a high hospital mortality (37.8%) among hospitalized COVID-19 patients. Almost half of the deceased patients (45.4%) had critical disease on admission. The study identified several risk factors for increased hospital mortality including older age, current smoking status, critical disease on admission, the presence of comorbidities, and initial laboratory derangements. To our knowledge, this study is the first to examine the risk factors associated with mortality in the Middle East.

Several studies have found that older age was significantly associated with increased mortality of COVID-19 [17, 2629]. Consistent with our findings, a meta-analysis of 42 studies comprising 423,117 patients showed an increased risk of mortality among older people (pooled OR 2.61; 95% CI 1.75–3.47) [26]. Specifically, among 16 countries, COVID-19 patients aged 65 or older had a 62-fold higher risk of mortality than younger age groups (IRR = 62.1; 95% CI = 59.7, 64.7) [30]. The increase in mortality among the elderly may be attributed to the high prevalence of comorbidities [9]. However, a study that used UK Biobank data and included 470,034 participants concluded that, although to a lower extent, healthy elderly COVID-19 patients still have an independently increased risk of mortality [31]. This would suggest a role of other age-related factors in COVID-19 mortality, such as a decreased reserve capacity of vital organs or weaker immune defenses [32].

Our results were not consistent with the findings of other studies that showed evidence of increased mortality in male COVID-19 patients [3336]. However, most of these studies were at the early stages of COVID-19 and were based on records that under urgent circumstances may have had missing data and used unadjusted risk estimates [33]. On the other hand, female cases and deaths from COVID-19 may have been underreported due to social norms of lower access to healthcare [37]. Therefore, gender differences should be further investigated to avoid bias in COVID-19 treatment, particularly with the similar risk of dying observed between males and females in severe cases [38, 39].

A recent systematic review and meta-analysis of 186 studies representing 210,447 deaths among 1,304,587 patients with COVID-19, found a significant increase in mortality among patients with diabetes (summary relative risk (SRR) = 1.54; 95% CI 1.44–1.64), hypertension (SRR = 1.42; 95% CI 1.30–1.54), obesity (SSR = 1.45; 95% CI 1.31–1.61), and smoking (SRR = 1.28; 95% CI 1.17 to 1.40 forever smoking, SRR = 1.29; 95% CI 1.03 to 1.62 for current smoking, and SRR = 1.25; 95% CI 1.11 to 1.42 for ex-smokers compared with nonsmokers) [40]. However, in our study, obesity and hypertension did not appear statistically significant in the final model, and an ex-smoker status was no longer significant when conducting the multivariate analyses. Similar to our findings, CKD was significantly associated with an increased risk of death (pooled OR = 5.58; 95% CI 3.27–9.54) [41]. Other comorbidities, including HF, history of DVT and PE, and malignancy were also found to be significantly associated with increased mortality in COVID-19 patients in several studies [4246].

In health outcome predictive evaluation (HOPE) for COVID-19 registry analysis, both hypernatremia (OR 2.38, 95% CI 1.18–4.78; ) and hyponatremia (OR 1.5, 95% CI 1.08–2.09; ) were found to be independently associated with COVID-19 mortality [47]. This is consistent with our findings for hypernatremia but not hyponatremia. However, the admittance frequency with hyponatremia (20.5%) was higher than that with hypernatremia (3.7%), as observed in our study [47]. Besides being consistent with our findings, several studies have identified hypoalbuminemia as a predictor of mortality among admitted COVID-19 patients, suggesting the determination of serum albumin on admission may be useful and there could be a potential therapeutic value for albumin infusion in COVID‐19 management [4850]. Similarly, low platelet level was found to be significantly associated with COVID-19 mortality, as observed in our study [51, 52]. However, the prognostic value of the thrombocytopenia therapeutic approach appears to be complicated and careful consideration is advised [53].

This study was relatively large and included data from the Middle East with PCR-confirmed COVID-19 cases. It used in-patient record data from a tertiary hospital to reflect the reality of COVID-19 clinical management in the region. However, the observational nature of this study does not have a temporal sequence and therefore cannot lead to causal associations.

5. Conclusions

Several demographic, clinical, and laboratory risk factors for COVID-19 mortality were identified, including severity status on admission. Further studies in real-life settings are required, particularly in the region, to identify early predictors and provide better management of the illness to control the pandemic and reduce its impact.

Data Availability

Data are available based upon request from KAUH.

Additional Points

Problem Statement. What is already known about this topic? (i)COVID-19 is a highly contagious illness that causes high mortality worldwide. (ii) Mortality in COVID-19 patients has been associated with several risk factors including demographics and preexisting comorbidities. (iii) Similar studies using real data in the Middle East are scarce. What does this article add? (i) The study is the first to identify potential risk factors for COVID-19 mortality in the Middle East. (ii) Machine learning methods were used to study the variable’s importance. (iii) Recognition of the full impact of COVID-19 is important to control the pandemic.

Ethical Approval

Ethical approval for this study was obtained from the Institutional Review Board (IRB) Committee of Jordan University of Science and Technology, Irbid, Jordan (27/137/2021).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Authors’ Contributions

All authors conceptualized the study; R. K., B. Y. K., S. A.-A., A.-H. A.-M., S. K., S. S. H, and M. A. A. were responsible for methodology; R. K., B. Y. K., S. A.-A., A.-H. A.-M., S. K., and M. A. A. validated the data; R. K., W. J. L., M. A., and M. A. A. performed formal analysis; all authors interpreted the data, reviewed and edited the manuscript, critically revised the report, and approved the final version to be submitted for publication; R. K. prepared the original draft.