Introduction

Healthcare associated infection (HAI) represents a major public health problem from all around the world1,2,3. Patients with HAI might have prolonged hospital stays, and have high morbidity and mortality, thus adding economic burden on the healthcare system4. Pneumonia and other lower respiratory tract infections (LRTIs) were the most common type of HAIs1,5. According to a large multicenter epidemiological survey from China, 8,739 (59.55%) of 14,674 HAIs cases belonged to LRTI6.

It has been suggested that risk prediction model may have applications in identifying high-risk patients and evaluating the effectiveness of infection control measures7,8,9. Currently, some studies have identified the risk factors of nosocomial LRTIs, which include tracheal intubation, underlying chronic lung disease, supine body position, mechanical ventilation, thoracic or upper abdominal surgery, prior episode of a large volume aspiration, and age older than 70 years10,11,12. However, these findings are difficult to be applied for risk prediction. Risk index based method is therefore needed in terms of its simplicity and feasibility in predicting the probability of acquiring nosocomial LRTIs.

There are currently many risk index based systems in clinical field, such as acute physiology and chronic health evaluation II (APACHE II), therapeutic intervention scoring system (TISS) and simplified acute physiology score (SAPS)13,14, however, these systems are not targeted for HAIs. Up to date, there are some studies trying to evaluate the application of risk index in predication of HAIs15 or surgical site infections (SSI)16, but no study focused on the predication of nosocomial LRTIs.

The purpose of this study was to develop a risk index based system for predicting nosocomial LRTIs using data from a large point-prevalence survey of HAIs, and to evaluate its sensitivity and specificity in identifying infection.

Methods

Data source and study population

The data used in this study comes from a large one-day point-prevalence survey of HAIs between October 2014 and March 201517. All variables for each patient were extracted from the individual report form of this survey, including demographic data, days of hospitalization on survey date, invasive procedures, use of antibiotics and underlying diseases. Underlying diseases for each patient were transformed into ICD-10-CM three-character categories code and treated as a binary variable. The detailed information for these variables was shown in Supplemental Table S1. 49328 cases of patients from 46 hospitals with completed information for all variables were included in this study. This study was approved by institutional review boards (IRB) of Academy of Military Medical Science. All methods were performed in accordance with the relevant guidelines and regulations. As all the data were analyzed anonymously, the IRB of Academy of Military Medical Science waived the informed consent requirement.

Case definition

The definition of nosocomial LRTIs was in accordance with diagnosis guideline issued by the National Health and Family Planning Commission of the People’s Republic of China (NHFPC; formerly the Chinese Ministry of Health) in 2001. Nosocomial LRTIs, which comprise of the CDC categories of ‘pneumonia’ and ‘lower respiratory tract infection other than pneumonia’, refer to infection acquired after 48 hours’ admission, unless there is a clear incubation period for the infection. Ventilator-associated pneumonia (VAP) was excluded for analysis in this study.

Logistic regression and fisher discriminant analysis

There’re several options for diagnostic and prognostic tasks in clinical medicine, of which we chose multivariate logistic regression and fisher discriminant analysis, both known as so-called white box models that allow an interpretation of model parameter18. For multivariate logistic regression, a backward selection algorithm was used, the coefficient (β) of the variables was estimated. The wald χ2 test was used to assess the covariate-adjusted p value. The p value for statistical testing of variable significance for exclusion from model is set to 0.05. Candidate risk factor variables are listed in Supplemental Table S1. For fisher discriminant analysis, those variables statistically significant in logistic regression analysis were used as input factors, and whether or not an individual has nosocomial LRTI is the classification factor. The area under the receiver operating characteristic (ROC) curve was calculated for both predictive models.

Construction of risk index based system

The risk index based system was constructed by simplifying the parameters based on the logistic regression model with better ROC curve performance. A 10-fold cross validation scheme was applied to test the robustness of the logistic model. This means that we randomly divided the training data set into 10 partitions, applied the classification method 10 times to the data from 9 partitions, and used the respective 10th partition to test the performance. After this series of 10 classification tasks, we derived a ROC curve figure using the mean of each performance parameter.

If the difference between ROC curves of original logistic model and 10-fold cross validation is small, we can deem that logistic is a robust model. Then the risk index was derived based on the original logistic model. We firstly determined the smallest absolute value u of the coefficients, then divided all the parameters by u, and rounded the results to the nearest integer. To further simplify risk index system, approximate index values were adjusted to the same score. One patient’s total risk score is calculated through adding the scores of all risk items of the patient.

The prevalence of LRTIs was calculated for patients with different risk scores. Different risk levels were determined according to the total risk scores of each patient. The sensitivity and specificity of different risk scores were calculated for determining the best cut-off point. The performance of cut-off point in predicting LRTIs was also evaluated for patients with different underlying diseases. All the data were analyzed with SAS 9.4. All tests were considered as significant at p < 0.05.

Results

The prevalence of LRTIs

Among the 49328 patients included in this study, there were 839 cases of nosocomial LRTIs, the overall prevalence was 1.70% (95% confidence interval [CI], 1.64% to 1.76%). The prevalence of nosocomial LRTIs among patients with different underlying diseases ranged from 0.2% to 50.0% (Supplemental Table S2). High prevalence was detected in patients with the following diseases: other respiratory disorders (17.3%), hydrocephalus (10.9%), nontraumatic intracerebral hemorrhage (10.3%), acute respiratory distress syndrome (7.1%), sequelae of cerebrovascular disease (5.9%), emphysema (5.8%), myeloid leukemia (5.5%) and lymphoid leukemia (5.3%). These diseases, which might jeopardize respiratory function, compromise the immune system or lead to patients’ prolonged stays in hospital, are considered to be significantly associated with the occurrence of LRTIs.

Prediction of LRTIs by logistic regression and fisher discriminant analysis

Logistic regression analysis showed that age, male, length of hospital stay, tracheotomy, artery or venous catheterization, urinary tract intubation are risk factors for nosocomial LRTIs. Prophylactic use of antibiotics can effectively reduce the risk of acquiring LRTIs. Judging from the estimated standardized regression coefficient, age has the strongest influence on whether a patient would acquire LRTI, for which the standardized coefficient is 0.36, followed by the length of hospital stay (0.31) and urinary tract intubation (0.22). Areas under the ROC curve for logistic regression and fisher discriminant analysis were 0.907 (95% CI, 0.897 to 0.917) and 0.902 (95% CI, 0.892 to 0.912), respectively. As shown in Fig. 1, there is little difference between the two statistical methods, especially for the curve on the upper left corner, where the best cut-off point most likely exists, suggesting that the predictive efficiency of the two models are almost the same.

Figure 1
figure 1

The ROC curves for predicting nosocomial lower respiratory tract infection derived from logistic regression and fisher discriminant analysis.

Prediction of LRTIs by risk index based system

As Fig. 2 reveals, the area under ROC curve was 0.907 for internal validation, and 0.888 for 10-fold cross validation. The sensitivity and specificity of the best cut-off point that logistic regression internal validation can achieve was 0.86 and 0.79, respectively, and that of 10-fold cross validation is 0.87 and 0.74, respectively. The logistic model was stable, so we chose logistic regression coefficient to construct the risk index based system for predicting nosocomial LRTIs. The system includes ten categories of risk factors, each corresponds to a special risk score (Table 1). The overall risk score for each patient is the sum of the risk scores for all the potential risk factors of this patient (Table 1). The ROC curves of logistic regression and risk index scoring system are very similar (Fig. 3), indicating no significant loss of accuracy. The area under the ROC curve for the risk index system was 0.905 (95% CI, 0.895 to 0.915). As risk score increases, the prevalence of LRTIs increases sharply (Fig. 4). For most of the patients, the total risk score represents the possibility of acquiring nosocomial LRTIs. Six risk levels were generated according to the total risk scores of all the study population, ranging from 0 to 5, the corresponding prevalence of nosocomial LRTIs were 0.00%, 0.39%, 3.86%, 12.38%, 28.79% and 44.83%, respectively (Table 2).

Figure 2
figure 2

The ROC curves for predicting nosocomial lower respiratory tract infections derived from internal validation and external 10-fold cross validation scheme based on logistic regression model.

Table 1 The scoring system based on risk factors of nosocomial lower respiratory tract infections.
Figure 3
figure 3

The ROC curves for predicting nosocomial lower respiratory tract infections derived from logistic regression and risk index based system.

Figure 4
figure 4

The number of patients and prevalence of nosocomial lower respiratory tract infections among patients with different risk scores.

Table 2 The prevalence of lower respiratory tract infections among patients with different risk levels.

All the potential cut-off points are shown in Table 3. Both the sensitivity and specificity were higher than 0.70, while the positive likelihood ratio and negative likelihood ratio varied from 3.50 to 7.20, and from 0.12 to 0.31, respectively. In practical application, one can choose the best cut-off point based on the specific task. In this study, we suggest 14 as the best cut-off point, for which the sensitivity and specificity were 0.87 and 0.79, respectively. To assess the predictive ability of 14, we calculated the sensitivity and specificity in subgroups with different diagnosis. The best cut-off point showed high discriminatory power in the majority of the subgroups with different underlying diseases, such as respiratory tuberculosis, malignant neoplasm of nasopharynx, malignant neoplasm of stomach, unspecified diabetes mellitus, et al. (Supplemental Table S3).

Table 3 The performance of predication for lower respiratory tract infections using different cut-off values.

Discussions

Predication of LRTIs among non-ventilated patients is very important for the early implementation of prevention strategies, including oral care, early mobilization interventions, swift diagnosis and treatment of dysphagia, as well as antimicrobial prophylaxis19,20. This study developed a risk index based system for LRTIs, which assigns a corresponding risk score to each patient according to the presence of risk factors, e.g. underlying diseases and medical treatment. The total scores derived from the risk factors of one patient would represent his/her risk for acquiring LRTI. Targeted prevention and control strategies could be implemented by healthcare workers according to the results of risk evaluation, and the cost-effective of infection control programs in hospitals is expected to be improved.

In general, the discriminatory power of two statistical models and the risk index based system in predicting nosocomial LRTIs was excellent and comparable with previous studies9,15,21. The overall prevalence of LRTIs in this study was 1.7%, while for the patients with risk level at four and five, the prevalence were as high as around 30% and 40%, respectively, which indicates that the system and the recommended best cut-off point could be used to identify patients with high risk. Active preventive measures could be taken for these patients19,22,23.

As the susceptibility of LRTIs among patients with different underlying diseases might be different, it is reasonable that each patient’s underlying diseases should be treated as important risk factors and be included for constructing the risk index model. There was a fluctuation for the number of LRTI patients when the total risk score was above 30 (Fig. 3), which was possibly associated with the reduction of number of susceptible patients. For some patients with certain underlying diseases, the predicative effectiveness was suboptimal possibly due to lack of information on patients’ physiology status.

This study has some limitations. Firstly, no external validation was conducted in this study, the applicability of this risk predicative system in clinical field deserves further study; Secondly, the data of this study comes from a point-prevalence survey, no follow up was conducted for the study population, some patients, who acquired LRTI after the survey date, might be misclassified as non-LRTI cases; Thirdly, the missing of some risk factor data might affect the accuracy of model construction. However, the high sensitivity and specificity of the system in this study suggested that it might be applied to boost more rational and cost-effective infection control programs for nosocomial LRTIs.