Introduction

Multimorbidity, defined as the coexistence of two or more conditions in the same individual, is common and increasing1,2. Patients with multimorbidity have poorer health outcomes and have greater use of acute unscheduled hospital healthcare3.

Patients with multimorbidity are heterogenous and can have a wide range of different combinations of conditions. Broad overall descriptions of health outcomes and needs of patients with multimorbidity, i.e. based on counts of conditions, are unhelpful for tailoring health care design. Accordingly, there have been recent calls to move away from simply counting diseases towards a more tailored understanding of which health conditions commonly co-occur3,4. This will enable us to anticipate the specific health needs of, and implications for, patients with particular conditions in combination.

Cluster analysis is a statistical technique that categorises items or properties into groups so that items in the same group are more statistically similar than those items in other groups. Our literature search highlighted that cluster analysis has previously been used to identify clusters of conditions in individuals from the general population, presenting in primary care, within narrow specialty subsets, or focussing specifically on older age groups5. We identified few studies that included hospitalised patients; three of which focussed on patients ≥ 65 years6,7,8, and one focussed on medical patients9. To our knowledge, no previous study has identified which conditions commonly cluster among unselected patients presenting to hospital and yet this is a setting of high strain on health systems globally.

As a first step to understanding the implications of disease clusters, we aimed to identify and characterise multimorbidity clusters in a cohort of patients hospitalised in the Grampian region of Scotland. This builds on our previous work describing the overall extent of hospital multimorbidity10,11.

Methods

Study design and setting

This study was prospectively preregistered on the Open Science Framework and is reported as per RECORD guidelines12. This is a population-based observational study using linked electronic health records carried out in a secondary care setting in a single health region in north-east Scotland (Grampian region, total population 2014, 584,22013). The region consists of one large urban centre and is spread over approximately 3000 square miles of city, town, village and rural communities14. Full details have been previously published10,11.

Data sources

We used inpatient hospital episode data, namely the Scottish Morbidity Record (SMR)15, from general/acute (SMR01) and psychiatric (SMR04) admissions, from the years 2009–2014. SMR is an episode-based patient record relating to all patients discharged from hospital in Scotland. SMR data is collated in a national database, managed by Information Services Division Scotland16, and data is returned to each regional health authority on an ongoing basis. Data collected includes patient identifiable and demographic details, episode management details, general clinical information and death data. Clinical information is recorded as main diagnosis and up to five other significant diagnoses and coded using the World Health Organization’s International Classification of Diseases (ICD-10).

Study population

Adult patients (≥ 18 years) admitted to any hospital as an inpatient during 2014, in a single regional health authority (NHS Grampian) were included. A patient’s first admission in 2014 was classified as their “index admission”, and the admission date was classified as their “index date”. We excluded day case, obstetric and psychiatric admissions when identifying the study population. The flow diagram for identifying the study population is shown in Fig. 1. Patients with multimorbidity (≥ 2 conditions) were included in the present analysis (n = 11,389).

Figure 1
figure 1

Flowchart of study population and data linkage. SMR, Scottish Morbidity Record; SIMD, Scottish Index of Multiple Deprivation. a Community Health Index number was missing or invalid for 662 inpatient general/acute admissions in 2014 (patients ≥ 18 years), therefore not included in the study population.

Multimorbidity measure

Multimorbidity was defined as having recorded diagnoses of ≥ 2 chronic conditions17,18. Conditions were identified from general/acute and psychiatric admissions in the 5 years prior to index date. We used the multimorbidity measure developed by Tonelli et al.19. This was based on the measure developed by Barnett et al.20 for measuring multimorbidity in a primary care population, using coding unique to primary care in the UK21. Tonelli et al. developed a corresponding validated coding scheme for use with administrative data based on the ICD system19. The specific ICD-10 codes for the 30 conditions included are detailed in Additional file 1, with a note of minor amendments made. ICD-10 codes recorded as main or other diagnoses were included.

Other variables

Other baseline characteristics were sex, age, deprivation, urban–rural area, and admission type. Age was categorised into six groups. Deprivation was measured using the Scottish Index of Multiple Deprivation (SIMD) 2012, categorised as quintiles (quintile 1 is the most deprived and quintile 5 the least deprived)22. Urban–rural status was measured using the Scottish Government sixfold Urban Rural Classification 2009/1023. SIMD and Urban Rural classification are identified from postcodes using the Scottish Government’s publicly available look-up files24,25.

Data linkage

NHS Grampian SMR data were held in a dedicated secure server, managed by the accredited Grampian Data Safe Haven (DaSH)26. The Community Health Index (CHI) number, a unique patient identifier used throughout the Scottish health care system, was used to link the study population to hospital episode data by DaSH. Postcodes were used to link the study population to the SIMD and Urban Rural Classification. The de-identified dataset was prepared and hosted by the Grampian DaSH, allowing secure controlled access for researchers while ensuring data security.

There were 662 admissions with missing CHI numbers in 2014 (inpatient general/acute, ≥ 18 years), therefore these admissions were not included. There were 314 patients who could not be linked with SIMD, and 576 patients who could not be linked with Urban Rural Classification, because of missing or invalid postcodes (Fig. 1). The characteristics of patients with missing values are reported in Additional file 2.

Statistical analysis

Descriptive analyses were expressed as frequencies and percentages or median and interquartile range (IQR) for categorical and continuous variables, respectively. Baseline characteristics were summarised by age, sex, admission type, SIMD quintile and Urban Rural category. The overall prevalence (%) was estimated for each condition and counts of conditions were calculated.

Clustering conditions, with each condition belonging exclusively to only one cluster, is a widely used approach to identify multimorbidity clusters. However, the same condition might occur in different combinations with other conditions in different patients. Patients with these different combinations of conditions, even if they share one same condition (e.g. Chronic Kidney Disease (CKD) and Diabetes, CKD and Chronic Heart Failure (CHF), only diabetes, only CKD, only CHF), might need a different plan of care.

An alternative valid clustering approach is to cluster patients instead, according to those combinations of conditions, which allows conditions to belong to more than one cluster of patients. While both methods are valid, we chose to cluster “patients” rather than “conditions”, as it better aligns with the purpose of identifying clusters of multimorbidity for improved person-centred care.

Relevant diagnosed chronic conditions in the previous 5 years were used to cluster patients with ≥ 2 conditions. Conditions were coded as binary variables, value of “1” when condition was present and “0” when absent. Prior to performing the cluster analysis, we evaluated whether the data contained non-random structures by visually inspecting the data (principle component analysis scatterplot Additional file 3) and using the Hopkins statistic27. These showed that the data was non-random, and therefore clusterable (Hopkins 0.28).

The Gower distance28 (equivalent to Jaccard29 when using only binary data) was used to measure the dissimilarity between observations. The Partitioning around Medoids (PAM) algorithm30 was used to identify distinct groups of patients with similar patterns of conditions, classifying individuals into mutually exclusive groups. The Silhouette method31 was used as an internal validation metric to determine the optimal number of patient groups, which was the number of groups that yielded the highest silhouette value. The groups were interpreted using descriptive statistics and the dimension reduction technique t-distributed stochastic neighbourhood embedding (t-SNE) was used to visualise the clusters32.

We carried out several sensitivity analyses. We compared the results obtained by: 1. replacing Gower with the Hamming distance33; 2. excluding the most common condition (hypertension) from the clustering process; and 3. excluding conditions with a prevalence of < 5% from the clustering process. Analyses were conducted using STATA v13.0 and R version 3.6.1.

Defining clusters of conditions

Prior to analysis, we documented the clusters we would expect to observe. The patterns of conditions present in the resulting groups of patients were clinically reviewed by clinical members of the study group (CB, MJ, SS). Clusters of conditions within each patient group were defined based on a combination of clinical review and assessment of the highest prevalence conditions within each patient group and labelled according to the condition with the highest prevalence.

Study registration

This study was prospectively pre-registered on the Open Science Framework on 26 September 2019 (https://osf.io/qnpw2). Deviations from the pre-registered protocol are documented in Additional file 4 and analysis R code is available in Additional file 5.

Ethics approval and consent to participate

This study was approved by North Node Privacy Advisory Committee (NNPAC Ref No. 6/001/19). The remit of this Committee is to provide researchers with access to NHS patient/health data within NHS Grampian for research purposes via a streamlined approach that incorporates Sponsorship, Ethics, Caldicott & R&D. Informed consent was waived by North Node Privacy Advisory committee as this research falls within the conditions for processing personal data that is “necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller; (Article 6 1,e of the UK General Data Protection Regulation (GDPR))”. Data was de-identified pre-analysis. All methods were performed in accordance with the relevant guidelines and regulations.

Results

Baseline characteristics

Of a total of 41,545 patients hospitalised in 2014, 11,389 (27.4%) had multimorbidity (≥ 2 conditions). Table 1 shows that patients with multimorbidity were older and more frequently admitted as an emergency than those with < 2 conditions. Just over half of patients were from the two least deprived quintiles. Counts of conditions in patients with multimorbidity ranged from 2 to 11, and the most common conditions were hypertension (56.5%), diabetes (27.0%), chronic kidney disease (26.0%), atrial fibrillation (19.9%) and chronic pulmonary disease (18.6%) (Table 2). The least common conditions were multiple sclerosis (1.1%), schizophrenia (0.8%), psoriasis (0.7%), peripheral vascular disease (0.2%) and chronic viral hepatitis B (0.1%).

Table 1 Baseline characteristics and counts of conditions.
Table 2 Prevalence of individual conditions among patients with multimorbidity.

Multimorbidity clusters

Cluster analysis of disease occurrence identified ten groups of patients. Table 3 describes the prevalence of all conditions in each patient group. Within each patient group, clusters of conditions have been highlighted in bold and labelled according to the most prevalent condition in each group. Other conditions that featured in a patient group (i.e. less common conditions with a higher prevalence than in other groups) are highlighted in italics.

Table 3 Prevalence of 30 conditions by patient group.

For example, Group 1 (labelled “hypertension”) was characterised by hypertension (77.5%) and atrial fibrillation (59.0%). Other feature conditions were non-metastatic cancer and chronic heart failure. Group 3 (“alcohol misuse”) was characterised by alcohol misuse (75.4%) and depression (54.5%). Other feature conditions in this group were asthma, cirrhosis, diabetes, epilepsy and schizophrenia.

The number of patients in each group ranged from 508 (hypothyroidism) to 2590 (hypertension). Several conditions were present in multiple groups of patients. Seven of the ten groups included hypertension, three included chronic kidney disease, two diabetes, and two atrial fibrillation. Multimorbidity clusters are summarised in Table 4.

Table 4 Multimorbidity clusters.

Table 5 describes the characteristics of patients in each group. Median age ranged from 51 (Group 3 alcohol misuse) to 79 (Group 8 chronic heart failure) years. The groups with the highest proportion of females were Group 6 (chronic pain), and Group 10 (hypothyroidism). The groups with the highest proportion of males were Group 3 (alcohol misuse), and Group 8 (chronic heart failure). The proportion of patients from the most deprived quintile ranged from 6.3% to 14.1%. Group 3 (alcohol misuse) had the highest proportion of patients from the most deprived and large urban areas, while Group 7 (cancer) had the highest proportion of patients from the least deprived and rural areas. Median counts of conditions ranged from 2 to 4, with Group 4 (CKD/diabetes) and Group 8 (chronic heart failure) having the highest proportion of patients with five or more conditions. The highest proportion of patients admitted as an emergency was in Group 3 (alcohol misuse) and Group 8 (chronic heart failure).

Table 5 Characteristics by patient group.

Sensitivity analyses

Results of the three sensitivity analyses are shown in Additional file 6. The sensitivity analyses using the Hamming distance or excluding hypertension resulted in similar clusters being identified. Excluding conditions with a prevalence of < 5%, resulted in 13 clusters, with some clusters split over more smaller groups compared with the main analysis. For example, the asthma and chronic pulmonary disease cluster was split into two separate clusters. However, overall, the same conditions were identified.

Discussion

To our knowledge, this is the first study to describe multimorbidity clusters in an unselected inpatient adult population, and the first population-level study of multimorbidity clusters in a Scottish/UK hospitalised population. Of 41,545 patients admitted to hospital, approximately one quarter (11,389) had multimorbidity, and our analysis identified ten clusters of co-occurring conditions.

The clusters revealed recognisable co-occurrences where the link was potentially causal e.g. hypertension leading to atrial fibrillation34, diabetes leading to kidney disease35. Clusters also revealed shared underlying disease mechanisms. For example, chronic heart failure, myocardial infarction, atrial fibrillation, stroke and kidney disease as vascular conditions of older age36. We identified a group of patients with a high prevalence of alcohol misuse co-occurring with depression and asthma, predominantly male and from more deprived quintiles, possibly indicating an underlying social driver. This finding supports known inequalities in alcohol-attributable harms, given that disadvantaged social groups have greater alcohol-attributable harms (admissions or death) compared with more advantaged individuals37. There were also clusters that represented artefact of how conditions are classified e.g. metastatic disease with non-metastatic cancer as two conditions in one person. Conditions with a high prevalence also had an impact, for example hypertension was present in more than half of those with multimorbidity and was a key condition in seven out of ten clusters.

While these clusters have face validity, their usefulness depends upon how they might delineate groups of people with specific health and social needs. Relevantly, those in the chronic heart failure cluster were the oldest (median age 79), and more likely to have 5 + health conditions; whereas those in the alcohol misuse cluster were more likely to be of working age (median age 51), live in a deprived area and present to hospital as an emergency. Thus, notwithstanding the artefact of associations between very common conditions, e.g. hypertension, and those that are prerequisites of another, e.g. metastatic cancer, we have shown the potential of identifying key multimorbidity clusters to which people may belong, so that we can ensure that health and social support is prioritised to the inpatient areas.

Methodological heterogeneity in studies investigating multimorbidity clusters makes it difficult to make comparisons. Studies vary with regard to the number and type of conditions included, data sources, populations, settings, and clustering methods. The most comparable study, in adult medical inpatients, identified five clusters of conditions: neurological diseases, heart/kidney diseases, malignancy, psychiatric diseases and miscellaneous diseases, from a list of 17 conditions9. We also identified similar chronic heart failure and cancer clusters.

This was a large, population-based study, and to our knowledge, the first study to characterise patterns of multimorbidity in an unselected hospitalised population. We ascertained conditions over the 5 years prior to index date, as longer lookback periods are more effective for identifying conditions38,39. We used high quality administrative data, with quality assurance assessments undertaken to ensure that inpatient data items were being recorded consistently and to a high standard40. Our results should be generalisable to other hospitalised populations with similar characteristics, and furthermore, the methodology used in this study would be applicable to health systems worldwide.

Limitations should also be noted. Cluster analysis is an exploratory classification method, and different clustering algorithms may produce different results. We found that hierarchical cluster analysis did not produce clinically relevant clusters, and therefore have reported results from PAM. To help with this, sensitivity analyses were conducted, the final clustering solution was clinically reviewed, and we have transparently and comprehensively reported our methods and deviations from pre-registered protocol (Additional file 4). Another limitation was that as conditions were identified from hospital episode data in the 5 years prior to index admission, we will not have recorded conditions for patients who were first time presenters on the index date. We did not include conditions from primary care records which will have underestimated the multimorbidity burden among people with conditions predominantly looked after in primary care. However, it is reasonable to hypothesise that conditions which are rare in the hospital setting, would have less influence on health care needs of people in hospital. Finally, multimorbidity clustering was based specifically on the conditions in Tonelli’s measure of multimorbidity19. There are many other heterogenous measures of multimorbidity available and we acknowledge that our findings may change if other conditions are studied. However, there is no single recommended measure of multimorbidity available. Therefore, we selected Tonelli as it is a validated adaptation of the highly influential Barnett measure.

The value of identifying clusters of conditions in hospitalised patients is as a first step towards identifying opportunities to target patient-centred care towards people with unmet needs. An important next step will be to determine the clinical outcomes of patients in each cluster, reasons why patients within some clusters have poor outcomes, and the pathways through healthcare that patients in each cluster predominantly take.

Conclusions

Identifying clusters of conditions in hospital patients is a first step towards identifying opportunities to target patient-centred care towards people with unmet needs, leading to improved outcomes and increased efficiency. Here we have demonstrated the face validity of cluster analysis as an exploratory method for identifying clusters of conditions in hospitalised patients with multimorbidity.