Introduction

Approximately 1.2 million people died due to chronic kidney disease (CKD) worldwide in 20151, representing an increase of 32% since 2005. CKD is now ranked 17th in the list of diseases which cause the most “years of lost life”, rising from 21st in 2005 and 25th in 19901. 2.6 million people received dialysis in 2010, and treatment of end-stage renal disease (ESRD) accounts for 2–3% of the healthcare budgets of high-income countries2. Understanding the genetic influences which predispose people to kidney dysfunction will have important applications for the diagnosis and treatment of a globally significant disease.

A number of human leukocyte antigens (HLA) encoded within the major histocompatibility complex are associated with increased or decreased risk of renal failure3. For example, HLA-B*51 was associated with ESRD in Venezuelan4 and Brazilian5 subjects, while A*26 was protective against ESRD in Saudi Arabia6 and Turkey7. Approximately 100 different HLA types have been linked to renal function by various studies worldwide8, including studies of European or white populations9,10,11,12,13. These were mostly reported in case–control studies comprising fewer than 500 subjects of a single nationality, and many associations remain unreplicated or have been contradicted by other studies. Despite mounting evidence that an association between HLA and renal function exists, there is no supportive confirmation in sufficiently-powered studies. We interrogated a large cohort of white British subjects to test the hypothesis that the HLA region is associated with renal function.

Methods

Study population and quality control

This is a UK Biobank (UKB) retrospective cohort study using data from 502,616 subjects aged 39–73 years at the time of recruitment between 2006 and 201014. 88% of the cohort self-identifies as “white British”, and principal component analysis conducted by UKB concluded that 82% of the UKB cohort is white British15. Analysis was restricted to this group to reduce population stratification; 92,858 subjects who were not white British were analysed separately16.

Individuals within the cohort whom UKB deemed to be related17 (kinship coefficient ≥ 0.044) were also excluded (n = 7318) to avoid HLA frequency bias18. Where subjects were related, the individual with the most complete set of genetic data, based on a set of “high-quality markers”17, was included. Genetic sex influences kidney function19, so only individuals whose sex could be clearly assigned were included. Subjects identified by UKB to have sex chromosome karyotypes other than XX or XY20 and those whose genetic sex, as calculated by UKB, did not match their self-reported sex21 were removed (n = 786 in total). Finally, 347 subjects were excluded at UKB’s recommendation due to missing genetic data22. A total of 101,309 subjects were excluded during quality control, leaving 401,307 subjects for analysis. All quality control was performed using Stata/SE 13.0 (StataCorp).

HLA typing

Imputation estimates a person’s most likely HLA type based on the presence of particular single nucleotide polymorphisms23. HLA types were imputed for each subject by UKB using HLA*IMP:02 software24 at the following loci: HLA-A, B, C (Class I) and DPA1, DPB1, DQA1, DQB1, DRB1, DRB3, DRB4, DRB5 (Class II)25 at a level equivalent to high resolution typing using eight reference datasets26. 362 HLA types were imputed. Two of these (HLA-DQB1*02:02 and DPB1*03:01) were not in Hardy–Weinberg equilibrium (HWE, P < 0.00014) so were excluded from this study; the remaining 360 alleles were included. Table 1 shows the 100 HLA types with frequency > 1% in the cohort.

Table 1 HLA types with frequency > 1%. This shows the 44 Class I and 56 Class II types which have frequency > 1% in the cohort. They are split into Class I and Class II and sorted in descending order of frequency.

Measures of renal function

Renal function was determined using estimated glomerular filtration rate (eGFR), a measure of toxin filtration calculated using serum biomarkers such as creatinine and cystatin27. High levels of these biomarkers are indicative of poor renal function and manifest in a lower eGFR. This study used the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) eGFR calculation which adjusted for age and sex28. Three eGFR values were calculated for each subject, using measures of: creatinine; cystatin; and both creatinine and cystatin29. Pairwise correlation confirmed that the three eGFR values were similar (Pearson’s correlation coefficients > 0.6; P < 0.0001). The cystatin-based eGFR value provided the most complete dataset; only this value was used for analysis to avoid repetition of testing using closely correlated variables.

Clinical histories for each subject were used as secondary outcomes. Subjects with kidney dysfunction were identified by examining self-reported questionnaires in addition to data relating to clinical diagnosis and procedures undertaken. These were deduced using a combination of the International Classification of Diseases30 (ICD)-9 and -10, Office of Population Censuses and Surveys (OPCS)31-3 and -4, and UKB’s own coding systems32,33. Subjects were categorised as: ESRD patients (yes or no); kidney transplant recipients (yes or no); dependent on renal replacement therapy (RRT) including transplantation at any point (yes or no); and CKD patients of any stage (yes or no) (see Table 6).

Statistical analysis

Linear regression analysis was performed to test for associations between HLA alleles and eGFR as a continuous variable. All 360 HLA alleles which were in HWE were included, with a Bonferroni threshold of P < 0.00014 considered significant34 (0.05/360). Subjects who had ever received RRT were excluded as their eGFR values may have suggested healthy renal function even though their native function was poor.

Logistic regression was used to test for associations between HLA types and adverse clinical outcomes (ESRD, RRT, CKD, and kidney transplantation; binary variables). Age at recruitment and sex were included as covariates, and only alleles in HWE with minor allele frequency > 5% were considered (n = 50) in order to increase statistical power. P < 0.001 was considered significant after Bonferroni correction. All regression analysis was performed using Plink software35.

Ethical approval

All methods were carried out in accordance with relevant guidelines and regulations. All experimental protocols were approved by UKB’s Research Ethics Committee. Informed consent was obtained for all subjects. UKB has obtained Research Tissue Bank approval from its ethics committee that covers the majority of proposed uses of the resource, so researchers do not typically need to obtain separate ethics approval.

Results

Variation in renal function

Variation in renal function within the UKB cohort is outlined in Table 2. eGFR values could not be calculated for around 18,000 subjects (< 5%) due to missing creatinine and/or cystatin measurements. Subjects dependent on RRT were excluded from the analysis, although their eGFR values are listed in Table 2, which shows calculated eGFR values and the corresponding CKD stages36 as well as the number of subjects in the final analysis.

Table 2 Distribution of subjects’ eGFR.

The calculated eGFR values were compared to average values to check that they were plausible. Average eGFR for different age categories were taken from the National Kidney Foundation37. The values calculated by this study were in line with NKF’s estimates, as shown in Table 3. This increases confidence in the calculated values.

Table 3 eGFR values by age.

11,379 subjects were identified with ESRD. Of these, 437 were renal transplant recipients and 1412 subjects had RRT. 4794 subjects had a clinical diagnosis of CKD (36 stage 1, 300 stage 2, 3557 stage 3, 397 stage 4, and 504 stage 5).

Regression analysis

33 HLA types were significantly associated with renal function after correction for multiple testing. Table 4 lists the 11 HLA alleles linked with decreased renal function (defined by either decreased eGFR or the presence of CKD or ESRD). Table 5 shows the 22 HLA alleles associated with increased renal function. No HLA associations were identified with kidney transplant status or RRT status. Tables 4 and 5 also show the population frequency of the alleles, the beta value or odds ratio (OR) of each effect, and the level of significance of the associations.

Table 4 Alleles which are associated with decreased kidney function.
Table 5 Alleles which are associated with increased kidney function.

Associations with decreased renal function

HLA types are inherited in maternal and paternal haplotypes and are not randomly distributed. Of the 11 HLA associations with decreased eGFR, seven were also linked to development of CKD, ESRD or both. 10 of these 11 HLA alleles are inherited in two well-documented haplotypes: (HLA-A*01:01, B*08:01, C*07:01, DRB1*03:01, DRB3*01:01, DQA1*05:01, DQB1*02:01; and A*03:01, B*07:02, C*07:0238. All genes in the former are associated with decreased eGFR, and all but HLA-DRB3*01:01 are also linked to increased risk of CKD, ESRD, or both. This haplotype is seen in 9.5% of the English population39. The “absence of DRB3 genes” was also associated with decreased eGFR, which may either indicate increased susceptibility in subjects homozygous for this common haplotype, or may reflect individuals with the latter haplotype, which is in linkage disequilibrium (LD) with HLA-DRB1*15:01 and therefore has no associated DRB3 genes present. Alternatively, a closely linked haplotype, HLA-A*03:01, B*07:02, C*07:02, DRB1*03:01, DQB1*02:01 (present in 0.5% of the English population39) may be implicated here.

Associations with increased renal function

The HLA associations with increased eGFR values do not appear to belong to full length haplotypes, but can be separated into groups of two or three HLA alleles which are often co-inherited. For example: HLA-DRB1*04:01, DQA1*03:01, DQB1*03:02 (seen in 8.2% of the English population40); DRB1*07:01, DQA1*02:01 (10.5% of the English population40); A*29:02, B*14:02, C*08:02 (2.1% of the Northern Irish population40); and B*44:03, C*16:01 (4.7% of the Northern Irish population40, also commonly associated with A*25:01) were all linked to increased eGFR. None of the 22 alleles associated with increased eGFR was shown to reduce the risk of adverse renal-related clinical outcomes.

Discussion

We identified significant HLA associations with renal function in the largest reported study to date. 22 HLA alleles were associated with increased renal function and 11 with decreased function. The HLA associations with increased renal function did not suggest a protective effect against CKD or ESRD, but the 11 associations with decreased renal function (seven of which were also linked to ESRD and/or CKD) were of particular interest. HLA genes are inherited through maternal and paternal haplotypes, which suggests a high probability that these alleles are not independently associated with renal function, but rather that this observation is non-random within the population. Specifically, individuals who carry the haplotypes listed are at increased risk of developing renal dysfunction, and may carry sub-clinical levels of impairment even in the absence of identifiable disease. This clustering of the HLA genes within well-documented haplotypes adds validity, which is reinforced as the primary and secondary outcome measures were calculated using the independent phenotypes of biomarkers and clinical outcomes. It should be noted that some significant alleles appear to be alone in significance (that is, the alleles that they are in LD with were not significant). Examples include HLA-A*32:01 and B*14:01, among others. In these cases, it is possible that the allele itself is linked to kidney function, independent of its haplotype, or it is possible that the other alleles in LD with this allele are also significant, and this study failed to detect this. The CKD-EPI calculation of eGFR was selected rather than MDRD41 or Cockcroft-Gault42 due to its increased accuracy when assessing subjects with normal renal function (eGFR > 60)43. Using only one eGFR value avoided multiple testing of closely related variables; the formula based on cystatin was selected as it had the fewest missing values. For comparison, the two other CKD-EPI eGFR formulae (one based on creatinine, and another based on both creatinine and cystatin) were used and the data re-analysed. In addition to the associations already described, three additional associations were identified as significant (assuming the same Bonferroni threshold of P < 0.00014): HLA-A*23:01 and DRB3*02:02 were linked to decreased renal function, and B*27:05 was linked to increased function.

Comparison with previous research

Previous literature has reported conflicting HLA associations with renal function in populations of different ethnic origin. Potentially, these contradictory findings may include false positives arising from inadequate statistical power, multiple testing, publication bias or methodological differences. Alternatively, it is possible that HLA associations with kidney function differ between populations due to varied heritage. Limiting this study to only white British subjects reduced any likelihood of bias due to population stratification.

Almost 100 HLA associations with ESRD have been described, only 11 of which have been confirmed by two or more independent studies. Our study replicated one of these 11 observations but refuted two. HLA-DRB1*03 was previously associated with renal dysfunction by four groups with a combined total of 1261 ESRD subjects and over 3000 controls5,6,7,44. We found not only HLA-DRB1*03:01 but an entire haplotype to be associated with decreased eGFR and increased risk of poor clinical outcome. However, HLA-B*07 was reported to be protective against ESRD in 1620 ESRD patients and 1211 controls by Doxiadis et al.10, and Karahan’s study of 587 patients and 2643 controls7. In this population, HLA-B*07:02 was associated with decreased renal function. Furthermore, HLA-DRB1*04 was associated with adverse renal outcomes in three previous studies with over 4000 ESRD subjects12,45,46, but here, DRB1*04:01 was linked to increased renal function. The remaining eight previously replicated HLA associations were not significant in this study. Overall, 14 of our associations confirmed previous observations5,6,7,12,44,45,47,48,49, while 12 of our findings refuted previous results7,10,12,45,46,49,50.

It is worth noting, however, that this study is much larger than any previous study. Most previous studies used case–control methodology (see “Strengths and weaknesses” below) and many failed adjust for multiple testing. Therefore, the findings reported here, which have undergone more stringent statistical testing, may be less prone to type I or II error.

Implications

This study is unique in that some of the HLA alleles associated with decreased renal function form a well-characterised haplotype. Both this and individual component HLA alleles have been associated with multiple diseases which result in CKD or ESRD, including systemic lupus erythematosus and IgA deficiency51. Our study indicates that even within a healthy population, renal function may be sub-clinically impaired in subjects with these alleles. These findings have the potential to impact upon clinical practice. HLA typing is already used as a diagnostic tool for disorders with strong HLA associations such as coeliac disease52, ankylosing spondylitis53, and actinic prurigo54. It may be advisable for clinicians to use HLA disease association typing to aid the diagnosis of renal failure, which could ensure timely therapeutic intervention. However, HLA associations with these diseases are much stronger than those reported here: the association between B*27 and ankylosing spondylitis has an odds ratio of 17155, while HLA associations with coeliac disease have OR > 1056, compared to ORs < 1.13 in this study. Clinicians and national kidney transplantation programmes may also use the HLA types associated with increased renal function to help identify suitable kidney donors.

Strengths and weaknesses

A key advantage of this study is the cohort size, which is larger than any previously published research. 382,204 subjects were included in the analysis of the primary outcome measure (eGFR), and the secondary analysis consisted of 11,379 cases of ESRD (and 389,928 controls). This study uses a variety of measures of renal function, most of which are calculated independently and are therefore unlikely to be subject to systemic bias. eGFR is a useful outcome measure because it provides a continuous scale, giving an accurate and precise estimate of renal function. Many previous studies used case–control methodology, reducing kidney function from a spectrum to binary categorisations such as “ESRD or healthy”. Measuring renal function on a spectrum may strengthen the statistical and clinical significance of this study.

A limitation of this investigation is that the HLA typing was performed by imputation rather than direct genotyping, which is more accurate57. This is because the cost of HLA typing a cohort of this size using traditional methods is prohibitively expensive. The imputation program used for the UKB population was HLA*IMP:02, though Karnes’ review57 of competing programs suggests that SNP2HLA is more accurate. Nevertheless, the review stated that HLA*IMP:02 is 94% accurate when imputing white subjects which, given the size of our cohort, is acceptable within the scope of this study. Furthermore, 360 of the 362 imputed alleles were in HWE (P > 0.00014), suggesting that the majority of imputed allele frequencies were consistent with frequencies that might be expected in a stable population. The two alleles which were not in HWE (HLA-DQB1*02:02 and DPB1*03:01) were excluded from the analysis.

Some HLA associations found in this study do not appear to be part of a haplotype. These alleles may be independently associated with renal function, or they may be false positives caused by inaccurate imputation. For example, HLA-B*44:02 was not significant in this study, but is commonly associated with C*05:01, DRB1*04:01, DQA1*03:01 and DQB1*03:02 in combination with either A*25:01, A*29:02, or A*32:01, which were all linked to increased renal function. Our study identified HLA-B*44:03, rather than B*44:02, to be associated with increased renal function. This discrepancy may suggest that HLA-B*44:02 alleles were incorrectly imputed as B*44:03, though given that C*16:01 (which is often seen in LD with B*44:03) was also associated with increased renal function may validate the observation regarding B*44:03. Any dubious observations may be resolved by repeating the imputation using an alternative imputation programme or additional reference panels.

It is possible that the strategy employed to identify subjects with adverse kidney-related clinical outcomes was insufficiently comprehensive to capture all cases. If data held by UKB were incomplete, or if relevant codes were not included (see Table 6), subjects with poor renal outcomes would be mischaracterised as healthy. This could be averted by obtaining a peer-reviewed validation of the coding systems that documents exactly which codes are representative of adverse renal outcomes, but to the best of our knowledge no such validation exists. Clinical outcomes were secondary outcome measures in this study; the primary outcome of eGFR is not affected by this limitation.

Table 6 Codes included in definitions of ESRD, kidney transplant status, RRT status, and CKD. This shows the various coding systems and codes used to identify subjects with adverse clinical outcomes.

A final limitation of this study is that the sizes of the associations with eGFR were smaller than previously published HLA disease associations55,56 and possibly too small to be considered clinically relevant. 25 out of 33 (76%) significant associations with eGFR had a beta value between − 0.5 and 0.5, suggesting that the presence of the allele has only a minor effect on kidney function. However, in seven cases, these apparently small effects were corroborated by associations with adverse clinical outcomes, implying that small beta values are not a contraindication of clinical relevance.

Conclusions

This study has identified 22 HLA types which are associated with increased kidney function, and 11 which are linked to decreased kidney function in a large UK population. Many of these are commonly inherited together in haplotypes. Importantly, seven alleles, which are each seen in between 14–34% of the cohort, were linked to both decreased eGFR and increased incidence of adverse clinical outcomes. Due to the constitution of the cohort, the results of this study can only be applied to white British people aged 39–73. Repeating the analyses with alternative cohorts may add considerably to our current knowledge and allow a better assessment on the implications for population health.