Background

Osteoporosis and the associated fractures are a major global health burden for patients, their social network, and society [1,2,3]. Ireland has one of the greatest illness burdens, and the highest projected increases in osteoporotic fractures in Europe over the coming decade [3, 4]. National publications suggest that available data reflect the ‘tip of the iceberg,’ and the financial costs of managing people with these fractures will double during this decade, rising to €2billion by 2030 [5,6,7]. Recent programs established national standards for the management and audit of hip fracture care among adults aged ≥ 60 years of age and fracture liaison services [8, 9]. These reflect the current state of fragility fracture care, variation in osteoporosis diagnosis, risk assessment and management, some progress, while also highlighting substantial needs including increases in resources, data, policy, priority, and logistics [3, 5,6,7,8,9,10].

Access to quality risk assessment, diagnosis, and treatment of osteoporosis is heterogeneous and inadequate around the globe [1,2,3, 11,12,13,14,15,16,17]. Many algorithms are available to decide whom to test, how to assess fracture risk, when to intervene, and how to monitor the effects of interventions [1,2,3, 12, 15, 16, 18,19,20,21]. Their performance varies considerably among different populations, with no single method substantially superior to others [18,19,20,21,22]. The Osteoporosis Self-assessment Tool (OST) is one of the simplest algorithms requiring only age and weight to aid in the identification of people likely to have a DXA diagnosis of osteoporosis [18,19,20,21,22,23]. Appropriate use of the OST index could significantly reduce the number of people requiring a DXA screening test [21,22,23,24,25].

Fracture risk may be estimated using various methods, each with strengths and limitations [2, 18,19,20,21,22]. Substantial efforts over many years supported the development of the FRAX tool such that it has become the dominant fracture risk assessment tool worldwide [26] and is the preferred algorithm of global professional bodies in skeletal health [2, 3, 26,27,28,29]. FRAX estimates the 10-year probability of hip fracture (HF) and certain major osteoporotic fractures (MOF) in people aged 40 to 90 years, with country-specific estimate options [26, 29]. Strengths of the FRAX algorithm include the availability of an online calculation option, with or without DXA testing, and the ability to include additional risk factors such as glucocorticoids, secondary causes, and a parental history of hip fracture [26, 29]. This is particularly attractive when access to DXA is limited [14, 20, 26, 29], such as in our center, where reducing unnecessary testing is important [25, 30]. Irish legislation governing the justification of the risks associated with exposure to ionizing radiation prohibits undertaking such exposures if alternative methods are available which can achieve the same objectives [31].

Clinical risk factors included in FRAX are prevalent in Irish adults, including those who are hospitalized and those with and without fractures [17, 32,33,34]. FRAX probabilities and potential intervention thresholds for Ireland were derived using a limited data set of public hospital admissions, population statistics, and several assumptions, though neither individual-level data nor DXA results were available [4]. Resulting software exists to calculate fracture risk estimates either via the FRAX website (www.sheffield.ac.uk/FRAX/), or on most modern DXA machines. Sometimes we note discordance between FRAX estimates in correspondence we receive and those derived from our DXA machines which include BMD.

The DXA-HIP cohort was established to examine and validate international DXA criteria and osteoporosis diagnostic and prediction algorithms for Irish adults [32]. In order to understand the importance of BMD inclusion when calculating FRAX probabilities for our population, we compare the agreement between Web-based and DXA-based FRAX derivatives for Ireland.

Methods

Details of the entire DXA-HIP cohort have been described [25, 32]. In brief, a convenience cohort was established for clinical research using DXA data from 3 centers which include 4 GE-Lunar Prodigy DXA machines, using G.E. Encore software version 17. Femoral neck T-scores for men and women are generated using NHANES III ISCD-recommended calculations [35]. All scans are performed and reported by staff trained to ISCD standards and recommendations. The staff have regular weekly meetings to discuss discrepancies, complex cases and audit procedures, performance, and reports. The collection and analysis of the data for the DXA-HIP project were approved by our Institutions Ethics Committee and in compliance with G.D.P.R. regulations [25, 32, 36]. In this study, due to the inherent thresholds in the FRAX tool [26], we include Caucasian subjects aged between 40 and 90 years of age with weight less than 125 kg.

Preliminary data for this study were supplied from 1 clinical site between June and December 2021. Data were collected and compiled when auditing our DXA-FRAX estimates at the time of scanning and reporting (G.E. Lunar FRAX estimates for Ireland, version 3.8). Contemporaneously, we derived Web-FRAX estimates for these same men and women from the FRAX website for Ireland (https://www.sheffield.ac.uk/FRAX/, version 4, country code = 48), with and without femoral neck BMD values in g/cm2 for GE Lunar. All data were subsequently rechecked on 2 further occasions by 2 of the investigators, and FRAX estimates were recalculated, to ensure the information being used was accurate, consistent, and complete. The data were merged, anonymized, and stored for analysis.

We chose to compare FRAX estimates between DXA-FRAX and Web-FRAX with and without the inclusion of femoral neck BMD for both men and women. In order to highlight the extent and magnitude of the difference between various FRAX estimations, we show the proportion of men and women whose difference between DXA-FRAX with BMD and their corresponding Web-FRAX without BMD HF which were: < 3%, between 3 and 6%, and > 6%, and MOF which were < 10%, between 10 and 20%, and > 20%. We used box and whisker plots and Bland–Altman plots to assess the overall and within-person differences between Web-FRAX with and without BMD and DXA-FRAX and the patterns of bias. We used paired T-tests, Chi-squared tests, Fisher’s exact tests, and Wilcoxon Rank Sum tests to compare means and medians as appropriate. All analyses were planned ad hoc. All analyses were performed on Python 3.6. We performed sensitivity analyses by excluding those whose prior fracture site was unknown, and for those with multiple prior fragility fractures.

Results

A total of 2090 records were collected during an audit of vertebral fracture assessment (VFA) scans between 2019 and 2021 including patients’ demographic information such as age, gender, weight, height, and BMI; risk factors such as previous fracture and femoral neck BMD; and results of DXA-FRAX estimations. Subjects with missing or incomplete information were excluded. Complete data on 1254 adults aged between 40 and 90 years were available for this study, including 290 (23.1%) men and 964 (76.9%) women. A summary of patient details including the variables used in FRAX calculations is shown in Table 1, broken down by gender. Women are significantly lighter and shorter than men, had lower BMD, and were less likely to take corticosteroids or drink excessively, but more likely to have a parent who had a hip fracture. Almost half of the men and women had a previous MOF, while almost 36% have another disorder strongly associated with osteoporosis such as early menopause, diabetes mellitus, or coeliac disease.

Table 1 Characteristics of study subjects

Women had higher FRAX scores than men using all 3 calculation methods, for both HF and MOF, shown in Figs. 1 and 2. The majority of men and women had DXA-FRAX HF scores below 5% and MOF less than 20%. A small number of female patients have very high scores (> 50%) for both HF and MOF. Overall, Web-FRAX scores without BMD were significantly higher, P < 0.001, than Web-FRAX scores with BMD or DXA-FRAX scores for both men and women, and both HF and MOF, shown in Table 2, though for individuals they were sometimes lower, Figs. 3 and 4. Differences between DXA-FRAX and Web-FRAX with BMD were very small and not statistically significant (P values HF: 0.914, MOF: 0.967) shown in Appendix Figs. 5 and 6, whereas the differences between DXA-FRAX and Web-FRAX without BMD as well as the differences between Web-FRAX with BMD and Web-FRAX without BMD are sometimes large and were statistically significant, ***P < 0.001 (Figs. 3 and 4). The prevalence of hip fracture and major osteoporotic fracture for women is 3.4% for HF and 47.5% for MOF, while for men is 6.6% for HF and 48.6% for MOF.

Fig. 1
figure 1

Box and whisker plots comparing DXA-FRAX, Web-FRAX with BMD, and Web-FRAX without BMD for MOF by gender. Note: the significance is reported for the following levels: ns: not significant; *P < 0.05, **P < 0.01, and ***P < 0.001

Fig. 2
figure 2

Box and whisker plots comparing DXA-FRAX, Web-FRAX with BMD, and Web-FRAX without BMD for HF by gender. Note: the significance is reported for the following levels: ns: not significant; *P < 0.05, **P < 0.01, and ***P < 0.001

Table 2 Comparison of DXA-FRAX and Web-FRAX scores with and without BMD for MOF and HF
Fig. 3
figure 3

Bland–Altman plots comparing DXA-FRAX to Web-FRAX without BMD for HF for by gender

Fig. 4
figure 4

Bland–Altman plots comparing DXA-FRAX to Web-FRAX without BMD for MOF by gender

In contrast, we found substantial differences within individuals when we compared the absolute difference in HF and MOF between their DXA-FRAX and their Web-FRAX scores without BMD, particularly among those with higher scores, as shown in the Bland–Altman plots in Figs. 3 and 4. However, for those with low scores: HF < 5% and MOF < 10%, the absolute differences were generally small. A similar pattern was noted when we compared Web-FRAX scores with BMD to Web-FRAX scores without BMD, data not shown. The differences observed between DXA-FRAX and the corresponding Web-FRAX with BMD estimates for HF and MOF were generally very small, with limits of agreement of < 1% and maximum differences of < 2.4%, shown in Appendix Figs. 5 and 6. Table 3 presents the breakdown of the proportion of individuals with various differences between DXA-FRAX and Web-FRAX scores without BMD, HF: < 3%, 3 to 6%, and > 6%, and MOF: < 10%, 10–20%, and > 20% difference. These show a greater proportion of women have larger absolute differences > 10% than men for MOF. However, 43% of the patients have an absolute difference in HF of > 3%. Moreover, 28% of females and 14% of males have an absolute difference in HF estimates of > 6%. The range of differences for women is 0–39.2% for MOF and 0–45.3% for HF while for men is 0–17.0% for MOF and 0–14.5% for HF.

Table 3 Proportion of individuals with various differences in HF and MOF between DXA-FRAX and Web-FRAX without BMD scores

Table 4 summarizes the characteristics of people whose DXA-FRAX and corresponding Web-FRAX without BMD scores differ by a small, moderate, or large amount. Fractures, secondary osteoporosis, and rheumatoid arthritis were more prevalent among those with larger differences, who are also older, lighter, and have lower BMI. In Tables 5 and 6, we summarize the characteristics of those women and men, respectively, whose differences between their DXA-FRAX and Web-FRAX scores were greater than or less than the limits of agreement derived from our Bland–Altman results (Figs. 3 and 4). Women with more extreme differences were older and lighter and had lower BMI and BMD; a greater prevalence of fractures, rheumatoid arthritis, and glucocorticoid use; and a much higher or lower prevalence of secondary osteoporosis, tobacco use, or a parent with a previous hip fracture. Men with more extreme differences were similarly lighter and had a lower BMI, a greater prevalence of fractures and excessive alcohol use, a lower prevalence of parents with a hip fracture, a higher or lower age, BMD, and prevalence of smoking, glucocorticoid use, rheumatoid arthritis, and secondary osteoporosis.

Table 4 Characteristics of individuals with various differences in HF and MOF between DXA-FRAX and Web-FRAX without BMD scores
Table 5 Comparison of extreme differences between DXA-FRAX and Web-FRAX for women
Table 6 Comparison of extreme differences between DXA-FRAX and Web-FRAX for men

Discussion

In this paper comparing different FRAX calculations in older Irish men and women, we found excellent agreement between the Web version and the DXA version when femoral neck BMD was included. However, when we compared estimations without BMD to estimations that included BMD, there were notable differences for some individuals or extreme cases which at times were quite large, up to 40% absolute difference for major osteoporotic fracture and 46% absolute difference for hip fracture, shown in Figs. 3 and 4. Such differences are more likely to be observed at extremes of weight, BMI, BMD, or prevalence of rheumatoid arthritis or secondary causes of osteoporosis, as well as where fractures or glucocorticoid use is present.

FRAX is a clinical tool designed to improve the estimation of fracture risk by combining some of the most important determinants in a multivariate algorithm, which should be more robust than using any single factor [29, 37,38,39,40]. Additionally, the importance of using absolute risk rather than relative risk or a single BMD threshold is an important advance [26, 29]. The original algorithm was derived from 9 cohorts including 46,340 men (32%) and women (68%) and validated across 11 cohorts including 230,486 men (< 1%) and women (> 99%) from 23 countries across the globe [37]. BMD was available for 37,305 (80.5%) of the development group, but only 28,660 (12.4%) of the validation group [37]. This algorithm is now available in a number of formats, including a web-based calculator and a DXA-based calculator [39, 41]. The current web-based tool (31st October 2022) includes 87 populations: 18 Asian populations, 36 European populations, 19 Middle East and African Populations, 5 North American populations, 7 Latin American populations, and 2 Oceania populations, while the DXA-based tool has 58 available populations to select from. Previous attempts to calibrate FRAX for Ireland used national hip fracture estimates but no patient-level data, and the authors note the inclusion of BMD could be problematic [4]. Today, FRAX is widely used in the assessment of individuals, despite the lack of validation within a large representative population [3]. Since the FRAX tool has been incorporated in over 80 guidelines worldwide [39, 42], there is a need for assurances of accuracy and consistency in outputs.

Several authors compare the performance of the tool using different calculation methods, with and without BMD, and to other risk algorithms, showing variability within and between populations [33, 37, 41, 43,44,45,46,47,48,49,50,51,52]. Some suggest FRAX performs similarly with and without BMD [43, 46, 49, 50], and using different calculation methods [41], while others suggest FRAX without BMD is not sensitive enough to identify those in need of treatment [33, 46,47,48, 51, 52]. A Japanese study comparing 4 different FRAX calculation methods for several thousand men and women show they provide similar estimates [41], while a group of Danish authors suggests the addition of BMD may be of limited benefit [43]. In our study, the inclusion of BMD reduces the mean FRAX Ireland estimates for both 10-year risk of HF and MOF, for both men and women. More importantly, for some individuals, there were very large differences when BMD was included in their calculation. Such differences could have a significant influence on patient and clinician decisions on whether, and how, to intervene or not, and the downstream clinical consequences for the patient in terms of benefit, risk, and cost.

Prior studies compare FRAX estimates with and without BMD using ROC (receiver operatic characteristic) curves and AUC (area under the curve) analyses, or a comparison of means [37, 44,45,46, 49,50,51]. ROC analysis is commonly used to assess the accuracy of diagnostic testing, but has important limitations, particularly when examining risk [53, 54]. Common errors in the medical literature include interpreting comparisons between two effects without directly comparing them, and over-interpreting non-significant results [55]. Inevitably, there will be differences between measures when different methods are applied; hence, the key issue is really the quantity of these differences [56]. In our study, the AUC values obtained with and without BMD are similar in pattern to prior publications whereby the inclusion of BMD improved the value. Unlike other studies, the AUC for MOF was greater than the AUC for hip fractures, likely due to the overfitting of the model with a very high fracture prevalence, particularly non-hip MOF in our sample. In a sensitivity analysis where we excluded those with multiple fractures or missing fracture sites, this provided a marginal improvement. However, the key aim of the study was to examine the within-person difference in FRAX estimates for different calculation methods. A more formal analysis of the differences between estimations within individuals displays a far more accurate picture of the size of the problem, and where those problems tend to arise. We also show when such differences are more likely to be seen. It would appear from our data that use of the FRAX tool without BMD should be interpreted cautiously for individual patients, especially older patients or those deemed higher risk.

Our study has important limitations. Firstly, these data represent a small sample of a larger dataset, but this analysis is an important first step in a multi-step process to examine and understand the validity of FRAX and other tools for our population with and without BMD. Secondly, the data are cross-sectional in nature, so while we can use the tool to estimate risk, and discriminate between those with and without prevalent fracture, we cannot calibrate the results. These results are important however as such assessments with and without BMD are in widespread use in clinical practice today in Ireland. Thirdly, all subjects were referred for a DXA scan for a reason and almost 50% have a prevalent fracture, so these results may not apply to a more general population, or those without prior fractures. Current studies are assessing the performance among those with and without risk factors, and with and without prior fractures in the larger dataset, and longitudinal analyses to calibrate this and other risk algorithms in a larger cohort. Our larger dataset is incomplete and has some missing data, but this small subset represents a sample that has been triple-checked for the accuracy and completeness of the data for all study subjects enabling a more robust comparison. Finally, there are many different versions of the FRAX tool in use today, and our results may not apply to other populations where the importance of BMD has been clarified or remains unknown.

Conclusions

Significant differences exist in the results of DXA-FRAX and Web-FRAX for Ireland, particularly for men and those with higher risk estimates so these results should be interpreted cautiously. Reassuringly, results were similar for those deemed at lower risk and for women. These results support the need for a more formal longitudinal analysis to calibrate FRAX and other risk tools for our population, with and without BMD.