Introduction

Osteoporotic fractures impose a substantial and globally increasing financial burden, especially on countries with rapidly aging populations [1]. By the year 2050, 40% of the populations in developed countries are expected to be over 60 years old and account for 35% of the population in Europe [2]. This fracture-related burden in Europe alone is estimated to increase by 23% (3.3 million fragility fractures) and cost €47.4 billion in 2030 [3].

Currently, areal bone mineral density (BMD) derived from dual-energy X-ray absorptiometry (DXA) remains the primary diagnostic approach for osteoporosis, whereby a T-score value below -2.5 is indicative of osteoporosis [4]. Specifically, according to the Bone Health and Osteoporosis Foundation (BHOF), areal BMD testing is recommended for men and women 50 years and older, presenting with any pre-determined clinical risk factors (i.e., parental history of hip fracture, prior osteoporotic fracture, oral glucocorticoid intake, or excessive alcohol consumption) [5]. However, areal BMD has been reported to have poor fracture risk predictive ability as approximately 50% of fractures occur in individuals that formally do not have osteoporosis, at least not at the time the DXA scan is acquired [6]. In addition, DXA imaging, while considered a relatively cost-effective option in comparison to more advanced imaging techniques like computed tomography (CT) or magnetic resonance imaging (MRI), still faces limitations due to factors such as affordability, lack of accessibility, and low screening rates [7]. Such challenges hinder the feasibility of implementing blanket population-based areal BMD testing as a comprehensive approach. Consequently, DXA imaging may not be a sustainable or economical screening strategy for identifying older people who are at a higher risk of osteoporotic fractures [8].

Not only is there a need to improve the prediction of fracture risk, but there is also a need to cast a wider net for fracture risk screening onto populations that typically do not fall under the targeted screening programs. Clinical routine CT examinations obtained for clinical indications other than dedicated assessment of osteoporosis (e.g., lung, cardiovascular, malignancy-related indications) can be used for opportunistic screening for osteoporosis and fractures [9,10,11]. While especially non-contrast-enhanced CT examinations may be used for the purpose of opportunistic assessments, methods are currently being developed that may deliver accurate measures of areal and/or volumetric BMD even in cases of contrast-enhanced scans or inhomogeneous scanning protocols between CT scanners or sites [12,13,14,15]. Given that such imaging data used for this kind of screening are initially procured for non-osteoporosis-related assessments, the implementation comes at low cost without the need for additional imaging exams or radiation exposure to the patient [16].

Current studies have focused on evaluating the performance of opportunistic CT screening for osteoporosis assessment [16,17,18,19,20]. Some of these studies suggest that CT-based biomarkers, such as image attenuation, volumetric BMD, and finite element (FE)-derived biomarkers, are reasonably accurate and could be at least as accurate as DXA-based areal BMD in identifying osteoporosis [16, 20]. Further, FE-derived parameters such as patient-specific bone strength are an attractive alternative to conventional density measurements as they have the capability to integrate the three-dimensional (3D) mechanical behaviour of the bone and provide improved fracture risk predictions [21]. The penultimate objective of osteoporosis management is to prevent the occurrences of osteoporotic fractures. Implementing an opportunistic CT screening strategy for high fracture risk in primary care can be effective in reducing osteoporotic fractures [22].

To date, there is no review that has evaluated the discriminative ability of biomarkers from opportunistic CT screening for predicting osteoporotic fractures, mainly hip, vertebral, and major osteoporotic fractures. Therefore, the aim of this review was to qualitatively synthesize current literature on opportunistic CT screening in fracture discrimination as well as to quantify and compare the discriminative ability of currently established CT-based biomarkers for fracture screening.

Methods

Search strategy

A systematic literature search was conducted in four electronic databases: MEDLINE, EMBASE, Cochrane, and Web of Science (from the earliest available indexing date to July 1, 2023). The search strategy was originally designed for MEDLINE, assessed using the PubMed interface and was then adapted to the other databases. The search term consisted of the following phrases: “computed tomography”, “routine”, “opportunistic”, and “fracture” (Online Resource 1 for full search strategy). To maximize sensitivity, the phrases in the search term covered free-text terms occurring in all fields. This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [23]. The references were imported into Rayyan [24], a free online application, which was used to eliminate duplicate studies and select studies based on the eligibility criteria.

Study selection

Studies were included if they fulfilled the following criteria: (1) the study used clinical routine CT scans from hospitals or other healthcare institutions, (2) biomarkers derived from CT scans (e.g., areal BMD, volumetric BMD, FE-derived parameters, texture features, bone or muscle attenuation, and any other unique biomarkers) were used for fracture discrimination, (3) any fracture including hip, vertebral, or major osteoporotic fractures were the outcome, and (4) the publication was written in English and conducted in human populations. Studies were excluded if (1) young populations below 40 years of age (on average) were evaluated, (2) the imaging modality used was high-resolution CT acquired in non-clinical domains (i.e., research purposes), (3) the biomarkers derived from CT scans were used for diagnosis of osteoporosis only, and (4) the studies were not original articles (i.e., case reports, reviews, or conference abstracts). Additionally, references from articles and reviews were also screened, and relevant articles identified from the manual search were included. Based on the pre-defined inclusion and exclusion criteria, two reviewers (ADP and NS) were involved in the independent screening of titles and abstracts as a first step to identify eligible studies. The two reviewers (ADP and NS) then independently screened full texts of studies for final inclusion. Disagreements between the two reviewers were resolved by discussion.

Data extraction

The data were extracted from studies by the two reviewers (ADP and NS). Data extracted included study design, follow-up period (if any), sample size, subject demographics such as age, sex, and prior fracture status, study location, fracture definition, fracture incidence, CT imaging indication, mean interval from CT imaging to fracture, type of CT imaging, CT scanner characteristics, type of CT calibration, region of interest (ROI) characterization, as well as all reported performance measures (e.g., sensitivity, specificity, hazard ratio [HR], odds ratio [OR], and the area under the curve [AUC]). The AUC from receiver operating characteristics (ROC) was the primary outcome of interest in this review. While sensitivity and specificity could have been considered to gain a clearer understanding of the discriminative ability of the CT-based biomarkers, the majority of studies in our review primarily reported AUC values, making it a more feasible option. Authors of studies that only reported the AUC without the 95% confidence intervals (CIs) were contacted to provide the requested data (n = 13). A formula, as detailed by Hanley and McNeil [25], was used to approximate the 95% CI values for the studies that we did not receive a response from (n = 8).

Quality assessment

The two reviewers (ADP and NS) independently screened the quality of the included studies using a checklist derived from the Quality Assessment of Diagnostic Accuracy Study 2 (QUADAS-2) tool [26]. The QUADAS-2 tool was modified to suit the purpose of this specific review (Table SI1; Online Resource 2). Disagreements between the two reviewers were resolved by discussion. We considered that a potential selection bias existed if fracture status was self-reported and if less than 90% of eligible patients were included in the analyses.

Statistical analysis

The studies included in the meta-analysis were stratified into four subgroups of imaging biomarkers: (1) areal BMD obtained from DXA or CT, (2) CT-based image attenuation (in Hounsfield Units), (3) CT-based volumetric BMD, and (4) FE-derived parameters. In the first subgroup, DXA-based areal BMD values were combined with CT-based areal BMD values due to well-established correlations in literature [27,28,29]. The inclusion of DXA-based areal BMD values serves as a comprehensive baseline/reference standard for the meta-analysis. The included studies either evaluated hip fractures, vertebral fractures, and/or major osteoporotic fractures, which were defined as all hip, vertebral, wrist, and humerus fractures. The overall estimates for each biomarker, i.e., areal BMD, volumetric BMD, attenuation, and FE-derived parameters as well as the estimates for the subgroups of hip, major osteoporotic, and vertebral fractures for each biomarker were calculated and reported. Since some studies reported more than one AUC value (i.e., for different anatomical locations), a three-level random effects model was fitted to the data to account for the dependence between reported values using the metafor package of the R programming environment (version 3.4.5 for Windows; The R Project for Statistical Computing: https://www.r-project.org/) [30]. The individual, subgroup, and overall AUC estimate values and the corresponding 95% CIs were illustrated in forest plots. Heterogeneity was assessed using the Q test as well as the multi-level I2 statistic, which allows for separate estimation of within-study (I2Level 2) and between-study (I2Level 3) variance components [31]. Since individual effect sizes within a cluster/study are based on the same sample, their sampling errors can be assumed to be correlated [32]. By assuming a correlation factor of ρ = 0.60, cluster-robust variance estimation was used to account for the within-study dependence [32]. The test for moderators was used to test the statistical difference between the imaging biomarkers, using the areal BMD subgroup as reference. A p-value < 0.05 was considered statistically significant.

Results

Study selection

Overall, out of the 2805 studies that passed the initial search strategy, 61 studies that fulfilled the eligibility criteria after screening titles and abstracts were included in the qualitative synthesis (Fig. 1). Eleven studies were excluded based on full-text screening. Reasons for exclusions after reviewing the full-text articles were (1) fracture not being the reference standard, (2) article being a conference abstract, (3) discrimination between individual vertebrae instead of patients, and (4) irrelevance (Fig. 1). Eleven studies were included manually, with the majority focusing on the evaluation of FE-derived parameters. These studies frequently employed technical keywords that may not be directly associated with clinical fracture evaluation, making their discovery with the search strategy challenging [33,34,35,36,37,38,39,40,41]. Hence, manual searches were conducted to ensure their inclusion in our review. Of the included studies, 35 reported AUC values and were incorporated in the meta-analysis. Articles that provided AUC values for other unique and less reported parameters or that provided other summary measures for the parameters analysed in this review were qualitatively synthesized with the remaining articles (no meta-analysis).

Fig. 1
figure 1

Flow diagram of the review process in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines

Study characteristics

The main characteristics of all included studies are summarized in Table 1 for hip fractures, Table 2 for vertebral fractures, and Table 3 for major osteoporotic fractures. Amongst the 61 studies, most studies (n = 38) focused on the prediction of vertebral fractures (Table 2). Of those, only two studies evaluated a non-vertebral ROI in the prediction of vertebral fractures [42, 43] (Table 2). Within the studies (n = 14) that investigated hip fractures, five studies evaluated the spine region for hip fracture prediction [44,45,46,47,48] (Table 1). In the prediction of major osteoporotic fractures (n = 13 studies), two studies calculated the biomarkers at the distal radius and ulna regions [49, 50], and two studies at the femur [51, 52], while the remaining studies evaluated them at the spine [45, 46, 48, 52,53,54,55,56,57,58,59] (Table 3).

Table 1 Main characteristics of included studies that assessed hip fractures (n = 14)
Table 2 Main characteristics of included studies that assessed vertebral fractures (n = 38)
Table 3 Main characteristics of included studies that assessed major osteoporotic fractures (n = 13)

The majority of studies were conducted in a case–control design (n = 10 studies for hip fracture prediction, n = 11 studies for vertebral fracture prediction, and n = 2 studies for major osteoporotic fracture prediction; Tables 1, 2 and 3). The type of subjects that were evaluated also varied widely between studies, from a simple inclusion criterion based on age and the availability of CT examinations to the inclusion of subjects with diverse clinical indications such as rheumatoid arthritis, lymphoma, multiple myeloma, or oncological conditions (Tables 1, 2 and 3). In addition, amongst the 61 studies, 10 studies focused exclusively on females [33, 37, 38, 40, 54, 57, 60,61,62,63] (Tables 1, 2 and 3). Mean age of the subjects ranged between 40 to 80 years, while the follow-up interval ranged between 0 to 10 years (Tables 1, 2 and 3).

With regards to the four subgroups of imaging biomarkers, areal BMD obtained from DXA or CT was typically reported in the form of mg/cm2 or derived T-scores, except for one study that additionally reported the areal BMD weighted with the probability of falling [38]. Furthermore, CT-based image attenuation (in Hounsfield Units) was measured at the thoracic or lumbar spine except for one study that measured it at the proximal femur, [51], one study at the distal ulna [50], and one study at the distal radius [49] (Tables 1, 2 and 3). The CT-based volumetric BMD was also measured at the thoracic or lumbar spine except for five studies that measured it at the femur [38, 39, 42, 43, 52] (Tables 1, 2 and 3). The FE-derived parameters consisted mainly of bone strength at the hip or spine, but some studies reported the absolute risk of current fracture [33], fragility ratio [38], maximum volumetric strain [38], load bearing capacity [60], axial rigidity [60], bending rigidity [60], and displacement [64] (Tables 1, 2 and 3).

Assessment of methodological quality

The detailed domain-wise and item-wise quality assessment of all studies included in this review is shown in Online Resource 2. Overall, 19 out of 61 studies for patient selection, 14 for index test, 50 for reference standard, and 28 for flow and timing were considered as having low risk of bias (Table SI2). Furthermore, 42 and 56 studies for patient selection and reference standard, respectively, and all 61 studies for index test had low concern for applicability. Only two studies complied with all the QUADAS-2 items (Table SI2) [56, 57]. About 50% of studies did not report on whether the enrolment of patients was random (Fig. SI1). Moreover, 26 out of 61 studies (~ 45%) were conducted in a case–control manner and were considered as having high risk of bias (Fig. SI1). Most studies (n = 39) did not report definitively if investigators extracting the CT-based imaging biomarkers were blinded to fracture status during the study (Fig. SI1). Twelve studies used pre-defined diagnostic thresholds to assess the discriminative ability [13, 34, 35, 41, 50, 59, 62, 65,66,67,68,69]. Except for a study that used machine learning to identify fractures without radiologist review [70] and a study that accepted patient-reported fracture status [55], fractures in most studies (n = 50) were verified using electronic health records or clinically accepted medical image detection by radiologists (Fig. SI1), and those were assessed to have low risk of bias. The remaining nine studies did not detail on their fracture ascertainment and were considered as having unclear risk of bias [15, 33, 37, 40, 58, 71,72,73,74]. Eleven studies included less than 90% of eligible participants in the analysis (Fig. SI1) [13, 40, 43, 45, 58, 66, 67, 70, 72, 75, 76].

Meta-analysis

Ten studies reported 19 AUC values for areal BMD in the form of mg/cm2 or derived T-scores obtained from either DXA or CT imaging. The overall pooled AUC of areal BMD for the prediction of any fracture was 0.73 (95% CI 0.71–0.75, Q = 19.27, p = 0.38), with the I2Level 2 = 0% and I2Level 3 = 4.7% of the total variation attributed to the within-study and between-study heterogeneity, respectively (Fig. 2). Subgroup analysis revealed that the pooled AUC values for areal BMD regarding hip and vertebral fractures were 0.73 (95% CI 0.70–0.76, Q = 8.47, p = 0.75) and 0.71 (95% CI 0.63–0.78, Q = 10.06, p = 0.07; Fig. 2), where the test for heterogeneity did not reach statistical significance. Subgroup analysis could not be performed for major osteoporotic fractures as no study reported areal BMD values.

Fig. 2
figure 2

Forest plot for the area under the curve (AUC) and 95% confidence intervals (CIs) of areal bone mineral density (BMD) or T-scores in discriminating subjects with and without hip or vertebral fractures

With regards to attenuation, 21 AUC values at various ROIs were reported in ten studies. The overall pooled AUC of attenuation for the prediction of any fracture was 0.73 (95% CI 0.66–0.79, Q = 607.23, p < 0.01; I2Level 2 = 9.07% and I2Level 3 = 89.67%), which was similar to that of areal BMD (Fig. 3). The pooled AUC values for attenuation for hip, vertebral, and major osteoporotic fractures were 0.66 (95% CI 0.52–0.79, Q = 9.09, p = 0.03), 0.77 (95% CI 0.66–0.87, Q = 560.80, p < 0.01), and 0.67 (95% CI 0.63–0.70, Q = 26.21, p < 0.01; Fig. 3), where the discriminative ability for hip and major osteoporotic fractures were the poorest.

Fig. 3
figure 3

Forest plot for the area under the curve (AUC) and 95% confidence intervals (CIs) of attenuation in discriminating subjects with and without hip, major osteoporotic, or vertebral fractures

Eighteen studies reported 37 AUC values for volumetric BMD in total, with the majority of values reported for vertebral fractures. The overall pooled AUC value of volumetric BMD was 0.76 (95% CI 0.71–0.81, Q = 393.74, p < 0.01; I2Level 2 = 6.76% and I2Level 3 = 87.6%; Fig. 4). There were only two studies that reported AUC values for volumetric BMD regarding hip fractures, which did not allow for the heterogeneity to reach statistical significance (AUC = 0.75, 95% CI 0.50–1.00, Q = 2.01, p = 0.16; Fig. 4). The pooled AUC for volumetric BMD for major osteoporotic fractures was 0.74 (95% CI 0.61–0.87, Q = 84.37, p < 0.01), which was lower than that for vertebral fractures (AUC = 0.77, 95% CI 0.70–0.83, Q = 129.86, p < 0.01; Fig. 4).

Fig. 4
figure 4

Forest plot for the area under the curve (AUC) and 95% confidence intervals (CIs) of volumetric bone mineral density (BMD) in discriminating subjects with and without hip, major osteoporotic, or vertebral fractures

Ten studies reported 26 AUC values for FE-derived parameters. The overall pooled AUC value of FE-derived parameters for any fracture was the highest at 0.77 (95% CI 0.72–0.81, Q = 69.30, p < 0.01; I2Level 2 = 9.16% and I2Level 3 = 63.85%; Fig. 5). Unlike for major osteoporotic fractures (AUC = 0.66, 95% CI 0.53–0.80, Q = 5.05, p = 0.08) and vertebral fractures (AUC = 0.80, 95% CI 0.73–0.87, Q = 8.33, p = 0.21), only the pooled AUC value of FE-derived parameters for hip fractures reached statistical significance (AUC = 0.78, 95% CI 0.72–0.83, Q = 40.70, p < 0.01; Fig. 5).

Fig. 5
figure 5

Forest plot for the area under the curve (AUC) and 95% confidence intervals (CIs) of finite element (FE)-derived parameters in discriminating subjects with and without hip, major osteoporotic, or vertebral fractures

The overall pooled AUC estimates, as well as those for the subgroups of different fracture types for each biomarker, is tabulated in Table 4. The overall pooled AUC for volumetric BMD (p < 0.01) and FE-derived parameters (p < 0.05), except for attenuation, were significantly different from that of areal BMD (Table 4).

Table 4 Subgroup and overall AUC estimates (95% confidence intervals) obtained from meta-analysis

Discussion

This review aimed to synthesize existing literature that utilized clinical routine CT imaging for opportunistic fracture discrimination using CT-based biomarkers comprehensively. Image-based biomarkers from CT have the potential to address current clinical limitations in assessing the risk for osteoporotic fractures, as they offer the opportunity to identify high-risk patients, incidentally, using imaging data that have been acquired for other clinical indications. This is the first review of its kind to evaluate the predictive capabilities of CT-based biomarkers in assessing fracture risk through opportunistic CT screening. While previous work has shown a significant difference in areal BMD values between older subjects with fractures and no fractures [42] or in patients with lymphoma [78], the bulk of studies, as shown by this review, demonstrated a limited discriminative ability for areal BMD, with an AUC ranging between 0.63 and 0.79. This drives the need to explore alternative imaging biomarkers such as attenuation, volumetric BMD, and FE-derived parameters to evaluate fracture risk in vulnerable populations.

Previous studies have attempted to demonstrate the potential of attenuation in distinguishing individuals with fractures [48,49,50,51, 53, 57, 62, 67, 68, 70, 82,83,84]. However, much like areal BMD, attenuation has exhibited a large variation in discriminative ability across different studies (AUC range: 0.61 – 0.90) [48,49,50,51, 53, 57, 62, 67, 68, 70, 82,83,84]. For instance, L1 trabecular attenuation was able to differentiate subjects with and without hip fractures with considerable accuracy [62, 67], indicating a possible association between spinal weakening and the overall deterioration in bone strength. However, a large cohort study (n = 9223) of older adults followed up to 12 years found poor discriminative ability in the prediction of both hip and major osteoporotic fractures [48]. Despite the limitations of attenuation measurements, the application in specific anatomical contexts could still be advantageous, and attenuation measures can be easily obtained by simple measurements within standard radiological picture archiving and communication systems. Distal radius attenuation was evaluated by one study regarding the identification of subjects with distal radius fractures and found it to be indicative of higher fracture risk in patients with wrist fractures (OR: 3.4 for females and 5.5 for males) [49]. Distal radius fractures often precede hip or vertebral fractures by up to several decades [49], and thus this highlights the potential utility of attenuation in identifying at-risk individuals way ahead, before more debilitating fractures are sustained (in the absence of dedicated areal or volumetric BMD assessments). Furthermore, the use of CT scans of the wrist could offer an efficient way to identify individuals at risk of future fragility fractures, as one study reported a sensitivity of 86% and specificity of 55% in predicting major osteoporotic fractures using distal ulnar attenuation [50]. Such work highlights the potential role of attenuation measurements to identify at-risk individuals seamlessly, despite their inherent limitations.

Additionally, this review demonstrated a less accurate prediction of major osteoporotic fractures, which typically encompass multiple anatomical sites, compared to the prediction of individual fracture types such as hip and vertebral fractures. In contrast, there was an improved prediction of vertebral fractures by CT-based imaging biomarkers, which were mostly captured at the spine itself, thereby situating vertebral fractures closest to the anatomical site of investigation. This finding seems consistent with a prior study that found that lumbar spine density measurements were not a reliable parameter for non-vertebral fractures and were only sensitive to the prediction of vertebral fractures [90]. Consequently, these results raise questions on the relevance of using biomarkers calculated at specific anatomical sites to predict fractures occurring at unrelated sites. The diminished accuracy in predicting major osteoporotic fractures might be attributed to the inherent limitations of two-dimensional (2D) biomarkers. These biomarkers tend to overlook bone abnormalities at the site of interest and are ill-equipped to detect them at other anatomical locations. This issue is compounded by the fact that major osteoporotic fractures encompass a wide range of anatomical sites. While measurements are predominantly taken at the hip or spine to predict fractures across various locations, the utilization of 2D biomarkers that are inherently limited further exacerbates this issue. This ultimately leads to poorer predictions for various types of fractures utilizing measurements obtained from unrelated sites of interest.

It is well-established that 2D measurements such as areal BMD and attenuation cannot fully capture the underlying structural changes that are important for characterizing fractures [9,10,11]. For instance, areal BMD alone can only explain 60–70% of the variation in bone strength as measured ex vivo [91]. This could be attributable to the averaging effect that disguises local regions of voids and defects resulting from osteoporosis that could contribute to the overall deterioration of structural integrity [88]. A promising solution to this could be to look at focal regions of deterioration, which is usually achieved with high-resolution imaging. A recent study showed that with clinical routine CT imaging, a surrogate for microarchitectural deterioration using focal bone mineral content loss, which considers the extent of demineralization and bone void space, could be associated with focal osteoporotic defects that lead to high fracture risk (OR = 2.07) and may allow to identify people at risk more accurately [88].

Another solution could be to incorporate 3D structural information to better understand the intricate interplay behind fractures. This meta-analysis suggests that volumetric BMD could be a more accurate biomarker than areal BMD and attenuation, particularly for vertebral fractures. Volumetric BMD seems to be even more sensitive to fracture prediction than FE-derived parameters. For example, a recent study demonstrated that volumetric BMD had better sensitivity (79% vs 64%) but lower specificity (47% vs 60%) than vertebral strength [35]. In fact, our meta-analysis revealed that volumetric BMD and FE-derived markers almost performed equally and, thus, volumetric BMD could be a good trade-off between the constraints of inherently limited 2D biomarkers and the sophistication of more advanced 3D structural measures. However, like attenuation, opportunistic extraction of volumetric BMD from clinical routine CT is sensitive to contrast agents that can impact the accuracy of measures [92]. Integrating texture features of trabecular bone using support vector machines, which has been shown to have very good discriminative ability (accuracy of 83%) [72], could further leverage the opportunistic use of CT data even beyond the use of standard measurements such as volumetric BMD [15]. Moreover, the extraction of geometric features such as the anterior–posterior ratio and middle-posterior ratio of vertebral bodies could help to detect vertebral fractures and could also be useful in the prediction of incidental fractures [79].

The lack of FE studies using opportunistic CT imaging renders its discriminative ability inconclusive. This review demonstrated that FE-derived parameters appear to perform reasonably well for hip fractures, but further research is needed to determine their usefulness in predicting other fracture types. Despite the lack of conclusive evidence, FE-derived parameters have managed to enter the clinical domain through a software that calculates bone strength, which has been validated in previous studies [34, 35, 41, 66]. Pre-defined thresholds were established for “fragile bone strength” (spine: females ≤ 4500 N, males ≤ 6500 N; hip: females ≤ 3000 N, males ≤ 3500 N) [41]. Studies following these thresholds have not necessarily found FE-derived strength to be superior, but instead found it to be only equivalent to areal BMD at best [34, 66]. Nevertheless, the high potential of FE-derived parameters was demonstrated in a study with 100% sensitivity and 70% specificity using load-bearing capacity normalized by body mass index in patients with spinal metastases using CT data obtained at the pre-fracture state [60]. Notably, this previous work excluded patients with a history of vertebral fractures so as to eliminate bias that would result in elevated risk of subsequent fractures [60]. This may indicate that FE-derived parameters could especially be useful in the prediction of secondary osteoporotic fractures, occurring due to underlying comorbidities. For instance, in individuals affected by cancer, metastases due to cancer can impact bone integrity, but current clinical metrics are only moderately sensitive in terms of identifying individuals at risk of fractures [60].

Unfortunately, evidence regarding FE is not only limited by the relatively low number of studies but also by the relatively small sample sizes [40, 43, 64, 71,72,73]. Although most of the studies evaluated here investigated unique populations that could be at risk for fractures due to the presence of secondary osteoporosis, the sample sizes are typically too small to draw deterministic conclusions on the risk of osteoporosis and fractures in patients with chronic conditions [71, 78]. Further research should look into investigating such populations with existing clinical routine CT scans for better understanding of the imaging biomarkers for fracture detection. Most FE studies were conducted in a case–control design manner, where the prevalence of fractures is pre-determined [33,34,35,36,37,38,39,40,41, 52, 66]. This hinders the applicability of findings from FE studies as fracture status is not known a priori in a clinical domain with real clinical cohorts [93]. Furthermore, assessing bone strength based on rigid anatomical bodies alone may not be sufficient. Inclusion of patient-specific impact loading as a result of falls by incorporating soft tissues may be a way to simulate real-life falls more realistically [94]. These considerations in the models could not only improve the accuracy, but the relevance of the models towards more accurate fracture prediction.

There are a number of promising approaches for improving fracture risk assessment beyond the biomarkers discussed herein. One approach involves improving the accuracy of CT-based biomarkers. For instance, this could be achieved through the quantification of sarcopenia, which may be an important attribute in capturing fracture risk in older individuals [95, 96]. In the context of cancer patients, a study revealed that muscle attenuation was not significantly different between patients with and without fractures [74]. However, reduced use of paraspinal muscles (e.g., due to back pain) and vertebral fractures can result in muscle atrophy [66], a factor that may not be reflected by mechanical properties. In contrast, Pickhardt et al. found that muscle attenuation can be used to predict fracture risk with reasonable accuracy [47], and that the accuracy is further improved when combined with bone attenuation [48]. This emphasizes the need for a holistic approach to fracture risk assessment that considers both bone and muscles. Another approach involves expediting the segmentation process of ROIs, thereby streamlining the downstream processing of biomarkers from CT imaging. The use of machine learning to segment the ROIs, where most of the laborious time and effort is spent on, is a viable option. One possible method to achieve this is through the application of machine learning algorithms to automatically segment the target anatomies in CT scans. This approach can effectively reduce the laborious time and effort typically required for manual segmentation and allow for remote identification of existing CT datasets to opportunistically screen for high fracture risk in the elderly [97]. Leveraging existing fractures as indicators for identifying future fractures is another feasible approach [98], as fractures detected by radiologists offer a swift and straightforward means of identifying individuals at risk. A previous study demonstrated that prevalent vertebral fractures can be a risk factor for future hip fractures (HR = 3.1) [44]. Furthermore, the severity of vertebral fractures may be a stronger indication for fracture risk (range of HR = 2.4–6.7) than the mere presence of fractures, owing to impaired bone microarchitecture [44]. This suggests that the presence of vertebral fractures and their severity could be independent biomarkers from areal BMD that relate to structural integrity [44]. However, where the objective is to prevent fractures before they even occur, this could be a secondary strategy in patients that were missed during primary screening for fracture risk.

There are some limitations associated with this review that are worth mentioning. Comparing the AUC values across studies for the meta-analysis could be seen as a limitation since the AUC is sensitive to the age compositions of the sample populations investigated [99]. Instead of AUC, sensitivity and specificity could have provided a clearer understanding on the discriminative ability of the evaluated CT-based biomarkers. However, most studies reported AUC values, making the AUC a feasible option for statistical analysis. The effects of osteoporosis medication treatments were not considered due to the high variability in study populations between studies. Differences in study designs, sample populations, sample sizes, types of fractures, time intervals between imaging and fractures, and other inter-study variations could also have had an impact on the applicability of the findings reported. The lack of racial heterogeneity (as most studies were conducted on Caucasian populations) also narrows the generalizability of the findings obtained in this review. The 95% CI of the AUC was estimated for studies that did not provide the information directly, which could have also influenced the pooled values obtained in the meta-analysis. Nevertheless, this review consolidated a comprehensive range of literature on opportunistic CT imaging for fracture assessment both quantitatively and qualitatively to better understand the performance of CT-based biomarkers.

In conclusion, volumetric BMD and FE-derived parameters obtained from opportunistic use of clinical routine CT data may perform better in terms of fracture risk than areal BMD. Furthermore, FE-derived parameters from CT imaging have the potential to improve fracture risk assessment. The use of artificial intelligence for segmentation and extraction of FE-derived parameters plus the incorporation of fall-related and muscle deterioration surrogates could be the way moving forward to achieve a cost-effective and efficient approach in the early and accurate detection of fracture risk.