Introduction

Although “exercise” is considered in the vast majority of clinical guidelines on osteoporosis (e.g., [1,2,3,4,5,6]), the actuality, completeness, and applicability of reliable recommendations vary widely. One reason for this unsatisfactory situation is the complexity of exercise with its numerous types of exercise, methods, exercise parameters, and training principles [7], which complicate a consistent summary of exercise effects on a given outcome [8]. Nevertheless, exercise recommendations that address at least training frequency and exercise intensity (i.e., strain magnitude, [9]) are crucial for recommendations on exercise protocols (e.g., [10,11,12,13]). With respect to the latter, two recent meta-analyses that summarized the effect of different exercise interventions on bone mineral density (BMD) ultimately failed to determine differences between exercise intensity categories on BMD in postmenopausal women [14, 15]. Even focusing on dynamic resistance exercise [15] as a relatively homogeneous type of exercise did not alter this result. While early basic research [9, 16, 17] established a crucial effect of strain magnitude (i.e., strain intensity) on bone parameters, recent research on molecular responseFootnote 1 to exercise (review in [18, 19]), on the other hand, revealed only limited evidence of a relevant effect of exercise intensity [20, 21].

Due to the aforementioned problem of very close interactions of factors related to participants and exercise characteristics, in the present meta-analysis, we focused exclusively on exercise studies that compared two study arms with different exercise intensities. We hypothesized that high-intensity exercise significantly increases BMD at the LS and the proximal femur ROI compared with low–moderate exercise.

Methods

This systematic review and meta-analysis is part of the Austria/German/Swiss S3 Guideline “körperliches Training zur Frakturprophylaxe” (physical exercise for the prevention of fractures; AWMF: 183—002).

Data sources and search strategy

We strictly followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [22]; and fully registered the study in PROSPERO (ID: CRD42021246415). Briefly, five electronic databases (PubMed/Medline, Scopus, Web of Science, Science Direct, Cochrane) were searched for articles published up to April 1, 2021, without language restrictions. Furthermore, databases were regularly monitored up to July 1, 2021.

The search strategy comprised a combination of population, intervention, and outcomes and was constructed around the key terms “Bone Mineral Density,” “Exercise,” and “Intensity.” Synonyms and subject headings (Mesh term for Medline) were used to sensitize the following search query: (“Bone density” or “Bone” or “BMD” or “Osteoporosis”) AND (“intensity” or “impact” or “load” or “dose–response”) AND (“Exercise” or “Training”). Following the primary search and duplicate exclusion, the same reviewer (SK) screened studies by title and abstracts against the eligibility criteria. A manual search in the reference lists of all included articles was conducted in an attempt to find new relevant studies. Authors of trials that were potentially eligible were contacted by e-mail for any missing data (e.g., mean change of BMD or standard deviation (SD)) or clarification of data presented.

Inclusion and exclusion criteria

We included studies/study arms with (1) randomized and non-randomized controlled trials with at least two exercise groups, comparing high vs low/moderate intensity; (2) involving adult participants of both sexes; (3) ≥ 6 months intervention duration; (4) areal BMD of the lumbar spine (LS), femoral neck (FN), and/or total hip (tH) region at baseline and follow-up assessment as determined by (5) dual-energy X-ray absorptiometry (DXA) or dual-photon absorptiometry (DPA); (6) studies with participants on hormone replacement therapy (HRT) were only included, if the number of subjects was comparable between the exercise groups.

We excluded studies with (1) novel exercise technologies (e.g., whole-body vibration, electromyostimulation); (2) participants with diseases that relevantly affect bone metabolism; (3) a focus on the synergistic/additive effect of exercise and pharmaceutic therapy; (4) double/multiple publications from one study; and (5) review articles, case reports, editorials, conference abstracts, and letters.

Data extraction

We designed a pre-piloted extraction form to collect relevant data. The form asked for details with regard to publication details, methodology, participant characteristics, exercise characteristics, risk assessment, and outcome characteristics at baseline and study end. Two reviewers (SK and WK) independently evaluated full-text articles and performed data extraction from the included studies; in case of disagreement, a third reviewer decided (SvS).

Outcome measures

The outcome of interest was change of (areal) BMD at LS-, FN-, and TH-ROI as assessed by DXA or DPA between baseline and follow-up. Due to missing data, we conducted a merged analysis for the proximal femur that include both FN and TH-BMD. However, we preferred to include TH-ROI [2] in the analyses when data for both ROIs were available. In cases of multiple BMD assessments, we considered only changes between the baseline and final BMD assessments.

Quality assessment

All studies included were independently assessed for risk of bias by two independent raters (SK and WK) using the Physiotherapy Evidence Database (PEDro)-scale [23] and the Tool for the assEssment of Study qualiTy and reporting in EXercise (TESTEX) provided by [24]. In case of disagreement, a third reviewer decided (SvS).

Data synthesis

For the detailed procedure for imputing missing standard deviations (SD), the reader is kindly referred to the comprehensive meta-analysis of Shojaa et al. [15]. Briefly, if the studies presented a confidence interval (CI) or standard errors (SE), they were converted to SD [25]. Furthermore, the authors (n = 9) were contacted to provide corresponding information. In cases of unreported or missing SD change, we used pre- and post-SD and correlation coefficients to impute SD of the change with the following formula according to the Cochrane handbook [25].

If the absolute mean difference of BMD values was missing, it was calculated by subtracting the post-mean from the pre-mean BMD value. In cases of multiple BMD assessments, we considered only changes between the baseline and final BMD assessments.

In order to determine the effect of exercise intensity, we only included studies with a high- and a low-intensity group (according to the eligibility criteria). We did not set our own thresholds for high and low intensity but used the intensity subgroups of each intervention as categorized by the authors, instead.

Statistical analysis

The statistical analysis and forest plots were performed applying the statistical software R (R Development Core Team) [26]. Standardized mean differences (SMDs) combined with the 95% confidence interval (95%-CI) were computed to estimate effect size (ES) value. Random-effects meta-analysis was performed using the metafor package [27]. Heterogeneity for the variability between studies was assessed by the Cochran Q test, in which p-values < 0.05 were considered significant. The level of heterogeneity was evaluated with the I2 statistic [25]. A sensitivity analysis was conducted to examine whether the overall result of the analysis was robust regarding the use of the imputed SDs. Potential publication bias was statistically assessed through regression test and the rank correlation between effect estimates and their standard errors, using the t-test and Kendall’s τ statistic respectively and visually assessed by inspecting funnel plots. To adjust the results for possible publication bias, we also conducted a trim and fill analysis using the L0 estimator proposed by Duval et al. [28]. The present subgroup analyses were conducted as a mixed-effects meta-analysis with “study duration” (≤ 7 months vs > 7 months) and “type of exercise “ (RT vs impact exercise) as potential moderators of the effect of exercise intensity on BMD (Fig. 1).

Fig. 1
figure 1

Flow diagram of search process according to PRISMA

Results

In summary, 11 exercise studies with 26 study arms were included in the analysis [29,30,31,32,33,34,35,36,37,38,39]. All studies randomly assigned participants to the (exercise) groups. Except for one study [38] that focused on women with osteopenia, no other studies applied bone status as an eligibility criterion. All the studies included middle-aged/older people; 8 studies focused on postmenopausal women and three studies involved men and women in their trials. One study listed data for men and women separately. All the exercise trials compared one group that applied high-intensity (HI) exercise with one group that was scheduled moderate–low (LI) intensity exercise. Initial sample size varied between 5 [33] and 46 [30] participants per group; dropout rates ranged between 0 [34] and 47% [37]. The pooled number of participants (initial sample size) was 251 in the high-intensity exercise and 265 in the low-intensity group respectively.

Of importance, in four studies [30,31,32,33, 39], women (up to 75% [30]) received hormone replacement therapy (HRT), albeit with no relevant difference between the groups. All but one studyFootnote 2 [38] included participants with no regular exercise or at least no RT exposure for at least 6 months prior to study start (Tables 1 and 2). The study of von Stengel et al. [38] was based on a previous exercise study [40] that applied mixed high-impact/RT training for 3 years prior to the present trial. The studies were conducted in Brazil [32], Japan [34], Germany [38], and the USA [29,30,31, 33, 35,36,37, 39]. Table 1 displays study and participant characteristics of the exercise trials included.

Table 1 Baseline characteristics of included studies (n = 11)
Table 2 Exercise characteristic of included studies (n = 11)

Intervention characteristics

Cholecalciferol, calcium supplementation

Dietary analysis showed that only one study provided vitamin D and calcium supplementation (125 IU/day Vit-D, 600 mg/day Ca) for its exercise groups [29]. In another study [30], participants with low calcium intake [41] were given instructions on how to increase their calcium intake.

Exercise intervention characteristics

Table 2 gives exercise characteristics of the included studies listed in alphabetical order. Seven studies with 16 study arms focused on resistance exercise (RT) [29, 30, 32, 35,36,37, 39]; three studies with 6 study arms applied weight-bearing/impact exercise [31, 33, 34]. One study prescribed a mixed weight-bearing/impact/RT exercise protocol [38] but exercise intensity differed for the RT sequence only, and so the study was included in the meta-regression as an RT study. Study length varied between 6 [29, 32, 39] and 24 months [38]. All RT trials applied a training frequency of three sessions per week (s/w): however, besides exercise intensity, the study of Bemben et al. [30] also compared the effect of 2 vs 3 s/w on BMD. Although not consistently listed, attendance rate ranged between 70 and 94%; thus, the net training frequency varied between 1.6 [30] and 2.8 s/w [29] for RT studies. Three studies that applied weight-bearing exercise (i.e., brisk walking; [31, 34]) or impact exercise [33] prescribed 3–5 s/w, but adjusted for training attendance, 2.4–3.9 s/w were actually performed. Finally, the mixed training protocol of von Stengel et al. [38] provided for 4 s/w of which 2.7 s/w were completed. With the exception of Vincent et al. [39], all the other RT studies focused on a multiple-set RT. Volume of brisk walking ranged from (attendance adjusted) ≈90 to 200 min/w; details of volume of impact loading in the corresponding study [33] were not provided. With a few exceptions of shorter study duration (i.e., 7 months) [31, 34], all the other studies regularly determined 1 RM or VO2max in order to adjust relative exercise intensity (i.e., principle of progression).

Relative exercise intensity of the RT studies was 40–60% 1RM for the low and 80–90% 1RM for the high-intensity exercise groups. One study did not fit perfectly into this scheme since it emphasized strain rate rather than strain magnitude. However, due to the explosive concentric movement velocity, the loading magnitude (i.e., exercise intensity) was 16% higher compared with the slow velocity approach (4 s-2 s-4 s) [38].

Weight-bearing exercise intensity as implemented by average walking velocity was 5.5 and 6.2 km/h in the low and 6.4 to 7.2 km/h in the high-intensity group. Differences in ground reaction forces (GRF) for impact exercise [33] were GRF < 1.5 × (low) vs > 2 × bodyweight (high intensity).

Outcome characteristics

All but one study [35] determined BMD at the lumbar spine. In parallel, eleven comparisons addressed BMD at the hip-ROI[29,30,31,32, 35,36,37,38,39].Footnote 3 Borer et al. [31] analyzed the LS and hip region based on a total body scan; thus, the total pelvis ROI (and not the hip-ROI) was included in the analysis. Apart from one study [33] that applied DPA, all the others used DXA.

Seven of the 12 high-intensity subgroups [30, 33, 34, 36, 37] that addressed BMD at the lumbar spine reported increases in BMD, while four low-intensity subgroups [30, 32, 37] reported positive changes.

BMD of the hip increased in eight of the 11 high-intensity subgroups [30,31,32, 36, 37, 39] and in eight low-intensity subgroups [29, 30, 35,36,37, 39].

Methodologic quality

Pedro and TESTEX results of the included studies are listed in Table 3. Methodologic quality of the trials ranged from 3 to 5 Pedro score points (Table 3), and using the TESTEX score it ranged from 8 to 10 points. Because the trials were very similar regarding quality assessment, a subgroup analysis for methods-related quality was not performed.

Table 3 Assessment of risk of bias for included studies

Meta-analysis outcomes

Effects of low vs high exercise intensity on lumbar spine BMD

Figure 2 displays results of high vs low exercise intensity on LS-BMD. The SMD of the included trials ranged widely from 1.26 in favor of low-intensity study arms to 1.27 in favor of the high-intensity study arms. In summary, the pooled estimate of random-effect analysis revealed a slightly more favorable effect of high-intensity exercise on LS-BMD (0.19, 95%-CI: 0.61 to − 0.23), but the difference between the groups was far from significant (p = 0.373). We also observed a substantial level of heterogeneity between the trials (I2 = 71%) (Fig. 2).

Fig. 2
figure 2

Forest plot of data on exercise intensity effects on BMD of the lumbar spine. The data are shown as pooled standard mean difference (SMD) with 95%-CI for changes in the high- (HI) vs low-intensity (LI) group

Sensitivity analysis was conducted to examine whether the overall result of the analysis was robust regarding the use of the imputed SDs. In summary, the analysis revealed largely comparable non-significant effects on exercise intensity independently of whether the mean (Fig. 2), minimum (maximum SD: SMD 0.11, 95% CI: 0.80 to − 0.59), or maximum correlation (minimum SD: SMD 0.22, 95% CI: 0.57 to − 0.13) was imputed.

Effects of low vs high exercise intensity on proximal femur BMD

Results of high vs low exercise intensity on BMD of the hip are provided in Fig. 3. The SMD of the included trials ranged from 0.22 in favor of low-intensity study arms to 1.74 in favor of the high-intensity study arms. In summary, the pooled estimate of random-effect analysis determined a slightly more favorable effect of high-intensity exercise protocols compared with their low-intensity peers (SMD: 0.17, 95%-CI: 0.38 to − 0.04), but here too the difference is not significant (p = 0.109). In contrast to BMD LS, levels of heterogeneity of trials within the analysis were low (I2 = 0%) for the hip-ROI.

Fig. 3
figure 3

Forest plot of data on exercise intensity effects on BMD of the hip. The data are shown as pooled standard mean difference (SMD) with 95%-CI for changes in the high- (HI) vs low-intensity (LI) group

Sensitivity analysis did not reveal different or significant effects on exercise intensity effects on hip-BMD upon imputation of the mean (Fig. 3), minimum (maximum SD: 0.15, 0.36 to − 0.05), or maximum correlation (minimum SD: 0.18, 0.39 to − 0.03).

Assessment of small study effects

BMD changes at the lumbar spine

The funnel plot showed no relevant evidence of a small study effect/publication bias (Fig. 4). Additionally, the regression (p = 0.99) and rank (p = 1.00) correlation tests for funnel plot asymmetry did not indicate any significant asymmetry.

Fig. 4
figure 4

Funnel plot of trials that address the lumbar spine-ROI

BMD changes at the hip-ROI

The trim and fill analysis revealed no evidence for a small study effect/publication bias (Fig. 5). This was also confirmed by regression (p = 0.168) and rank (p = 0.164) correlation tests for funnel plot asymmetry, which did not indicate significant asymmetry.

Fig. 5
figure 5

Funnel plot of trials that address the hip-ROI

Subgroup analyses

As reported, subgroup analyses were applied for the study duration (≤ 7 months vs. > 7 months) and the type of exercise (RT vs. WB).

Effect of study duration on low- vs. high-intensity exercise effects on BMD at the LS and hip

Although the effect of higher exercise intensity on BMD LS in studies > 7 months [30, 33, 35, 37, 38] was considerably higher (SMD: 0.27, 95%CI: 0.56 to − 0.02) compared to studies of 7 months or less [29, 31, 32, 34, 36, 39] (0.07, 0.83 to − 0.69), we observed no significant difference (p = 0.060) (Fig. 6). On the other hand, the analysis of studies ≤ 7 months revealed a substantial level of heterogeneity (I2: 81%) in contrast to the longer studies (I2: 2%).

Fig. 6
figure 6

Forest plot of data on the effect of study duration on exercise intensity effects on LS-BMD. The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in the high- (HI) vs low-intensity (LI) group

Surprisingly, the corresponding results of higher exercise intensity on BMD of the hip differed from the results on BMD-LS. Although the difference was not significant (p = 0.136), trials of shorter duration provided a considerably higher effect size (SMD: 0.27, 95%-CI: 0.61 to − 0.02) compared to exercise studies of 8 months and longer (0.07, 0.83 to − 0.69). Analysis for studies ≤ 7 months indicated moderate (I2: 45%) studies > 7 months low levels of heterogeneity (I2: 0%) between trials (Fig. 7).

Fig. 7
figure 7

Forest plot of data on the effect of study duration on exercise intensity effects on hip-BMD. The data are shown as pooled standard mean difference (SMD) with 95%-CI for changes in the high- (HI) vs low-intensity (LI) group

Effect of type of exercise on low- vs. high-intensity exercise effects on BMD at the LS and hip

In summary, the effect of high-intensity exercise on BMD at the LS was slightly more pronounced in RT-type exercise [29, 30, 32, 35,36,37,38,39] (SMD: 0.22, 95%-CI: − 0.22 to 0.66) compared with WB/impact-exercise types [31, 33, 34] (0.07, − 1.24 to 1.38) (Fig. 8), although differences between the groups were far from significant (p = 0.802). We observed a substantial level of heterogeneity for both analyses (RT type: I2: 70.1%, WB: I2: 77.7%) (Fig. 8).

Fig. 8
figure 8

Forest plot of data on the effect of “type of exercise” on exercise intensity effects on LS-BMD. The data are shown as pooled standard mean difference (SMD) with 95%CI for changes in the high- (HI) vs low-intensity (LI) group

Only one study [31] reported the effect of WB exercise on the issue of exercise intensity for the hip-ROIFootnote 4 (0.97, − 0.08 to 2.02); thus, the relevance of the group difference (RT vs WB) might be rather limited (p = 0.200). With respect to RT trials, we observed a non-significant effect on hip BMD (p = 0.206) in favor of the high-intensity exercise group (0.14, − 0.08 to 0.35). Levels of heterogeneity for the RT analysis can be considered negligible (I2: 0%).

Discussion

Generating reliable exercise recommendations is a difficult task [8] not only, but particularly, in the area of exercise and bone health [15]. Apart from varying participant characteristics, exercise characteristics especially generate a complex and nigh-on inextricable mixture of determinants with potential effects on BMD [42]. In order to reliably address the relevance of exercise intensity on BMD changes, we focused on exercise trials that concentrated exclusively on the comparison of study arms with different exercise intensity to avoid such confounding interactions. In summary, our meta-analysis of comparative trials did not provide significant evidence for a superior effect of high vs low exercise intensity for LS- (SMD: 0.19, 95%-CI: 0.61 to − 0.23) or hip-BMD (0.17; 0.38 to − 0.04). We were not the first to look at a direct comparison of high vs low exercise intensity on BMD. Souza et al. [43] evaluated the effects of high (≥ 70% 1RM) vs low load (< 70%) resistance exercise (6 studies) and reported “similar effects” on BMD at the LS and hip. Aware of this result, we extended our analysis to “weight bearing/impact” exercise in particular. Although this approach complicates the proper categorization of exercise intensity, we feel that the inclusion of other types of exercise with relevance on bone [11, 12, 42] will have provided additional evidence on the issue of exercise intensity and BMD changes.

Since most relevant exercise aspects (i.e., site specificity, progression of exercise intensity, training frequency) were either narrowly distributed (Table 2) or might be negligible in BMD studies ≤ 7 months (e.g., Figs. 6 and 7), our subgroup analysis focused on “study duration” and “type of exercise.” Taking into account that formation modeling induced by heavy mechanical loading [44] might not even be completed before ≈4 months [45], short exercise studies might not be able to determine the fully mineralized bone matrix when progressively applying high mechanical strain. Our results are inconclusive. Although non-significant, we observed more favorable effects for higher exercise intensity in studies longer than 7 months at the LS but the opposite effect at the hip-ROI (Figs. 6 and 7).

Another important moderator of exercise intensities might be the “type of exercise,” categorized here into “weight bearing/impact” vs “resistance training (RT)” exercise.Footnote 5 While both types of exercise are similarly effective in increasing LS and hip-BMD [47], the rationale for our approach was primarily based on the less pronounced difference of low vs high strain magnitude in two [31, 34] of three WB/impact studies.Footnote 6 Being aware of the low statistical power due to the relative predominance of RT studies in this contribution, we again determined no significant differences for exercise intensity at the LS when considering type of exercise as a moderator (Fig. 8). Reviewing the RT studies in detail, it is striking, however, that in contrast to the high-intensity groups, the majority of low-intensity study arms [29, 30, 36, 37, 39] applied low absolute intensity (“effort”), i.e., the proportion of reps to relative intensity (%1RM, Table 2) is far from repetition maximum or work to failure [48]. Thus, in contrast to low-load induced muscular hypertrophy [49], high absolute intensity [48] is obviously not the dominant trigger for bone adaptation, which is an important message for practitioners. There is also some evidence that strain magnitude slightly below bone adaptive threshold might be compensated by more loading cycles ([50, 51], review in [42]). This aspect refers to the RT studies [29, 30, 32, 35,36,37], which usually applied about twice as many repetitions in the low, compared to the high-intensity subgroup (10–20 vs 2–10 reps).

Although our comparative approach might have largely excluded confounders based on participant and exercise characteristics, some methodological limitations and study particularities might have nevertheless affected our study results. (1) Considering that meta-analytic results depend on the studies included [52], we have to briefly discuss our eligibility criteria. First, we opted to include WB-/impact and RT trials in our analysis. While all but one RT trial (see below) focused on strain magnitude, the dominant osteo-anabolic aspect of brisk walking [31, 34] or hopping/jumping [33] might be strain rate.Footnote 7 While (dynamic) RT addressed strain rate separately by movement velocity[54],Footnote 8 WB/impact trials prescribed strain rate by the type or mode of exercise. We included exercise trials that might not perfectly address the issue of exercise intensity on BMD at LS- and/or hip-BMD. This particularly relates to the RT exercise trials of von Stengel et al. [38] that predominantly focused on strain rate, but also to the study of Borer et al. [31] that generated LS and hip data from a whole-body scan. While the ROIs in particular for the hip-ROI (i.e., proximal femur vs pelvis) varied considerably, the general effect of low vs high exercise intensity should be comparable. (2) Differences in exercise intensity of some studies were less pronounced. Apart from the two brisk walking studies [31, 34], Brentano et al. [32] in particular applied a comparable exercise intensity during the initial 2–3 months of their 6-month RT study. (3) With one exception [38], all the studies were quite short (6–12 months). Presuming that most unexpected (i.e., “abnormal”) strain compositions which stress the non-adapted bone may generate positive effects, we hypothesized that the relevance of higher mechanical strain will increase after the initial phase of bone conditioning. However, our subgroup analysis on study duration displayed conflicting results (Figs. 6 and 7). (4) The eligible studies were somewhat old (1992–2011), indicating that this topic is regarded as being sufficiently evaluated. We do not agree, instead we feel that well-designed and adequately powered studies should address the important aspect of exercise intensity much more precisely. (5) We observed a substantial level of heterogeneity between the trials at the LS- but not for the hip-ROI (I2: 71.1 vs 0%). Surprisingly, the two studies [31, 32] that contributed most to this finding revealed a significant superiority of low intensity at the LS-BMD (Fig. 2), while the effect on hip-ROI was the opposite (Fig. 3). We are unable to explain this finding by participant or exercise characteristics.

Finally, our study methodological design does not allow the general effect of low, moderate, or high exercise intensity (compared with sedentary control groups) on BMD to be determined. In contrast, the recent study of Kistler-Fischbacher et al. [14] provided significant positive evidence of exercise effects on BMD largely independently of exercise intensity.Footnote 9 From a pragmatic point of view, this finding is very welcome for people unable or unmotivated to conduct high-intensity exercise programs for bone health.

Conclusion

In summary, the main finding of this review and meta-analysis of comparative studies with two study arms was that there is insufficient evidence to claim a superior effect of high-intensity exercise on areal BMD at the lumbar spine and hip in people 50 years and older. Considering the results of more general meta-analyses that the positive effect of exercise on BMD was largely independent of whether low, moderate, or high intensity was applied [14], varying exercise intensity might be a promising option to address the issue of exercise intensities in intervention studies.