Introduction

Osteoporosis, a disease characterised by low bone mass and increased fracture risk [1], is associated with greater disability than that caused by many cancers and chronic non-communicable diseases [2]. The condition affects 3.2 million people aged over 50 in the UK alone and conveys a substantial economic burden, estimated to cost the UK around £5.5 billion by 2025 [3]. Despite being considered primarily a condition of old age [4], around 60% of osteoporosis risk can be attributed to the amount of bone that is acquired by attainment of the peak bone mass (PBM) in early adulthood [5]. Whilst genetics is suggested to account for 60–80% of the variability in PBM, physical activity (PA) during growth is one of the most important factors influencing the remaining modifiable component [6]. However, despite many observational [7,8,9] and intervention studies [10,11,12] demonstrating an osteogenic effect from PA, the recommended dose of PA (frequency, intensity, duration and type) that benefits bone health in children and adolescents remains unclear [6].

Summarising the precise dose–response effect that PA has on bone health may be hampered by the study designs reviewed and the variety of methods used to estimate PA. Many previous reviews summarising the relationship between PA and bone health have focused on PA or exercise interventions [13,14,15,16,17], which only epitomise a small subset of activity behaviour (acute changes over a short duration) and do not represent everyday habitual activity in the general population. When summarising the effect of habitual PA on bone health, the majority of observational studies included in previous reviews [18, 19] have mostly used self-reported methods to obtain information about activity behaviour (8/9 studies in [19] and 7/10 in [18]). A problem with self-reported methods of estimating PA is that they provide imprecise information regarding the intensity, duration, frequency and pattern of accumulating activity, especially in children, due to their lower cognitive function and inability to accurately recall information and estimate time [20]. In recent years, the use of accelerometers to monitor habitual PA in relation to health outcomes in children has become commonplace. Accelerometers are small, lightweight and unobtrusive and allow several days or weeks of PA to be assessed over short sampling intervals of minutes or seconds [21]. Whilst accelerometers provide an objective measure of PA free from the random and systematic errors associated with self-report [22], there still remain several methodological challenges related to the collection, processing and interpretation of the acceleration data [23].

As minimal attention has been directed towards standardising methodological approaches [21], researchers are often required to make decisions regarding the accelerometer model (which may output raw and/or proprietary count-based data), wear criteria (definition of a valid day, non-wear time within a day, number of valid days required for inclusion) and whether to analyse a raw acceleration output directly, average outputs (raw/counts) over a certain length of epoch, classify the magnitude of the output into categories in an attempt to reflect different physiological intensities and if so, what cut-points to apply to facilitate this [23]. All of these may have a bearing on the quantity and quality of accelerometer data obtained [23]. Many studies evaluating relationships between accelerometer-derived estimates of habitual PA and health outcomes also only report activity as moderate-to-vigorous PA (MVPA)—a metric that is included in PA guidelines and is proposed to reflect an intensity of activity that places a moderate-vigorous cardiovascular (aerobic) demand on the body. Activities upwards from and including brisk walking are suggested to illicit this cardiovascular demand. Whilst significant, positive associations between MVPA and bone health outcomes have been observed [8, 9], it is likely that these associations are driven by activities of a more vigorous intensity, rather than those at the lower end of moderate intensity, such as walking, as walking has been shown to be of little or no benefit to bone health [24]. A broad MVPA classification may therefore make it difficult to discern the precise threshold of intensity driving an association between PA and bone and could also risk a non-osteogenic type of activity (e.g. walking) being recommended to promote bone health.

A recently published systematic review [25] examining the associations between bone health outcomes and objectively measured PA intensities (sedentary, light (LPA), moderate (MPA), MVPA, vigorous (VPA) and total PA) in children and adolescents demonstrated that both MPA and VPA positively predicted bone development in this population. However, the magnitude of associations between these intensities and bone outcomes within studies was not compared. It is therefore not clear whether there is a consistently greater benefit of VPA over and above MPA in relation to bone health outcomes. The independent associations and greater benefits of objectively measured VPA over other PA intensities such as MPA or MVPA have been recently recognised for several other health outcomes in youth [26]. When looking at the magnitude of the relationships between MPA and/or MVPA and VPA, Gralla et al. [26] found that VPA was consistently a stronger predictor of improved body composition and fitness in comparison to MPA and/or MVPA. These findings emphasise the importance of stratifying for higher intensity activity and assessing the strength of associations between outcomes and independent activity intensities when trying to identify more precise dose–response relationships. With particular reference to bone health, adaptations in bone are threshold driven and bought about by activities that create dynamic, rapidly applied loads with a high magnitude of impact. Activities that elicit higher impacts provide a larger osteogenic effect [27]. Therefore, when bone outcomes are of interest, it would be particularly important to summarise findings from studies that have objectively and independently assessed the association of higher intensity activity over and above other intensities of habitual activity.

An assessment of the independent contributions that both moderate- (commonly referred to as MPA/MVPA) and high-intensity (commonly referred to as VPA) activity have on bone health will likely be influenced by the different accelerometer methods used to obtain PA data between studies. A summary of the range of methods employed will provide essential information to help identify potential methodological issues in the objective measurement of PA in relation to bone health. It remains important, however, to establish whether a particular intensity appears to be consistently more beneficial to the bone. This combined level of information will help to inform the direction that future research in the objective assessment of habitual PA in relation to bone health must take to improve the precision of measuring bone-specific PA and will also facilitate the identification of more specific dose–response relationships between PA and bone health in children and adolescents.

The review will therefore (1) summarise the accelerometry data collection and processing methods used in studies to estimate habitual PA when relating it to bone outcomes in children and adolescents; (2) determine whether habitual PA of at least moderate intensity (MPA/MVPA and VPA) is related to bone health in children and adolescents (independently of activities at a lower intensity); and (3) despite variations in accelerometer methods used to capture the data, determine whether the magnitude of association between PA and bone outcome measures is consistently stronger for a particular intensity of habitual PA (MPA/MVPA or VPA).

Methods

This review was guided by the Centre for Reviews and Dissemination’s guidance for undertaking reviews [28] and the COSMOS-E guidelines on conducting systematic reviews and meta-analyses of observational studies of aetiology [29] and is reported in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines [30]. The review protocol is registered on the PROSPERO International Prospective Register of Systematic Reviews (https://www.crd.york.ac.uk/prospero/) under the registration number CRD42018106493.

Search strategy

A detailed systematic electronic search combining free text and Medical Subject Headings (MesH) was conducted in several electronic databases (MEDLINE (Ovid) 1946–present with MEDLINE (OVID) in process and other non-indexed citations, EMBASE, Web of Science, SPORTDiscus and Cochrane Central Register of Controlled Trials) from their commencement up until May 4, 2020. Search terms relevant to physical activity (e.g. physical activity, habitual activity, MVPA, accelerometer, activity monitor, motion sensor) AND bone health (e.g. bone health, bone density, bone strength, bone structure) OR bone imaging methods (e.g. DXA or DEXA, quantitative ultrasound, quantitative tomography) AND children/adolescents (e.g. child, adolescent, paediatric or pediatric, youth) were used. An example of a full search conducted in the MEDLINE database is given in the Online Resource 1. The Yale MeSH analyser [31] was used on a selection of potentially relevant studies to identify important MeSH terms to include in the search and ensure vital terms had not been missed. There were no limits placed upon the search; however, only articles published in the English language were considered for inclusion. Review articles, editorials, conference abstracts or proceedings, unpublished articles or dissertations were not considered for inclusion. Manual searches of the reference lists of included papers and relevant review articles were conducted to identify any additional articles.

Study selection and inclusion criteria

The review inclusion criteria were guided by the Population, Exposure, Control and Outcome(s) format outlined in the COSMOS-E guidelines for systematic reviews of epidemiological studies, which are in line with the Population, Exposure, Comparator, Outcome(s) and Study characteristics (PECOS) framework in the PRISMA guidelines. Two reviewers independently screened the titles and abstracts of all results from the electronic database search according to the pre-defined inclusion criteria. Any discrepancies were resolved through discussion and abstracts that were not eligible were discarded. Full-text articles for potentially relevant studies were obtained and screened by the same two reviewers. Studies that met the inclusion criteria were selected and included in the review. A third investigator was consulted if the reviewers were unable to reach a consensus on discrepancies.

For inclusion in the review, studies were required to have included generally healthy children and adolescents aged ≤ 18 years (including those who were overweight/obese), objectively measured habitual PA using an accelerometer, and to have reported VPA (or high-intensity activity) and MPA and/or MVPA (or moderate-intensity activity) on a continuous scale (e.g. minutes per day, proportion of total time, number of peaks per day) and measured their respective associations with at least one measure of bone health (e.g. strength, mass, structure). Studies reporting associations between bone outcomes and activity that were of a moderate (jogging/slow running) and high-intensity (e.g. faster running/jumping) but were not defined in terms of VPA and MPA and/or MVPA were included as a comparison of activity intensity could still be made and descriptions of activities within bands of intensity allowed respective bands to be included in MPA and VPA categories for the purposes of this review. Since habitual PA includes all types of bodily movement that result in energy expenditure [32], studies focusing solely on a particular subset of PA (e.g. exercise, sport, leisure time PA, school-time PA) were excluded as this does not portray habitual PA in its entirety and therefore does not concur with the aims of the review. Studies were required to be observational in design (cross-sectional and prospective), but intervention studies were considered for inclusion if associations between VPA and MPA and/or MVPA (or other intensity definitions) and bone outcomes had been conducted at baseline (cross-sectional analyses) or if there was a separate control group that could be considered as a cross-sectional or prospective analysis. If a number of studies drawing from the same cohort were identified, all were considered for inclusion in the review. Those measuring participants at different time points or that reported on different outcomes obtained through a separate imaging method or at additional anatomical sites were included. When multiple studies from the same cohort had reported on the same or similar outcomes with comparable analyses, the study that had the most complete descriptive information on the sample, activity intensities, bone outcomes and their respective associations that most closely coincided with the aims of the review was kept for inclusion. Studies were not excluded based on the imaging tool used to assess bone outcomes.

Quality assessment

Following exclusion of studies that did not meet the inclusion criteria, the quality and risk of bias of included studies were assessed by two independent investigators using the National Institute of Health ‘Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies’ [33]. This consists of 14 components that relate to the design of the study, selection bias, bias in both the exposure and outcome (habitual PA and bone health outcomes), follow-up and whether statistical analyses adjusted for key confounders. Characteristics including age, sex, ethnicity, maturational stage and skeletal or body size should all be considered for statistical adjustment to reduce residual variability in regression models and improve statistical power, since they are associated with bone measures during growth [6]. Studies were given an overall rating of ‘poor’, ‘fair’ or ‘good’ based on the responses to the 14 items (response can be yes, no, cannot determine, not reported or not applicable for each of the 14 items). Item 9, which assesses whether the exposure measure was clearly defined, valid and reliable, was modified to account for the accelerometer data inclusion criteria. Amongst children and adolescents, the minimum number of days needed to achieve a reliable depiction of habitual PA ranges from 4 to 9 days [34]. It is also accepted that 10 h of wear time is sufficient to qualify as a valid day [22], so studies including participants that had PA data for ≥ 4 days with ≥ 10 h of wear were given a ‘yes’ response to item 9, and those with fewer than this received a ‘no’ response to this item. Studies were not excluded based on the results of the quality assessment; however, study quality was taken into account when interpreting findings.

Data extraction

A structured form was developed to extract the following data: authors, title, study design, participants (sample size, age, sex, maturity status), accelerometer measurement procedures (make and model, epoch length, wear location, number of days wear, valid days for inclusion, definition of non-wear time, MPA, MVPA and VPA cut-points used (or other intensity categories presented and how they were defined)), amount of activity for each intensity (e.g. minutes of MPA, MVPA and VPA), bone imaging tools, site(s) assessed and outcomes reported, statistical analyses and covariates, and observed associations (R2, R2 change, r, β, Std. β) between MPA, MVPA and VPA (or other moderate-/high-intensity PA classifications) and bone outcomes and their level of significance (p-value). When more than one regression model was presented, data were extracted for the most adjusted model. If the required information was not presented in the article, an email was sent to the corresponding author requesting it. If there was no response, a reminder email was sent, and if no reply was received, only the information provided in the paper was presented. Data extraction was cross-checked by reviewer 2.

Data synthesis and analysis

Due to the large variability in the methods used to assess bone outcomes (e.g. DXA, pQCT, QUS), the anatomical sites assessed (e.g. total body, femoral neck, tibia, radius, calcaneus) and numerous outcomes reported (e.g. bone mineral content, bone mineral density, bone stiffness, cortical density, polar strength-strain index), the heterogeneity between studies meant that the results of many of the studies were not directly comparable and therefore it was considered inappropriate to conduct a meta-analysis. In the absence of a fully quantitative meta-analysis, a semi-quantitative approach was employed, using chi-square tests to determine which PA intensity (MPA, MVPA or VPA) had the greatest proportion of ‘statistically significant associations’ with a bone outcome and which intensity had the greatest proportion of ‘strongest within-study associations’. These two ‘proportions’ were derived from a two-stage ‘vote’ counting procedure: Stage 1 involved awarding a ‘vote’ to any ‘PA intensity vs bone outcome’ association that was statistically significant (p < 0.05). All activity intensities within a study could potentially receive a vote at this stage. Stage 2 compared the magnitude of the statistically significant ‘PA intensity vs bone outcome’ associations within a study and only the intensity—MPA and/or MVPA and VPA—with the strongest association with the bone outcome received a vote (total count per analysis at this stage could only be 0 or 1). Only positive associations could be deemed as the ‘strongest association’ as they are consistent with a greater benefit to bone outcomes. When the association was statistically significant and of the same magnitude for two PA intensities, each intensity was counted as a vote in stage 1, but no vote was cast at stage 2. When negative associations were observed, a vote was counted in stage 1, but no vote was cast at stage 2. To present the results for vote-counting, studies were grouped based on the method used to assess bone outcomes and were further organised by anatomical site. The results for the significant counts or most strongly associated counts are expressed as a percentage of the total number of counts available (%(n/N); total counts are the number of counts regardless of statistical significance) for each intensity as studies differed in the combination of intensities reported and therefore the total number of counts available was different for MPA, MVPA and VPA, respectively. A 3 × 2 chi-square (χ2) test was used to determine whether the proportions of ‘statistically significant associations’ and ‘strongest within-study associations’ vote counts differed between the three PA intensities. When this omnibus test determined that there were statistically significant differences (p < 0.05) between at least two of the PA intensities, a priori follow-up analyses were carried out in the form of two 2 × 2 chi-square tests—‘MPA vs VPA’ and ‘MVPA vs VPA’. The observed p-values of these 2 × 2 tests were multiplied by ‘2’ in order to adjust for multiple testing, creating a new Bonferroni-adjusted p-value. Fisher’s exact tests were used where the data violated the assumptions of a chi-square test. Table 1 includes all reported associations between PA and bone outcomes, regardless of statistical significance.

Table 1 Study characteristics, accelerometry methods used and the relationships reported between MPA and/or MVPA and VPA with measures of bone health in children and adolescents for the 30 reviewed studies. Studies are grouped and presented by imaging method and study type (cross-sectional or longitudinal/prospective) then by epoch length (≥ 60 s, ≤ 15 s, raw data) and alphabetically within each epoch group

Results

The initial search strategy identified 10,017 potentially relevant articles. Following the removal of duplicates, 7389 titles and abstracts were screened and 7215 of these were not deemed to be eligible, leaving 174 articles for full-text review. Of these studies, 33 satisfied the pre-defined inclusion criteria. Four of the studies [35,36,37,38] were further excluded as multiple studies had reported on the same/similar outcomes using participants from the same cohort. The study with the most complete descriptive information on the sample, activity intensities and their respective associations with bone outcomes was kept for inclusion. An additional study [39] was obtained through the hand searching of reference lists of included studies and relevant reviews, making a total of 30 studies included in the review. A PRISMA flow diagram detailing the stages of study selection and reasons for exclusion of full texts can be seen in Fig. 1.

Fig. 1
figure 1

Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) flow diagram of the study selection process. PA physical activity, MPA moderate physical activity, MVPA moderate-to-vigorous physical activity, VPA vigorous physical activity

Study characteristics and quality assessment

Study characteristics are presented in Table 1. Of the 30 included studies, 26 were cross-sectional [24, 39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63], three were longitudinal [64,65,66] and one was prospective in design [67]. Four studies included participants from the Avon Longitudinal Study of Parents and Children (ALSPAC) [24, 45, 46, 60], two from the Iowa Bone Development Study [50, 65], two from the European Youth Heart Study [42, 55] and two from the Children’s body composition and stress (CHiBS) study [44, 56]. Other studies also included participants from the Healthy Lifestyle in Europe by Nutrition in Adolescence (HELENA) study [47], the Copenhagen School Child Intervention Study (CoSCIS) [48] and the Identification and prevention of Dietary- and lifestyle-induced health EFfects In Children and infantS (IDEFICS) study [49]. The mean age of participants ranged from 4 to 18 years of age, and sample sizes ranged between 38 and 4465, with a mean sample size of 687 participants. The majority of studies (N = 26/30) included both boys and girls in their sample, with one study including only girls [63] and three studies including only boys [52, 66, 67]. Eleven studies assessed maturity using Tanner staging (self-reported in [24, 39, 42, 60, 63, 66, 67] and assessed by a physician in [44, 47, 55, 56]), five studies estimated it from maturity offset prediction equations [41, 51, 59, 64, 65] and three studies used skeletal age [43, 52, 66]. Two studies also assessed the presence of menarche via self-report [59, 62]. More detail regarding the characteristics of participants in included studies can be found in Table 1.

The bone imaging methods used in the included studies and anatomical sites assessed are summarised in Table 2. The majority of studies (n = 21/30) measured bone outcomes using dual-energy x-ray absorptiometry (DXA), with six studies using peripheral quantitative computed tomography (pQCT) and five studies using quantitative ultrasound (QUS). Two studies used both DXA and pQCT to measure bone outcomes [39, 57]. The respective associations between the PA intensities and all reported bone outcomes can be found in Table 1.

Table 2 A summary of the bone imaging methods used and anatomical sites assessed in all studies included in the review (N = 30)

Studies included in this review were required to have monitored habitual PA objectively using an accelerometer and to have reported both moderate and high intensities of activity. Accelerometer-derived methods used to collect and process this data are presented in Table 3. Fourteen of the included studies reported activity associations with MPA, MVPA and VPA; six reported MVPA and VPA; five reported MPA and VPA; and two studies reported MPA, MVPA, VPA and very vigorous PA (VVPA) (Table 3). Two studies [45, 46] did not report activity as MPA and/or MVPA and VPA, but instead used raw data and calculated the average number of acceleration peaks per day across different ‘impact bands’. Since the impact bands related to activities of various intensities (e.g. walking, brisk walking, running and jumping), these studies were included as it allowed an intensity comparison to be made between activities that are frequently classified elsewhere as MPA or VPA. One other study [48] reported various thresholds of activity intensity that were described as vigorous; however, since the lower thresholds were similar in magnitude to moderate intensity thresholds used in other studies already included in the review, this study was also included and an intensity comparison was made. Activity was reported as minutes per day in most (n = 24/30) studies [24, 39,40,41,42,43, 47, 49,50,51,52,53,54,55, 58,59,60,61,62,63,64,65,66,67], or as the proportion of recording time [44, 48, 56, 57], or number of peaks per day across various impact bands [45, 46].

Table 3 A summary of the accelerometer data collection and processing methods used in all studies included in the review (N = 30)

Quality of included studies

The majority of included studies (n = 20/30) were awarded a ‘fair’ quality rating [24, 39,40,41,42, 44,45,46,47,48,49,50,51,52,53,54,55,56, 59, 60], with four studies being deemed to be of ‘good’ quality [64,65,66,67] and six studies deemed to be of ‘poor’ quality [43, 57, 58, 61,62,63]. Only 9/30 studies [39, 41, 53, 54, 58, 61,62,63,64] had a ‘yes’ response to item 9, where participants were required to have ≥ 4 days with ≥ 10 h of accelerometer data. Despite satisfying this requirement, four of these studies [58, 61,62,63] were still deemed to be of poor quality as they had not controlled for important covariates (e.g. age, sex, ethnicity, maturational stage, skeletal/body size) in the analyses when investigating independent associations between bone outcomes and each activity intensity (MPA and/or MVPA and VPA). Others [57] also failed to adjust for appropriate covariates, and one study [43] was deemed to be of poor quality as the outcome measure used (proximal femur shape variation) was reported as having not been used in the area of bone health and therefore was not considered a valid and reliable outcome. Information on individual study ratings can be found in Online Resource 2.

Accelerometer data collection and processing methods used in all included studies

The accelerometer data collection and processing methods used in the included studies (n = 30) are detailed in Table 3. There was considerable variability between studies for all aspects reviewed.

Most studies (n = 24/30) used one model of an accelerometer to collect PA data; however, six studies [24, 44, 49, 56, 65, 66] used two or three different models (participants only wore one of the two/three models at a time) to obtain data and appear multiple times in this section of Table 3. The most commonly used accelerometer was the Actigraph GT1M (14/30 studies), followed by the MTI 7164 (6/30 studies). Other models included the Actigraph GT3X (n = 3/30), GT3X + (n = 3/30), Actitrainer (n = 3/30), wGT3X-BT (n = 1/30), the WAM 6471 (n = 2/30), Newtest monitor (n = 2/30), Actiwatch motion sensor (n = 1/30), Actical (n = 1/30), Lifecorder GS (n = 1/30) and GENEActiv (n = 1/30). Monitors were most commonly worn on the right hip, with 18/30 studies using this wear location. Other studies required participants to wear the accelerometer on the lumbar spine/lower back (n = 3/30), right waist (n = 1/30), waist midaxillary line (n = 1/30), right iliac crest (n = 1/30) or non-dominant wrist (n = 2/30). Four studies did not report the accelerometer wear location.

Epoch length ranged from 5 to 120 s, with 60 s being used most commonly in 11/30 studies, with the next most common being 15 s used in 8/30 studies. Three studies [42, 57, 60] did not report epoch length; however, activity was referred to in terms of counts per minute in the methods, so it was assumed that a 60-s epoch had been used. McCormack and colleagues [53] and Bielemann and colleagues [40] collected raw data that was then integrated into 10-s and 5-s epochs. Two studies [45, 46] did not use epochs and instead collected raw data and calculated the number of peaks that occurred each day within certain bands of acceleration that corresponded with peaks in impact typically encountered when exposed to different types of activity.

There was large variability in the ways in which studies defined the PA intensities of interest (Table 3). Eleven studies defined MPA and/or MVPA and VPA using the Evenson [68] cut-points of 2296–4011 cpm, ≥ 2296 cpm and ≥ 4012 cpm, respectively. Cut-points for MVPA ranged from > 500 to ≥ 3600 cpm and for VPA > 1000 to > 6500 cpm (Table 3). Not all cut-points were defined in terms of counts per minute. For example, Deere et al. [45, 46] separated raw acceleration data into different impacts using ‘g’ bands, where 1 g is equivalent to gravitational force. Six impact bands relating to normal walking, brisk walking, jogging/running and jumping were used in [45] and activity was defined as low (0.5–2.1 g), medium (2.1–4.2 g) and high (> 4.2 g) impact in [46]. One study [48] reported thresholds of ≥ 3000, ≥ 4000, ≥ 5200, ≥ 6500, ≥ 7000 and ≥ 8200 cpm. The ≥ 3000 cpm was comparable to thresholds used to define MVPA in other studies, so was used as MVPA, with all others treated as VPA.

The number of days participants were required to wear the accelerometer ranged from 2 days to  2 weeks, with most studies requesting participants to wear the monitor for 7 days (16/30 studies). Four days (including 2 week and 2 weekend days) were required in 5/30 studies and with one weekend day in 2/30 studies. Five days with both weekend days (3/30 studies) or one weekend day (1/30 studies) were also used.

Valid day definitions varied from 6 to 14 h (daily activity from 6am to 8 pm; Table 3). Most commonly, a valid day was defined as having at least 10 h wear (15/30 studies), followed by at least 8 h of wear (8/30 studies). Non-wear was defined as periods of at least 10 min of consecutive zero counts in 5/30 studies (two of these studies also removed night activity), with periods of 20 (1/30 studies), 30 (1/30 studies) and 60 (1/30 studies) min of consecutive zero counts, and all-day consecutive zero counts (1/30 studies) also being used. Participants were asked to complete a diary of when the monitor was worn/removed in 4/30 studies. A large proportion of studies (n = 11/30) did not report how non-wear was defined.

The minimum number of days required for including participants’ accelerometer data in analyses ranged from at least 2 (4/30 studies) to 5 days (1/30 studies), with most requiring a minimum of 3 valid days (16/30 studies). Nine studies required participants to have at least 4 valid days to be included in the final sample. Many studies did not specify whether included days were required to be week and/or weekend days, whilst others required participants to have at least one or both weekend days in order to satisfy the inclusion criteria (Table 3).

Associations of MPA and/or MVPA and VPA with bone outcomes

The results for the vote count, conducted as a semi-quantitative alternative to a meta-analysis, are presented in Table 4. A more detailed reporting of each outcome at each site is given in Online Resource 3. Overall, there were 570 association analyses performed between a PA intensity and a bone outcome, (all bone measurement methods: all anatomical sites) of which 33% (186/570) were statistically significant (p < 0.05). The chi-square tests provided very strong evidence that this proportion of significant associations differed depending on the PA intensity (3 × 2 χ2 = 24.6, p < 0.001) and that it was significantly higher for VPA (44%: 101/228) than for MVPA (28%: 42/151, 2 × 2 χ2 = 10.5, p = 0.002) and MPA (23%: 43/191, 2 × 2 χ2 = 21.9, p < 0.001). From the within-study comparisons (where the PA intensity with the strongest association with the bone outcome received a count), the chi-square tests provided very strong evidence that the proportion of ‘strongest association’ counts differed by PA intensity (3 × 2 χ2 = 86.6, p < 0.001) and that it was higher for VPA (39%: 90/228) than for MVPA (5%: 8/151, 2 × 2 χ2 = 55.3, p < 0.001) and MPA (9%: 18/191, 2 × 2 χ2 = 49.1, p < 0.001) The overall ‘all bone methods: all sites’ proportions of ‘statistically significant association’ total counts and within-study ‘strongest association’ counts for each intensity are displayed in Fig. 2. Repeating the vote count after the removal of poor-quality studies (a sensitivity analysis) did not influence the overall findings. The proportion of significant and most strongly associated counts for each intensity remained very similar, with VPA still having a higher proportion of ‘strongest association’ counts compared to the other intensities. Repeating the analyses using only studies that had reported all three intensities (MPA, MVPA and VPA) also obtained the same pattern of results.

Table 4 Results from the vote count for all included studies by imaging method (DXA, pQCT and QUS) and each anatomical site assessed. In stage 1, votes were counted based on whether each intensity was statistically significant (p < 0.05; 1 = yes). In stage 2, out of the significant intensities, only the intensity with the largest effect size (association) received a vote (only 1 count available out of the 2/3 intensities). Votes were counted for all analyses for each outcome included in a study (e.g. for the whole sample, boys and girls). When the value of association was the same for two intensities, votes were counted if significant in stage 1, but a stage 2 vote was not cast. When negative associations were observed, their significance was noted but again, no stage 2 vote was cast. Results are presented as the proportion of significant/most strongly associated counts out of the total number of counts available for each intensity (total counts are regardless of statistical significance), followed by the number of significant/most strongly associated counts and the number of total counts available for each intensity (% (n/N))
Fig. 2
figure 2

Overall significant counts (stage 1) and ‘strongest within-study association’ counts (stage 2) expressed as a proportion of the total number of counts available for each intensity (total counts are the number of counts available, regardless of statistical significance; MPA, moderate physical activity; MVPA, moderate-to-vigorous physical activity; VPA, vigorous physical activity). In stage 1, votes were counted based on whether each intensity was statistically significant (p < 0.05; 1 = yes). In stage 2, out of the significant intensities within a study, only the intensity with the largest effect size (association) received a vote (only 1 count available out of the 2/3 intensities). Votes were counted for all analyses for each outcome included in a study (e.g. for the whole sample, boys and girls). When the value of association was the same for two intensities, votes were counted if significant in stage 1, but a stage 2 vote was not cast. When negative associations were observed, their significance was noted but again, no stage 2 vote was cast. p-values (vs VPA) represent the Bonferroni-adjusted p-values from the 2 × 2 chi-square tests for ‘MPA vs VPA’ and ‘MVPA vs VPA’ when the omnibus 3 × 2 chi-square test indicated that there was a significant difference between at least two of the three intensities. Significance was set at the 5% level

DXA: all sites

Overall, there were 348 association analyses of DXA-derived bone outcomes (DXA: all sites), 39% (134/348) of which were statistically significant. The chi-square tests provided strong evidence that this proportion of statistically significant associations differed depending on the PA intensity (3 × 2 χ2 = 12.6, p = 0.002) and that it was significantly higher for VPA (50%: 68/136) than for MVPA (33%: 32/98, 2 × 2 χ2 = 7.0, p = 0.016) and MPA (30%: 34/114, 2 × 2 χ2 = 10.5, p = 0.002). From the within-study comparisons, the chi-square tests provided very strong evidence that the proportion of ‘strongest association’ counts differed by PA intensity (3 × 2 χ2 = 54.8, p < 0.001) and that it was higher for VPA (43%: 59/136) than for MVPA (5%: 5/98, 2 × 2 χ2 = 42.0, p < 0.001) and MPA (14%: 16/114, 2 × 2 χ2 = 25.4, p < 0.001).

At the whole body, 30% (7/23), 25% (6/24) and 32% (9/28) of counts were significant for MPA, MVPA and VPA, respectively. Of the 12 studies reporting bone outcomes at this site, six reported no significant associations with PA [39, 47, 56, 57, 66, 67]. Significant, positive associations were reported between MPA and BMC [53] and VPA and BMC [52, 55], with a significant, negative correlation also reported between VPA and BMC (r =  − 0.21, p < 0.05) [61]. All activity intensities (MPA, MVPA and VPA) were significantly associated with BMC and BMD in overweight/obese 8–12-year-olds with low adherence to the Mediterranean diet pattern (MDP) [54]. The MVPA and VPA β coefficients were very similar (BMC: β = 0.109 and 0.108 for MVPA and VPA; BMD: β = 0.185 and 0.183), so stage 2 counts were not determined. Tobias and colleagues [60] also found MPA, MVPA and VPA to be significantly associated with BMC, BMD and BA in 4457, 11-year-olds. In all instances, the regression coefficient was largest for MPA. The thresholds used to define activity intensities were much higher (MPA 3600–6199 cpm, MVPA ≥ 3600 cpm, VPA ≥ 6200 cpm) than other studies reporting strongest associations with VPA (MPA 2000–3999 cpm, VPA ≥ 4000 cpm in [55]; MVPA ≥ 2296 cpm, VPA ≥ 4012 cpm in [52]) and only 3–4 min/day of VPA were reported in comparison to 20–30 min/day [52, 55]. At the whole body, the proportion of counts most strongly associated with bone outcomes was 22%, 0% and 11% for MPA, MVPA and VPA, respectively (Table 4).

At the lumbar spine, 8% (1/13), 9% (1/11) and 24% (4/17) of MPA, MVPA and VPA counts were significant. Of the eight studies assessing bone outcomes at this site, five did not report any significant associations with PA [39, 47, 55, 66, 67]. One study [52] reported a significant, positive correlation between BMC and VPA, with another reporting significant associations between BMD and VPA, and BMD and MPA in 18-year-old boys and girls, respectively [40]. A longitudinal study found both MVPA and VPA from age 5 to 15 years significantly predicted lumbar spine BMC in boys (β estimate largest for VPA), but only VPA was significant in girls [65]. At the lumbar spine, the proportion of counts most strongly associated with bone outcomes was 8%, 0% and 24% for MPA, MVPA and VPA, respectively.

In comparison to the whole body and lumbar spine, a higher proportion of counts were significant for MPA (54% (7/13)), MVPA (33% (2/6)) and VPA (100% (18/18)) at the hip. Of the five studies reporting bone outcomes at this site, significant, positive associations were reported between BMD and VPA in 15-year-olds [47], and with BMD and impacts of 4.2–5.1 g and > 5.1 g (equivalent to running and jumping, counted as VPA) in 17.7-year-old adolescents [45]. Moderate-to-vigorous PA [65] and VPA [39, 47, 65] were also significantly associated with BMC, with VPA most strongly associated compared to MVPA for both boys and girls in the study assessing PA and BMC at ages 5, 8, 11, 13 and 15 [65]. In a cross-sectional study conducted in the same cohort [50], geometric indices obtained from hip structural analysis at age 5 were significantly correlated with MVPA in girls and VPA in both boys and girls. None of the significant MPA or MVPA counts was most strongly associated with outcomes at the hip whereas 94% of VPA counts were most strongly associated at this site. The amount of VPA reported in these studies ranged from around 2 to 40 min per day. Studies also assessed outcomes at the trochanter, intertrochanter and Ward’s area regions of the hip (counts for each in Table 4). Significant, positive associations were observed between BMD and MPA in boys [64], MVPA in boys and VPA in girls [41] and VPA in both boys and girls [42] at the trochanter, and between BMD and VPA [42, 47] and BMD and MPA in girls and MVPA in boys [41] at the intertrochanter. Shape variation in Ward’s area was significantly associated with MPA and MVPA (most strongly with MPA) in 9–10-year-old boys, but not girls [43].

At the femoral neck, 24% (8/33), 32% (6/19) and 63% (22/35) of total MPA, MVPA and VPA counts, respectively, were significant. Of the eleven studies assessing outcomes at this site, one did not report significant associations (including at the superlateral and inferomedial subregions) [64]. Significant, positive associations were reported between composite strength indices and BMC with VPA in 9–10-year-old boys and girls [55]. Significant associations were also reported for BMC and MPA and VPA (strongest for VPA) in 11–13-year-old boys [52]. A longitudinal study using boys from the same cohort (11–13 years at baseline) found VPA significantly predicted BMC over a 12-month period [67]. One study reported significant associations between BMC and MPA and MVPA (strongest for MPA), but not VPA in 10-year-olds [39]. A higher VPA cut-point of 6500 cpm was used compared to ~ 4000 cpm in the studies reporting significant, strongest associations with VPA and only 2 min/day compared to ~ 10–30 min/day at respective intensities were reported. Significant associations were also reported between BMD and MPA [52], MVPA [41, 47, 67] and VPA [42, 47, 52, 67], with VPA being most strongly associated in the studies where MPA [52] and MVPA [67] were also significant. One study [47] used receiver operating characteristic curve analysis to assess the relationship between MVPA, VPA and BMD. Since more than 32 min/day of VPA was associated with increased BMD, compared to 78 min/day of MVPA, votes were counted in favour of VPA. A longitudinal study conducted in boys found that VPA (not MPA or MVPA) during the pubertal years significantly predicted BMC and BMD at 18 years [66]. Significant associations between BMD and MPA in both boys and girls, and VPA in boys [40] and impacts of 4.2–5.1 g and > 5.1 g (equivalent to running and jumping, counted as VPA) [45] were also reported in older adolescents aged ~ 18 years. Impacts of 4.2–5.1 g and > 5.1 g were also significantly associated with geometric and strength indices [45]. At the femoral neck, the proportion of counts most strongly associated with bone outcomes was 9%, 11% and 60% for MPA, MVPA and VPA, respectively.

Other sites assessed using DXA included the upper limbs, lower limbs calcaneus and forearm (counts for each in Table 4). Both MVPA and VPA were significantly associated with BMC, BMD and BA (all β’s largest for VPA) in the upper limbs of 12-year-olds from the ALSPAC cohort. In the lower limbs, BMC, BMD and BA were significantly associated with MPA and MVPA, but not VPA (all β’s largest for MPA). Munoz-Hernandez et al. [54] reported significant associations between BMC and BMD with MPA and MVPA (same β, no stage 2 vote), but not VPA at the upper limbs and with MPA, MVPA and VPA at the lower limbs (BMC no stage 2 vote, BMD β largest for VPA), but only in those with low adherence to the MDP. Hasselstrom [48] assessed BMD of the calcaneus and distal forearm and reported significant associations with BMD for all intensities of activity (> 3000, > 4000, > 5200, > 6500, > 7000 and > 8200 cpm). The beta was largest for the > 6500 and > 7000 cpm thresholds at the calcaneus and at the forearm, and the beta was similar from the > 5200 cpm threshold onwards; therefore, VPA was deemed most strongly associated. One study [39] did not report any significant associations between forearm BMC and MPA, MVPA or VPA.

pQCT: all sites

Overall, there were 162 association analyses of pQCT derived bone outcomes (pQCT: all sites), 16% (27/162) of which were statistically significant. The chi-square tests provided strong evidence that this proportion of significant associations differed depending on the PA intensity (3 × 2 χ2 = 11.6, p = 0.003) and that it was significantly higher for VPA (28%: 20/72) than for MPA (7%: 4/57, 2 × 2 χ2 = 9.1, p = 0.005) though only borderline significantly higher than for MVPA (9%: 3/33, 2 × 2 χ2 = 4.6, p = 0.063). From the within-study comparisons, the chi-square tests provided very strong evidence that the proportion of ‘strongest association’ counts differed by PA intensity (3 × 2 χ2 = 19.6, p < 0.001) and that it was higher for VPA (28%: 20/72) than for MVPA (3%: 1/33, 2 × 2 χ2 = 8.7, p = 0.007) and MPA (4%: 2/57, 2 × 2 χ2 = 13.2, p < 0.001). Several outcomes were reported at each site and are detailed in Online Resource 3.

At the distal tibia, only bone strength index (BSI) was significantly associated with VPA [51, 59]. At the tibial shaft, polar strength-strain index (SSIp) was significantly associated with MVPA in 15-year-olds [59] and both MVPA and VPA (strongest for VPA) in 11-year-olds [51]. Significant associations between VPA and cortical BMC, BA and periosteal circumference were observed at the mid-tibia in 1748 participants from the ALSPAC 15.5 year clinic [24]. In this study, significant negative associations were also observed between VPA and cortical BMC and endosteal circumference. At the 17-year ALSPAC clinic [46], high-impact activity > 4.2 g (equivalent to fast running, treated as VPA) was significantly associated with periosteal circumference, SSI and cross-sectional moment of inertia in boys. Although not significant in girls, there was a trend for a high-impact activity to have a larger beta coefficient. At the radius, one study did not report any significant associations between MVPA or VPA and outcomes at the 4% or 65% sites [51]. Another reported significant correlations between MPA and total bone density and BSI at the 4% site and VPA and cortical area at the 65% site [39].

QUS: all sites

Overall, there were 60 association analyses of QUS derived bone outcomes (QUS: all sites), 42% (25/60) of which were statistically significant. The chi-square tests provided moderate evidence that this proportion of significant associations differed depending on the PA intensity (3 × 2 χ2 = 7.1, p = 0.028) and that it was significantly higher for VPA (65%: 13/20) than for MPA (25%: 5/20, 2 × 2 χ2 = 6.5, p = 0.022) though not MVPA (35%: 7/20, 2 × 2 χ2 = 3.6, p = 0.116). Fisher’s exact tests were used for the within-study comparisons as the data violated one of the assumptions for chi-square tests (one cell count = 0). The Fisher’s exact tests provided very strong evidence that the proportion of ‘strongest association’ counts differed by PA intensity (3 × 2, p < 0.001) and that it was higher for VPA (55%: 11/20) than for MVPA (10%: 2/20, 2 × 2, p = 0.011) and MPA (0%: 0/20, 2 × 2, p < 0.001).

No significant associations were observed at the distal 1/3 radius [63]. At the calcaneus, MPA, MVPA and VPA were significantly associated with stiffness index (SI) in pre- and primary-school-aged children (both β’s largest for VPA) [44, 49], and in 10.8-year-old boys (β largest for MVPA), but not girls [62]. One study found all activity intensities (MPA, MVPA, VPA) were significantly associated with calcaneal broadband ultrasound attenuation (BUA; strongest for VPA) [44], with another [58] reporting significant associations between VPA and BUA, speed of sound (SOS) and bone quality index (BQI) in 10–12-year-old boys, but not girls. At the midshaft tibia, SOS was significantly correlated with MPA, MVPA and VPA in normal weight and overweight girls (10 years) and adolescents (15 years) combined (r largest for VPA at the non-dominant limb) [63]. Studies where VPA was most strongly associated with bone outcomes reported around 2–20 min/day of VPA.

Additional vote count by epoch length

The epoch length applied to accelerometer data has been shown to dramatically alter the PA data obtained. This is particularly prominent for VPA, where in children, it has been shown that around four times more VPA is identified when activity is assessed using a 5-s epoch compared to a 60-s epoch [69]. To investigate whether the intensity with the highest proportion of counts most strongly associated with bone outcomes differed depending on epoch length, additional vote counting was conducted separating studies into those who had used a ≥ 60-s epoch, and those using 15 s or less. Regardless of whether ≥ 60-s or ≤ 15-s epochs were used, a higher proportion of counts were significant for VPA compared to MPA or MVPA (Online Resource 3) and a higher proportion of the total counts for VPA were identified as being most strongly associated with bone outcomes compared to the other activity intensities.

Discussion

This systematic review summarises the range of accelerometry data collection and processing methods used to estimate habitual PA in relation to bone outcomes in children and adolescents and, irrespective of the range of methods used, identifies whether a particular intensity of habitual PA (moderate (MPA/MVPA) or vigorous (VPA)) is more strongly associated and beneficial to bone health in this population. Considerable heterogeneity in the accelerometry methods of reviewed studies was observed. Studies varied in terms of the monitor make and model used, wear criteria applied (definition of a valid day, non-wear time within a day, number of valid days required for inclusion), accelerometer output (raw or proprietary count-based), whether the output was averaged over an epoch, and if so, the length of epoch, and in the cut-points used to determine the activity intensity classifications. Regardless of the accelerometry methods employed, results were still indicative of a greater benefit of VPA over MPA/MVPA; however, the variability in accelerometer methods meant it was not possible to identify the precise amount of VPA (a key component of PA dose) required to benefit bone.

Habitually performed VPA was significantly and positively associated with several bone outcomes (bone mineral content and density, geometric and strength indices) at important load bearing sites such as the hip and femoral neck. Associations between VPA and bone outcomes were often larger in comparison to MPA/MVPA, such that, for the same increase in the amount of time spent in each intensity, VPA would lead to greater gains in bone outcomes. Studies that conducted regression analyses where activity intensities were entered simultaneously into the model also demonstrated evidence of a threshold effect whereby lower intensities of activity no longer had any explanatory power once VPA/high-impact activity was included in the models [42, 45, 46]. However, variability in sample characteristics, the imaging method used to obtain bone outcome data, the anatomical sites assessed and range of bone outcomes reported at these sites, as well as differences in the ways in which accelerometer-derived habitual PA data was collected and processed, make it difficult to fully understand the relationship between VPA and bone. It is therefore not possible to identify the precise amount of VPA required to benefit bone health in this population.

In studies that observed significant, positive associations between habitual VPA and bone, the mean amount of time reportedly spent in VPA varied between around 2 and 40 min per day (Table 1). Large variability in the amount of reported VPA leads to a high level of uncertainty surrounding the recommended dose of bone-relevant PA in children and adolescents and prevents clear bone-specific activity recommendations from being made. Since the samples in these studies ranged from 5 to 18 years in age, differences in the amount of VPA reported could also be a reflection of the precipitous decline in habitual PA that occurs throughout adolescence [70]. However, even in studies with comparable sample characteristics, there was still considerable variability in the amount of VPA reported. For example, Sayers et al. [24], Janz et al. [65] and Gracia-Marco et al. [47] all described the VPA of 15-year-old boys and girls and reported around 3, 10 and 30 min per day. Differences were also observed in studies that both analysed 5-year-old participants from the Iowa Bone Development Study. One reported 38 and 28 min of VPA per day in boys and girls [50] compared to only 13 and 10 min per day in the other [65]. Differences in VPA prevalence are likely contributed to by large variations in accelerometer-derived measures of bone-specific habitual activity including (but not limited to) the processing methods such as choice of cut-point, epoch length and wear/non-wear criteria.

The studies included in the present review employed a diverse range of intensity thresholds to define MPA and/or MVPA and VPA. Vigorous PA was defined as being as low as > 1000 cpm to as high as > 6500 cpm, with many studies using a cut-point of ≥ 4000 or ≥ 4012 cpm (16/30 studies). Despite being designed and validated to reflect the same physiological intensity of activity (cardiovascular demand), the use of different cut-points to classify accelerometer outputs inevitably produces large differences in the estimates of activity behaviour [71, 72]. In addition to influencing the amount of time spent in VPA, differences in the cut-points used may have also influenced the intensity of activity identified as being most beneficial to bone outcomes. For example, Tobias et al. [60] found MPA, but not VPA to be significantly and most strongly associated with total body and lower limb BMC, BA, BMD and aBMC in 12-year-old children from the ALSPAC cohort. However, the cut-points used to define MPA and VPA were substantially larger (MPA, 3600–6199 cpm; VPA, ≥ 6200 cpm) than those used in other studies, and the MPA cut-point was similar in magnitude to how most other studies had defined VPA (≥ 4000 or ≥ 4012 cpm). In addition to the very high VPA threshold, the accelerometer output in this study was averaged over a 60-s epoch. As very high-intensity PA occurs in brief, sporadic episodes [73, 74], the averaging of PA data over a 60-s timeframe causes this activity to be misclassified as lower intensity activity and is likely the reason why only a small amount of VPA ≥ 6200 cpm was reported (~ 3 min/day).

With the exception of two studies that reported the number of peaks occurring each day within different impact bands from raw accelerometry data, the majority of reviewed studies reported accelerometer output (raw or count-based) over a range of epoch lengths. Epochs ranged from 5 to 120 s in duration, with 60 s being used most commonly in 11/30 studies (8/30 used 15 s, 4/30 10 s, 2/30 used 5 s). As epoch length increases, less time spent in VPA is reported due to the increased dilution of high-intensity activity amongst longer periods of lower intensity activity [71]. Significant differences in activity prevalence exist between shorter (5 s, 10 s, 15 s) and longer 60-s epochs [69, 72, 75] meaning it is not possible to compare the amount of activity accumulated between studies when such a large range of epoch lengths are used [75]. Variability in the length of epoch used therefore also contributes to the inability to recommend precisely how much time spent in VPA is required to be of benefit to bone. Whilst the shorter epochs of 5–15 s mean that activity data is averaged over a much smaller timeframe compared to 60 s and therefore less dilution of high-intensity activity is likely to occur, significant differences in the time spent in VPA have still been detected between epochs of 1, 5 and 15 s [71]. This is likely due to the fact that the vast majority (93%) of VPA bouts in children last for less than 10 s [74], with the mean duration of VPA bouts being only 3 s [73, 74]. Therefore, even when shorter epochs are used, over-smoothing of the short, sporadic, high-impact events most relevant to bone will still occur as the epoch length remains longer than the bout of activity being measured. Consequently, important characteristics of habitual VPA will almost certainly be misclassified as lower intensity activity even if studies use 15-s or 10-s epochs.

The number of studies using accelerometers to measure PA in children and adolescents has greatly increased; however, a lack of standardisation means the methods employed are very diverse, reducing the comparability of findings [23]. In addition to cut-points and epoch length, comparability between findings of reviewed studies is further hindered by the fact that the majority used monitors that output data in proprietary counts. These are device and manufacturer-specific, and since no standard exists for producing these units of measure across device manufacturers, it continues to be unclear as to what a ‘count’ means, both physically and physiologically [76]. Studies also varied greatly in terms of wear criteria and only a small number (9/30) required participants to have a minimum of 4 valid days, with at least 10 h of recording per day to be included. Since the number of days needed to achieve a reliability of 80% ranges from 4 to 9 in children and adolescents [34] and at least 10 h of recording per day is required to satisfy minimum wear time criteria to monitor daily exposure to PA [22], it raises questions as to whether the PA reported in included studies is representative of children’s habitual PA. Furthermore, not all studies specified whether the days sampled or included were week and/or weekend days, and since activity behaviour varies between these, both should be included [34]. The majority of studies used hip-worn accelerometers. Whilst these are thought to provide the most accurate estimation of activity intensity [77], wear compliance is significantly less in comparison to wrist-worn monitors [78]. Only two studies used a wrist-worn accelerometer; however, they should be considered in future studies to ensure greater wear and more representative data is obtained [78].

The variability of accelerometer data collection and processing methods used in the reviewed studies make it likely that the VPA reported is not reflective of levels habitually performed by children and adolescents. However, the lack of a gold standard comparison makes it impossible to know which methods provide the most valid estimates of activity [71]. Due to the intermittent, transient nature of children’s PA patterns [73, 74] and the fact that short, dynamic bursts of high-intensity activity are required to initiate osteogenesis [27], the use of shorter epochs (such 1 s) should be explored when investigating bone-relevant PA in free-living situations. Since jumping activities that are of benefit to bone generally last less than 1 s in duration, others have also suggested the use of the raw acceleration signal [79]. Two of the included studies [45, 46] conducted in the ALSPAC cohort used raw acceleration and computed the number of peaks that occurred within various impact bands using custom-designed code. High-impact PA > 4.2 g (jumping and running > 10 km/h) was identified as being most beneficial to hip BMD and structure in adolescents [45]. Raw acceleration is more reflective of the ground reaction forces experienced in everyday life and is better able to capture brief, sporadic PA episodes than epoch data [80]. Alternative accelerometer outputs that count the peaks within impact categories [45, 46] or that quantify daily loading into a score based on the osteogenic index [81] from raw accelerometry data are therefore likely to be more suited to evaluating PA in relation to bone health than currently used methods. However, limited information is currently available to interpret and infer activity type from raw metrics—an important characteristic for prescribing doses of activity and there are also several analytic and logistic challenges regarding the transmission and storage of large volumes of data and appropriate modelling methods, with raw-data-based analytical methods still in the process of being developed and optimised [82].

Several previous reviews that have included observational studies investigating PA in relation to bone have mostly included studies that used self-reported methods to assess habitual activity behaviour. A strength of the present review is that all included studies had objectively assessed habitual PA using accelerometry, which overcomes several limitations of self-report methods particularly when the population of interest is children. A more recent systematic review that focused on accelerometer-derived PA in relation to bone reported that MPA and VPA were the PA intensities that positively influenced bone outcomes in children and adolescents [25]. However, they did not compare the magnitude of associations and whether there was evidence of a greater benefit of one intensity over the other. The present review included studies that had performed analyses with activity data stratified by intensity (MPA/MVPA and VPA) and examined the magnitude of associations within each study, which allowed the contributions of higher intensity activity (which is reported as being most relevant to bone) to be assessed in more detail. Including all types of bone outcomes from several imaging methods meant that information on important indices of bone health that are not obtained through DXA could also be reported. However, studies that used three-dimensional techniques such as pQCT that distinguishes between trabecular and cortical bone and is able to assess important geometric indices of bone were few in number in comparison to DXA. The variability in the methods used to image bone, along with the anatomical sites assessed and range of outcomes reported at these sites, as well as the distinct heterogeneity in the accelerometer data collection and processing methods used meant it was not appropriate to conduct a meta-analysis. The vote-counting procedure that was conducted as a semi-quantitative alternative is, however, limited by the fact that it does not provide an estimate of the magnitude of an effect across the studies reviewed and is only able to identify whether or not there is evidence of an effect [83, 84]. The procedure also does not take into account study size, with larger studies that have greater statistical power being treated the same as smaller studies with less power, which may have introduced some bias to the results [83]. However, this approach provides an interpretable way of summarising study findings when a meta-analysis is not possible [83, 84]. The inclusion of studies that had conducted analyses between both MPA and/or MVPA and VPA may have also meant that any studies only assessing VPA in relation to bone outcomes will not have been included. However, inclusion of both intensities allows the potential benefit of one over the other to be observed.

Since short, dynamic bursts of activity are particularly important for bone health and are likely to have been misrepresented in studies assessing the associations between habitual PA and bone, there is a need for studies to be conducted that investigate the use of shorter 1-s epochs that are better able to identify more sporadically performed, high-intensity activity [85] or that use raw acceleration (where no epoch is applied) that has the resolution to capture impact peaks within the data [80] to ensure that this type of activity is captured more in its entirety. Improved bone-specific approaches to PA measurement will allow for a better understanding of important components (amount and intensity) of the dose–response relationship between PA and bone outcomes, which in turn will inform the design of PA interventions that aim to improve bone health in this population. Whilst stratifying for VPA in the present review allowed the independent contributions of this intensity to be explored, it is still a relatively broad category that includes both running and jumping activities, which actually differ in terms of impact magnitude. Running is classified as moderate-impact activity and jumping is a high-impact activity that has the potential to initiate a greater osteogenic response [27]. Whilst the magnitude of the accelerometry output is frequently classified in research according to the cardiovascular demands of activity, when osteogenic characteristics of PA are of interest, classifying accelerometer output in terms of loading (i.e. impact), which more closely reflects the physiological mechanisms underpinning bone adaptation [27], is likely to be more informative and would allow the osteogenic response of activities that differ in terms of impact magnitude to be more precisely examined. There is also a need to increase the comparability of findings between studies. Since methods for analysing raw acceleration data are still being developed, validated and optimised [82], averaging of raw data over shorter 1-s epochs using free to access open source software and investigating associations with a number of PA intensities (instead of MPA/MVPA and VPA which were designed to reflect steady-state aerobic intensities of activity) may present a more viable, readily accessible method for improving the monitoring and identification of bone-specific PA intensities and comparability of findings between studies.

In conclusion, whilst there is evidence to suggest a greater benefit of VPA over MPA/MVPA to bone outcomes in children and adolescents, at present, it is not possible to discern the precise amount of VPA required to be of benefit in this population. This is due to the considerable variation in the methods used to obtain accelerometer data, which greatly impact on the amount of VPA reported. Since there is currently no consensus for accelerometer methodology, it is unclear which methods most accurately reflect bone-specific activity habitually performed by children and adolescents. Future research needs to investigate whether the use of shorter epochs allows for more of the sporadic, high-impact activity performed by children and adolescents to be identified and whether more specific, bone-relevant intensities of activity that focus on impact and loading characteristics from raw accelerometry data should be explored and recommended over traditionally used classifications. A data-driven approach that identifies the intensities of free-living activity that are most strongly associated with bone health outcomes may be more informative than relying on and investigating the associations with pre-defined intensity classifications that have been calibrated against measures of energy expenditure and are more relevant to cardiovascular and metabolic health outcomes, as opposed to bone.