Introduction

Determining maturity and understanding growth in a child is critical for medical and psychosocial purposes. Assessing bone age is important to investigate whether the maturity of bones is occurring at the same rate as the chronological ageing process. Furthermore, bone age assessment has a role in forensic and legal investigations when the individual’s chronological age is in doubt. For example, in asylum seekers and unaccompanied minors without valid documents to prove their ages [1], it is important to assess bone age using a reliable and suitable method [2]. Incorrectly assessing a child as an adult leaves the child with limited access to education, healthcare and other support provided to children.

There are two approaches widely used to determine bone age from a left hand radiograph: the Greulich and Pyle (G&P) and Tanner and Whitehouse (TW) methods [3, 4]. The population which formed the G&P standard atlas were North American Caucasians of good socioeconomic status. The assessment process is typically based on comparing a hand-wrist radiograph of a child with the age-matched standard radiographs as contained in the atlas. The G&P method depends on comparing the overall maturational status and is known to be straightforward and quick, therefore widely used. In contrast, the TW method depends on assessing and scoring the skeletal maturity of each individual bone of the hand, hence taking a longer time than the G&P method. Since the establishment of the G&P atlas, many studies have been conducted in different parts of the world to determine whether it is applicable to different populations. This question is important, particularly given the increasing legal and illegal influx of immigrants to certain parts of Europe. This systematic review and meta-analysis aims to provide a better understanding of the applicability of the G&P atlas to children and adolescents who are of a different population from the original standard.

Materials and methods

Search strategy

A systematic search of the MEDLINE, Embase and Cochrane databases was conducted. We searched MEDLINE using keywords ((Greulich and Pyle)) OR Greulich Pyle, ((bone age assessment OR bone age determination)) AND left hand and refined the search to include articles in English published between 1st January 1959 and 15th February 2017. No free text was used in this search. For Embase, we used the term (Greulich and Pyle) and refined the search to include articles in English published between 1st January 1959 and 15th February 2017. We also searched the Cochrane library using the keywords (Greulich and Pyle) and the MeSH term (Age Determination by Skeleton). The search was refined to include articles in English published between 1st January 1959 and 15th February 2017. Each study’s title and abstract was screened to determine whether it presented data correlating bone age assessed by the G&P with chronological age. The full text was retrieved when the reviewers could not decide on the study’s eligibility from the title and abstract alone. The following exclusion criteria were then applied:

  1. 1.

    Health status of participants could not be confirmed from the article or participants with developmental disorders or subjected to nutritional supplementation (these represent unhealthy children expected to show delayed or advanced bone age).

  2. 2.

    Using a modified method of G&P and/or using modalities other than conventional radiography

  3. 3.

    Full text not available within the resources available to the reviewers

  4. 4.

    Full text not in English

  5. 5.

    Review articles

  6. 6.

    When the mean difference between bone age (BA) and chronological age (CA) was not reported or could not be calculated by the reviewers based on the study results presented.

The search was independently carried out by two reviewers (KA and ACO), followed by a consensus meeting to agree the final selection of studies for inclusion in this review.

Quality assessment

Two reviewers KA and ACO independently assessed the quality of included studies using the tool developed by the National Institute for Health and Care Excellence (NICE, Appendix G) [5]. Discrepancies were resolved by discussion. The tool considers five aspects of a study: population, method of participant selection, outcomes, analysis and generalisability of the study. Then, an overall study quality grading is given to each study for internal validity (IV) and a separate grading for external validity (EV) as follows:

  • ++ All/most of the checklist criteria have been fulfilled and the conclusions are unlikely to alter.

  • + Some of the checklist criteria have been fulfilled; the conclusions are unlikely to alter even when they have not been fulfilled.

  • − Few or no checklist criteria have been fulfilled, and the conclusions are likely or very likely to alter.

Data extraction

A single reviewer (K.A.) extracted and recorded the following data from eligible studies:

  1. 1.

    Sample size (males and females)

  2. 2.

    Ethnicity or country of origin

  3. 3.

    Mean difference and standard deviation (SD) between bone age and chronological age (BA-CA)

  4. 4.

    Mean and SD of bone age

  5. 5.

    Mean and SD of chronological age

  6. 6.

    Authors’ conclusions

  7. 7.

    Applicability of the standard

Given the review question, studies were divided into four groups based on major ethnic groups: African, Asian, Caucasian and Hispanic. Data for each major ethnic group were summarised and analysed separately. Some studies reported the place/country from which participants were recruited, and in such cases, the study was grouped under the major ethnicity of that country. The mean differences between BA and CA are to be interpreted as follows: a positive value indicates that the child’s bone age exceeds the child’s chronological age and a negative value indicates delayed bone age compared to chronological age.

Additionally, we defined four categories to reflect the applicability of the G&P standard to the studied population as follows: (a) applicable, (b) not applicable (determined by the authors’ use of words identical or similar to “applicable” or “not applicable”, respectively, in the study’s discussion or conclusion), (c) needs some modification (authors use phrases such as, “can be used with caution” or when the standard was found to be applicable to a certain age group but not others) and (d) not clear (when the study failed to mention whether the standard was applicable, not applicable or needed modification).

Statistical analysis

A combination of random effect meta-analyses by ethnicity (African, Asian, Caucasian and Hispanic) and sex was conducted using R Software [6]. Overall meta-analysis of all ethnicities was also determined. Additionally, meta-regression with covariates analysis (including sex and ethnicity as explanatory variables) was determined. Yearly interval sub-analysis of Asians aged 6 to 17 years and Caucasians aged 10 to 17 years was carried out in males and females. Other ethnicities were excluded from interval sub-analysis as the age groups were not constant between studies.

In total, 50 meta-analyses were performed using mean differences and standard deviations as summary statistics for the difference between bone age and chronological age. When a study examined more than one ethnicity, each ethnicity was treated as a separate study (only for the meta-analysis). Heterogeneity was assessed between 0 (no heterogeneity) and 100% (maximum heterogeneity) using the I-squared statistic. A funnel plot was determined to assess bias or the present of any systematic heterogeneity.

Results

This systematic review identified 907 studies of which 45 were eligible for inclusion (Fig. 1). Four additional studies were identified from the reference lists of the initial 45 extracted papers; therefore, the total number of included studies was 49 [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55], of which 27 (55%) were related to Caucasian populations. The total number of children in the included studies was 21,081 (11,445 boys), comprising 11,194 Caucasians (5922 boys), 6776 Asians (3731 boys), 1705 Africans (1073 boys) and 1406 Hispanics (781 boys). As summarised in Table 1, there was minimal risk of bias for internal validity alone in one study [33], for external validity alone in five studies [17, 18, 25, 40, 50] and for both internal and external validity in 12 studies [11, 20,21,22, 27, 30, 35, 42, 43, 45, 51, 54]. There was significant risk of bias for internal validity alone in 0 studies, for external validity alone in two studies [8, 46] and for both internal and external validity in 2 studies [23, 29]. Sources of bias in these four studies requiring that their results be interpreted with caution include:

  1. 1.

    Absent documentation of statistical criteria such as p values and/or observer reliability [8, 23]

  2. 2.

    Insufficient detail about the source of the study population [29]

  3. 3.

    Non-representative samples [46]

Fig. 1
figure 1

Flow chart to show article selection process

Table 1 Quality assessment of the included studies (after agreement between the two assessors)

Studies included in this systematic review reported the mean difference between bone age and chronological age in different forms. Twenty-nine studies (60%) [8,9,10,11,12,13,14, 18, 19, 22, 24, 26,27,28, 30, 35, 36, 38, 41, 42, 44, 45, 48,49,50,51,52,53,54] presented the mean difference for each year of age for each sex. In such cases, the maximum delay and advancement in bone age was extracted. Twelve studies [15, 17, 20, 25, 31, 33, 34, 37, 39, 40, 46, 47] divided their sample into subgroups, where each subgroup contains up to five age groups, e.g. children aged between 1 and 5 years old. For each subgroup, the overall mean difference for each sex is reported. Eight studies [7, 16, 21, 23, 29, 32, 43, 55] only reported the overall mean difference between bone age and chronological age, limiting the applicability of their results to individual age groups. Data relating to ethnicity or country of origin, sample size, mean BA-CA and the authors’ conclusions are summarised for each study in Tables 2, 3, 4, and 5.

Table 2 Summary of studies that assessed the reliability of the G&P atlas in Caucasian children
Table 3 Summary of studies that assessed the reliability of the G&P atlas in African children
Table 4 Summary of studies that assessed the reliability of the G&P atlas in Asian children
Table 5 Summary of studies that assessed the reliability of the G&P atlas in Hispanic children

Meta-analysis based on ethnicity

  1. 1.

    Caucasian females: Fifteen studies were included in the meta-analysis. These 15 studies presented moderate heterogeneity (I-squared 76%, Fig. 2) but did not show any statistically significant results, with overall mean difference BA-CA of 0.13 years (95% CI -0.17, 0.43).

  2. 2.

    Caucasian males: Seventeen studies were included in the meta-analysis. These 17 studies presented low heterogeneity (I-squared 22%, Fig. 2) and did not show any statistically significant results, with an overall mean difference BA-CA of -0.10 years (95% CI, -0.24, 0.04).

  3. 3.

    African females: Only three studies were included in the meta-analysis. The three studies were homogeneous (I-squared 0%, Fig. 3) and showed statistically significant results, with overall mean difference BA-CA of 0.37 years (95% CI 0.04, 0.69).

  4. 4.

    African males: Only five studies were included in the meta-analysis. The five studies presented moderate heterogeneity (I-squared 78%, Fig. 3) but did not show any statistically significant results, with overall mean difference BA-CA of 0.62 years (95% CI -0.01, 1.26).

  5. 5.

    Asian females: Only nine studies were included in the meta-analysis. These nine studies presented low to moderate heterogeneity (I-squared 27%, Fig. 4) but did not show any statistically significant results, with overall mean difference BA-CA of -0.10 years (95% CI -0.32, 0.12).

  6. 6.

    Asian males: Ten studies were included in the meta-analysis. The studies were highly heterogeneous (I-squared 82%, Fig. 4) but did not show any statistically significant results, with overall mean difference BA-CA of 0.15 years (95% CI -0.30, 0.59).

  7. 7.

    Hispanic females: Only two studies were included in the meta-analysis. The two studies presented no heterogeneity (I-squared 0%, Fig. 5) and did not show any statistically significant results, with overall mean difference BA-CA of 0.19 years (95% CI -0.23, 0.61).

  8. 8.

    Hispanic males: Only three studies were included in the meta-analysis. The three studies presented low heterogeneity (I-squared 11%, Fig. 5) but did not show any statistically significant results, with overall mean difference BA-CA of -0.11 years (95% CI -0.41, 0.19).

Fig. 2
figure 2

Forest plot of Caucasians (females and males)

Fig. 3
figure 3

Forest plot of Africans (females and males)

Fig. 4
figure 4

Forest plot of Asians (females and males)

Fig. 5
figure 5

Forest plot of Hispanics (females and males)

In regard to the meta-regression, the coefficient for the Africans showed statistical significance with estimate being (p > 0.05) (Supplementary Table 2).

Meta-analyses by yearly interval (see also Supplementary Tables 3 to 6 and Supplementary Figs. 1 to 5)

For Caucasian males, seven studies were included [9, 19, 27, 30, 35, 41, 52]. These studies did not show any statistically significant results. The mean difference BA-CA ranged from -0.32 years (at 13 years old) to 0.44 years (at 17 years old). For Caucasian females, six studies were included [9, 27, 30, 35, 41, 52]. These studies did not show any statistically significant results, with mean difference BA-CA ranging from -0.20 (at 10 years old) to 0.34 (at 14 years old).

For Asians, five studies were included [24, 28, 38, 51, 53]. The studies did not show any statistically significant results in females, with mean BA-CA ranging from -0.27 (at 6 years old) to 0.50 years (at 15 years old). In males, however, the studies showed statistically significant results for the following ages:

  • Six years: overall mean difference BA-CA of -1.08 years (95% CI -1.49, -0.67)

  • Seven years: overall mean difference BA-CA of -1.35 years (95% CI -1.85, -0.85)

  • Eight years: overall mean difference BA-CA of -1.07 years (95% CI -1.97, -0.17)

  • Nine years: overall mean difference BA-CA of -0.80 years (95% CI -1.43, -0.18)

  • Seventeen years: overall mean difference BA-CA of 0.50 years (95% CI -0.08, 0.93)

Based on the results of the yearly interval meta-analysis, we produced graphs for Asians and Caucasians of both sexes (Fig. 6), which show BA according to our meta-analysis compared to BA as assessed by the G&P atlas.

Fig. 6
figure 6

G&P bone age after adjustment based on meta-analysis (females and males)

Discussion

Bone age assessment is a frequently employed and (in the clinical setting) useful diagnostic technique. Its utility in assessing the age of immigrants and asylum seekers is less secure. Figures from the European Commission estimated that in 2016, about 95,000 unaccompanied minors migrated to Europe, of which more than half were Asians [1]. Although there are no exact figures, many of these immigrants were without valid documents to prove their age. Being unable to prove age, or incorrectly assessing a child as an adult, can restrict the child from having access to their rights such as healthcare and education [56] granted by the law in European countries. Hence, it is important that reliable age estimation methods are used.

Concerned with the reliability of the G&P atlas for different ethnic populations, we considered it important to ascertain its applicability to healthy children. Additionally, bias in studies can result in poor reproducibly and/or lead to distorted results and wrong conclusions. However, in this systematic review, results of the four studies with high risk of bias [8, 23, 29, 46] had little impact on (the statistical significance of) our results. This is because the population of these studies contributed less than 5% to the total included population in which only two studies [8, 29] were included in the meta-analysis, which reduced their impact on sample size and results. A funnel plot shows the absence of a large study with high power as most of the studies scattered toward the bottom; however, minimal risk of publication bias was observed among the studies with three studies switched from the funnel plot (Supplementary Fig. 6) [47, 48, 52].

The G&P atlas appears to be applicable to Caucasians, although some recent studies (included in the meta-analysis) have reported that bone age is advanced compared to chronological age in girls up to 13 years old and in boys aged 10 years and above, possibly highlighting the fact that children nowadays are maturing faster than when the atlas was established [32, 42]. Calfee et al [32] assessed the bone age of predominately Caucasian American adolescents (where the G&P atlas was developed). Their skeletal maturation exceeded their chronological age indicating advanced bone age. Perhaps this should not be surprising as Himes [57] reported that skeletal maturation increases by about 0.22 to 0.66 years per decade.

This systematic review and meta-analysis showed no significant difference between BA and CA in Caucasians, which indicates that the G&P atlas is applicable to this group. This is in line with an earlier meta-analysis conducted by Serinelli et al [58] in which no significant difference between BA and CA were found. Note that Serinelli et al included a smaller number of studies; only reported the overall mean difference between BA and CA and did not account for individual age groups.

Concerning the Asian population, three studies recruited Asians living in America [17, 31, 47] while the remaining 17 studies were all carried out in Asia. It seems that skeletal maturation does not conform to the G&P standard at least for some of those who live in East and South Asia. In boys, delay in skeletal maturity during early and middle childhood was followed by advancement during adolescence. Our meta-analysis confirms that there are significant differences between BA and CA in Asian males in two age categories: those aged 6 to 9 years and those aged 17 years. These differences are larger than the standard deviations reported in the G&P atlas for the corresponding age group (± 0.77, ± 0.84, ± 0.90 years at age of 6, 7, 8, and 9 years, respectively), which may have an impact on patient diagnosis and management. In the clinical context, a healthy Asian boy in early childhood could be misdiagnosed as having delayed bone age when using the G&P atlas. The significant advancement in BA compared to CA in Asian males at age 17 is important because this is a critical age in the forensic/legal context, with the individual judged by adult standards in certain legal instances [59].

The G&P standard also seems to be imprecise for Africans. Our meta-analysis of three papers [17, 20, 47] showed significant advancement in bone age of females at all ages (p < 0.01). Results from meta-regression with covariates support this difference with BA in Africans being statistically different (Fig. 3). Although our meta-analysis did not show significant difference between BA and CA in African males, some studies reported significant advancement (p < 0.01) in adolescence among African American males [15, 17, 47]. Concerning those living in Africa, some studies have shown retardation of bone age among males and females [23, 25, 36]. It is difficult to attribute these variations between Africans only to differences in socioeconomic status, as they were not reported across all studies.

In contrast, the G&P standard appears appropriate for the Hispanic population until adolescence. Our meta-analysis shows no significant difference between BA and CA although only three studies were included [17, 18, 47]. However, Zhang et al, reported that the G&P significantly overestimated males aged between 10 and 13 years [31].

In the current review, a final analysis was performed combining Asians and Hispanics in order to compare our results to those of Serinelli et al, who used the Cavalli-Sforza classification of ethnicity [60], in which Asians and Hispanics are under one ethnic group (Mongoloid). Our meta-analysis of Asian Hispanics for both females and males showed no significant results (Suppl. Fig.6). This is in contrast to Serinelli’s meta-analysis, in which the G&P atlas significantly overestimated chronological age [58]. However, Serinelli et al included only three papers for the Mongoloid population: one related to the Asian population and two to the Hispanic population. One of these latter two studies [61] was excluded from the current systematic review because it included unhealthy children. We therefore believe our results to be more robust.

The major limitation identified in this review is the difficulty in separating ethnicity from socioeconomic status. Relatively few studies reported the socioeconomic status of their sample [9,10,11,12, 20, 22, 26, 27, 30, 31, 38, 42, 46, 48, 51]. Children in these studies seemed to follow the same pattern of advancement and delay in bone age as their peers of the same ethnicity in other studies. When bone age is accelerated, new social and cultural factors rather than economic conditions have been suggested to be the main drive [27]. However, our results suggest ethnicity should also be considered when assessing bone age. A further limitation of the study is the failure to calculate the mean absolute and root mean square errors, which might have further confirmed the accuracy of the G&P atlas in relation to each population. However, the mean of each variable (BA and CA) was only available for 13 studies [18, 19, 24, 26,27,28, 35, 38, 49,50,51,52,53], and for these 13 studies, individual observations were not provided; therefore, the mean error could not be calculated.

Conclusion

This systematic review revealed that the ethnicity/origin of the child can influence the applicability of the G&P standard. The G&P standard is imprecise and should be used with caution in Asian and African populations, particularly when assessing age for forensic/legal purposes. Some caution is also required for Hispanics (particularly males). The G&P atlas can be used with most confidence in Caucasians. There is a complex inter-relationship between the impacts of socioeconomic status and ethnicity on bone age using the G&P atlas, which no study has clearly set out to address. Although the graphs in Fig. 6 may be helpful, until new ethnicity-related standards are created, clinicians should be aware of the limitations of the G&P method presented in this review.