Introduction

Age estimation in living or dead individuals is a major challenge in anthropology and forensic odontology. Age estimation not only assists in criminal or civil proceedings involving minors or undocumented adults, but also facilitates the creation of a biological profile when identification is mandatory [1].

Teeth have been proved to be an important source for age estimation in both living and deceased individuals. Once tooth development is complete, degenerative or postformation dental changes are the appropriate markers for estimating age in adults [2]. The aging process in teeth is influenced by numerous external and internal factors (lifestyle, nutrition, type of work [3], toxic habits, diseases and treatments, among others) [4]. Common conditions and treatments such as teeth bleaching, orthodontics, trauma, prosthetic crowns, etc. may also alter the color of the tooth [5].

Thus, age-related dental changes can be modified by these factors, making the discrepancy between chronological and biological age greater and less accurate in adulthood [1], and hindering adult age estimation in the field of anthropology and forensic dentistry.

Several methods are available for age assessment in teeth. Biochemical techniques are applied to determine the age of an individual, such as aspartic acid racemization [6], methylation and telomere length [7,8,9]. However, biochemical studies are complex and involve tooth destruction [3], thus these methods are not the first choice in practical cases.

The most commonly used procedures are based on the study of morphological changes in dental structures caused by aging [1], such as secondary dentin apposition, root translucency, cementum apposition, attrition and dental color, among others. In this regard, panoramic dental radiographs have been used to estimate the age of adolescents and adults for forensic purposes. Apposition of secondary dentin is an important parameter for estimating age, as it is a continuous process throughout life that can be measured indirectly by the reduction of pulp area on radiographs [10, 11]. Other morphological techniques, such as microscopic techniques, oral cytology [12], computed tomography and magnetic resonance imaging [13], have also been applied to determine age. However, these techniques have the disadvantages of radiation exposure or destruction of teeth.

Changes in tooth color have been shown to contribute significantly to age estimation based on dental morphological changes [3]. The colors of enamel, dentin and cementum change with chronological age [14, 15]. Age-related changes in enamel color are due to surface cracking and increased nitrogen content, resulting in changes in light refraction. Changes in dentin color are attributable to changes in mineral and organic composition. The root color is less frequently affected than that of the crown, which is more exposed to external influences. [2]. These changes make the enamel thinner and yellower in older individuals (>45 years) compared to younger individuals [16]. In addition, a decrease in tooth brightness (L*) [17, 18] and an increase in redness (a*) with age have been reported [18].

Tooth color can be determined by visual comparison with a known standard (dental shade guide), although the ability to distinguish colors varies among observers or when biases are avoided by means of electronic color measuring devices, such as colorimeters, spectrophotometers, or digital cameras, among others [19]. The measurement of tooth color has great advantages for age estimation, since it does not involve the destruction of teeth, but it may be also very useful in living individuals, especially in countries that consider even low-dose exposure to radiological imaging to be inappropriate for age estimation. Several studies have compared different methods of age estimation [20,21,22,23,24], although a meta-analysis comparing the validation and best methods of age estimation using tooth color has not been performed to date. Therefore, the aim of this study was to evaluate the usefulness of age estimation methods based on accurate measurement of tooth color changes.

Material and methods

This systematic review was conducted following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [25] and in compliance with the Cochrane criteria. The protocol and the PI(E)COS question and the main searching strategy of this systematic review was registered in the International Prospective Register of Systematic Reviews (CRD42022343371).

Selection criteria

The criteria for article selection were: descriptive studies, in vivo and ex vivo dental samples, human samples, color measurements as outcome variables, and relationship of tooth color variables with age. Systematic reviews, narrative reviews, case reports and studies with subjective color measurement systems or based on color guides were excluded.

Databases searched

The bibliographic reference repositories used were Web of Science and Pubmed. All available databases were activated in the Web of Science search engine: Web of Science Core Collection, Medline, Current Contents Connect, SciELO Citation Index, KCI-Korean Journal Database, Derwent Innovations Index, and Russian Citation Index.

Search strategy

The original search strategy was “colorimeter*”, “technique measures color”, “color measurement”, “shade*” combined with the term “teeth”. The term “age” or related was not included in the initial search strategy to avoid the risk of bias due to search complexity. This term was subsequently included in the Rayyan manager [26]. No linguistic or temporal limits were set.

Selection process

The results obtained in the search were exported to the Rayyan systematic review manager [26]. In the Rayyan manager, the term "age" was included in the automated search tools as a criterion, and the studies obtained were extracted for the selection process. Subsequently, the duplicate detection tool was used. Two independent reviewers performed the initial selection process based on the title and the abstract of the results obtained. Then, the articles were read completely to determine their inclusion based on the previously established criteria. A third reviewer was involved in discordant decisions, and discussion ensued to reach consensus.

Data collection process

The included articles were screened by two independent reviewers for manual extraction of the data of interest from the studies; no automation procedures were used. The data obtained were compared among the reviewers to reach consensus.

Data Items

The following descriptive data were extracted from the articles included: author, year, origin of the sample, type of tooth, device used for the analysis of the sample, and analysis protocol used (setting device, measurement conditions, analysis system and location of the color measurement). Likewise, age estimation models based on the color variable and those that applied age to estimate color were also extracted from the studies in which they were developed.

Study risk of bias assessment

According to the established inclusion criteria and the requirements of the present systematic review, the risk of bias could be identified in the following factors: incomplete protocol detailed, missing outcome data (lack of mean or standard deviation values), measurement of the outcome (use of non-comparable protocol), and selection bias (comparability of participants due to a lack of stratification by sample age). The risk detection process was performed manually and independently by two reviewers.

Effect measures

The outcome variables extracted were means and standard deviations of the color systems, and the reliability parameters of the statistical models (r, R2, standard error estimate, area under the curve, sensitivity and specificity).

Synthesis methods

The data extracted from the included studies were structured in five descriptive tables, three meta-analysis figures and an appendix to describe the protocol of the studies. Table 1 describes general aspects of the studies: author, year, country, type of tooth and type of sample. Table 2 shows outcome results: mean and standard deviation of the color variables for the age groups established by authors. Table 3 shows linear regression models with age as the dependent variable and Table 4 shows receiver operating characteristics. Table 5 shows linear regression models to estimate the color with age as an independent variable. A random meta-analysis model of the results for the CIE L*a*b* color variables stratified by age groups (under 30, 30-60, 60 and older) was performed with the Metafor plugin based on the R-software package [27]. The random model assumed that there was a τ2 variability (in addition to sampling variability), which explains differences between studies.

Table 1 Summary of articles identified in the systematic literature review
Table 2 Age estimation based on the CIE system
Table 3 Age estimation based on equation models with age as dependent variable
Table 4 Receiver operating characteristic (ROC) curve analysis for age estimation with colorimetric variables (L*, a*, b*), chromaticity coordinates (x, y, z) or whiteness (WIC, Z%, WI) and yellowness (YI) indexes in upper incisors
Table 5 Linear regression models for color estimations with age as independent variable

Results

Figure 1 shows the study selection process applied in the systematic review. The initial search strategy yielded 3867 results, of which 15 were duplicates and 3108 were discarded due to the automation process in the "age" search. A total of 744 studies were selected for title and abstract review, of which 659 were discarded following the inclusion and exclusion criteria. The remaining 85 were selected for full text review, 4 of which were not reported; thus, 81 reports were assessed for eligibility. The level of agreement observed between the two reviewers was 84% (k= .811 95%CI= .71; .905 p˂.05), indicating very good concordance [28]. Sixty-three studies were excluded after full-text reading and application of the established inclusion and exclusion criteria. Therefore, a total of 18 studies were included in the present systematic review (Fig. 1).

Fig. 1
figure 1

Flow diagram for the systematic review according to preferred reporting items for systematic reviews and meta-analyses (PRISMA guidelines)

The risk of bias analysis was performed by two independent reviewers. The results showed a low level of risk related to design, detailed protocol and outcome data (mean and standard deviation). However, the lack of stratification of the results by age (selection bias) limited their inclusion in the meta-analysis. Although all the analyzed studies employed the same measurement system (CIE L*a*b* or CIE L*C*H*), the devices used were different. Therefore, the applicability of the obtained results could include small deviations in the final results.

The most common countries of the 18 studies were Spain (5 of 18) [17, 29,30,31,32] and Germany (3 of 18) [33,34,35] (Table 1). In eight of the included studies, it was not possible to identify the origin of the sample [5, 29, 30, 34,35,36,37,38]. All studies included in vivo upper anterior teeth for analysis, except one, which used ex vivo (non-vital) teeth [31]; the study by Greta et al. also analyzed non-vital teeth [39]. Five of the studies established smoking as an exclusion criterion [18, 29, 33, 40, 41]. However, the study by Kim et al. considered smokers among their inclusion criteria for a comparative analysis based on the color changes produced by smoking in teeth [37]. In general terms, the exclusion criteria defined by the authors were similar: restorations, aesthetic procedure (staining, fluorosis or whitening), attrition, gingival bleeding, caries, abraded lesion, cavities and stains [32, 42, 43]. Therefore, it can be considered that all the studies included in the present systematic review were carried out with healthy teeth (Table 1).

In eleven studies, color was measured by spectrophotometry [5, 18, 29, 32,33,34,35,36,37, 39, 41]. Spectroradiometry was used in three of the analyzed studies [32, 38, 43], colorimeters in three other studies [30, 40, 44], and a digital camera and a computer for color analysis in one of them [42]. Thirteen of the included studies used portable color estimation devices that could be used in reproducible environmental conditions (Appendix 1).

The environmental and lighting conditions for sampling in the different studies were natural (in clinics or laboratories), except for the use of ultraviolet light [42] and a cabinet or special chamber [31, 40]. Among the studies that carried out the color measurement using spectrophotometry, some of them adopted standardized lighting conditions [17, 36, 39], while in others the environmental conditions were not standardized [18, 32, 33]. In these cases, color measurement was performed by placing the probe tip in contact with the tooth surface, with most authors considering the middle third of the facial tooth surface to be the most appropriate region (Appendix 1). Particular environments were designed for spectroradiometry measurements according to the protocols established by different studies [32, 38, 43]. In these studies, the measurement was performed at a standardized distance of 8-9 centimeters from the measured object (Appendix 1).

The age of the sample, the number of teeth, the color system used (CIE L*a*b*, CIE LCH or CIE XYZ), and statistically significant differences observed by the different studies are shown in Table 2. Concerning the stratification of the analyzed samples, some studies segmented the results by age into three or more groups, which allowed obtaining comparable values and observing the change in tooth color between groups [5, 32, 37, 41, 42]; however, studies without age range groups made it impossible to establish this differentiation [17, 30, 31, 34, 36, 38, 39, 43].

All studies included both men and women. Four of the analyzed studies showed the results segmented by sex [17, 31, 36, 37]. The number of samples differed among the analyzed studies; the largest sample size was found in Krasniqui et al., with 2295 teeth [41], and the smallest in Cho et al., with 47 teeth [44] (Table 2).

Regarding the measurement system, fifteen of the included studies used the CIE L*a*b* system [5, 17, 18, 30, 32, 34,35,36,37,38,39, 41,42,43,44], four studies used the CIE L*C*H* system [17, 33, 35, 39], and two studies employed the CIE XYZ system [31, 32] (Table 2). The results of all the analyzed studies showed the color variables based on the mean and standard deviation, except for two studies: one that only provided mean values [17] and another based on means values and typical deviation [30].

Table 3 shows linear regression models with age as a dependent variable. The results of the analysis models could vary according to the sample (in vivo [intraoral measure] or ex vivo [extracted tooth]), the measurement location and sex. In the case of the study by Devos et al., the best model for estimating dental age was in vivo, with a value of R2=0.56; the standard error of age estimation was not reported [40]. According to the author's results, the proposed model underestimates the real age in the highest age categories and overestimates the chronological age in the lowest categories [40]. The linear regression model published by Martin de las Heras et al. (Age=19.3-0.44WIC) has a standard error of 11.7 years and r=0.75 for ex vivo fresh extracted teeth [31].

Table 4 shows the analysis of age as a dependent variable in Receiver Operating Characteristic (ROC) curve models. The different color variables were used to determine the sensitivity and specificity for estimating age in different age groups of ten years range (i.e., 20-29 years, 30-39 years, etc.). The Area Under the Curve (AUC) values ranged between 0.7 and 0.8 for all dental color parameters measured in the upper central incisors, except for the variables L* and Z%, with sensitivity values ranging between 56% and 93%, and specificity values between 38% and 92% [32].

Table 5 shows the regression models (equation) for estimating the tooth color variables, considering age as an independent variable in the model. The model with the highest predictive value was developed by Gomez Polo et al. (2015) [29]. The value of L* as dependent variable in regression models could be estimated with a reliability of 48.8% in men (y=87.98-0.28*years) and 45.1% in women (y=88.03-0.23*years) with a value of R2=0.48 and 0.45, respectively [29].

Figures 2, 3 and 4 show the random-effects meta-analysis model performed on tooth color variables in the CIE L*a*b* system grouped by age: under 30 years (Fig. 2), 30-59 years (Fig. 3) and 60 years and older (Fig. 4). The study carried out by Da Silva et al. was not included in the meta-analysis, as it used non-comparable techniques (ultraviolet light) [42].

Fig. 2
figure 2

Forest plot of the meta-analysis performed on L* (A) a* (B) and b* (C) values in studies with individuals younger than 30 years old

Fig. 3
figure 3

Forest plot of the meta-analysis performed on L* (A) a* (B) and b* (C) values in studies with individuals between 30-59 years old

Fig. 4
figure 4

Forest plot of the meta-analysis performed on L* (A) a* (B) and b* (C) values in studies with individuals over 60 years old

Seven groups of samples under 30 years of age from 6 studies were included (Fig. 2). In a total of 434 samples, heterogeneity (I2) was very high in all three analyses, thus the variability is not explainable by simple sampling error. The study by Kim contributed 200 samples (100 male and 100 female), representing 46.08% of the present analysis samples [37]. The L* mean value for this age group was 74.78 (95%CI 67.18; 82.39 p < 0.01) (Fig. 2A). The a* mean value was 1.07 (95%CI -0.93; 3.07 p < 0.01) (Fig. 2B). Likewise, the b* mean was 16.35 (95%CI 11.66; 21.04 p < 0.01) (Fig. 2C). For the 3 variables analyzed, heterogeneity was greater than 98%.

Ten groups of samples aged 30-59 years from 6 studies were included (Fig. 3). The study by Kim (2018) contributed 274 samples (137 male and 137 female), representing 38.05% of the analysis samples [37]. The L* mean value for this age range was 70.07 (95%CI 62.95 ; 77.18 p < 0.01) (Fig. 3A); the a* mean was 0.51 (95%CI -0.06; 1.09 p < 0.01) (Fig. 3B), and the b* mean was 16.17 (95%CI 12.05 ; 20.29 p < 0.01) (Fig. 3C). For the 3 variables analyzed, the heterogeneity of the results was very high.

Five groups of a total of 250 samples aged ≥60 years from 3 studies were included (Fig. 4). The study by Kim contributed 200 samples (100 male and 100 female), representing 80% of the analysis samples [37]. The L* mean value was 65.71 (95%CI 59.21; 72.22 p < 0.01) (Fig. 4A); the a* mean was 1.70 (95%CI 0.48; 2.92 p < 0.01); and the b* mean was 17.77 (95%CI 11.49; 23.96 p < 0.01) (Fig. 4C). Heterogeneity was 98% for the L* and b* values, and 86% for the a* values.

Discussion

Age estimation is a major challenge in anthropology and forensic odontology laboratories, as well as in judicial settings, since it is one of the tools used in human identification. Color tooth measurement is a very valuable method for estimating age in living individuals, complying with medical ethical standards, especially in some countries where low-dose X-ray exposure is considered unethical for age estimation. This review provides a meta-analysis of methods for age estimation by accurate measurement of aging-related changes in tooth color.

According to the present review, the type of tooth used to estimate age based on color is healthy anterior teeth (incisors and canines) free of caries, cavities, attrition, endodontics, reconstruction, breakage, bleaching, or abnormal staining. Previous studies have analyzed tooth color measurement devices [19], mainly for clinical purposes, and concluded that spectrophotometers, colorimeters and imaging systems are useful and relevant tools. However, in our systematic review, the most commonly used instrument was the spectrophotometer; this may be mainly due to the fact that spectrophotometric shade analysis is accurate, reproducible and portable [45], which facilitates its transport in real forensic cases. Regarding the systems employed, we observed that fifteen of the included studies used the CIE L*a*b* system [5, 17, 18, 30, 32, 34,35,36,37,38,39, 41,42,43,44], four studies used the CIE L*C*H* system [17, 33, 35, 39], and two studies used the CIE L*a*b* system with parameters X,Y and Z [31, 32] (Table 3). According to our results, the CIE L*a*b* system has been the most widely used, probably due to its advantages, such as an easier interpretation of the psychophysical dimensions of color perceptions and the possibility to estimate the magnitude of the differences between two color stimuli using the chromaticity diagram [40].

The statistical methods proposed to estimate age according to tooth color in adults are linear regression models and ROC curves, with the former being the most commonly used. These studies used age as a dependent (Table 3) or independent (Table 5) variable. The studies based on age as the dependent variable showed R2 values between 0.28 and 0.56, being higher in cases of ex vivo teeth (Table 3). Among the studies on ex vivo teeth, the results of studies that estimated age from fresh extracted teeth [31] should be differentiated from the results of others that analyzed stored extracted teeth [40], since storage time may affect tooth color. However, the results obtained from the ex vivo studies are similar in Martin de las Heras and Devos [31, 40], although the correlation coefficients of the former were higher. In addition, different works show that ex vivo teeth are darker than in vivo teeth [15, 40]; therefore, age estimation using tooth color in corpses should be taken with caution [15].

Although the standard error of the models should be applied in practical forensic cases and compared with other populations, some studies did not reflect this [29, 37, 40, 43]. Studies based on age as an independent variable showed an R2 with values ranging from 0.10 to 0.48. In this regard, the model with the highest predictive capacity was the one developed by Gomez Polo et al., with L* representing the best predictive parameter in both males and females [29]. However, these results should be taken with caution, as age is not considered a covariate in these studies and there may be an important effect of age on data interpretation. Regarding the ROC curves, to our knowledge, only the study by Martin-de-las-Heras et al. performed the analysis using this method [32]. This study showed AUC values from 0.7 to 0.8 for all dental color parameters measured in upper central incisors, except for the variables L* and Z%, with sensitivity values ranging from 56% to 93% and specificity values ranging from 38% to 92% [32]. In this sense, a diagnostic test is considered “highly accurate” with an AUC value >0.9, “useful for some purposes” with a value of 0.7–0.9, and “poor” with a value of 0.5–0.7 [46]. Applying this statistical interpretation, the method developed by Martin-de-las-Heras et al. is considered useful for age estimation [32].

Age estimation based on tooth color has certain limitations. Tooth color may vary as a result of age, but also due to other causes such as smoking, tobacco mastication, poor dental hygiene, consumption of foods and beverages with chemical components (coffee, tea, red wine), and excessive use of fluoride, as well as some treatments, illness and genetic factors [41]. Thus, most studies determined tooth color in non-restored, non-discolored and non-smokers’ teeth, which could limit the extrapolation to real cases [5, 32, 41]

Several aspects may influence age estimation. The most relevant seem to be the location of the color measurement and sex [40]. In ex vivo teeth, it is recommended to perform the color measurements on the vestibular surface of the crown and on the mesial and vestibular surface of the root. In the case of in vivo teeth, it is recommended to take measurements on the vestibular enamel at least 2 mm coronal to the gingival border [32, 40]. On both types of teeth, it is recommended to place the device in full contact with the tooth surface and measure color variables several times (from 3 to 5) using the average of the measurements [32, 40]. Regarding sex, men and women participated in all studies, but there was no homogeneity in the sample. In addition, there were discrepancies among studies. For example, Gomez Polo et al. observed that the mean value of L* and H* was significantly higher in women than in men, while the mean value of a*, b* and C* was significantly higher in men [17]. A similar situation occurred in the study by Gozalo-Díaz et al., where it was observed that women had higher values of lightness and lower values of yellowness than men in the analyzed teeth [43]. Martín de las Heras et al. showed statistically significant differences between sexes, especially in the values of the X and Z parameters [31], and Demirel and Tuncdemir observed that women had lighter teeth than men [5]. In contrast, for Hassel et al., sex was not a significant factor [35].

A randomized meta-analysis model of the results was performed for the CIE L*a*b* color variables stratified by age. In order to pool the largest number of studies and samples and obtain higher forensic relevance in the meta-analysis, the age groups were classified as follows: under 30 years, between 30 and 60 years, and over 60 years. The random model assumed that there was a τ2 variability (in addition to sampling variability), which explains differences between studies. Regarding the group under 30 years of age, our results showed a high heterogeneity for the parameters L* (I2=98%), a* (I2=99%), and b* (I2=99%). The group between 30 and 60 years of age showed high variability for the parameters L* (I2=99%), a* (I2=93%), and b* (I2=99%). The group older than 60 years also showed heterogeneity for the parameters L* (I2=98%), a* (I2=86%), and b* (I2=98%). All analyses showed very high heterogeneity (I2). The percentage of heterogeneity in effect sizes cannot be attributed to sampling error [47] or sample size. The heterogeneity is explained by several factors, such as the wide age range of each of the studies and the non-standardized conditions for color measurement [48]. In this regard, the colorimeters used in each study were different (with different sensitivity and precision), the environments and lighting conditions of the measurements were not the same, and the measurements were aggregated by very wide age ranges, affecting both the mean values and the confidence intervals.

According to our results, we recommend the use of linear regression models with age as the dependent variable, calculating the standard error for age estimation. It is also recommended to standardize the experimental conditions with a color measurement protocol, increasing the sample size of the studies, and including small age ranges, both sexes and different ethnicities.

To the best of our knowledge, this is the first systematic review to analyze methods for estimating age based on accurate measurement of tooth color changes. However, our study has some limitations. The articles were grouped into different age groups (under 30 years, between 30 and 60 years, and over 60 years) in order to include the maximum number of research papers in the meta-analysis, excluding others. For this reason, the results should be taken with caution; however, the randomized meta-analysis model attempts to balance this. The lack of standardization among studies may also be a limitation to extract data not only in our research but also in any future studies. In this regard, an international color-age estimation protocol tool is crucial to ensure consistency of data extraction and comparison between studies. In addition, despite the meticulous search strategy used, some articles may have been overlooked in the present study.

In conclusion, the use of linear regression models with age as the dependent variable to calculate the standard error is recommended to estimate the interval of dental age. Color measurement should be based on an accurate and objective technique such as spectrophotometry, which is the most commonly used method, placing the device in full contact with the tooth surface and measuring the color variables several times using the mean of the measurements. It is recommended to increase the sample size of the studies, including small age ranges, equation models for different types of teeth, both sexes, and different ethnicities. This systematic review also highlights the need to protocolize age estimation studies that measure tooth color in order to apply this method in different forensic settings.

Key points

  1. 1.

    This is the first systematic review and meta-analysis based on dental color estimation.

  2. 2.

    Linear regression models with standard deviation are recommended for age estimation.

  3. 3.

    The meta-analysis showed heterogeneity among the studies; several factors are discussed.

  4. 4.

    Small age ranges and standardized measurement conditions would improve age estimation.

  5. 5.

    Protocolized age estimation studies based on color changes are recommended.