New insights in the reproducibility of visual and electronic tooth color assessment for dental practice

Ratzmann, Anja; Welk, Alexander; Hoppe, Stephanie; Fanghaenel, Jochen; Schwahn, Christian

doi:10.1186/s13005-020-00248-w

Research
Open access
Published: 16 December 2020

New insights in the reproducibility of visual and electronic tooth color assessment for dental practice

Anja Ratzmann¹,
Alexander Welk²,
Stephanie Hoppe³,
Jochen Fanghaenel⁴ &
…
Christian Schwahn⁵

Head & Face Medicine volume 16, Article number: 37 (2020) Cite this article

2661 Accesses
2 Citations
Metrics details

Abstract

Background

The aim of the study was to compare a 2D and 3D color system concerning a variety of statistical and graphical methods to assess validity and reliability of color measurements, and provide guidance on when to use which system and how to interpret color distance measures, including ΔE and d(0M1).

Methods

The color of teeth 14 to 24 of 35 patients undergoing regular bleaching treatment was visually assessed and electronically measured with the spectrophotometer Shade Inspector™. Tooth color was recorded before bleaching treatment, after 14 days, and again after 6 months. VITAPAN® Classical (2D) and VITA-3D-Master® (3D) served as reference systems.

Results

Concerning repeated measurements, the 2D system was superior to the 3D system, both visually and electronically in terms of ΔE and d(OM1), for statistics of agreement and reliability. All four methods showed strong patterns in Bland-Altman plots. In the 3D system, hue was less reliable than lightness and chroma, which was more pronounced visually than electronically. The smallest detectable color difference varied among the four methods used, and was most favorable in the electronic 2D system. Comparing the methods, the agreement between the 2D and 3D system in terms of ΔE was not good. The reliability of the visual and electronic method was essentially the same in the 2D and 3D systems; this comparability is fair to good.

Clinical relevance

The 3D system may confuse human raters and even electronic devices. The 2D system is the simple and best choice.

Peer Review reports

Background

Valid and reliable measurements of tooth color are of major importance in esthetic and restorative dentistry as well as in dental technical practice. Tooth color is usually described based on the Munsell color space in terms of hue, value, and chroma [1, 2]. Hue measures the basic color, value indicates the lightness of a color, and chroma measures the saturation or intensity of a color. Value is determined first, followed by chroma, yielding hue as the third dimension. One of the most important prerequisites is the assessment of tooth color either via visual comparison with prefabricated color scales or using measuring devices such as a colorimeter, spectrophotometer or digital imaging systems with corresponding software [3]. The most common method in clinical practice is still the visual method using VITAPAN® Classical shade guide, which is a 2D system. In 1998, the VITA 3D-Master® shade guide was launched on the dental market. It was developed to systematize color determination, thereby enhancing the likelihood of valid and reliable color measurements [4,5,6,7]. Concerning the systematic determination, however, an implicit prior belief about the VITA 3D Master® was not checked in developing this color guide: namely, that any two 3D shades within the same dimension at given constant shade values of the other two dimensions can be well differentiated by the human eye. In fact, dentists and dental technicians believe that the third dimension (hue) is problematic and that the distance between adjacent 3D shades is not large enough in this dimension. To quantify color differences, ΔE as the Euclidean distance between two points in the color space of the three dimensions (value, chroma, and hue) has been used in the majority of dental color studies [8,9,10,11,12,13,14,15,16,17,18,19,20], although a modification of ΔE is preferable [21]. However, numerous studies comparing visual and electronic methods have been published over the past decade [3, 8, 11, 18,19,20, 22,23,24,25,26,27].

Taking tooth color measurements is a complex process. In psychology and statistics, it is well known that repeated measurements [28, 29] or groups of observations such as on patients’ teeth increase reliability [30, 31]. Moreover, the favored ΔE to measure color differences cannot be applied to important graphical and statistical methods for the assessment of validity and reliability, including Bland-Altman plots to examine patterns of disagreement and the intraclass correlation coefficient (ICC) to estimate measurement variability [32]. These limitations can be overcome by using the distance of each shade from 0M1 of the 3D color system, denoted by d(0M1) [33]. Because d(0M1) does not distinguish shades of the same radius from M1, d(0M1) and ΔE are complementary rather than competing. For example, in studying bleaching effects, d(OM1) may be favorable for 0M1 but less favorable for comparing shades by gender and age groups (or to study whether the gender difference in tooth color increases with age). In general, validity depends on the purpose [34] and is to be redefined for every research question; there is no such thing as a universal gold standard [35, 36]. Likewise, choosing methods to assess reproducibility depends on the purpose [37]. Whereas reliability is often related to calibration or comparability of examiners before and during performance of large cross-sectional or multicenter studies (only one measurement per participant in the full-scale investigation), the smallest detectable difference or the smallest detectable change is sought in longitudinal studies (at least two measurements per participant; measurement error occured twice or more) [37], when the difference between repeated measurements is in the focus of interest. The smallest detectable difference or, in the present context, the smallest detectable color difference (SDCD), describes a statistical property and is different from perceptible or acceptable color difference thresholds. The SDCD of a row of teeth can easily be recalculated from the SDCD of a single tooth [31]. The SDCD may differ from method to method and from study to study; it contradicts the idea that color difference thresholds are universally valid. In other words, the concept of a universal color difference threshold is scientifically misleading because it confuses validity and reliability. Moreover, color metrics are arbitrary, color perception is subjective, and acceptable color shade differences vary among different colors (ΔE: 1.1 among red shades and 2.1 among yellow shades) [38]. Despite these limitations of color science, it can serve as a rough guide for color difference thresholds and may be useful in daily tooth color determination in dentistry. Therefore, different aspects must be considered when comparing the conventional 2D system with the newer 3D system. This seems more reasonable, because it is more ordered. Ordering alone, however, may not be enough, because the human or electronic rater must have the chance to measure reliably. Whereas directly adjacent shades of the 3D system have mean ΔE values of about 3.8 for lightness (1M1-2M1-3M1) and 4.4 for chroma (2M1–2M2–2M3), the mean ΔE value is only about 1.5 for the six direct neighbors of hue (2L1.5–2R1.5;2L2.5–2R2.5) [38].

Thus, it can be hypothesized that hue is measured less reliably than lightness or chroma. This can be examined not only for an electronic rater but also for a human rater; within-subject comparisons are justified because the examiner serves as her/his own control (hue as exposure versus lightness or chroma as reference), similar to n-of-1 trials [39].

The aim of this study was to compare the 2D and the 3D color systems concerning a variety of statistical and graphical methods to assess validity and reliability, as well as to provide guidance on when to use which system and how to interpret ΔE and d0M1.

Materials and methods

Subjects and clinical procedure

In order to better assess clinically relevant color changes, color measurements were performed in patients receiving a regular in-office bleaching treatment (BT). The tooth-inclusion criteria for performing BT were no caries, endodontic treatment or restorations. Patients with insufficient oral hygiene, previous BT, periodontal disease, pregnancy, and allergy or hypersensitivity to the bleaching agents were excluded. The study was approved by the ethics committee of the Medical Association (Ärtzekammer) of Mecklenburg-Vorpommern (Reg. Nr.III UV 15/08). All patients gave informed consent. Thirty-five patients (24 women, 11 men, average age 30 years) from the Dental Clinic at the University of Greifswald participated. The complete clinical procedure was performed under standardized conditions according to the standardized clinical protocol for in-office bleaching under the supervision of an experienced dentist (AW). The bleaching procedure was performed on teeth 15 to 25 and 35 to 45. Supra- and subgingival plaque, stains and calculus were removed, and all teeth were polished with non-fluoridated, oil-free pumice before bleaching.

The gingiva was protected by a liquid gingiva protectant (Dental Dam, Schütz Dental, Rosbach, Germany) activated by a light-curing unit (Ortholux TM LED Lurnig Light, Fa. 3 M Unitek). Bleach’n Smile, 35% H₂O_2, (Schütz Dental, Rosbach, Germany) was applied three times for 10 minutes according to the manufacturer’s recommendation.”

After bleaching, all teeth were fluoridated with Elmex® gelée (CP GABA, Germany).

Visual and electronic color assessment

The color of labial surfaces of teeth 14 to 24 was visually assessed by an experienced dental technician, who was ophthalmologically examined before this study [40], under diffuse daylight between 11 a.m. and 3 p.m. The time needed for color assessment was not restricted. Electronic measurements were performed with the spectrophotometer Shade Inspector™ (Schütz-Dental, Rosbach, Germany) by a dentist calibrated prior to this study [40]. The color systems VITAPAN® Classical (2D-VC; VITA Zahnfabrik, Bad Säckingen, Germany) and VITA 3D-Master® (3D; VITA Zahnfabrik, Bad Säckingen, Germany) served as reference systems. The VC color system has a two-dimensional structure that enables the description of hue (category A to D) and lightness including chroma (group 1 to 4) [41]. It serves as the standard shade guide for visual color assessment in dental practice. The 3D color system has a three-dimensional structure that enables the separate description of lightness (1 to 5 and 0 for bleaching), chroma (1 to 3, including half points), and hue (M, L, R) [42]. For the measurement procedure, each tooth was categorized into the gingival (S₁), the body (S₂), and the incisal (S₃) segment. The incisal segment S₃ was not included in the analysis due to its transparency. Measurements were carried out as described in the previous study [33]. Time points of visual and electronic measurements were before BT (T₁/T₂- Baseline), 14 days (T₃/T₄) and 6 months (T₅/T₆) after BT (Fig. 1).

Statistical methods

ΔE = ((ΔL*)² + (Δa*)² + (Δb*)²)^1/2 and ΔE₀₀ [43] were calculated. ΔE₀₀ is superior to ΔE, but its calculation is quite sophisticated. Irregularities in the color space are corrected as follows: 1. the differences in the individual dimensions are calculated; 2. weighting is carried out; 3. finally a term for the interaction between the chroma differences and the hue differences is added; the calculation includes 22 lines of formulae [43]. ΔE₀₀ values are usually smaller than those of ΔE [21]. Here, we focused on ΔE because it is more commonly used. The Bland-Altman plot [44] is one of the most frequently cited methods in medicine. Although several adaptations have been discussed [45,46,47,48,49], we present only the classical plot with the mean difference and the limits of agreement for d(0M1), which is ΔE of each shade from 0M1. For method comparisons, but not for intra-rater comparisons, the regression line was added. Out of 840 paired observations, a total of 30–55 observations can be expected to be outside the limits of agreement according to M. Bland [50]. Besides the limits of agreement (difference between measurements ±1.96* standard deviation of the difference [44]), we present the agreement within 2.7 [16] and 3.7 [51] units of d(0M1) and ΔE. These agreement statistics and the difference between the pairs of observations (denoted by d₂ – d₁ for d(0M1), including standard deviation, are the only measurement error statistics also reported for ΔE. The standard error of measurement (SEM) is a further agreement statistic and reported in two versions [37], for which the values are very similar herein. The SDCD is defined as 1.96*√2*SEM ≈ 2.77*SEM [37]. The SDSC on the level of groups of observations or patient’s teeth is calculated according de Vet et al. 2001 [31]. In addition to agreement statistics, which are related to differences of repeated measurements, we present reliability statistics, which are related to calibration or comparability of raters or methods [34]. The fraction of the total measurement variance due to variance among teeth is estimated by three versions of the intraclass correlation coefficient (ICC) [28]. Whereas the ICC_(3,1) ignores systematic differences between the two methods, raters, or measurements of the same rater, the ICC_(2,1) includes an additional term of the variance among raters to account for the total measurement variance (denominator) [28, 37]. Thus, the greater the systematic difference between two raters, the smaller the ICC_(2,1) compared with the ICC_(3,1). The ICC is the most appropriate reliability statistic [37] and recommended besides the Bland-Altman plot [32]. To avoid confusing terminology, SEM, SDSC and ICC are presented in the terminology used in Shrout & Fleiss [28]. ICC and kappa, which are closely related [32, 52], are interpreted according to Byrt’s classification [53]. Graphics and statistical analyses were performed using Stata software, release 14.2 (Stata Corporation, College Station, TX, USA). As the American Statistical Association took a stand against Null Hypothesis Significance Testing [54, 55], we present confidence intervals as recommended [56]. Because accuracy requires a large sample size [44], we looked for at least 200 observations as recommended [57].

Results

Intra-rater variability

The agreement within the limits of ΔE < 2.7 was better for 2D than for 3D, both visually and electronically (Table 1). Figure 2 shows how the difference between two values of d(0M1) is related to ΔE, for which the difference between visual and electronical measurements was chosen. This difference in d(0M1) was strongly and substantially symmetrically related to ΔE (Fig. 2; R² = 0.69 for 2D and R² = 0.59 for 3D). The agreement within the limits of d(0M1| < 2.7 was also better for 2D than for 3D, both visually and electronically (Table 2). The limits of agreement were narrower for 2D_elec than for the remaining three methods (Table 2; Fig. 3). The Bland-Altman plots show clear patterns of disagreement for all methods, which is most pronounced for 2D_vis (Fig. 3). The d(0M1) range is narrowest for 2D_vis (11.0) and widest for 3D_elec (21.6) (Fig. 3); the variability of d(0M1) in terms of the pooled standard deviation is highest for 3D_elec. The reliability in terms of the ICC is good to very good for d(0M1) (Table 2).

Table 1 Agreement of repeated measurements for four methods in terms of ΔE and ΔE₀₀ related to a single tooth

Full size table

Table 2 Agreement and reliability of repeated measurements for four methods in terms of the distance from 0M1 related to a single tooth

Full size table

As hypothesized for the three single dimensions, hue is less reliable than lightness or chroma, both electronically (Kappa value for hue = 0.45, 95% CI: 0.40–0.50; ICC_(1,1) for lightness = 0.76, 95% CI: 0.74–0.79; ICC_(1,1) for chroma = 0.67, 95% CI: 0.63–0.70) and visually (Kappa value for hue = 0.01, 95% CI: − 0.05 – 0.06; ICC_(1,1) for lightness = 0.52, 95% CI: 0.47–0.57; ICC_(1,1) for chroma = 0.66, 95% CI: 0.62–0.69).

The standard errors of measurement and SDCDs were essentially the same for the four methods, except for 2D_elec, which was better (Table 2). On the level of groups of observations or patient’s teeth, the SDCD of 2D_elec diminished from 2.8 for a single tooth to 1.4 and 1.0 for four and eight teeth, respectively. The SDCD of 2D_vis decreased from 3.9 for a single tooth to 1.9 and 1.4 for four and eight teeth, respectively.

Inter-method variability

The comparability of visual and electronic measurements was fair to good in 2D and slight to fair in 3D for the agreement within the limits of ΔE < 2.7 (Table 3). The corresponding agreement of 2D and 3D measurements was fair in the visual approach, and poor to slight in the electronic approach (Table 3).

Table 3 Comparing methods of measurements in terms of ΔE and ΔE₀₀: 2D versus 3D within visual or electronical measurement; visual versus electronical measurements within 2D and 3D

Full size table

The comparability of visual and electronic measurements was good in 2D and fair in 3D for the agreement within the limits of |d(0M1)| < 2.7 (Table 4). The corresponding agreement of 2D and 3D measurements was good in the visual approach, and fair in the electronic approach (Table 4).

Table 4 Comparing methods of measurements of the distance from 0M1 related to a single tooth: 2D versus 3D within visual or electronical measurement; visual versus electronical measurements within 2D and 3D

Full size table

Concerning the comparability of the visual and electronic measurements, the difference d₂ – d₁, which indicates systematic error, was moderate in 2D and small in 3D (Table 4; Fig. 4). The Bland-Altman plots show marked patterns of disagreement for the approaches.

Concerning the comparability of 2D and 3D measurements, the difference d₂ – d₁ indicates systematic error, which was pronounced in the electronic approach (Table 4; Fig. 4). This difference can be interpreted as constant bias. Assuming proportional bias, the regression line can be cautiously interpreted. The Bland-Altman plots, however, showed clear patterns of disagreement for the approaches; the bias between the 2D and 3D system is neither constant nor uniquely proportional.

The reliability in terms of the ICC was fair to good for visual and electronic measurements. The reliability in terms of the ICC_(3,1), which ignores systematic differences, was good to very good for 2D and 3D measurements. The reliability in terms of the ICC_(2,1), which takes into account systematic differences, was poor to very good.

Discussion

The 2D system proved superior to the 3D system both visually and electronically in terms of ΔE and d(0M1) for statistics of agreement and reliability to assess intra-rater variability. All four methods showed strong patterns of disagreement between repeated measurements in Bland-Altman plots. As hypothesized, the 3D system is less reliable for hue than for lightness and chroma, a phenomenon which was more pronounced visually than electronically. The SDCD differs by the four methods used and was most favorable in the electronic 2D system. The agreement between the 2D and 3D systems in terms of ΔE was not good. It was lower in the electronic than in the visual method. The comparability of the 2D and 3D systems was uncertain, because confidence intervals of ICCs accounting for systematic error were wide. The systematic error between the 2D and 3D systems cannot be neglected. The reliability of the visual and electronic method was substantially the same in the 2D and 3D systems; this comparability was fair to good.

Below, the following aspects are discussed: 2D and 3D, visual and electronic, ΔE and d(0M1), Bland-Altman plots and statistics (patterns and numbers), single shade designations of the 3D system, validity and reliability, statistical SDCD and known thresholds, agreement and reliability (comparability), human and machine, and intra- and inter-method variability.

2D and 3D systems

The 2D and 3D systems differ in the color space assessed [33]. Some 3D shades that are lighter (lightness) or stronger (chroma) are not well covered by the 2D system, which is especially pronounced for the additional bleaching shades available only in the 3D system. Compared to VC, hue ranges of 3D Master are extended toward yellow-red, and 3D Master shades are more uniformly spaced than that of VC [6]. In contrast, there are spatial gaps in the 3D system which are filled in the 2D system [33, 41]. In short, both guides are suboptimal and can be improved [14].

The variability between raters may favor the 3D Master shade guide over the VC shade guide [58]. The coverage error favors the 3D system, although it is unclear whether the difference between the 2D and 3D systems is clinically relevant [12, 14, 59,60,61]. However, the clear patterns in Bland-Altman plots for d(0M1) cast doubt on the meaningfulness of converting 3D shades into VC shades (2D) as suggested elsewhere [62].

Visual and electronical method

The gaps mentioned above that are filled by the 2D system are supported by additional 2D shades to assess quarter-points for the second shade designation number [33], which is an important difference between the visual and electronic method. A further important difference is the extension of the second shade designation number from the visual four-point scale to the electronic five-point scale. Similarly, the electronic 3D system includes bleaching shades not used by the visual 3D system evaluated here. Thus, it could have been expected that a human rater is inferior to the electronic rater, especially for the 2D system. It is of note that the agreement of intra-rater variability in terms of ΔE and d(0M1) is better for the visual 2D measurement than that for the electronic 3D measurement.

Several studies have found that instrumental methods are more accurate or reliable than visual measurements [11, 19, 23,24,25, 63,64,65]. A recent study, however, has shown that clinically relevant differences between the visual evaluation and the intraoral scanning device (3Shape) are negligible [20]. According to Li & Wang, the reliability of shade matching can be ensured neither by the instrumental nor by the visual approach [66]. Furthermore, the difference in color matching between human-eye assessment and computerized colorimetry depends on tooth type [18] and shade [8].

ΔE and d(0M1)

ΔE supports only statistics on agreement; neither Bland-Altman plots nor reliability statistics are feasible. Essentially, d(0M1) enables evaluating patterns of disagreement, other agreement statistics such as SDCD, and reliability statistics including versions of ICC accounting for systematic errors. Regarding agreement of repeated measurements of the same rater, the differences among the four methods are substantially the same for ΔE < 2.7 and d(0M1). The level of agreement within fixed limits, however, is higher for d(0M1). For example, d(0M1) hardly differentiates 3M1 from 2L2.5 (d(0M1): 15.2 and 15.3, respectively) although ΔE is 8.3. Thus, if lightness is compensated by less chroma (or chroma by darkness), then d(0M1) will not work well. The systematic errors between 2D and 3D measurements in d(0M1) are plausible, because the 2D and 3D systems differ in the color space assessed (see above). Within the 2D system, systematic errors between visual and electronic measurements are small, which can be explained by the additional quarter-point shades in the electronic 2D system.

Bland-Altman plots and statistics – patterns and numbers

According to Bland-Altman plots, bias between the 2D and 3D systems is neither constant nor uniquely proportional. Even if these kinds of bias could be adjusted for – as suggested for uniquely proportional bias [48, 49] – the clear patterns are not appropriate for sophisticated statistical methods. Thus, Bland-Altman plots provide important information hardly available in numbers.

Single shade designations of the 3D system and d(0M1)

Although the reliability for the hue component of the visual 3D system is zero, the corresponding d(0M1) indicates good reliability. Likewise, the reliabilities are fair versus very good for the electronical 3D system, respectively. Thus, reliabilities of single shade designations can be misleading, especially for hue, for which ΔE values are only about 1.5 (see above). Nevertheless, the hue component of the 3D system is problematic, because its reliability is lower than those of lightness and chroma.

Validity and reliability

Colorimetry does not facilitate valid measurements. The value of d(0M1), however, supports pseudo-valid measurements, as the range of d(0M1) values differs across the four methods. The bleaching shades added to the electronic 3D system (not to the visual 3D system) make the difference: this range (21.6) is twice as high compared to visual 2D (11.0). Reliability in terms of the ICC depends on this range – if the variability of d(0M1) is small, the ICC will be small. As expected, the pooled standard deviation of the electronic 3D system is higher than that of the electronic 2D system. The ICC of the electronic 3D system, however, is lower, which emphasizes the problems with the 3D system – independent of human raters.

Smallest detectable color difference, acceptable and perceptible thresholds

An acceptability threshold of 2.7 in ΔE and a perceptibility threshold of 1.2 in ΔE are known [16]. The SDCD in terms of d(0M1) depends on the method and decreases from 2.8 to 1.0 for a row of eight teeth using electronic 2D measurements. These are statistical values and can differ from study to study. However, it is plausible that electronic 2D is the method with the best agreement, including SDCD. For properties of ΔE and d(0M1), electronic 2D is the recommended method for study designs with repeated measurements, such as longitudinal studies.

Agreement and reliability (comparability)

Whereas intra-rater agreement of repeated measurements in terms of SEM and SDCD does not differ between visual and electronic 3D measurements, the reliability or ICC differ substantially. Thus, a single human rater is not worse than the electronic device for a longitudinal study when using the 3D system. The comparability of the four methods remains uncertain. Therefore, the same method should also be used in multicenter studies.

Human and machine

Compared with a set of human raters, a set of devices from the same electronic system should have higher levels of standardization [67], which corresponds to the more favorable ICCs observed. However, n-of-1 trials, as used herein for the single human rater, limit generalizability. It may be further argued that the human rater lacks the ability to perceive hue [39]. But even if the examiner had lacked this ability, this would not have invalidated our conclusions, because we did not make an isolated statement on hue, but rather compared hue with lightness and chroma. These intra-human comparisons are supported by the n-of-1 trial design. Moreover, the same intra-device comparisons support the hypothesis that hue is not well reproducible; the electronic reliability of hue is merely fair. In addition to our findings, background knowledge further supports that 3D hue cannot be well assessed (see Introduction).

Intra- and inter-method variability – validity revisited

Whereas the reliability within each of the four methods is good to very good, comparability of the visual and electronic measurements is only fair to good. This also questions the validity of visual and electronic measurements. In turn, this question also refers to the difference between the 2D and 3D system. In fact, Bland-Altman plots using the 2D system suggest that both visual and electronic values are valid only for d(0M1) values of about 12 (A1 – A2, B1 – B2) and greater than 20 (A4, B3 – B4, C3 – C4, D4). The shades B1 and A2 are not well covered by the 3D system [33], which is mirrored in the corresponding Bland-Altman plots. Vice versa, 3D shades 1M1 and 1M2 (both d(0M1)<11.2 for the minimum of the 2D system) are not well covered by the 2D system [33] and question the validity of adjacent 2D shades, namely A1, B1, and B2. In daily practice, the 3D system may be useful for shades not available in the 2D system. Nevertheless, switching between methods cannot be recommended in scientific studies. The 3D system, however, can be favorable in bleaching studies owing to the added bleaching shades.

Conclusion

The 3D system may confuse both human raters and electronic devices. The 2D system is the simple and best choice.

Availability of data and materials

All data are available on request at the Department of Orthodontics, Dental School, University Medicine, Walther-Rathenau-Strasse 42, 17475 Greifswald, Germany.

Abbreviations

2D_vis :: 2D_visual
2D_elec :: 2D_electronical
3D_vis :: 3D_visual
3D_elec :: 3D_electronical

References

Terry DA, Geller W, Tric O, Anderson MJ, Tourville M, Kobashigawa A. Anatomical form defines color: function, form, and aesthetics. Pract Proced Aesthet Dent. 2002;14:59–67 quiz 68.
PubMed Google Scholar
Sikri VK. Color: implications in dentistry. J Conserv Dent. 2010;13:249–55.
Article PubMed PubMed Central Google Scholar
Chu SJ, Trushkowsky RD, Paravina RD. Dental color matching instruments and systems. Review of clinical and research aspects. J Dent. 2010;38(Suppl 2):e2–16.
Article PubMed Google Scholar
Paravina RD, Powers JM, Fay RM. Dental color standards: shade tab arrangement. J Esthet Restor Dent. 2001;13:254–63.
Article CAS PubMed Google Scholar
Paravina RD, Majkic G, Imai FH, Powers JM. Optimization of tooth color and shade guide design. J Prosthodont. 2007;16:269–76.
Article PubMed Google Scholar
Paravina RD, Powers JM, Fay RM. Color comparison of two shade guides. Int J Prosthodont. 2002;15:73–8.
PubMed Google Scholar
Paravina RD. Performance assessment of dental shade guides. J Dent. 2009;37(Suppl 1):e15–20.
Article PubMed Google Scholar
Yap AU, Sim CP, Loh WL, Teo JH. Human-eye versus computerized color matching. Oper Dent. 1999;24:358–63.
CAS PubMed Google Scholar
Sim CP, Yap AU, Teo J. Color perception among different dental personnel. Oper Dent. 2001;26:435–9.
CAS PubMed Google Scholar
Wee AG, Monaghan P, Johnston WM. Variation in color between intended matched shade and fabricated shade of dental porcelain. J Prosthet Dent. 2002;87:657–66.
Article PubMed Google Scholar
Paul S, Peter A, Pietrobon N, Hammerle CH. Visual and spectrophotometric shade analysis of human teeth. J Dent Res. 2002;81:578–82.
Article CAS PubMed Google Scholar
Li Q, Yu H, Wang YN. In vivo spectroradiometric evaluation of colour matching errors among five shade guides. J Oral Rehabil. 2009;36:65–70.
Article CAS PubMed Google Scholar
Hassel AJ, Cevirgen E, Balke Z, Rammelsberg P. Intraexaminer reliability of measurement of tooth color by spectrophotometry. Quintessence Int. 2009;40:421–6.
PubMed Google Scholar
Cocking C, Cevirgen E, Helling S, Oswald M, Corcodel N, Rammelsberg P, Reinelt G, Hassel AJ. Colour compatibility between teeth and dental shade guides in Quinquagenarians and septuagenarians. J Oral Rehabil. 2009;36:848–55.
Article CAS PubMed Google Scholar
Olms C, Setz JM. The repeatability of digital shade measurement--a clinical study. Clin Oral Investig. 2013;17:1161–6.
Article PubMed Google Scholar
Paravina RD, Ghinea R, Herrera LJ, Bona AD, Igiel C, Linninger M, Sakai M, Takahashi H, Tashkandi E, Perez MM. Color difference thresholds in dentistry. J Esthet Restor Dent. 2015;27(Suppl 1):S1–9.
Article PubMed Google Scholar
Knezovic D, Zlataric D, Illes IZ, Alajbeg M, Zagar. In vivo evaluations of inter-observer reliability using VITA Easyshade(R) advance 4.0 dental shade-matching device. Acta Stomatol Croat. 2016;50:34–9.
Article PubMed PubMed Central Google Scholar
Igiel C, Weyhrauch M, Wentaschek S, Scheller H, Lehmann KM. Dental color matching: a comparison between visual and instrumental methods. Dent Mater J. 2016;35:63–9.
Article CAS PubMed Google Scholar
Lehmann K, Devigus A, Wentaschek S, Igiel C, Scheller H, Paravina R. Comparison of visual shade matching and electronic color measurement device. Int J Esthet Dent. 2017;12:396–404.
PubMed Google Scholar
Mehl A, Bosch G, Fischer C, Ender A. In vivo tooth-color measurement with a new 3D intraoral scanning system in comparison to conventional digital and visual color determination methods. Int J Comput Dent. 2017;20:343–61.
PubMed Google Scholar
Gomez-Polo C, Portillo Munoz M, Lorenzo Luengo MC, Vicente P, Galindo P, Martin Casado AM. Comparison of two color-difference formulas using the Bland-Altman approach based on natural tooth color space. J Prosthet Dent. 2016;115:482–8.
Article PubMed Google Scholar
van der Burgt TP, ten Bosch JJ, Borsboom PC, Kortsmit WJ. A comparison of new and conventional methods for quantification of tooth color. J Prosthet Dent. 1990;63:155–62.
Article PubMed Google Scholar
Fani G, Vichi A, Davidson CL. Spectrophotometric and visual shade measurements of human teeth using three shade guides. Am J Dent. 2007;20:142–6.
PubMed Google Scholar
Judeh A, Al-Wahadni A. A comparison between conventional visual and spectrophotometric methods for shade selection. Quintessence Int. 2009;40:e69–79.
PubMed Google Scholar
Browning WD, Chan DC, Blalock JS, Brackett MG. A comparison of human raters and an intra-oral spectrophotometer. Oper Dent. 2009;34:337–43.
Article PubMed Google Scholar
Gomez-Polo C, Gomez-Polo M, Celemin-Vinuela A, Martinez Vazquez De Parga JA. Differences between the human eye and the spectrophotometer in the shade matching of tooth colour. J Dent. 2014;42:742–5.
Article PubMed Google Scholar
Parameswaran V, Anilkumar S, Lylajam S, Rajesh C, Narayan V. Comparison of accuracies of an intraoral spectrophotometer and conventional visual method for shade matching using two shade guide systems. J Indian Prosthodont Soc. 2016;16:352–8.
Article PubMed PubMed Central Google Scholar
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–8.
Article CAS PubMed Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models. A modern perspective. 2nd ed. Boca Raton: Chapman & Hall/CRC; 2006.
Book Google Scholar
Altman DG, Bland JM. Statistics notes - standard deviations and standard errors. Br Med J. 2005;331:903.
Article Google Scholar
de Vet HCW, Bouter LM, Bezemer PD, Beurskens AJHM. Reproducibility and responsiveness of evaluative outcome measures - theoretical considerations illustrated by an empirical example. Int J Technol Assess Health Care. 2001;17:479–87.
Article PubMed Google Scholar
Szklo M, Nieto FJ. Epidemiology. Beyond the basics. 3rd edn. Burlington: Jones & Bartlett Learning; 2014.
Google Scholar
Ratzmann A, Schwahn C, Treichel A, Faltermeier A, Welk A. Assessing the effect of multibracket appliance treatment on tooth color by using electronic measurement. Head Face Med. 2018;14:22.
Article PubMed PubMed Central Google Scholar
Porta M, Greenland S, Hernán M, dos Santos SI, Last JM, Burón A. A dictionary of epidemiology. 6th ed. Oxford: Oxford University Press; 2014.
Book Google Scholar
Gigerenzer G, Marewski JN. Surrogate science: the idol of a universal method for scientific inference. J Manag. 2015;41:421–40.
Google Scholar
Greenland S. Invited commentary: the need for cognitive science in methodology. Am J Epidemiol. 2017;186:639–45.
Article PubMed Google Scholar
de Vet HCW, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol. 2006;59:1033–9.
Article PubMed Google Scholar
Ahn JS, Lee YK. Color distribution of a shade guide in the value, chroma, and hue scale. J Prosthet Dent. 2008;100:18–28.
Article PubMed Google Scholar
Senn S. Statistical issues in drug development. Chichester: Wiley; 2007.
Book Google Scholar
Ratzmann A, Klinke T, Schwahn C, Treichel A, Gedrange T. Reproducibility of electronic tooth colour measurements. Biomed Tech (Berl). 2008;53:259–63.
Article Google Scholar
Park JH, Lee YK, Lim BS. Influence of illuminants on the color distribution of shade guides. J Prosthet Dent. 2006;96:402–11.
Article PubMed Google Scholar
Vita Z. Dental Shade Guides. In J Am Dent Assc. 2002;133:366–7.
Article Google Scholar
Sharma G, Wu WC, Daa EN. The CIEDE2000 color-difference formula: implementation notes, supplementary test data, and mathematical observations. Color Res Appl. 2005;30:21–30.
Article Google Scholar
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–10.
Article CAS PubMed Google Scholar
Bland JM, Altman DG. Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet. 1995;346:1085–7.
Article CAS PubMed Google Scholar
Krouwer JS. Why Bland-Altman plots should use X, not (Y+X)/2 when X is a reference method. Stat Med. 2008;27:778–80.
Article PubMed Google Scholar
Carstensen B. Comparing methods of measurement: extending the LoA by regression. Stat Med. 2010;29:401–10.
Article PubMed Google Scholar
Taffé P, Peng M, Stagg V, Williamson T. MethodCompare: an R package to assess bias and precision in method comparison studies. Stat Methods Med Res. 2019;28(8):2557–65. https://doi.org/10.1177/0962280218759693.
Taffe P, Peng MK, Stagg V, Williamson T. Biasplot: a package to effective plots to assess bias and precision in method comparison studies. Stata J. 2017;17:208–21.
Article Google Scholar
Bland M. Should all my observations lie between the limits of agreement? 2004.
Google Scholar
Johnston WM, Kao EC. Assessment of appearance match by visual observation and clinical colorimetry. J Dent Res. 1989;68:819–22.
Article CAS PubMed Google Scholar
Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York: Wiley; 1981.
Google Scholar
Byrt T. How good is that agreement? Epidemiology. 1996;7:561.
Article CAS PubMed Google Scholar
Wasserstein RL, Assoc AS. ASA statement on statistical significance and P-values. Am Stat. 2016;70:131–3.
Article Google Scholar
Wasserstein RL, Lazar NA. The ASA's statement on p-values: context, process, and purpose. Am Stat. 2016;70:129–31.
Article Google Scholar
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50.
Article PubMed PubMed Central Google Scholar
How can I decide the sample size for a study of agreement between two methods of measurement? 2004. Last accessed October 5, 2020. https://wwwusers.york.ac.uk/~mb55/meas/sizemeth.htm.
Oh WS, Koh IW, O'Brien WJ. Estimation of visual shade matching errors with 2 shade guides. Quintessence Int. 2009;40:833–6.
PubMed Google Scholar
Hassel AJ, Koke U, Schmitter M, Beck J, Rammelsberg P. Clinical effect of different shade guide systems on the tooth shades of ceramic-veneered restorations. Int J Prosthodont. 2005;18:422–6.
PubMed Google Scholar
Bayindir F, Kuo S, Johnston WM, Wee AG. Coverage error of three conceptually different shade guide systems to vital unrestored dentition. J Prosthet Dent. 2007;98:175–85.
Article PubMed PubMed Central Google Scholar
Ongul D, Sermet B, Balkaya MC. Visual and instrumental evaluation of color match ability of 2 shade guides on a ceramic system. J Prosthet Dent. 2012;108:9–14.
Article PubMed Google Scholar
Zenthofer A, Wiesberg S, Hildenbrandt A, Reinelt G, Rammelsberg P, Hassel AJ. Selecting VITA classical shades with the VITA 3D-master shade guide. Int J Prosthodont. 2014;27:376–82.
Article PubMed Google Scholar
Paul SJ, Peter A, Rodoni L, Pietrobon N. Conventional visual vs spectrophotometric shade taking for porcelain-fused-to-metal crowns: a clinical comparison. Int J Periodontics Restorative Dent. 2004;24:222–31.
PubMed Google Scholar
Pimentel W, Tiossi R. Comparison between visual and instrumental methods for natural tooth shade matching. Sci Color. 2014:47–9.
Liberato WF, Barreto IC, Costa PP, de Almeida CC, Pimentel W, Tiossi R. A comparison between visual, intraoral scanner, and spectrophotometer shade matching: a clinical study. J Prosthet Dent. 2019;121:271–5.
Article PubMed Google Scholar
Li Q, Wang YN. Comparison of shade matching by visual observation and an intraoral dental colorimeter. J Oral Rehabil. 2007;34:848–54.
Article CAS PubMed Google Scholar
Todorovic A, Todorovic A, Gostovic AS, Lazic V, Milicic B, Djurisic S. Reliability of conventional shade guides in teeth color determination. Vojnosanit Pregl. 2013;70:929–34.
Article PubMed Google Scholar

Download references

Acknowledgements

Not applicable.

Informed consent

Informed consent was obtained individually from all participants included in the study.

Explanation of any issues relating to journal policies

No issues.

Confirmation that all authors have read and approved the manuscript

All authors have read and approved the manuscript.

Confirmation that the content of the manuscript has not been published or submitted for publication elsewhere

The authors confirm that the content of the manuscript has not been published or submitted for publication elsewhere.

Funding

The study was not funded. Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Orthodontics and Department of Dental Propaedeutics/Community Dentistry, Dental School, University Medicine, Walther-Rathenau-Strasse 42, 17475, Greifswald, Germany
Anja Ratzmann
Department of Restorative Dentistry, Periodontology, Endodontology, Preventive and Paediatric Dentistry, Dental School, University Medicine, Walther-Rathenau-Strasse 42, 17475, Greifswald, Germany
Alexander Welk
Private Dental Office, Promenadestrasse, 296047, Bamberg, Germany
Stephanie Hoppe
Department of Orthodontics, Dental School, University Medicine, Franz-Josef-Strauß-Allee 11, 93053, Regensburg, Germany
Jochen Fanghaenel
Department of Prosthetic Dentistry, Gerontology and Biomaterials, University Medicine, Walther-Rathenau-Strasse 42, 17475, Greifswald, Germany
Christian Schwahn

Authors

Anja Ratzmann
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Welk
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Hoppe
View author publications
You can also search for this author in PubMed Google Scholar
Jochen Fanghaenel
View author publications
You can also search for this author in PubMed Google Scholar
Christian Schwahn
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AR: contributed to design, recruiting participants, analysis and interpretation, writing of manuscript, AW: contributed to design, supervision clinical treatment, analysis and interpretation, revising the manuscript, JF: critically revised the manuscript,. StH: contributed to data acquisition. CS: contributed to statistical analysis and interpretation, writing statistical section of manuscript. All authors gave final approval and agreed to be accountable for all aspects of the work.

Corresponding author

Correspondence to Anja Ratzmann.

Ethics declarations

Ethics approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee of the Ärztekammer Mecklenburg-Vorpommern (Reg. Nr.III UV 15/08) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Consent for publication

Not applicable.

Competing interests

All authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Ratzmann, A., Welk, A., Hoppe, S. et al. New insights in the reproducibility of visual and electronic tooth color assessment for dental practice. Head Face Med 16, 37 (2020). https://doi.org/10.1186/s13005-020-00248-w

Download citation

Received: 08 July 2020
Accepted: 18 November 2020
Published: 16 December 2020
DOI: https://doi.org/10.1186/s13005-020-00248-w

New insights in the reproducibility of visual and electronic tooth color assessment for dental practice

Abstract

Background

Methods

Results

Clinical relevance

Background

Materials and methods

Subjects and clinical procedure

Visual and electronic color assessment

Statistical methods

Results

Intra-rater variability

Inter-method variability

Discussion

2D and 3D systems

Visual and electronical method

ΔE and d(0M1)

Bland-Altman plots and statistics – patterns and numbers

Single shade designations of the 3D system and d(0M1)

Validity and reliability

Smallest detectable color difference, acceptable and perceptible thresholds

Agreement and reliability (comparability)

Human and machine

Intra- and inter-method variability – validity revisited

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Informed consent

Explanation of any issues relating to journal policies

Confirmation that all authors have read and approved the manuscript

Confirmation that the content of the manuscript has not been published or submitted for publication elsewhere

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Head & Face Medicine

Contact us