The Repeatability, Reproducibility, and Correlation of the Schirmer Test: A Comparison of Open versus Closed Eye

Eren Ekici; Cagatay Caglar; Esra Ün Akgümüş

doi:10.3341/kjo.2022.0006

Abstract

Purpose

To compare the repeatability and reproducibility of the Schirmer test (ST) without anesthesia when the eyes are open (STo) and closed (STc) in previously undiagnosed patients with suggestive symptoms of dry eye.

Methods

In a comparative, observational series case study, 31 patients were included in the study. STo and STc were alternately applied for a total of six times. The ST was applied two times with the eyes open (S1) and closed (S2), respectively, by a single ophthalmologist. Then the ST was repeated four times with the eyes open (S3, S5) and closed (S4, S6), respectively, by a single nurse.

Results

S1, S3, and S5 were 23.4, 23.7, and 23.3 mm, respectively. S2, S4, and S6 were 14.7, 15.6, and 16.6 mm, respectively. STc scores were found to be statistically lower than the STo’s in general (right: t = 2.033, p = 0.048; left: t = 3.474, p = 0.004). There was no statistically significant difference in the scores of the three tests with open eyes: S1, S3, and S5 (p = 0.462). There was also no statistically significant difference in the scores of the three tests with closed eyes: S2, S4, and S6 (p = 0.05).

Conclusions

Our study suggests that although administering the ST with the patient’s eyes open produces higher readings than STc in patients with suggestive symptoms of dry eye, there was an acceptable reliability among tests performed open and closed. Moreover, intraexaminer reliability was higher than interexaminer reliability for both with the eyes open and closed.

Keywords: Dry eye syndromes, Repeatability, Reproducibility, Schirmer test

Dry eye arises out of a variety of genetic, environmental and lifestyle conditions that produce changes in the tears and surface of the eyeballs. It leads to uncomfortableness, impairment in vision, and tear film instability with possible harmful effects to the ocular surface [1]. As reported by several researchers, dry eye is among the most frequently established diagnoses in an ophthalmology practice, severely reducing the patient’s quality of life [2]. Although the presence of some specific symptoms in the patient constitutes strong evidence for the diagnosis of dry eye disease, some tests should be performed in these patients. Each type of tests makes specific information available in connection with the condition of the ocular surface [3].

A “gold standard” diagnostic test is not present for dry eye disease. Plus, determining and being in unison on the most proper dry eye diagnostic tests for clinical practice is still debatable [4]. Despite the fact that a great variety of tests of tear production exist, the Schirmer test (ST) is one of the most employed methods peculiar to identify and assess the ocular tear production. In 1903, Otto Schirmer defined that uncomplicated test for the first time [5], which is still frequently applied in the office to evaluate aqueous tear production. Three variations related to ST have been described [4,6-9]. ST-I has two branches: ST-I without anesthesia and with topical anesthesia. When performed without anesthesia, the ST-I evaluates basal tear secretion of the main lacrimal gland coupled with the trigeminal reflex tearing in which irritating nature of the filter paper makes the tear secretion develop. Whereas the function of the basal lacrimal secretion is measured by ST-I performed after topical anesthesia. ST-II test is primarily used for measuring the reflex tear secretion of the main lacrimal gland through causing an irritation on the nasal mucosa with a cotton-tipped applicator before evaluating tear production [9-11]. While this widely used test in practice is reliable and reproducible as to Prause et al. [12], it had been demonstrated to dearth of precision and reproducibility for detecting dry eye according to some other studies as well and the wide fluctuation in the test outcomes of the same person taken at the same time each day for several days has been shown. [4,6,13]. On the other hand, despite it essentially defined by Schirmer as a test to be performed with the patient seated and with open eyes, blinking freely [5], it is specified by some reports that the test is conducted with eyes open, others with eyes closed, while others remain unspecified. More recently, the effect of the eye position being open versus closed on the value of the ST has been measured, and upper ST values in the opened eye position were reported [14]. When the eyes are open, the effects of the upper/lower lid margins and eyelashes in stimulating tear secretion along with the influence of external factors such as evaporation, humidity, and temperature can be increased. All of these components can contribute to higher ST scores.

Consequently, this study aimed to contrast the repeatability and reproducibility of the ST without anesthesia when the eyes are open (STo) and closed (STc), and to investigate the correlation of these tests with each other.

Materials and Methods

Sixty-two eyes of 31 patients who had not previously been diagnosed with dry eye and had complaints of stinging, burning, ocular fatigue, and grittiness were recruited to the study. Patients with these complaints took the ST in the outpatient clinic conditions. Prior to the testing sessions, informed written consent was obtained from all participants. Study methods conformed to the ethical guidelines of the Declaration of Helsinki. The study was approved by the Clinical Research and Ethics Committee of Hitit University (No. 232/5.5.2020).

A routine ophthalmic examination was performed for all individuals. Patients with anamnesis of dry eye diagnosis and artificial tears use, previous eye surgery history, ocular infection, ocular allergy, contact lens use, ocular medication, systemic medications (e.g., antidepressants, antihistamines, decongestants, hormone replacement therapy, drugs for acne and Parkinson’s disease, and etc.) known to affect tear production, previous punctual occlusion procedures, tear gland damage from inflammation or radiation, conjunctival concretions, and eyelid malpositions were excluded from the study.

STo was administered as S1, S3, and S5, whereas STc was administered as S2, S4, and S6. The patients during STo were instructed to stay with both eyes open, glancing at a higher point. At the same time, they were permitted to blink without restriction in principle and were requested to desist from blinking for as long as possible during STo. In contrast, the patients during STc were asked to keep their eyes closed. To evaluate the reproducibility, the ST was performed once by two separate examiners: S1 and S2 by a doctor, and S3 and S4 by a nurse. In order to evaluate the repeatability, it was performed by the nurse once again (S5 and S6). In other words, ST without anesthesia was performed two times (S1, S2) by a single ophthalmologist (EE). Then, the ST was repeated four times (S3, S4, S5, and S6), by a single nurse (EÜA). Overall, the ST was performed six times with 15-minute interval at a total of three visits. Before performing STs, any noticeable fluid was gently removed from the lower lid margin with a cotton swab each time. After a 1-minute waiting period, a Schirmer strip of filter paper was placed in the lower culde-sac within 2 to 3 mm from the lateral canthus of each eye with the patient seated and the eyes open and closed alternately without anesthesia. After 5 minutes, the strip was removed, and the amount of wetting was measured in millimeters. Patients who have readings above 40 mm (i.e., the test paper was completely wetted) were excluded as a reason for the incapability to give an accurate measurement of the amount of wetting. To overcome environmental parameters such as light or temperature, all STs were performed in the same room with the nonexistence of airflow and a steady temperature (range, 21°C to 24°C) at the same time interval of a day. To minimize the test anxiety, patients were informed well and made to be acquainted with the procedure. In order to reduce a sequence effect of the two tests, the participants were randomly assigned as to the order of the ST series (e.g., open or closed). Moreover, the sequence was rotated for all patients. For instance, provided that the ST of the previous patient was initiated with STo, the following was initiated with STc.

Statistical analysis

IBM SPSS ver. 22.0 (IBM Corp., Armonk, NY, USA) was used for the data analysis. The “ggplot2” library was used in R ver. 3.5.0 (R Foundation for Statistical Computing, Vienna, Austria) for Bland-Altman plots. The authors presented the quantitative data as the mean ± standard deviation (SD). The normal distribution of the data was evaluated with the Shapiro-Wilks test. Since the data was not normally distributed, the nonparametric Friedman test was used to compare three or more repeated measurements. Following Friedman test, post hoc tests were performed as multiple comparison tests to determine the difference between groups. The results were received according to the sequence of the tests carried out: STo (S1, S3, S5) and STc (S2, S4, S6). Following the results, they were classified one by one to determine the intraclass correlation coefficient (ICC) of the two tests. These numerical amounts were computed from the estimates of within- and between-subject errors associated with the analysis of variance using Bonferroni correction. With the purpose to ascertain the agreement between the test-retest reliability of the STo and STc measurements, ICC (two-way mixed model with consistency type) with its 95% confidence interval (CI) and Bland-Altman plots with the 95% limits of agreement (LoA; defined as mean difference ±1.96 SD) were employed. The ICC values were interpreted as follows: poor reliability (<0.5); moderate reliability (0.5-0.75); good reliability (0.75-0.9); and excellent reliability (>0.9). A p-value less than 0.05 was considered significant.

Results

Sixty-two eyes of 31 subjects, the mean age of 45.93 ± 15.93 years (range, 18-69 years), were included in this study. Twenty-four of the participants were female, and seven were male. There was no significant difference in the ages between male and female patients (p = 0.085).

The comparison between the STc and STo

On each visit, STc scores were lower than STo scores to the greatest extent and were found to be statistically significant (Table 1). When examining the outcomes of STo and STc belonging to right and left eyes at each visit, statistically significant difference in the scores of the three tests were found: for STo within right eyes, S1 and S3 (p > 0.999), S1 and S5 (p > 0.999), and S3 and S5 (p > 0.999); for STo within left eyes, S1 and S3 (p = 0.856), S1 and S5 (p > 0.999), and S3 and S5 (p = 0.984); for STc within right eyes, S2 and S4 (p > 0.999), S2 and S6 (p > 0.999), and S4 and S6 (p > 0.999); and for STc within left eyes, S2 and S4 (p > 0.999), S2 and S6 (p = 0.260), and S4 and S6 (p = 0.580). There was no statistically significant difference between both interexaminer and intraexaminer measurements when STo and STc were compared separately for the right and left eyes. The p-value was above 0.05 in all measurements.

The intercorrelation of STc and STo

The ICCs and their 95% CIs are shown in Table 2. The ICC values for STc (S2-S4, S2-S6, and S4-S6) were 0.622, 0.599, and 0.744 respectively, whereas the ICC values for STo (S1-S3, S1-S5, and S3-S5) were 0.694, 0.698, and 0.837, respectively.

The correlation between STc and STo

The ICC values for STo and STc (S1-S2, S1-S4, S1-S6; S3-S2, S3-S4, S3-S6; and S5-S2, S5-S4, S5-S6) were in the range of 0.270 to 0.556. All these comparisons between STo and STc had a statistically significant agreement (p < 0.001) except S2-S3 (p = 0.006) and S2-S5 (p = 0.016). These tests were performed by two different examiners: the doctor performed S2, while the nurse performed S3 (first visit) and S5 (second visit). These tests also had a poor correlation (ICC, 0.314 and 0.270, respectively). The ICC of all the other tests performed by the same examiners for STo and STc was around 0.5.

Interexaminer (reproducibility) and intraexaminer (repeatability) reliability

The ICC values for interexaminer (doctor versus nurse) reliability were 0.694 (S1-S3) and 0.622 (S2-S4). On the other hand, the ICC values for intraexaminer (nurse) reliability were 0.837 (S3-S5) and 0.744 (S4-S6).

The agreement between STo and STc

Fig. 1A and 1B present the Bland-Altman plots which investigate the interexaminer agreement of STo and STc. The mean of the differences and ±1.96 SD of these differences between the parameters are shown with the lines. In both plots, the measurement differences show a random distribution around zero. The 95 % LoA was slightly narrower between the STo (doctor versus nurse) than the STc (doctor versus nurse). They were −14.98 to 14.34 and −17.42 to 15.68, respectively. Meanwhile the two cases were not within the LoA for STo, and the four cases were not within the LoA for STc in the plots.

Fig. 1C and 1D present the Bland-Altman plots which investigate intraexaminer agreement of STo and STc. The mean of the differences and ±1.96 SD of these differences between the parameters are shown with the lines. In both plots, the measurement differences show a random distribution around zero. The 95 % LoA was slightly narrower between the STo than the STc. They were −9.91 to 10.83 and −15.08 to 13.05, respectively. The two cases were not within the LoA for both STo and STc in the plots.

Discussion

The ST is still an important diagnostic test for the determination and evaluation of dry eye patients. The low cost, convenient accessibility, and simplicity of ST make itself the most frequently used screening test in everyday practice for the estimation of tear production under-examination patients [10,14]. However, while ST was originally described with open eyes [5], the results of practicing the test with closed eyes have been published by many authors [15,16]. As far as we are aware, the current study is the first to make comparisons between the repeatability and reproducibility of the two ST without anesthesia.

In our study, it was revealed that mean STc scores (14.74, 15.61, and 16.62 mm, respectively) were lower than the STo scores (23.43, 23.75, and 23.35, respectively) in general at each visit and there was a statistically significant difference (p < 0.001) between the results of two tests (Table 1). The fact that these values are all in the normal range may suggest that the patient cohort does not have dry eyes. Whereas objective clinical signs often conflict with patient-reported symptoms [3]. A positive diagnosis of dry eye disease is often based heavily on the presence of symptoms, with the literature suggesting that symptoms are an essential component of the disease [1]. The results of our study were consistent with previous studies [8,9], which report lower Schirmer scores in the closed eye circumstance than those in the open eye circumstance. Closing the eyes during ST may reduce the rate of blinking, ocular irritation due to eye movements over the paper strip, or the impact regarding outside circumstances such as temperature, evaporation, and humidity. This one after another diminishes excess reflex tearing, which is a leading factor that endangers the reliability of ST [17]. Also, corneal sensitivity to different stimulus modalities (mechanical, thermal, and chemical, etc) were shown to significantly reduce in patients with dry eye when compared with age-matched normal subjects [18]. When the test was applied eyes closed, the physical characteristics of the stimulus delivered to eyes decreased as well and caused lower readings that might be thought to affect the reliability. The lower STc interpretations of the results, which were found statistically significant in the current study, confirm this decrease in reflex tearing while the eyes are closed throughout ST. When the results of the three tests performed with the eyes open and closed were compared with each other, there was no statistically significant difference between them (S1-S3, S1-S5, S3-S5, S2-S4, S2-S6, S4-S6; p < 0.05 for all measurements).

The ICCs estimated for STc and STo among themselves in the current study were found to be strong correlation. On the other hand, the ICC value was found to be highly statistically significant (ICC > 0.6 and p < 0.001 in all comparisons) in all measurements made in the categories of both STo (S1-S3, S1-S5, S3-S5) and STc (S2-S4, S2-S6, S4-S6) (Table 2). Our results were inconsistent with Serin et al. [8]; however, only healthy patients without any symptoms were included in their study. In our study, we applied the ST to individuals with complaints that may be caused by dry eye. When STc and STo measurements were compared with each other, even though a weak correlation was found between STc and STo, except for two of these correlations (S2-S3, p = 0.006; S2-S5, p = 0.016), the others were statistically significant (ICC < 0.6 and p < 0.001 in all comparisons) (Table 2). The aforementioned tests performed by two different examiners, S2 performed by the doctor and S3 and S5 performed by the nurse, also had a poor correlation (ICC, 0.314 and 0.270, respectively). The ICC value of the ST scores showed a weak correlation, but it was demonstrated in our study that there was a statistically significant correlation in these values. Despite the significant correlation, it was observed that the values found were low when the clinical practice was considered. Obtaining different scores even in consecutive tests suggested that a single ST score might be misleading in the clinic. Therefore, the ST score should be evaluated together with other findings and tests in the diagnosis of dry eye.

Although ST is used commonly in the daily ophthalmology practice, the inadequacy of accuracy and repeatability have been noted [13,17]. In spite of the fact that any reading under 10 mm is recognized abnormal in general, according to other ophthalmologists, this test is an acceptable diagnostic tool only for severe dry eyes because of its moderate reproducibility [19], with values of less than 5 mm are being taken into account as significant by many practitioners [6]. In the current study, the reproducibility (doctor versus nurse: S1-S3, 0.694 and S2-S4, 0.622) was found to have a level of acceptable reliability. Similar results were observed for intraexaminer reliability, that is, the repeatability (nurse versus nurse: S3-S5, 0.837 and S4-S6, 0.744) was found to have a level of acceptable reliability for the ST as well. Besides, the reproducibility (0.694 versus 0.622) and repeatability (0.837 versus 0.744) of STo were found to be slightly higher than STc in the study (Table 2).

The Bland-Altman plots showed similar results with ICC in reliability. Intraexaminer reliability was higher than interexaminer reliability. The 95% LoA was slightly narrower between the STo than the STc in the plots. The measurements within the LoA were lower in STo. According to the Bland-Altman plots as in ICC, measurements of STo were slightly more reliable than STc’s for both intraexaminer and interexaminer estimations (Fig. 1). In our study, the deduction that the interexaminer reliability was lower than the intraexaminer reliability indicated that it would be appropriate for these tests to be performed by the same examiner both on repeated measurements on the same day and follow-up visits. The lower repeatability of the ST was one of the most crucial factors that reduced test reliability.

The greatness in the value of reliability in intraexaminer STs than interexaminer’s was consistent with Lee et al. [13]. They suggested that there is a significantly higher error in examinations by separate examiners than repeated examinations by one examiner [13]. The factors like the paper’s contact with the eyelashes for a long period (5 minutes), the change in light or other environmental parameters (temperature, etc.), a reduction in reflex tearing in the second visit, test anxiety (produced more reflex tearing on the first test), or the disease status between visits were thought over to clarify the large inconsistencies in the reported repeatability of the ST [4,17].

The limitations of our study were a relatively small number of participants owing to the strict criteria (e.g., absence of ocular allergy and drug use, no contact lens use, previous eye surgery, etc.) and the tiresome nature of the study. Despite the limitations, the current study has shown that administering the ST in patients with eyes open produced higher readings than ST with eyes closed. Considering that the study was conducted on individuals with complaints of dry eye, we would like to emphasize that the ST scores performed in patients with eyes closed seemed more realistic. But we would like to note that further studies should be conducted on which method will give a more accurate result in the clinical practice.

Although the correlations of STo and STc were found to be statistically significant, the correlation values were relatively low in clinical experience. The different outcomes even in consecutive tests suggested that a single ST score might be misleading in the clinical practice. For this reason, the ST scores should be interpreted together with other clinical findings and tests for the correct diagnosis in dry eye.

Moreover, intraexaminer reliability (reproducibility) was found to be higher than interexaminer reliability (repeatability) in the present study. Our findings propose that the ST is recommended to be performed by the same examiner in repeated measurements on the same day and follow-ups visits.

Test	STo	STc	p-value
1 (Doctor)	23.43 ± 9.76	14.74 ± 9.80	<0.001^*
2 (Nurse, first visit)	23.75 ± 9.35	15.61 ± 9.62	<0.001^*
3 (Nurse, second visit)	23.35 ± 8.69	16.62 ± 10.42	<0.001^*

ST	S1	S2	S3	S4	S5	S6
S1	1	0.450 (0.228-0.628)	0.694^* (0.538-0.803)	0.474 (0.256-0.646)	0.698^* (0.544-0.806)	0.470 (0.251-0.643)
p-value	-	<0.001	<0.001	<0.001	<0.001	<0.001
S2	0.450 (0.228-0.628)	1	0.314 (0.072-0.521)	0.622^* (0.442-0.754)	0.270 (0.024-0.486)	0.603^* (0.412-0.738)
p-value	<0.001	-	0.006	<0.001	0.016	<0.001
S3	0.694 (0.538-0.803)	0.314 (0.072-0.521)	1	0.535 (0.331-0.692)	0.837^* (0.743-0.899)	0.491 (0.276-0.659)
p-value	<0.001	0.006	-	<0.001	<0.001	<0.001
S4	0.474 (0.256-0.646)	0.622 (0.442-0.754)	0.535 (0.331-0.692)	1	0.567 (0.372-0.715)	0.744^* (0.608-0.837)
p-value	<0.001	<0.001	<0.001	-	<0.001	<0.001
S5	0.698 (0.544-0.806)	0.270 (0.024-0.486)	0.837 (0.743-0.899)	0.567 (0.372-0.715)	1	0.556 (0.357-0.706)
p-value	<0.001	0.016	<0.001	<0.001	-	<0.001
S6	0.470 (0.251-0.643)	0.599 (0.412-0.738)	0.491 (0.276-0.659)	0.744 (0.608-0.837)	0.556 (0.357-0.706)	1
p-value	<0.001	<0.001	<0.001	<0.001	<0.001	-