FormalPara Key Points for Decision Makers

The coronavirus disease 2019 (COVID-19) pandemic caused researchers to consider the use of videoconferencing interviews to collect cTTO data. There is little evidence about the quality of cTTO data from videoconferencing interviews in comparison with face-to-face interviews.

We have provided insights into the performance of videoconferencing interviews in comparison with face-to-face interviews, in terms of interviewer and respondent engagement.

No evidence suggests that the quality of cTTO data decreases when using videoconferencing compared with face-to-face interviews.

1 Introduction

Time trade-off (TTO) is one of the most widely used preference elicitation methods for valuing health states. It is a matching task for obtaining utility values, which can be operationalised in numerous ways. Some of the main versions of TTO that are being used have been well-described elsewhere [1,2,3]. In general, in a TTO task, respondents are asked to make trade-offs between the length of life and quality of life. The goal of the TTO is to obtain answers from respondents on how much time they would be willing to give up (i.e., trade off) to avoid having health impairments; the worse the health impairments become, the more time respondents are willing to give up. As such, the amount of time given up can be used as a measure for the severity of the health impairment. In TTO tasks, respondents are typically asked to answer a series of questions wherein they are forced to choose between a longer life with health impairments and a shorter life in full health. An iterative sequence is used to vary the amount of time shown in the life with full health until a point of indifference is reached. Respondents might be inclined to view very severe health impairments as ‘worse-than-dead’. TTO typically includes a worse-than-dead adapted task.

It is assumed that TTO tasks are not easy to understand for respondents. This is also the case for the composite TTO (cTTO), the version adopted by the EuroQol Group to elicit values for the EQ-5D-5L and EQ-5D-Y-3L instruments [4,5,6]. cTTO tasks can be abstract and confrontational, given the trade-offs between the length of life and quality of life. In addition, the task itself is complex, with the iterative procedure and separate tasks for ‘better-than-dead’ and ‘worse-than-dead’, which is why face-to-face interviews have traditionally been used for the collection of cTTO data and why the EuroQol Group has developed interviewer scripts used in all EQ-5D-5L valuation studies [7]. Face-to-face cTTO interviews have been shown to be reliable and to yield high-quality data [5, 7]. However, online self-completed TTO experiments have already been conducted. The results of one study have shown that online self-completed cTTO interviews induce a systematic downward bias [8]. Another study reported that TTO tasks conducted online and read by an automated voice had more inconsistencies and decreased engagement [9]. Thus, given the complexity of a cTTO task, the role of the interviewer is crucial. More specifically, the interviewer’s role consists of demonstrating the task to the respondent by using examples (being in a wheelchair in the EQ-5D valuation studies). The interviewer should explain all elements of the task and show how to perform it in terms of the iterative procedure used to vary the time given up when the respondent makes their own choices. In addition, the interviewer should show both the ‘better-than-dead’ and ‘worse-than-dead’ sides of the task. For these reasons, in a cTTO valuation study, it is important that both interviewers and respondents are properly engaged to gain high-quality data [7].

The coronavirus disease 2019 (COVID-19) pandemic has elicited a need to innovate in terms of how to conduct these interviews, as the face-to-face approach became infeasible due to social restrictions imposed in many jurisdictions. The social distancing measures made researchers consider using videoconferencing software (e.g., Skype or Zoom) as an alternative mode of administration to collect the necessary cTTO data. The videoconference-based administration approach was seen as having the potential to produce cTTO data with similar quality to that from traditional face-to-face interview methods, as an interviewer would be present to explain and guide the cTTO exercises. As far as we are aware, only Lipman [10] has discussed the advantages and disadvantages of cTTO interviews conducted by video conference, in comparison with face-to-face interviews, using empirical data from a study in The Netherlands. Lipman reported that a videoconferencing approach would have several advantages over a face-to-face approach, including greater geographical reach, additional convenience to respondents, lower study cost, and more rapid data collection. However, video conference-based collection may be affected by selection bias, such as the differential internet access among segments of the population [11], as well as the experience and ease of speaking over a computer. The conclusions of Lipman are limited to one interviewer and one country, thus limited evidence about the performance of videoconferencing interviews is available.

The aim of this manuscript was to provide insights into the performance of videoconferencing interviews in comparison with the performance of face-to-face interviews, based on interviewer and respondent engagement in cTTO task under each approach.

2 Methods

2.1 The EQ-5D-Y-3L Valuation Protocol

The data used for this study were collected as part of the national EQ-5D-Y-3L valuation studies in Belgium and Spain. Both studies used the EQ-5D-Y-3L instrument, which consists of five dimensions (mobility, looking after oneself, usual activities, pain/discomfort, and worry/sadness/unhappiness) with three levels each: ‘no problems’, ‘some/a bit’ and ‘a lot/very much’, usually coded from 1 to 3. In this paper, a health state ‘profile’ describes a health state by the levels of its five dimensions. For example, ‘22222’ means some problems or a bit of problems occur in all five dimensions, and ‘12222’ means no problems occur in mobility and some problems or a bit of problems occur in the other four dimensions. The protocol used in both countries was the cTTO component of the international EQ-5D-Y-3L valuation protocol [6]. This protocol also included an online DCE component; however, the focus of our analysis is only the cTTO side. Briefly, the protocol recommends the use of a face-to-face cTTO to obtain preferences for EQ-5D-Y-3L states in a sample of 200 respondents. The main cTTO tasks included 10 health states: 3 mild, 2 moderate, and 5 severe, as suggested by the EQ-5D-Y-3L valuation protocol (Mild: 11112, 11121, 21111; Moderate: 22223, 22232; and Severe: 31133, 32223, 33233, 33323, 33333). All health states were valued from the perspective of a 10-year-old child.

A cTTO task involves a series of choices between two alternative lives: life A, describing full health for t years; and life B, describing an impaired health state for 10 years, where t is ≤ 10 years (Fig. 1a). To allow for worse-than-dead preferences, the task is adapted so that life B now consists of 10 years in full health (so-called lead time: LT) followed by 10 years in the impaired health state. The trade-offs are then made between t years in life A (where t is again ≤ 10 years) and life B with a total of 20 years, 10 of which are spent in the impaired health state (Fig. 1b). The cTTO uses an iterative sequence to guide the respondent to his point of indifference between living life A or life B (i.e., valuing life A and life B the same) by varying t based on the answers provided by the respondent (Fig. 1c).

Fig. 1
figure 1figure 1

cTTO task example. a Better than dead side; b worse than dead side; c iteration procedure. The iteration procedure used to vary t is described elsewhere [12], but briefly, it uses a ping-pong approach starting with t in 10 years (t = 10 years in full health = ‘Life A’ and 10 years in the impaired health state = ‘Life B’) and moving to t = 0 if a respondent chooses A. If the respondent then chooses B, t is increased to t = 5 (a) followed by 1-year increments/decrements or 6-month increments/decrements depending on respondent’s choices. If at t = 0 the respondent chooses A, the worse-than-dead side of the task is shown (b), where t = 10. If the respondent chooses A again, t is decreased to t = 5 followed by 1-year increments/decrements or 6-month increments/decrements depending on the respondent’s choices. Utilities shown in Fig. 1c for the impaired health states are calculated using t of the point of indifference: U = t/10 for states considered better-than-dead, and U = (t-10)/10 for states considered worse-than-dead. cTTO composite time trade-off

2.2 Composite Time Trade-Off (cTTO) Interview Structure

The cTTO interviews were structured as follows:

  1. (1)

    The information sheet stated the aims of the study and requested a participant’s consent. If a respondent did not provide consent, the survey was terminated and only showed a message thanking the respondent for considering participation.

  2. (2)

    Demographic questions asked about their geographical area, age, and sex to delimit quotas and ensure sample representativeness.

  3. (3)

    The self-reported EQ-5D-Y-3L instrument asked about their health and was administered as a warm-up task.

  4. (4)

    Three questions asked about their experience with illness.

  5. (5)

    Two adaptative wheelchair examples were presented to allow the interviewer to explain the cTTO task. Depending on whether the first wheelchair example was used to explain the better-than-dead or the worse-than-dead side of the task, the second wheelchair example was adapted to explain the other side.

  6. (6)

    Three practice states were provided, to allow participants to practice alone before the main tasks.

  7. (7)

    The main 10 cTTO states (the same for all participants) were presented in random order.

  8. (8)

    The standard feedback module (FBM) was used as in prior EQ-5D-5L valuation studies [13]. Participants were presented with the rank ordering of the health states based on their 10 cTTO valuations and asked to exclude health state valuations they felt did not have the appropriate location in the ranking.

  9. (9)

    Additional background questions were asked about their educational level and whether respondents had children.

This interview was electronically implemented in the EuroQol Valuation Technology (EQ-VT), a web-based technology allowing for storage of all clicks and times between clicks that the interviewer used to demonstrate the cTTO wheelchair examples, or within the respondent’s choices in the practice and main health state tasks across the whole interview.

2.3 Sampling and Data Collection

The studies in Belgium and Spain were conducted independently, and both aimed to collect 200 cTTO interviews each, as suggested by the EQ-5D-Y-3L valuation protocol [6]. The face-to-face data collection was conducted in March 2020 in Spain and between August and October 2020 in Belgium, while the videoconferencing data collection was conducted between July and August 2020 in Spain and between October and December 2020 in Belgium. Both projects were set up to involve only face-to-face interviews, and recruitment was handled via two different market agencies.

Due to COVID-19 social distancing measures, the face-to-face data collection was interrupted at 123 interviews in Spain and 121 interviews in Belgium. Since respondents’ and interviewers’ health was still at risk, the research team agreed to try collecting the remaining interviews needed to reach the target sample size using videoconferencing software. In Spain, interviewers were inactive for 5 months (March 2020–July 2020) before restarting the data collection via Skype, while in Belgium, the remaining interviews were conducted via Zoom without stopping the data collection. Videoconferencing interviews were completed with audio and video connection and sharing of the interviewer’s screen. In both modes of administration, the interviewer operated the cTTO software, and the participants stated their preferences verbally.

All interviewers were trained for this specific study and monitored on a weekly basis, following the standard quality control procedure of the EuroQol Group [7]. A total of four interviewers with previous experience conducting cTTO interviews participated in the Spanish data collection; however, one interviewer (interviewer 4) did not participate in the initial face-to-face data collection. The interviews conducted by interviewer 4 were therefore withdrawn from our analysis. In Belgium, the same three interviewers conducted both face-to-face and Zoom-based interviews.

2.4 Metrics Definition

As described by the EuroQol standard quality control methodology [7], and more recently by the quality assurance programme [14], we measured both interviewer and respondent engagement as follows:

Interviewer engagement pertains to how well they explain the cTTO task to respondents, which is measured to ascertain whether the quality control procedure also worked in the videoconferencing environment. Therefore, the following were measured:

  1. 1.

    The amount of time (in seconds) on each cTTO wheelchair example. Short task duration explaining the task indicates poor engagement, as the proper explanation of a cTTO task requires some time. Based on previous studies [7], evidence suggests that about 5 min is necessary for this task.

  2. 2.

    The number of moves from the iterative procedure used to complete each cTTO wheelchair example. Few moves would mean poor engagement, as the way that t (i.e., years in life A) is varied by the EuroQol cTTO iteration procedure may be difficult to understand for respondents. Thus, several moves of the procedure must be shown to properly explain the task. Based on previous studies, evidence suggests that 30–40 moves are required.

  3. 3.

    The moves performed in the better-than-dead and worse-than-dead elements of the wheelchair examples. In order to learn whether both sides of the task were explained, we split the analysis of moves. Evidence suggests that the better-than-dead side is usually explained first, and that the split between moves shown for the better-than-dead task and worse-than-dead task should be about 75% and 25%, respectively [7].

To measure respondent engagement, we focused on the following parameters:

  1. 1.

    The amount of time taken in seconds to complete each TTO task (excluding wheelchair examples and practice cTTO tasks). Respondents require time to make multiple choices to reach their indifference point on the iteration procedure. Short task duration is associated with poor engagement, as speeding through the task or stopping early in the iteration procedure tends to generate imprecise responses. Evidence suggests that about 1 min is necessary for conducting a single cTTO task [7].

  2. 2.

    The proportion of values on specific responses of the iteration procedure. The specific responses are defined as those that only required a few moves before ending the task. More specific responses therefore mean lower engagement. Those specific responses corresponded to the following numbers: 1, 0.5, 0, − 0.5.

  3. 3.

    The proportion of responses in half-year units. To produce half-year-unit responses, respondents must make more of an effort than they must make to reach year responses; therefore, more half-year units were associated with higher engagement.

  4. 4.

    The proportion of negative values. Since the worse-than-dead side of the task was more difficult to understand than the better-than-dead side and also required more steps in order to arrive at a final response, respondents may be reluctant to value a health state as worse-than-dead. Thus, a lower proportion of worse-than-dead responses may be associated with poorer engagement.

Note that the respondent engagement was compared at the aggregate (i.e., interviewer) level and not at the individual respondent level. This is because a degree of variability between individual respondents due to differences in characteristics such as age is to be expected. However, when comparing at the interviewer level, it highlights whether interviewers were able to properly engage their respondents. To add clarity for points 2–4 of respondent engagement, we also looked at the value distribution for all responses. Additionally, a combined measure of both interviewer and respondent engagement is the duration of the whole interview. Shorter interview durations were associated with poorer engagement.

Furthermore, we looked at the face validity of the cTTO data and the interviewer’s effects on values. To determine face validity of the cTTO data, we used the proportion of respondents per interviewer that provided inconsistent values (e.g., providing a higher cTTO value for a logically worse health state). The proportion of respondents whose cTTO data contained at least one inconsistency before and after the FBM (removing responses that were flagged by respondents as not valid) was examined across all health states and for the worst EQ-5D-Y-3L health state. We used strict and weak criteria for calculating inconsistencies. To determine interviewer effects, we examined the value distribution of overall health states.

2.5 Statistical Comparison between Modes of Administration

The mode of administration of the interview (videoconferencing vs. face-to-face) was not randomised. Videoconference cTTO interviews were conducted later than face-to-face interviews in both countries; thus, interviewers were more experienced during the videoconferencing interviews. We assumed that the first 20 interviews conducted by each interviewer were the most affected by potential interviewers’ learning effects, based on previous experience in EQ-5D valuation studies [7]. To reduce the influence of potential interviewer learning effects in the mode of administration comparison, we split our sample into the following three groups: (1) the first 20 face-to-face interviews conducted by each interviewer; (2) the subsequent face-to-face interviews; and (3) the videoconferencing interviews.

Sample characteristics were reported using proportions in each subpopulation group categorised by the comparison groups defined above. Task durations and moves were reported using means and standard deviations. Histograms were used to report value distributions, and the proportions of specific values, half-year values, and negative values are reported in the corresponding figures. Finally, inconsistent respondents were reported using proportions. We used a Z-test for all proportion comparisons, while the unequal variance unpaired t-test was used for comparing task durations and number of moves on our statistical analysis. We reported statistically significant results with and without Bonferroni correction at a 95% confidence level.

3 Results

3.1 Descriptive Statistics

In Belgium, 218 cTTO interviews were conducted in total: 120 face-to-face and 98 via videoconferencing. In Spain, 16 interviews conducted by one interviewer who conducted only videoconferencing interviews were excluded. This resulted in 184 interviews in total, of which 123 were face-to-face and 61 were conducted via videoconferencing. The face-to-face interviews were divided into two groups: the first 20 interviews per interviewer and their subsequent interviews (60 and 63, respectively, in Spain; 60 and 60, respectively, in Belgium). The videoconference interviews were not split between first and subsequent interviews as the sample size was too small (61 in Spain and 98 in Belgium).

In Spain, videoconference interviews included a lower proportion of respondents with a university degree compared with the subsequently conducted face-to-face interviews (63.5% vs. 42.6%), whereas in Belgium there was an imbalance in the proportion of females, which was higher among the videoconference respondents (40% vs. 70.4%) (Table 1).

Table 1 Respondents’ demographic by mode of administration.

3.2 Interviewer Engagement

Both task duration and moves spent on the wheelchair example appeared higher for videoconferencing interviews in both countries or were not affected by mode of administration, as almost no significant results were found between videoconferencing interviews and subsequent face-to-face interviews (Table 2). As expected, learning effects were present in both studies, as shown by the significant difference in results when comparing first face-to-face interviews with subsequent face-to-face interviews (Table 2).

Table 2 Interviewer and respondent engagement

3.3 Respondent Engagement

The proportion of values on specific responses or in half-year units were not affected by the mode of administration in either of the two countries. The observed time per each main cTTO task was higher in videoconferencing interviews than in subsequent face-to-face interviews in Spain. In Belgium, the task durations and proportions of values on specific responses or in half-year units remained relatively unchanged between subsequent face-to-face interviews and videoconferencing interviews (Table 2). There was an observed increase of worse-than-dead values in Spain and Belgium in the videoconferencing interviews (Fig. 2), although neither was significant. Both the proportion of responses ending on specific responses and the proportion of responses ending in half-year responses remained stable in Belgium between the modes of administration, whereas this varied more in Spain (Fig. 2). Further analyses of respondent values by interviewer showed insignificant differences between interviewers in Spain (Fig. 3). Interviewer 1 increased the observed proportion of half-year units and interviewer 3 remained stable when comparing their results on subsequent face-to-face versus videoconferencing interviews. Respondents from interviewer 2 produced the lowest observed proportion of half-year units in the videoconferencing interviews. The results by mode of administration were similar across all interviewers in Belgium (Fig. 3).

Fig. 2
figure 2

cTTO value distribution for all health states. *There were not significant results at the 95% confidence interval from the proportions test. cTTO composite time trade-off

Fig. 3
figure 3

cTTO value distribution for each interviewer. *There were not significant results at the 95% confidence interval from the proportions test. cTTO composite time trade-off

Regarding interviewer/respondent engagement, the observed interview duration was higher for videoconferencing interviews in both countries when comparing subsequent face-to-face and videoconferencing interviews (Table 2); however, these results were significant only in Spain.

3.4 Face Validity and Value Distribution Interviewer’s Effects

The observed proportion of inconsistent respondents was lower for videoconferencing interviews compared with the subsequent face-to-face interviews in Spain. In contrast, this proportion was higher for videoconferencing interviews in Belgium. However, these observed differences were not statistically significant (Table 3).

Table 3 Proportion of flagged health states and inconsistent respondents by interviewer and group

4 Discussion

This manuscript reports on the quality of cTTO data, in terms of interviewer and respondent engagement, collected via videoconferencing interviews. More specifically, we report the insights from two EQ-5D-Y-3L valuation studies in Belgium and Spain. We have defined the metrics to measure engagement and we have reported the results of these metrics. None of the defined outcomes for measuring interviewer and respondent engagement has suggested that worse results occur for videoconferencing interviews when compared with face-to-face interviews. All observed results show similar or higher engagement (i.e., higher data quality) for videoconferencing interviews. As a result, the quality of cTTO data in these valuation studies were not affected by introducing videoconferencing interviews.

Other researchers have conducted similar research. Lipman [10] has reported findings in line with our results, suggesting that videoconferencing interviews are feasible and produce values that are similar to those obtained from face-to-face interviews. In our analysis, we have focused on other aspects of the comparison; namely, differences with respect to engagement instead of differences of elicited values. Since our study was not designed as an experiment to compare face-to-face and videoconferencing modes of administration, but rather as national EQ-5D-Y-3L valuation studies, the study design put limitations on this comparison. Given our sample size and the nature of our cTTO data, it was more appropriate to look at engagement, as our non-controlled factors together with the potential effect of the pandemic on the study population’s preferences could induce an unfair comparison of elicited values.

We did not control at the designing stage for other potential sources of variability; namely, interviewer’s learning effects, respondent demographics, and potential changes to health preferences due to the COVID-19 pandemic. All of these factors may limit the generalisation of our findings. We tried to reduce the influence of these factors by dividing face-to-face interviews (the one affected by learning effects) into two groups (the first 20 and subsequent interviews). As shown in the analysis, this adjustment in sample subgroups was important but its cost is a reduction of the power of the comparison, which may have further limited our capacity to detect potentially real differences in the mode of administration.

As expected, interviewer learning effects were present in the two countries but at a lower level in Belgium than in Spain. In Spain, interviewers were inactive for 5 months before restarting their videoconferencing interviews, and one interviewer was retrained before resuming the videoconferencing interviews; therefore, they may have lost practice during this period, meaning some learning effects could have occurred during videoconferencing interviews as well as during the face-to-face interviews. This may explain the small variations in results between the countries. However, testing specific learning effects in our cTTO data was not possible due to the small sample size. A recent EQ-5D-5L valuation study conducted by Finch et al. in Italy [15] was entirely completed using videoconference settings. Finch et al. reported initial quality control issues due to interviewers’ learning effects; however, it cannot be disentangled whether those learning effects stem from the cTTO task learning or the videoconferencing environment.

Other sources of potential variability stem from the fact that the sampling recruitment was designed for gaining a representative sample of the Belgian and Spanish population in the EQ-5D-Y-3L valuation studies, but not between our comparison subgroups. This means that the sampling age/sex/location cells were not homogenously filled out for face-to-face and videoconferencing interviews, which has produced several differences in the characteristics of the comparison samples. Samples homogeneity may also be affected by respondent’s IT skills as recruitment for videoconferencing was limited by its feasibility. Given our sample size, further subgroup analysis to obtain any meaningful results was not possible. While this may be a significant issue for comparing elicited values, it is far less of an issue for comparing engagement. Indeed, we found a higher proportion of negative values for videoconferencing interviews. These results could indicate that respondents are more willing to share views that may not be socially desirable in videoconferencing interviews, as suggested by Lipman [10]. In our case, we cannot disentangle whether this results from participants being more honest in revealing their preferences in online interviews; from differences in population health preferences caused by the impact of the COVID-19 pandemic, as suggested by Webb et al. [16], or simply from differences between samples’ characteristics.

cTTO videoconferencing interviews appear promising and have potential advantages, such as reducing costs or reaching population subgroups that are typically excluded from valuation studies. However, further research is needed on different aspects of how they compare with face-to-face interviews, as pointed out by Lipman [10]. This further research should use controlled experiments that assign the mode of administration randomly and use the respondent’s own perspective rather than a 10-year-old-child perspective. They can be conducted (1) between subjects (requiring large sample sizes); or (2) within-subject, using test–retest designs where the order of the mode of administration is randomised. In addition, collecting information on interview cancellations, which mode of administration respondents would have preferred if they had a choice, interviewers’ and respondents’ experiences, or the videoconferencing software used would be of interest.

5 Conclusion

When looking at interviewer and respondent engagement, no evidence suggested that the quality of cTTO data is reduced when using videoconferencing compared with face-to-face interviews.