Keywords

1 Introduction

The number of older adults who use the Internet is rapidly growing, and older adults are the fastest growing group of Internet users [15]. The Internet has the potential to help older adults (e.g., as a source of information, education, social support), yet at the same time, older adults may experience information overload and challenges when trying to find pertinent information. Some cognitive declines commonly occur with age, and various aspects of cognition are crucial to successfully navigating Websites. For example, spatial skills, short-term memory, processing speed, working memory, and sustained attention, which are all well known to decline with age [611], are involved in navigating Websites [12]. Website designers often do not take into account the cognitive limitations of older users. Thus, many older Internet users end up frustrated and dissatisfied [1, 1315]. Additionally, UX researchers often do not take into account the cognitive capabilities of research participants in general, which may lead to inaccurate assumptions about the usability of a product [16]. At present, there is little empirical evidence about the relationship between age-related cognitive differences and Internet performance.

The purpose of the present study was to examine the relationship between age-related decline in cognition and Internet performance. Younger, middle-age, and older adults completed a standard battery of cognitive tasks to assess processing speed and vocabulary (a backward counting task, the WAIS Digit Symbol Coding task, and the Shipley Vocabulary test), which represent fluid and crystallized ability, respectively, and have been shown to be predictive of computer use [17]. Participants also completed a computer and Internet experience questionnaire, five information-finding tasks on a US Census Bureau Website, and a satisfaction questionnaire. Performance on the Website was measured by accuracy (percent of tasks completed successfully), efficiency (mean time to complete tasks) and self-rated satisfaction, three commonly used usability metrics [18].

We hypothesized that: (1) There would be age-related differences in performance; (2) There would be a linear age-related difference such that middle-age adults would not differ in performance compared to younger and older adults; (3) Older adults with higher cognitive function would perform similar to younger adults, and older adults with lower cognitive function would perform worse than younger adults and their higher-functioning counterparts; (4) Older adults with lower cognitive function would report lower satisfaction with the Website.

2 Method

2.1 Participants

Twenty-one younger adults (6 males, 15 females), 19 middle-age adults (9 males, 10 females) and 21 older adults (9 males, 12 females) participated in the study. Participants were residents of the metropolitan Washington DC area and were recruited via advertisements in local newspapers or through a database that is maintained by the US Census Bureau’s Human Factors and Usability Research Group. Participants were given a $40 honorarium. They completed a questionnaire about their computer use and Internet experience and completed tasks that measured processing speed and vocabulary. All participants reported being experienced with computers and the Internet but unfamiliar with the Website used in this study. There was no age-related difference in reported difficulty in using the Internet, but older adults reported greater difficulty in learning to use new Websites, compared to younger and middle-age adults. See Table 1 for participants’ self-reported demographics and cognition scores.

Table 1. Mean (and range) demographics and cognition scores by age group

2.2 Procedure

Participants came to the Human Factors and Usability Research Group’s laboratory at the US Census Bureau headquarters to participate in the study. Each participant sat individually in a 10′ × 12′ room, facing one-way glass and a wall camera.

The test administrator (TA) explained the study, and the participant read and signed a consent form. Participants completed a backward counting task, the WAIS Digit Symbol Coding task [19] and the Shipley Vocabulary test [20], each of which are detailed below. The TA left the room—the participant and the TA sat in separate rooms during the remainder of the session and communicated via microphones and speakers. The TA began video recording from the opposite side of the one-way glass.

Participants worked on five pre-determined information-seeking tasks on the American FaceFinder (AFF)Footnote 1 Website. AFF is the Census Bureau’s primary data dissemination Website about the population, housing, and economy of the United States (see Fig. 1). Participants completed typical tasks for general users of the Website, and all completed tasks in the order of easiest to hardest to accomplish (determined by the complexity and number of steps needed to find the information). The first two tasks we categorized as easy; they required 2 and 3 steps, respectively, and the final three tasks we categorized as hard; they required 4, 4, and 6 steps, respectively. (For a list of tasks, see [21]. Participants read each task aloud then used the Website to locate the information (while working silently), and they stated their answer aloud when they felt they had found the correct answer. After the participant completed each task, the TA loaded the main page of the site, and the participant proceeded with the next task. At the conclusion of the study, participants completed a final satisfaction questionnaire and answered debriefing questions from the TA.

Fig. 1.
figure 1

AFF Website home page

Most participants attempted to complete all five tasks. After 7 min working on the task, the TA asked the participant if they felt they were close or far from the answer and if they wanted to move on. If the participant wanted to move on, the task ended, the TA loaded the main page, and the participant began the next task. These tasks, as well as tasks participants gave up on without the TA prompt, were classified as passes, and when calculating the accuracy, they were scored as 0. Seven percent of the tasks for younger adults, 18 % of the tasks for middle-age adults and 31 % of the tasks for older adults were passes.

2.3 Usability Metrics

We assessed accuracy, efficiency, and satisfaction. For our dependent measures, accuracy was calculated as the percent of users who successfully completed the task, efficiency was calculated as the time it took participants to complete each successfully completed task, and satisfaction was calculated as the participants’ ratings for the 11 items on the satisfaction questionnaire (ratings of 1:low – 9:high). We then averaged the accuracy, efficiency and satisfaction scores across tasks for each participant (for accuracy and efficiency) and across participants in each age group (for accuracy, efficiency and satisfaction). We examined the relationship among age, cognition and Website performance (usability metrics).

2.4 Cognitive Metrics

We assessed processing speed with two tasks: a backward counting task and the Digit Symbol Coding task [19]. For the backward counting task, participants were given 30 s to count backward from 100, by ones, as quickly as possible. The score is the number that participants get to (i.e., lower number = higher score). For the Digit Symbol Coding task, participants were given a sheet of paper with seven lines of 20 number-box combinations (the first seven were practice trials). At the top of the paper, there was a key in which the numbers 1–9 were paired with a unique abstract symbol. Participants were required to fill in the boxes with the corresponding symbols, in order, as fast as they can, for two minutes. The score is the total number of correctly filled-in boxes.

We assessed verbal ability (i.e., vocabulary) with the Shipley’s Institute of Living Scale [20]. The test was administered on paper and featured 40 items in which the first word was printed in capital letters, and four words were opposite it. Participants were instructed to circle the one word (of the four) “that means the same thing, or most nearly the same thing, as the first word,” and an example was provided. The score is the sum of correctly identified words.

3 Results

We asked the following research questions:

  1. 1.

    Are there age-related differences in performance (accuracy, efficiency, satisfaction)?

  2. 2.

    Do middle-age adults differ in performance compared to younger and older adults?

  3. 3.

    Do older adults with higher cognitive function perform differently than older adults with lower cognitive function?

  4. 4.

    Do older adults with higher cognitive function perform similar to younger adults on the performance metrics?

  5. 5.

    Do older adults with lower cognitive abilities report lower satisfaction than older adults with higher cognitive abilities?

First we examined accuracy, efficiency and satisfaction across age groups, overall. We conducted one-way ANOVAs comparing age group and found an age-related difference for accuracy, F(2,58) = 2.95, p = .06. Planned two-tailed t-tests confirmed no difference in accuracy between middle-age and older adults (p = 0.28) and no difference between middle-age and younger adults (p = 0.24). However, there was a significant difference between younger and older adults (p = 0.01) such that younger adults had higher accuracy than older adults. For efficiency and satisfaction, there were no age-related differences (p = 0.36 and p = 0.26, respectively). See Table 2.

Table 2. Mean (and range) performance by age group

Next we examined each cognitive measure and whether varying levels of cognitive function impact performance. For each cognitive measure, we split the data into three parts: upper, middle, and lower thirds. We then compared the “high-functioning” older adults, the “low-functioning” older adults and the younger adults.

Processing Speed. First we examined the backward counting task. Seven older adults were classified as high-functioning (HF), with scores between 43 and 60 (\( \bar{x} \) = 53), and seven were classified as low-functioning (LF), with scores between 72 and 84 (\( \bar{x} \) = 76). Younger adults’ scores were between 34 and 75 (\( \bar{x} \) = 51). A one-way ANOVA revealed a significant group difference, F(2,32) = 20.39, p < 0.0001. Tukey HSD post hoc test revealed a significant difference between HF and LF older adults (p < 0.01), a significant difference between LF older adults and younger adults (p < 0.01), and no significant difference between HF older adults and younger adults.

We compared accuracy by age group and found a significant difference between HF older adults, LF older adults and younger adults, F(2,32) = 3.58, p < 0.05. However, Tukey HSD post hoc test revealed no significant difference in accuracy between HF and LF older adults, between HF older adults (\( \bar{x} \) = 50 %) and younger adults (\( \bar{x} \) = 62 %), and between LF older adults (\( \bar{x} \) = 32 %) and younger adults. Thus, older adults with slower processing speed completed fewer tasks successfully compared to younger adults, and older adults with faster processing speed did not differ from younger adults. See Table 3.

Table 3. Mean (and range) performance by age group: backward counting

We next compared efficiency and satisfaction by age group. For efficiency, we found no significant difference between HF older adults (\( \bar{x} \) = 228 s), LF older adults (\( \bar{x} \) = 357 s) and younger adults (\( \bar{x} \) = 160 s), F(2,30) = 2.05, p = 0.15. Similarly, for satisfaction, we found no significant difference between HF older adults (\( \bar{x} \) = 5.79), LF older adults (\( \bar{x} \) = 5.34) and younger adults (\( \bar{x} \) = 6.34), F(2,32) = 1.34, p = 0.28.

Next we examined the Digit Symbol Coding task. Six older adults were classified as HF, with scores between 64 and 91 (\( \bar{x} \) = 72), and six were classified as LF, with scores between 24 and 38 (\( \bar{x} \) = 34). Younger adults’ (N = 19) scores were between 48 and 116 (\( \bar{x} \) = 79). A one-way ANOVA revealed a significant group difference, F(2,28) = 19.59, p < 0.0001. Tukey HSD post hoc test revealed no difference between HF older adults and younger adults. However, there was a significant difference between LF older adults and younger adults (p < 0.01) and between HF and LF older adults (p < 0.01),

We next compared accuracy by age group and found a significant difference between HF older adults, LF older adults and younger adults, F(2,30) = 5.08, p = 0.01. Tukey HSD post hoc test revealed no significant difference in accuracy between HF older adults and younger adults and between LF older adults and younger adults. However, we found a significant difference between HF older adults and LF older adults (p < 0.05). Thus, for this measure, LF older adults completed fewer tasks successfully than HF older adults. See Table 4.

Table 4. Mean (and range) performance by age group: digit symbol coding

We next compared efficiency and satisfaction by age group. Consistent with the backward counting measure, we found no significant difference in efficiency between HF older adults, LF older adults, and younger adults, F(2,28) = 1.63, p = 0.21. Similarly, we found no significant difference in satisfaction between HF older adults, LF older adults and younger adults, F(2,30) = 1.80, p = 0.18.

Verbal Ability. Next we examined the Shipley’s Institute of Living Scale. Seven older adults were classified as HF, with scores between 35 and 40 (\( \bar{x} \) = 38), and seven were classified as LF, with scores between 22 and 29 (\( \bar{x} \) = 27). Younger adults’ scores were between 22 and 38 (\( \bar{x} \) = 31). A one-way ANOVA revealed a significant group difference, F(2,32) = 16.55, p < 0.0001. Tukey HSD post hoc test revealed a significant difference between HF and LF older adults (p < 0.01), a significant difference between HF older adults and younger adults (p < 0.01), and no significant difference between LF older adults and younger adults.

We next compared accuracy by age group and found a significant difference between HF older adults, LF older adults and younger adults, F(2,32) = 3.57, p < 0.05. However, Tukey HSD post hoc test revealed no significant difference in accuracy between HF and LF older adults, between HF older adults and younger adults, and between LF older adults and younger adults. See Table 5.

Table 5. Mean (and Range) Performance by Age Group: Vocabulary

We next compared efficiency and satisfaction by age group. As with the processing speed measures, we found no significant difference in efficiency between HF older adults, LF older adults and younger adults, F(2,30) = 0.78, p = 0.47. For satisfaction however, we found a significant difference among the groups, F(2,32) = 4.53, p < 0.05. Tukey HSD post hoc test revealed a significant difference in satisfaction between HF and LF older adults (p < 0.01), and no significant difference between HF older adults and younger adults, and no significant difference between LF older adults and younger adults. Thus, older adults with higher vocabulary reported lower satisfaction than their lower vocabulary counterparts.

4 Conclusion

In this empirical study, we observed age-related differences in Website performance. Consistent with other research [e.g., 22, 23 (study 3), 24] and with our first hypothesis, older adults had lower accuracy than younger and middle-age adults. Further, middle-age adults’ performance did not differ from either younger adults or older adults, but older and younger adults differed. This may suggest an age-related linear decline that is consistent with previous literature on age-related decline in cognition, in general [11, 25] and is in-line with our second hypothesis. However, the difference may also be due to strategy and experience differences [24], as middle-age adults often have a wider range of both compared to younger and older adults – the “extreme” groups. We found this difference only for the accuracy measure and not for the efficiency and satisfaction measures.

When we split our older adults into “high-functioning” and “low functioning” based on the backward counting task performance (processing speed), we found an age-related difference. However, our post hoc analysis did not reveal significant differences among the groups, even though there was a trend for older adults with slower processing speed to complete fewer tasks successfully compared to younger adults (32 % and 62 % respectively), and for older adults with faster processing speed (50 %) to not differ from younger adults. We found a similar pattern for the Shipley’s Vocabulary test: There was an overall age-related difference and a trend for older adults with lower vocabulary to complete fewer tasks successfully compared to younger adults, and older adults with high vocabulary to not differ from younger adults. We believe we did not find significance in the post hoc tests because we did not have sufficient power; the older adult groups were drastically reduced in size when we split them into three (high, middle, lower ability). For the Digit Symbol Coding task (processing speed), we also found a significant age-related difference. For this task, post hoc analysis revealed a significant difference between high-functioning and low-functioning older adults only, such that low-functioning older adults completed fewer tasks successfully than their high-functioning counterparts.

These findings suggest that cognitive abilities may play a role in older adults’ ability to successfully complete tasks (our third hypothesis), consistent with research on technology use more generally [17]. Here, higher cognitive abilities appeared to mediate differences that are typically apparent with aging users. We did not find these differences in efficiency. For satisfaction, in contrast with our fourth hypothesis, we found that older adults with higher vocabulary reported lower satisfaction with the Website compared to their lower vocabulary counterparts. We speculate that HF adults may have higher expectations for the Website, and it may not have met their expectations whereas LF adults may have lower expectations for the Website and hence may be more satisfied with the site, as it may have exceeded their expectations. In addition, we speculate that LF adults may have a tendency to blame themselves rather than the Website for any issues they may have encountered with the site (based on anecdotal evidence from the lab). Future work should seek to understand what participants specifically consider when rating satisfaction and why this might differ by cognitive ability.

In this study, we assessed the mean performance across all tasks. However, previous research shows that older adults have greater difficulties with more difficult tasks, compared to younger adults [21, 26, 27]. Future research should couple cognitive measures with task difficulty to understand the multitude of factors that lead to successful user experiences for older users as well as other users with either reduced cognitive capabilities or limited Internet experience.

This study is the first to demonstrate that cognitive ability may play a role in older adults’ ability to complete tasks successfully. User experience researchers should include such measures to understand participants’ cognition and ensure that people with varying levels of processing speed and vocabulary are included in research. Otherwise, the assumptions we make about the usability of a product may be invalid.

In conclusion, the key findings from this study are that (a) a Website designed for a broad population of users posed problems for people of all ages, and participants could not always find what they were searching for; (b) the difficulties were more pronounced for older adults with lower levels of processing speed and verbal abilities. The design implications are clear – user experience testing must include users of all ages and the use of the site should not impose demands on processing speed and verbal ability.