Identifying usability and fun problems in a computer game during first use and after some practice
Introduction
Testing products with representative users is one of the core aspects of user-centred design. While it is possible to perform a test to determine quantitative measures like efficiency, effectiveness, and satisfaction (ISO, 1998), another common goal is to identify specific parts of a system that cause users trouble (Hertzum and Jacobsen, 2001) in order to improve the system. This is often called formative testing (Barnum, 2002). In this paper the term ‘user test’ refers to this latter practice of identifying problems by observing actual users using a product. The term ‘user test’ here does not refer directly to assessing the quantitative measures, although detecting and solving the problems in a product should eventually lead to increases in efficiency, effectiveness, and satisfaction.
Ideally, a user test will reveal as many aspects of a product as possible that cause users trouble and that need to be modified. However, research has shown that the set of identified problems depends both on the users taking part in the user test (Nielsen, 1994), and the evaluators analysing the data (Jacobsen, 1999; Hertzum and Jacobsen, 2001). Usually, products are tested with users who use these products for the first time, but it could be expected that there are differences in users’ behaviour when they have become more familiar with a product. This difference might also result in different sets of identified problems, because some problems may have been overcome while other problems may arise. Often there is a pattern in the kind of problems that is likely to have been overcome and the kind of problems that will arise when users become more familiar with a product (Prümper et al., 1992).
When performing formative evaluations in practice, not only detecting problems but also separating severe problems from insignificant problems is important because resources may not be available to correct all identified problems (Jacobsen, 1999). However, just like the numbers of certain types of detected problems may change, the overall severity estimations of problems may also change when users become more familiar with a game. Furthermore, the list of most important problems to fix might be different when testing first use or more familiar use.
Depending on the goal of the user test and the intended use of the product these differences in the numbers and severity of certain problem types could have consequences for the way the user test is organized. For user testing computer games with young children this seems especially relevant, because children often get help from their parents when playing a game (MPFS, 2003), or they play with somebody else (sibling, friend, parent) (Nikken, 2002). Therefore, it is likely that they will be able to overcome some of the usability problems easily while others may arise over time. Furthermore, computer games should be fun to play, not just the first time but preferably also after a while. Testing a game with children after they have become familiar with it, might give a better, or at least different idea of the problems that need to be fixed than testing its first use.
The study described in this paper examines the differences in numbers of identified problems of different types and their severity when testing adventure-type computer games for young children, between 5 and 7 years old, during first use, and after they have practiced two times with the game for about half an hour. Although real practice effects, like learning to drive a car, or doing complex arithmetic cannot be found in such a short time, the nature of many things that young children have to learn in computer games is much simpler. For example, children may not know at first that the purpose of a certain subgame is to catch all the blue flies because the explanation of the subgame is unclear. Once they have figured this out they will know what to do the next time they play the same subgame. This type of knowledge development belongs to the simplest types of learning in the cognitive domain according to Bloom's taxonomy of educational activities (Bloom et al., 1964). In contrast to the more complex types of learning in this taxonomy, this type of learning can probably be attained by playing a game two times for half an hour, especially when children only have to recognize the way to play the subgames.
The current analysis was done on an existing set of observations (Bekker et al., 2004) for the purpose of identifying specific problems that can be detected when testing first or more familiar use, in two frameworks of theories of human problems and intrinsic motivation. Furthermore, two other evaluators used Observer logging software (Noldus, 2002) to examine the collected video tapes of the first study again and in much more detail. Subsequently, the method to test the hypotheses is given. Reliability analyses of both the problem detection procedure and the problem classification are given. The results show which types of problems can be found more easily during first use and which ones when children have practiced with a game and what the effects are on the severity estimations of problems and the ranking of most important problems. Furthermore, the changes in the efficiency, effectiveness and satisfaction are discussed. Finally, these results are discussed in terms of their generalizability and practical use.
Section snippets
Problems in computer game play
Zapf and colleagues (Frese and Zapf, 1991; Zapf et al., 1992) proposed a taxonomy of problems occurring in work with office computers combining the work of Reason (1990), Norman and Draper (1986), Rasmussen (1982) and Hacker (1986). Norman's model of user-system interaction is created to describe interactions of humans with all sorts of systems and can also easily be applied to games (Barendregt and Bekker, 2004). Rasmussen's classification refers to the degree of conscious control exercised by
Intrinsic motivation in computer games
Because having pleasure and fun are key factors in a computer game (Pagulayan et al., 2003), fun problems are another category of problems that can occur. Therefore, problems that undermine fun are worth examining. In this paper, fun problems are defined as follows:
- •
Fun problems: Fun problems Fun problems occur when there are aspects in the game that make the game less motivating to use, even though they are not usability problems. For example, the music can be too scary, the characters can be
Usability problems
When children practice with a game they change from complete novices to more experienced players of the game. While eight different types of usability problems have been defined, Zapf et al. (1992) found that for adults using an office application there are only three significant differences between novices and experts concerning usability problems:
- •
Experts have significantly fewer knowledge problems than novices.
- •
Experts have significantly fewer thought problems than novices.
- •
Experts have
Participants
To test the hypotheses and answer the other research questions an experiment was run with 25 children of group three and four (grade one and two) of De Brembocht, an elementary school in Veldhoven, The Netherlands. This school is situated in a neighbourhood mainly inhabited by people who received higher education and earn more than minimum wage. All children were between five and seven years old (mean age was 84 months, S.D.=5.5 months), 8 girls and 17 boys. They were recruited by means of a
Reliability of the problem detection
Two evaluators analysed eight out of the 25 videotapes from the first session. The 25 videotapes of the last session were all analysed by two evaluators. To check the inter-coder reliability for the two evaluators the any-two agreement measures were calculated for the results of the individual breakdown coding and for the lists of problems, as proposed by Hertzum and Jacobsen (2001)
In this equation, P1 and P2 are the sets of problem indications or the sets of problems detected by
Types of problems in both test sessions
In the first test session 98 problems were identified and in the last test session 115 problems. The distribution of all unique problems found in the two analysed sessions over the problem categories is given in Fig. 3.
Hypotheses problem types
Most children did not visit exactly the same parts of the game in the first and the last test session. They could visit subgames, story screens, or navigational screens in any order they liked. In order to test the hypotheses only those subgames, story screens and navigational
Considerations about the unconfirmed hypotheses
Contrary to the expectations, the number of knowledge inefficiencies was not significantly higher for the first test session than for the last test session. This was caused in both test sessions by the low number of inefficiencies that all were identified by a only a few children. Inefficiencies can sometimes be hard to observe. This is caused by two factors; firstly, the observers have to be knowledgeable of all the possibilities in the game, secondly, it must be clear what the child is trying
Conclusion
The experiment described in this paper examined the differences in the outcomes of a test when a game is tested when children see it for the first time or when they have become more familiar with the game. The experiment showed that even after only 1 hour of practice with a game, children were able to finish significantly more subgames in the same time, increasing the efficiency in the game. Children were also able to finish a higher percentage of the subgames that they started, increasing the
Acknowledgements
This research was funded by the Innovation-Oriented Research Programme Human-Machine Interaction (IOP-MMI) of the Dutch government. We would like to thank Silvia Crombeen and Mariëlle Biesheuvel for conducting the test sessions. We would also like to thank the children and teachers of primary school de Brembocht for taking part in our research, and we would like to thank Prof. Dr. G.W.M. Rauterberg for bringing the relevant book of Frese and Zapf (1991) to our attention. Finally, we would like
References (46)
Estimating the number of subjects needed for a thinking aloud test
International Journal of Human–Computer Studies
(1994)The Architecture of Cognition
(1983)- Barendregt, W., Bekker, M. M., 2004. Towards a framework for design guidelines for young children's computer games. In:...
- Barendregt, W., Bekker, M. M., Speerstra, M., 2003. Empirical evaluation of usability and fun in computer games for...
- Barendregt, W., Bekker, M. M., Bouwhuis, D. G., Baauw, E., 2006. Predicting effectiveness of children participants in...
Usability Testing and Research
(2002)- Bekker, M. M., Barendregt, W., Crombeen, S., Biesheuvel, M., 2004. Evaluating usability and fun during initial and...
Curiosity and Exploration
Science
(1968)- et al.
Taxonomy of Educational Objectives
(1964) - Clanton, C., 1998. An interpreted demonstration of computer game design. In: Proceedings of the Conference on CHI 98...
Intrinsic rewards and emergent motivation
Intrinsic Motivation
Fehler bei der Arbeit mit dem Computer, Ergebnisse von Beobachtungen und Befragungen im Burobereich (Errors in Working with Computers, Results of Observations and Interviews in the Office Field)
Mental Models
Arbeitspsychologie
The evaluator effect: a chilling fact about usability evaluation methods
International Journal of Human–Computer Interaction: Special issue on Empirical Evaluation of Information Visualisations
Comparison of evaluation methods using structured usability problem reports
Behaviour & Information Technology
Cited by (49)
Towards a methodology for user experience assessment of serious games with children with cochlear implants
2018, Telematics and InformaticsCitation Excerpt :The incorporation of serious games in therapy can make a healthy contribution in such a way that they bring together entertainment and education (Chen and Michael, 2005; Sawyer, 2008) and can be integrated into the rehabilitation process. Currently, games have adapted formal Human-Computer Interaction (HCI) techniques in order to assess interaction and product quality (Gonzalez et al., 2012; Britain and Bolchini, 2010; Read et al., 2002; Risden et al., 1999; Barendregt et al., 2006; Wang et al., 2009). Usability is a quality attribute that determines user satisfaction and consequently the product.
Using the MemoLine to capture changes in user experience over time with children
2016, International Journal of Child-Computer InteractionCitation Excerpt :It may be more difficult to identify some of these issues with other approaches used to capture changes in user experience over time; such as cross-sectional or longitudinal research design. For example, using a pre and post-test with survey tools, as in the study by Barendregt et al. [26], it would be difficult to capture specific events that might trigger boredom or identify the challenges faced by the children in higher levels.
Physiological and psychophysiological responses in experienced players while playing different dance exer-games
2015, Computers in Human BehaviorAffective surfing in the visualized interface of a digital library for children
2015, Information Processing and ManagementPlayer-video game interaction: A systematic review of current concepts
2015, Computers in Human BehaviorCitation Excerpt :Two papers adapted classic user testing to test video game usability. Barendregt, Bekker, Bouwhuis, and Baauw (2006), in an experimental study, suggested testing the usability of a game twice with the same player. They reported that the problems before and after some practice were different in terms of quality and severity.
The design and evaluation of a peripheral device for use with a computer game intended for children with motor disabilities
2015, Computer Methods and Programs in BiomedicineCitation Excerpt :However, accessibility and usability are essential for allowing users to completely realize the benefits of informatics. Usability is related to efficiency and user satisfaction [12]. Accessibility refers to not only the ability to reach a resource but also the potential for using a resource in a satisfactory manner [13].