Evaluating feedback devices for time-continuous mobile multimedia quality assessment
Introduction
For a long time, subjective multimedia quality assessment has been performed in a fully controlled laboratory setting, following the guidelines described in ITU recommendations [1], [2], [3], [4]. Due to the increased number of mobile devices, this traditional approach does not seem to be adequate anymore. Alternatives, such as (semi) living-labs, have been considered [5], [6], [7], but also measurements in realistic environments have been performed [8] in the recent past. Based on these considerations, a new ITU recommendation defining measurement settings for subjective video, audio and audiovisual quality in any environment was published in January 2014 [9]. In this specification, a set of five methods and acceptable as well as discouraged changes to these methods are presented. In contrast to previous recommendations, the new ITU-T P.913 document does not contain any time-continuous rating method.
At a first glance, collecting single ratings for each test sequence seems to have the advantage of being able to define a mapping between user ratings and media quality. But in a mobile testing scenario, this is only partially true: in general, the stimuli that are used have a duration of around 8–10 s. In [1] it is stated that “for still pictures, a 3–4 second sequence and five repetitions (voting during the last two) may be appropriate” and that “for moving pictures with time-varying artefacts, a 10 s sequence with two repetitions (voting during the second) may be appropriate”. In the new standard [9] it is specified that stimuli should range from 5 to 20 s and that “eight- to ten-second sequences are highly recommended”.
The length of 8–10 s has been defined to avoid uncertainties that might be caused by the primacy and recency effect. Those psychological effects explain why judgments are increasingly based on earlier or later parts of the sequences. However, within a time frame of 10 s, environmental conditions can change drastically and can seriously affect the experienced quality. Take for example a truck driving by casting a shadow on the users׳ device and creating a strong background noise. In this work, we did not analyze the effects on changing environments on the perceived quality. Here, we simply want to point out that the usage of a time-continuous quality assessment methodology seems to be unavoidable when allowing any environment for subjective testing as aimed at in ITU-T P.913 [9].
One of the reasons why time-continuous methods have not been included in ITU-T P.913 [9] might be that no adequate rating device is available for outdoor or even mobile quality assessment. According to [1], [2], time-continuous subjective tests have to be performed by using a desk-mounted slider. Such a slider cannot be used for measuring the quality of mobile multimedia. It is too large to be carried around and it is impossible to perform ratings while consuming mobile multimedia.
To overcome this problem, we developed several different tools for a time-continuous rating in mobile quality assessment. We estimated that some of these tools would be more applicable for mobile quality assessment than others. Therefore, we defined a set of measures that enabled us to compare the performance of the time-continuous rating schemes.
We decided to compare them to the currently used slider according to objective and subjective criteria. The precision of the rating methodology represented an important aspect of our research. We also computed the time needed to react and to perform the intended rating. We estimated the potential distraction that occurred due to the rating methodology and we anonymously collected biometrical data that might have impact on the efficacy of certain rating methodologies. Finally, we gathered the users׳ opinions on each single method and asked the test persons to rank them with respect to their subjective preferences.
The paper is structured as follows. In Section 2 we present a selection of related work concerning rating devices and scales. In Section 3, the different test methodologies and their implementation are described. Three different user tests were carried out to assess the performance of the test methodologies. Those tests and their selected quality criteria are described in Section 4.
Section snippets
Related work
Research on the suitability and comparability of different subjective tests has been frequently carried out [10], [11], [12]. It was found that the choice of the rating scale is not very critical. For example, user ratings based on an 11-point scale can be translated into a 5-point scale [12] without loss of information. This is beneficial, since it seems to be convenient to use the simplest scale for mobile testing, due to the complexity of the situation. However, the issue of the
Material and methods
Large or heavy devices are inappropriate for mobile multimedia quality assessment. Since they are uncomfortable to use, they contain a high potential for distractions. In order to find a suitable alternative to the slider for future testing, we developed four alternatives that will be presented below.
Experimental settings
Since we aimed at studying the performance of the rating methodology itself, we chose to set up a specific test scenario that differs from the typical multimedia quality assessment settings. Instead of presenting stimuli containing artificially introduced impairments, we used videos where every five seconds a new number or category was shown to the users. The numbers were randomly generated between 1 and 5 and the categories were randomly selected from the five ACR categories “excellent, good,
User opinion
As described in Section 4.1, users were asked to state their opinion on the different methodologies. Every person had to select one method they preferred over the others. The results of the second test, where a camera based implementation of the finger count method was used, can be observed in Fig. 9. Generally, it can be deduced that users prefer the finger count, the finger distance and the glove over the other methods.
Judging from user׳s comments, the finger count seems to be the most
Conclusion
In this paper we presented an extensive study on alternative rating methodologies to the slider, which is the commonly used device for time-continuous subjective quality assessment. During our research activities, we started to develop and test several new methods that are suitable for mobile use. Results from three different user studies were used to understand the important issues of such a rating technology and successively led to important refinements. Finally, we were able to identify one
Acknowledgments
We thank the reviewers for their valuable comments on a previous version of the paper, as well as our students who have acted as evaluators, and gratefully acknowledge Grant ICT08-005 from the Wiener Wissenschafts-, Forschungs-, und Technologiefonds (WWTF, Vienna Science and Technology Fund) to Helmut Hlavacs and Grant CS11-009 from WWTF to Ulrich Ansorge, Otmar Scherzer and Shelley Buchinger.
References (20)
- ITU-R BT.500-11, Methodology for the Subjective Assessment of the Quality of Television Pictures, Itu-r,...
- ITU-T P.910, Subjective Video Quality Assessment Methods for Multimedia Applications, Itu-t,...
- ITU-R BS.1283, Subjective Assessment of sound Quality—A Guide to Existing Recommendations, Itu-r,...
- ITU-R BS.1283-1, Methodology for the Subjective Assessment of Video Quality in Multimedia Applications, Itu-r,...
- et al.
Quantifying subjective quality evaluations for mobile video watching in a semi-living lab context
IEEE Trans. Broadcast.
(2012) - D. Schuurman, T. Evens, L. De Marez, A living lab research approach for mobile TV, in: EuroITV ׳09, ACM, Leuven,...
- et al.
Trends in the living room and beyond: results from ethnographic studies using creative and playful probing
Comput. Entertain.
(2008) - et al.
Assessing quality of experience of IPTV and video on demand services in real-life environments
IEEE Trans Broadcast.
(2010) - ITU-T P.913, Methods for the Subjective Assessment of Video Quality, Audio Quality and Audiovisual Quality of Internet...
- M. Pinson, S. Wolf, Comparing subjective video quality testing methodologies, in: SPIE Video Communications and Image...
Cited by (4)
Designing Real-time, Continuous QoE Score Acquisition Techniques for HMD-based 360°VR Video Watching
2022, 2022 14th International Conference on Quality of Multimedia Experience, QoMEX 2022VRate: A Unity3D Asset for integrating Subjective Assessment Questionnaires in Virtual Environments
2018, 2018 10th International Conference on Quality of Multimedia Experience, QoMEX 2018Continuous subjective rating of perceived motion incongruence during driving simulation
2018, IEEE Transactions on Human-Machine SystemsWho is moving - User or device? Experienced quality of mobile 3D video in vehicles
2015, ACM International Conference Proceeding Series