Evaluating feedback devices for time-continuous mobile multimedia quality assessment

https://doi.org/10.1016/j.image.2014.07.001Get rights and content

Highlights

  • We developed a scheme to objectively compare subjective rating methods.

  • We tested this scheme by comparing several rating devices.

  • Users would like to express experienced quality with a simple finger count method.

  • The finger count method outperformed other approaches also objectively.

  • Haptic feed-back distracts the user from his task instead of providing help.

Abstract

In January 2014, the new ITU-T P.913 recommendation for measuring subjective video, audio and multimedia quality in any environment has been published. This document does not contain any time-continuous subjective method. However, environmental parameter values are changing continuously in a majority of outdoor and also most indoor environments. To be aware of their impact on the perceived quality, a time-continuous quality assessment methodology is necessary. In previous standards, targeting laboratory-based test settings, a desk-mounted slider of substantial size is recommended. Unfortunately, there are many environments where such a device cannot be used.

In this paper, new feedback tools for mobile time-continuous rating are presented and analysed. We developed several alternatives to the generally adopted desk-mounted slider as a rating device. In order to compare the tools, we defined a number of performance measures that can be used in further studies. The suitability and efficacy of the rating scheme based on measurable parameters as well as user opinions is compared. One method, the finger count, seems to outperform the others from all points of view. It was been judged to be easy to use with low potential for distractions. Furthermore, it reaches a similar precision level as the slider, while requiring lower user reaction and scoring times. Low reaction times are particularly important for time-continuous quality assessment, where the reliability of a mapping between impairments and user ratings plays an essential role.

Introduction

For a long time, subjective multimedia quality assessment has been performed in a fully controlled laboratory setting, following the guidelines described in ITU recommendations [1], [2], [3], [4]. Due to the increased number of mobile devices, this traditional approach does not seem to be adequate anymore. Alternatives, such as (semi) living-labs, have been considered [5], [6], [7], but also measurements in realistic environments have been performed [8] in the recent past. Based on these considerations, a new ITU recommendation defining measurement settings for subjective video, audio and audiovisual quality in any environment was published in January 2014 [9]. In this specification, a set of five methods and acceptable as well as discouraged changes to these methods are presented. In contrast to previous recommendations, the new ITU-T P.913 document does not contain any time-continuous rating method.

At a first glance, collecting single ratings for each test sequence seems to have the advantage of being able to define a mapping between user ratings and media quality. But in a mobile testing scenario, this is only partially true: in general, the stimuli that are used have a duration of around 8–10 s. In [1] it is stated that “for still pictures, a 3–4 second sequence and five repetitions (voting during the last two) may be appropriate” and that “for moving pictures with time-varying artefacts, a 10 s sequence with two repetitions (voting during the second) may be appropriate”. In the new standard [9] it is specified that stimuli should range from 5 to 20 s and that “eight- to ten-second sequences are highly recommended”.

The length of 8–10 s has been defined to avoid uncertainties that might be caused by the primacy and recency effect. Those psychological effects explain why judgments are increasingly based on earlier or later parts of the sequences. However, within a time frame of 10 s, environmental conditions can change drastically and can seriously affect the experienced quality. Take for example a truck driving by casting a shadow on the users׳ device and creating a strong background noise. In this work, we did not analyze the effects on changing environments on the perceived quality. Here, we simply want to point out that the usage of a time-continuous quality assessment methodology seems to be unavoidable when allowing any environment for subjective testing as aimed at in ITU-T P.913 [9].

One of the reasons why time-continuous methods have not been included in ITU-T P.913 [9] might be that no adequate rating device is available for outdoor or even mobile quality assessment. According to [1], [2], time-continuous subjective tests have to be performed by using a desk-mounted slider. Such a slider cannot be used for measuring the quality of mobile multimedia. It is too large to be carried around and it is impossible to perform ratings while consuming mobile multimedia.

To overcome this problem, we developed several different tools for a time-continuous rating in mobile quality assessment. We estimated that some of these tools would be more applicable for mobile quality assessment than others. Therefore, we defined a set of measures that enabled us to compare the performance of the time-continuous rating schemes.

We decided to compare them to the currently used slider according to objective and subjective criteria. The precision of the rating methodology represented an important aspect of our research. We also computed the time needed to react and to perform the intended rating. We estimated the potential distraction that occurred due to the rating methodology and we anonymously collected biometrical data that might have impact on the efficacy of certain rating methodologies. Finally, we gathered the users׳ opinions on each single method and asked the test persons to rank them with respect to their subjective preferences.

The paper is structured as follows. In Section 2 we present a selection of related work concerning rating devices and scales. In Section 3, the different test methodologies and their implementation are described. Three different user tests were carried out to assess the performance of the test methodologies. Those tests and their selected quality criteria are described in Section 4.

Section snippets

Related work

Research on the suitability and comparability of different subjective tests has been frequently carried out [10], [11], [12]. It was found that the choice of the rating scale is not very critical. For example, user ratings based on an 11-point scale can be translated into a 5-point scale [12] without loss of information. This is beneficial, since it seems to be convenient to use the simplest scale for mobile testing, due to the complexity of the situation. However, the issue of the

Material and methods

Large or heavy devices are inappropriate for mobile multimedia quality assessment. Since they are uncomfortable to use, they contain a high potential for distractions. In order to find a suitable alternative to the slider for future testing, we developed four alternatives that will be presented below.

Experimental settings

Since we aimed at studying the performance of the rating methodology itself, we chose to set up a specific test scenario that differs from the typical multimedia quality assessment settings. Instead of presenting stimuli containing artificially introduced impairments, we used videos where every five seconds a new number or category was shown to the users. The numbers were randomly generated between 1 and 5 and the categories were randomly selected from the five ACR categories “excellent, good,

User opinion

As described in Section 4.1, users were asked to state their opinion on the different methodologies. Every person had to select one method they preferred over the others. The results of the second test, where a camera based implementation of the finger count method was used, can be observed in Fig. 9. Generally, it can be deduced that users prefer the finger count, the finger distance and the glove over the other methods.

Judging from user׳s comments, the finger count seems to be the most

Conclusion

In this paper we presented an extensive study on alternative rating methodologies to the slider, which is the commonly used device for time-continuous subjective quality assessment. During our research activities, we started to develop and test several new methods that are suitable for mobile use. Results from three different user studies were used to understand the important issues of such a rating technology and successively led to important refinements. Finally, we were able to identify one

Acknowledgments

We thank the reviewers for their valuable comments on a previous version of the paper, as well as our students who have acted as evaluators, and gratefully acknowledge Grant ICT08-005 from the Wiener Wissenschafts-, Forschungs-, und Technologiefonds (WWTF, Vienna Science and Technology Fund) to Helmut Hlavacs and Grant CS11-009 from WWTF to Ulrich Ansorge, Otmar Scherzer and Shelley Buchinger.

References (20)

  • ITU-R BT.500-11, Methodology for the Subjective Assessment of the Quality of Television Pictures, Itu-r,...
  • ITU-T P.910, Subjective Video Quality Assessment Methods for Multimedia Applications, Itu-t,...
  • ITU-R BS.1283, Subjective Assessment of sound Quality—A Guide to Existing Recommendations, Itu-r,...
  • ITU-R BS.1283-1, Methodology for the Subjective Assessment of Video Quality in Multimedia Applications, Itu-r,...
  • T. De Pessemier et al.

    Quantifying subjective quality evaluations for mobile video watching in a semi-living lab context

    IEEE Trans. Broadcast.

    (2012)
  • D. Schuurman, T. Evens, L. De Marez, A living lab research approach for mobile TV, in: EuroITV ׳09, ACM, Leuven,...
  • R. Bernhaupt et al.

    Trends in the living room and beyond: results from ethnographic studies using creative and playful probing

    Comput. Entertain.

    (2008)
  • N. Staelens et al.

    Assessing quality of experience of IPTV and video on demand services in real-life environments

    IEEE Trans Broadcast.

    (2010)
  • ITU-T P.913, Methods for the Subjective Assessment of Video Quality, Audio Quality and Audiovisual Quality of Internet...
  • M. Pinson, S. Wolf, Comparing subjective video quality testing methodologies, in: SPIE Video Communications and Image...
There are more references available in the full text version of this article.

Cited by (4)

View full text