Elsevier

Computers & Graphics

Volume 86, February 2020, Pages 27-41
Computers & Graphics

Special Section on SVR 2019
New interactive strategies for virtual reality streaming in degraded context of use

https://doi.org/10.1016/j.cag.2019.10.005Get rights and content

Highlights

  • The multimedia community has been active in finding ways to stream immersive video content under limited bandwidth.

  • Degrading the quality is not the only choice to lower the data rate, and not necessarily the best for the users.

  • Two new types of impairments, Virtual Walls and Slow Downs, exploit the human attentional process to this aim.

  • Experiments confirm the potential of these alternative impairments, and indicate in which situation should they be employed.

  • Network simulations with viewport-based adaptations confirm their potential to improve various streaming performance metrics.

Abstract

Virtual reality videos are an important element in the range of immersive contents as they open new perspectives for story-telling, journalism or education. Accessing these immersive contents through Internet streaming is however much more difficult owing to the required data rates much higher than for regular videos. While current streaming strategies rely on video compression, in this paper we investigate a radically new stance: we posit that degrading the visual quality is not the only choice to reduce the required data rate, and not necessarily the best. Instead, we propose two new impairments, Virtual Walls (VWs) and Slow Downs (SDs), that change the way the user can interact with the 360 video in an environment with insufficient available bandwidth. User experiments with a double-stimulus approach show that, when triggered in proper time periods, these impairments are better perceived than visual quality degradation from video compression. We confirm with network simulations the usefulness of these new types of impairments: incorporated into a FoV-based adaptation, they can enable reduction in stalls and startup delay, and increase quality in FoV, even in the presence of substantial playback buffers.

Introduction

Contents and equipment’s for Virtual Reality (VR) have been developing fast in the last couple of years, both from a technological and commercial point of view. The technology is benefiting from major progresses in VR headset design (such as the announced Google-LG new 18-megapixel display) and compression [32]. From a business perspective the sales of VR headsets are foreseen to reach a yearly 40 million in 2022 and the market $215B [9]. With games and AR applications, cinematic contents and 360 videos in particular are important elements in the range of immersive contents. These are spherical videos which are meant to be watched in a VR headset for the user to get immersed into the content’s world. They open new perspectives for story-telling, journalism or education.

As it is currently the case for regular videos, their preferred mode of consumption will remain Internet streaming. However, a major obstacle to stream 360 videos is their required data rate, or bandwidth. Owing to the distance between the user’s eye and the screen when wearing a VR headset, the data rate must be two orders of magnitude higher than that of 4K videos. Given the resolution of the human fovea, a full impression of reality from the sight would require 5 Gpbs, even with the latest H.265 video coding standard [4]. These data rates are not available in standard Internet accesses, and the network challenges entailed by massive distribution of immersive content are substantial.

A major question therefore arises: how to stream immersive content under limited bandwidth? This article contributes in this direction. The general principle in existing research is to send in high quality (i.e., with high encoding rates) the sector of the video the user faces, and the rest in lower quality. This therefore makes the transmission decisions dependent on the user’s behavior in the virtual environment. Deciding which part of the sphere to send in high quality from the remote streaming server hence requires to predict the future user’s Field of View (FoV). Such prediction is only partly possible over very short time horizons (order of a second or less) owing to the complex dependency on previous motion and content, and inherent randomness [33]. For a given constrained bandwidth, the greater the discrepancy between the bandwidth and the highest video rate, the narrower the sector sent in highest quality, and the greater the probability the user will face a low quality sector.

This article investigates a radically new stance on the problem: assuming that the goal of an immersive experience is to make the user feel as in a real-world thanks to the sight, and given the impact of visual degradation on the vestibular system (as compared with watching a regular screen) and the feeling of presence, we posit that degrading the visual quality is not the only way to reduce the required data rate, and not necessarily the best choice. Based on the knowledge of the human attentional process, we identify new dimensions in which to impair the content to absorb the lack of bandwidth, complementarily to the visual quality. Specifically, we design two types of impairments and show that, when triggered in proper time periods, they can be better perceived than visual quality degradation from video compression, for the same amount of data to transfer.

Contributions:

  • We introduce two new types of impairments, named Virtual Walls (VWs) and Slow Downs (SDs), to improve the experience of 360 video streaming under limited bandwidth.We implement them in a streaming player compliant with the Spatial Relationship Description (SRD) amendment to the MPEG-DASH (Dynamic Adaptive Streaming over HTTP) standard for 360 video streaming.

  • We carry out user experiments with 18 users and 11 video scenes to identify whether VWs and SDs are alternative impairments acceptable to the users and that can improve the level of experience compared with quality adaptation alone. We use a double-stimulus approach to have every VW and SD versions compared with a reference version (both versions consume the exact same data rate). The video content represents different categories and comes from reference datasets.

  • The results show that both VW and SD impairments are generally preferred by the users over the compression-only reference. A thorough analysis of quantitative subjective assessments and objective metrics (head motion collected from logs) enables to understand the important factors involved in the user’s preference. Standardized SUS and AttrakDiff questionnaires confirm the acceptability of our approach.

  • Finally, we assess the gain in streaming performance VW and SD can bring to different FoV-based adaptation logics more or less prioritizing buffering over responsiveness to head motion. We confirm with network simulations the usefulness of these new types of impairments: incorporated into a FoV-based adaptation, they can enable reduction in stalls and startup delay, and increase quality in FoV, even in the presence of substantial playback buffers.

In our concern for reproducibility, the code made and the user experimental data collected for this work are made publicly available at [39], [40].

The article is organized as follows. Section 2 presents related works. Section 3 introduces and motivates the proposed impairments. Section 4 details the experimental protocol. Section 5 analyzes the results of the user experiments, while Section 6 presents network simulations. Finally, we discuss some of the questions raised by our approaches, including important perspectives, in Section 7, and give conclusions in Section 8.

Section snippets

Related works

We review below four core aspects for our goal: the main recent findings on attentional behavior in VR, then the general classes of attention guidance techniques, the perception of slow motion and finally how the problem of streaming VR has been tackled so far.

Sitzmann et al. in [44] provide an extensive study (involving 169 users) of how do people explore in static VR environment (i.e., 360 images). They show that the average exploration time, that is the time a user takes to scan the entire

New types of impairments: VW and SD

We first present elements on the phases of the human attention when watching a 360 video, before introducing the new types of impairments we propose, each aimed at being used in one of the phases.

Hypothesis and experimental protocol

This section details the specific hypotheses we make on the VW and the SD impairments, as well as the evaluation of their overall usability and user experience. This evaluation is made using a double-stimulus approach following the guidelines of the International Telecommunications Union (ITU) [47]. We use standard and ad hoc questionnaires with specific metrics the users are asked to score. This evaluation is completed by the analysis of the head motion logs recorded during the experiments.

In

Results

We first analyze the results of the user experiments for VW, then for SD. We show in which extent they can confirm hypotheses H1 and H2. We analyze the importance of each factor (visual quality, responsiveness or comfort scores) in the expressed preference. The last part analyzes the results of the SUS and AttrakDiff questionnaires.

System-level impact of VW and SD

The previous section has shown results of the user experiments that were aimed at verifying whether VW and SD are alternative impairments acceptable to the users and that can improve the level of experience compared with compression alone. These alternative impairments are made to help support usage in limited bandwidth, and the experiments were made for typical scenarios where these impairments are envisioned to help (counter quality degradation in the startup exploration phase for SD or in

Discussion

VW and SD will be particularly useful when there is a significant discrepancy between the available bandwidth and the bitrate of the highest quality of the sphere: the higher this discrepancy, the narrower the area where the quality can be maximum. This discrepancy will worsen with future headsets with significantly increased resolution (such as the newly released Varjo with 50 megapixels per eye). Resorting to SD and VW will enable to increase this area. This has been echoed by the findings of

Conclusions and future works

This article has identified two new types of impairments to help streaming VR videos under limited bandwidth. We have built on the recent characterization of human attention in VR to introduce Virtual Walls and Slow Down, which we show to be well-accepted and useful to improve the level of experience compared with quality adaptation alone. The SD and VW impairments are complementary in that they are meant to apply to different types of scenes (exploration and concentrated focus, respectively).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work has been supported by the French government, through the UCA JEDI and EUR DS4H Investments in the Future projects ANR-15-IDEX-0001 and ANR-17-EURE-0004.

References (53)

  • M. Almquist et al.

    The prefetch aggressiveness tradeoff in 360 degree video streaming

    Proceedings of the 9th ACM multimedia systems conference. MMSys ’18

    (2018)
  • T. Ballard et al.

    Rats: Adaptive 360-degree live streaming

    Proceedings of the 10th ACM multimedia systems conference. MMSys ’19

    (2019)
  • A. Bangor et al.

    Determining what individual SUS scores mean: adding an adjective rating scale

    J Usability Stud

    (2009)
  • E. Bastug et al.

    Toward interconnected virtual reality: opportunities, challenges, and enablers

    IEEE Commun Mag

    (2017)
  • J.G. Beerends et al.

    The influence of video quality on perceived audio quality and vice versa

    J Audio Eng Soc

    (1999)
  • J. Brooke

    SUS – A quick and dirty usability scale

  • N. Carlsson et al.

    Optimized adaptive streaming of multi-video stream bundles

    IEEE Trans Multimed

    (2017)
  • Caruso E.M., Burns Z.C., Converse B.. Slow motion increases perceived intent. Proc Natl Acad Sci USA...
  • Corporation I.D.. Demand for augmented reality/virtual reality headsets expected to rebound in 2018. 2018. Industry...
  • A. Coutrot et al.

    Learning a time-dependent master saliency map from eye-tracking data in videos

    (2017)
  • S. Dambra et al.

    TOUCAN-VR

    Software

    (2018)
  • S. Dambra et al.

    Film editing: new levers to improve VR streaming

    Proceedings of the 9th ACM multimedia systems conference. MMSys ’18

    (2018)
  • E.J. David et al.

    A dataset of head and eye movements for 360 degree videos

    Proceedings of the 9th ACM multimedia systems conference, MMSys ’18

    (2018)
  • Example. Manifest file. http://yt-dash-mse-test.commondatastorage.googleapis.com/media/car-20120827-manifest.mpd;...
  • Y. Farmani et al.

    Viewpoint snapping to reduce cybersickness in virtual reality

    Graph Interfaces

    (2018)
  • C.O. Fearghail et al.

    Director’s cut – analysis of aspects of interactive storytelling for VR films

  • V.R. Gaddam et al.

    Tiling in interactive panoramic video: approaches and evaluation

    IEEE Trans Multimed

    (2016)
  • B. Girod et al.

    Adaptive playout for low latency video streaming

    Proceedings of the IEEE international conference on image processing (ICIP)

    (2001)
  • Google. VR180 cameras. 2019....
  • M. Graf et al.

    Towards bandwidth efficient adaptive streaming of omnidirectional video over http: design, implementation, and evaluation

    Proceedings of the 8th ACM on multimedia systems conference. MMSys’17

    (2017)
  • S. Grogorick et al.

    Subtle gaze guidance for immersive environments

    Proceedings of the ACM symposium on applied perception, SAP ’17

    (2017)
  • M. Hassenzahl et al.

    User experience – a research agenda

    Behav Inf Technol

    (2006)
  • H. Hu et al.

    Deep 360 pilot: learning a deep agent for piloting through 360 sports videos

    Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    (2017)
  • M. Jeppsson et al.

    Efficient live and on-demand tiled HEVC 360 VR video streaming

    Proceedings of the IEEE international symposium on multimedia (ISM)

    (2018)
  • V.A. de Jesus Oliveira et al.

    Designing a vibrotactile head-mounted display for spatial awareness in 3D spaces

    IEEE Trans Vis Comput Graph

    (2017)
  • J. Kim et al.

    Voice activity detection using an adaptive context attention model

    IEEE Signal Process Lett

    (2018)
  • Cited by (15)

    View all citing articles on Scopus
    View full text