New interactive strategies for virtual reality streaming in degraded context of use

doi:10.1016/j.cag.2019.10.005

Computers & Graphics

Volume 86, February 2020, Pages 27-41

https://doi.org/10.1016/j.cag.2019.10.005 Get rights and content

Highlights

•
The multimedia community has been active in finding ways to stream immersive video content under limited bandwidth.
•
Degrading the quality is not the only choice to lower the data rate, and not necessarily the best for the users.
•
Two new types of impairments, Virtual Walls and Slow Downs, exploit the human attentional process to this aim.
•
Experiments confirm the potential of these alternative impairments, and indicate in which situation should they be employed.
•
Network simulations with viewport-based adaptations confirm their potential to improve various streaming performance metrics.

Abstract

Virtual reality videos are an important element in the range of immersive contents as they open new perspectives for story-telling, journalism or education. Accessing these immersive contents through Internet streaming is however much more difficult owing to the required data rates much higher than for regular videos. While current streaming strategies rely on video compression, in this paper we investigate a radically new stance: we posit that degrading the visual quality is not the only choice to reduce the required data rate, and not necessarily the best. Instead, we propose two new impairments, Virtual Walls (VWs) and Slow Downs (SDs), that change the way the user can interact with the 360^∘ video in an environment with insufficient available bandwidth. User experiments with a double-stimulus approach show that, when triggered in proper time periods, these impairments are better perceived than visual quality degradation from video compression. We confirm with network simulations the usefulness of these new types of impairments: incorporated into a FoV-based adaptation, they can enable reduction in stalls and startup delay, and increase quality in FoV, even in the presence of substantial playback buffers.

Graphical abstract

Introduction

Contents and equipment’s for Virtual Reality (VR) have been developing fast in the last couple of years, both from a technological and commercial point of view. The technology is benefiting from major progresses in VR headset design (such as the announced Google-LG new 18-megapixel display) and compression [32]. From a business perspective the sales of VR headsets are foreseen to reach a yearly 40 million in 2022 and the market $215B [9]. With games and AR applications, cinematic contents and 360^∘ videos in particular are important elements in the range of immersive contents. These are spherical videos which are meant to be watched in a VR headset for the user to get immersed into the content’s world. They open new perspectives for story-telling, journalism or education.

As it is currently the case for regular videos, their preferred mode of consumption will remain Internet streaming. However, a major obstacle to stream 360^∘ videos is their required data rate, or bandwidth. Owing to the distance between the user’s eye and the screen when wearing a VR headset, the data rate must be two orders of magnitude higher than that of 4K videos. Given the resolution of the human fovea, a full impression of reality from the sight would require 5 Gpbs, even with the latest H.265 video coding standard [4]. These data rates are not available in standard Internet accesses, and the network challenges entailed by massive distribution of immersive content are substantial.

A major question therefore arises: how to stream immersive content under limited bandwidth? This article contributes in this direction. The general principle in existing research is to send in high quality (i.e., with high encoding rates) the sector of the video the user faces, and the rest in lower quality. This therefore makes the transmission decisions dependent on the user’s behavior in the virtual environment. Deciding which part of the sphere to send in high quality from the remote streaming server hence requires to predict the future user’s Field of View (FoV). Such prediction is only partly possible over very short time horizons (order of a second or less) owing to the complex dependency on previous motion and content, and inherent randomness [33]. For a given constrained bandwidth, the greater the discrepancy between the bandwidth and the highest video rate, the narrower the sector sent in highest quality, and the greater the probability the user will face a low quality sector.

This article investigates a radically new stance on the problem: assuming that the goal of an immersive experience is to make the user feel as in a real-world thanks to the sight, and given the impact of visual degradation on the vestibular system (as compared with watching a regular screen) and the feeling of presence, we posit that degrading the visual quality is not the only way to reduce the required data rate, and not necessarily the best choice. Based on the knowledge of the human attentional process, we identify new dimensions in which to impair the content to absorb the lack of bandwidth, complementarily to the visual quality. Specifically, we design two types of impairments and show that, when triggered in proper time periods, they can be better perceived than visual quality degradation from video compression, for the same amount of data to transfer.

Contributions:

•
We introduce two new types of impairments, named Virtual Walls (VWs) and Slow Downs (SDs), to improve the experience of 360^∘ video streaming under limited bandwidth.We implement them in a streaming player compliant with the Spatial Relationship Description (SRD) amendment to the MPEG-DASH (Dynamic Adaptive Streaming over HTTP) standard for 360^∘ video streaming.
•
We carry out user experiments with 18 users and 11 video scenes to identify whether VWs and SDs are alternative impairments acceptable to the users and that can improve the level of experience compared with quality adaptation alone. We use a double-stimulus approach to have every VW and SD versions compared with a reference version (both versions consume the exact same data rate). The video content represents different categories and comes from reference datasets.
•
The results show that both VW and SD impairments are generally preferred by the users over the compression-only reference. A thorough analysis of quantitative subjective assessments and objective metrics (head motion collected from logs) enables to understand the important factors involved in the user’s preference. Standardized SUS and AttrakDiff questionnaires confirm the acceptability of our approach.
•
Finally, we assess the gain in streaming performance VW and SD can bring to different FoV-based adaptation logics more or less prioritizing buffering over responsiveness to head motion. We confirm with network simulations the usefulness of these new types of impairments: incorporated into a FoV-based adaptation, they can enable reduction in stalls and startup delay, and increase quality in FoV, even in the presence of substantial playback buffers.

In our concern for reproducibility, the code made and the user experimental data collected for this work are made publicly available at [39], [40].

The article is organized as follows. Section 2 presents related works. Section 3 introduces and motivates the proposed impairments. Section 4 details the experimental protocol. Section 5 analyzes the results of the user experiments, while Section 6 presents network simulations. Finally, we discuss some of the questions raised by our approaches, including important perspectives, in Section 7, and give conclusions in Section 8.

Section snippets

Related works

We review below four core aspects for our goal: the main recent findings on attentional behavior in VR, then the general classes of attention guidance techniques, the perception of slow motion and finally how the problem of streaming VR has been tackled so far.

Sitzmann et al. in [44] provide an extensive study (involving 169 users) of how do people explore in static VR environment (i.e., 360^∘ images). They show that the average exploration time, that is the time a user takes to scan the entire

New types of impairments: VW and SD

We first present elements on the phases of the human attention when watching a 360^∘ video, before introducing the new types of impairments we propose, each aimed at being used in one of the phases.

Hypothesis and experimental protocol

This section details the specific hypotheses we make on the VW and the SD impairments, as well as the evaluation of their overall usability and user experience. This evaluation is made using a double-stimulus approach following the guidelines of the International Telecommunications Union (ITU) [47]. We use standard and ad hoc questionnaires with specific metrics the users are asked to score. This evaluation is completed by the analysis of the head motion logs recorded during the experiments.

Results

We first analyze the results of the user experiments for VW, then for SD. We show in which extent they can confirm hypotheses H1 and H2. We analyze the importance of each factor (visual quality, responsiveness or comfort scores) in the expressed preference. The last part analyzes the results of the SUS and AttrakDiff questionnaires.

System-level impact of VW and SD

The previous section has shown results of the user experiments that were aimed at verifying whether VW and SD are alternative impairments acceptable to the users and that can improve the level of experience compared with compression alone. These alternative impairments are made to help support usage in limited bandwidth, and the experiments were made for typical scenarios where these impairments are envisioned to help (counter quality degradation in the startup exploration phase for SD or in

Discussion

VW and SD will be particularly useful when there is a significant discrepancy between the available bandwidth and the bitrate of the highest quality of the sphere: the higher this discrepancy, the narrower the area where the quality can be maximum. This discrepancy will worsen with future headsets with significantly increased resolution (such as the newly released Varjo with 50 megapixels per eye). Resorting to SD and VW will enable to increase this area. This has been echoed by the findings of

Conclusions and future works

This article has identified two new types of impairments to help streaming VR videos under limited bandwidth. We have built on the recent characterization of human attention in VR to introduce Virtual Walls and Slow Down, which we show to be well-accepted and useful to improve the level of experience compared with quality adaptation alone. The SD and VW impairments are complementary in that they are meant to apply to different types of scenes (exploration and concentrated focus, respectively).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work has been supported by the French government, through the UCA JEDI and EUR DS4H Investments in the Future projects ANR-15-IDEX-0001 and ANR-17-EURE-0004.

References (53)

M. Almquist et al.
The prefetch aggressiveness tradeoff in 360 degree video streaming
Proceedings of the 9th ACM multimedia systems conference. MMSys ’18
(2018)
T. Ballard et al.
Rats: Adaptive 360-degree live streaming
Proceedings of the 10th ACM multimedia systems conference. MMSys ’19
(2019)
A. Bangor et al.
Determining what individual SUS scores mean: adding an adjective rating scale
J Usability Stud
(2009)
E. Bastug et al.
Toward interconnected virtual reality: opportunities, challenges, and enablers
IEEE Commun Mag
(2017)
J.G. Beerends et al.
The influence of video quality on perceived audio quality and vice versa
J Audio Eng Soc
(1999)
J. Brooke
SUS – A quick and dirty usability scale
N. Carlsson et al.
Optimized adaptive streaming of multi-video stream bundles
IEEE Trans Multimed
(2017)
Caruso E.M., Burns Z.C., Converse B.. Slow motion increases perceived intent. Proc Natl Acad Sci USA...
Corporation I.D.. Demand for augmented reality/virtual reality headsets expected to rebound in 2018. 2018. Industry...
A. Coutrot et al.
Learning a time-dependent master saliency map from eye-tracking data in videos
(2017)

S. Dambra et al.

TOUCAN-VR

Software

(2018)

S. Dambra et al.

Film editing: new levers to improve VR streaming

Proceedings of the 9th ACM multimedia systems conference. MMSys ’18

(2018)

E.J. David et al.

A dataset of head and eye movements for 360 degree videos

Proceedings of the 9th ACM multimedia systems conference, MMSys ’18

(2018)

Example. Manifest file. http://yt-dash-mse-test.commondatastorage.googleapis.com/media/car-20120827-manifest.mpd;...

Y. Farmani et al.

Viewpoint snapping to reduce cybersickness in virtual reality

Graph Interfaces

(2018)

C.O. Fearghail et al.

Director’s cut – analysis of aspects of interactive storytelling for VR films

V.R. Gaddam et al.

Tiling in interactive panoramic video: approaches and evaluation

IEEE Trans Multimed

(2016)

B. Girod et al.

Adaptive playout for low latency video streaming

Proceedings of the IEEE international conference on image processing (ICIP)

(2001)

Google. VR180 cameras. 2019....

M. Graf et al.

Towards bandwidth efficient adaptive streaming of omnidirectional video over http: design, implementation, and evaluation

Proceedings of the 8th ACM on multimedia systems conference. MMSys’17

(2017)

S. Grogorick et al.

Subtle gaze guidance for immersive environments

Proceedings of the ACM symposium on applied perception, SAP ’17

(2017)

M. Hassenzahl et al.

User experience – a research agenda

Behav Inf Technol

(2006)

H. Hu et al.

Deep 360 pilot: learning a deep agent for piloting through 360^∘ sports videos

Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

(2017)

M. Jeppsson et al.

Efficient live and on-demand tiled HEVC 360 VR video streaming

Proceedings of the IEEE international symposium on multimedia (ISM)

(2018)

V.A. de Jesus Oliveira et al.

Designing a vibrotactile head-mounted display for spatial awareness in 3D spaces

IEEE Trans Vis Comput Graph

(2017)

J. Kim et al.

Voice activity detection using an adaptive context attention model

IEEE Signal Process Lett

(2018)

Cited by (15)

How good are virtual hands? Influences of input modality on motor tasks in virtual reality
2023, Journal of Environmental Psychology
Hand-tracking enables controller-free interaction with virtual environments, which can make virtual reality (VR) experiences more natural and immersive. As naturalness hinges on both technological and human influence factors, fine-tuning the former while assessing the latter can be used to increase overall experience. This paper investigates a reach-grab-place task inside VR using two input modalities (hand-tracking vs. handheld-controller). Subjects (N = 33) compared the two input methods available on a consumer grade VR headset for their effects on objective user performance and subjective experience of the perceived sense of presence, cognitive workload, and ease-of-use. We found that virtual hands (with hand-tracking) did not influence the subjective feelings of perceived presence, naturalness, & engagement; neither did it inspire the overall ease-of-use while performing the task. In fact, subjects completed the task faster and felt a lower mental workload and higher overall usability with handheld-controllers. The result found that in this particular case, hand-tracking did not improve the psychological and emotional determinants of immersive VR experiences. The study helps expand on our understanding of the two input modalities in terms of their viability for naturalistic experiences in VR akin to real-world scenarios.
Editorial Note
2020, Computers and Graphics (Pergamon)
Foreword to the Special Section on the Symposium on Virtual and Augmented Reality 2019 (SVR 2019)
2020, Computers and Graphics (Pergamon)
A Novel Approach for Scalable and Sustainable 6G Networks
2024, IEEE Open Journal of the Communications Society
Designing and developing a virtual reality escape game for youth vocational rehabilitation
2024, International Social Work
Journalistic innovation: How new formats of digital journalism are perceived in the academic literature
2023, Journalism

View all citing articles on Scopus

View full text

Special Section on SVR 2019New interactive strategies for virtual reality streaming in degraded context of use

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related works

New types of impairments: VW and SD

Hypothesis and experimental protocol

Results

System-level impact of VW and SD

Discussion

Conclusions and future works

Declaration of Competing Interest

Acknowledgment

The prefetch aggressiveness tradeoff in 360 degree video streaming

Proceedings of the 9th ACM multimedia systems conference. MMSys ’18

Rats: Adaptive 360-degree live streaming

Proceedings of the 10th ACM multimedia systems conference. MMSys ’19

Determining what individual SUS scores mean: adding an adjective rating scale

J Usability Stud

Toward interconnected virtual reality: opportunities, challenges, and enablers

IEEE Commun Mag

The influence of video quality on perceived audio quality and vice versa

J Audio Eng Soc

SUS – A quick and dirty usability scale

Optimized adaptive streaming of multi-video stream bundles

IEEE Trans Multimed

Learning a time-dependent master saliency map from eye-tracking data in videos

TOUCAN-VR

Software

Film editing: new levers to improve VR streaming

Proceedings of the 9th ACM multimedia systems conference. MMSys ’18

A dataset of head and eye movements for 360 degree videos

Proceedings of the 9th ACM multimedia systems conference, MMSys ’18

Viewpoint snapping to reduce cybersickness in virtual reality

Graph Interfaces

Director’s cut – analysis of aspects of interactive storytelling for VR films

Tiling in interactive panoramic video: approaches and evaluation

IEEE Trans Multimed

Adaptive playout for low latency video streaming

Proceedings of the IEEE international conference on image processing (ICIP)

Towards bandwidth efficient adaptive streaming of omnidirectional video over http: design, implementation, and evaluation

Proceedings of the 8th ACM on multimedia systems conference. MMSys’17

Subtle gaze guidance for immersive environments

Proceedings of the ACM symposium on applied perception, SAP ’17

User experience – a research agenda

Behav Inf Technol

Deep 360 pilot: learning a deep agent for piloting through 360∘ sports videos

Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

Efficient live and on-demand tiled HEVC 360 VR video streaming

Proceedings of the IEEE international symposium on multimedia (ISM)

Designing a vibrotactile head-mounted display for spatial awareness in 3D spaces

IEEE Trans Vis Comput Graph

Voice activity detection using an adaptive context attention model

IEEE Signal Process Lett

Special Section on SVR 2019
New interactive strategies for virtual reality streaming in degraded context of use

Deep 360 pilot: learning a deep agent for piloting through 360^∘ sports videos