Special Section on SVR 2019New interactive strategies for virtual reality streaming in degraded context of use
Graphical abstract
Introduction
Contents and equipment’s for Virtual Reality (VR) have been developing fast in the last couple of years, both from a technological and commercial point of view. The technology is benefiting from major progresses in VR headset design (such as the announced Google-LG new 18-megapixel display) and compression [32]. From a business perspective the sales of VR headsets are foreseen to reach a yearly 40 million in 2022 and the market $215B [9]. With games and AR applications, cinematic contents and 360∘ videos in particular are important elements in the range of immersive contents. These are spherical videos which are meant to be watched in a VR headset for the user to get immersed into the content’s world. They open new perspectives for story-telling, journalism or education.
As it is currently the case for regular videos, their preferred mode of consumption will remain Internet streaming. However, a major obstacle to stream 360∘ videos is their required data rate, or bandwidth. Owing to the distance between the user’s eye and the screen when wearing a VR headset, the data rate must be two orders of magnitude higher than that of 4K videos. Given the resolution of the human fovea, a full impression of reality from the sight would require 5 Gpbs, even with the latest H.265 video coding standard [4]. These data rates are not available in standard Internet accesses, and the network challenges entailed by massive distribution of immersive content are substantial.
A major question therefore arises: how to stream immersive content under limited bandwidth? This article contributes in this direction. The general principle in existing research is to send in high quality (i.e., with high encoding rates) the sector of the video the user faces, and the rest in lower quality. This therefore makes the transmission decisions dependent on the user’s behavior in the virtual environment. Deciding which part of the sphere to send in high quality from the remote streaming server hence requires to predict the future user’s Field of View (FoV). Such prediction is only partly possible over very short time horizons (order of a second or less) owing to the complex dependency on previous motion and content, and inherent randomness [33]. For a given constrained bandwidth, the greater the discrepancy between the bandwidth and the highest video rate, the narrower the sector sent in highest quality, and the greater the probability the user will face a low quality sector.
This article investigates a radically new stance on the problem: assuming that the goal of an immersive experience is to make the user feel as in a real-world thanks to the sight, and given the impact of visual degradation on the vestibular system (as compared with watching a regular screen) and the feeling of presence, we posit that degrading the visual quality is not the only way to reduce the required data rate, and not necessarily the best choice. Based on the knowledge of the human attentional process, we identify new dimensions in which to impair the content to absorb the lack of bandwidth, complementarily to the visual quality. Specifically, we design two types of impairments and show that, when triggered in proper time periods, they can be better perceived than visual quality degradation from video compression, for the same amount of data to transfer.
Contributions:
- •
We introduce two new types of impairments, named Virtual Walls (VWs) and Slow Downs (SDs), to improve the experience of 360∘ video streaming under limited bandwidth.We implement them in a streaming player compliant with the Spatial Relationship Description (SRD) amendment to the MPEG-DASH (Dynamic Adaptive Streaming over HTTP) standard for 360∘ video streaming.
- •
We carry out user experiments with 18 users and 11 video scenes to identify whether VWs and SDs are alternative impairments acceptable to the users and that can improve the level of experience compared with quality adaptation alone. We use a double-stimulus approach to have every VW and SD versions compared with a reference version (both versions consume the exact same data rate). The video content represents different categories and comes from reference datasets.
- •
The results show that both VW and SD impairments are generally preferred by the users over the compression-only reference. A thorough analysis of quantitative subjective assessments and objective metrics (head motion collected from logs) enables to understand the important factors involved in the user’s preference. Standardized SUS and AttrakDiff questionnaires confirm the acceptability of our approach.
- •
Finally, we assess the gain in streaming performance VW and SD can bring to different FoV-based adaptation logics more or less prioritizing buffering over responsiveness to head motion. We confirm with network simulations the usefulness of these new types of impairments: incorporated into a FoV-based adaptation, they can enable reduction in stalls and startup delay, and increase quality in FoV, even in the presence of substantial playback buffers.
In our concern for reproducibility, the code made and the user experimental data collected for this work are made publicly available at [39], [40].
The article is organized as follows. Section 2 presents related works. Section 3 introduces and motivates the proposed impairments. Section 4 details the experimental protocol. Section 5 analyzes the results of the user experiments, while Section 6 presents network simulations. Finally, we discuss some of the questions raised by our approaches, including important perspectives, in Section 7, and give conclusions in Section 8.
Section snippets
Related works
We review below four core aspects for our goal: the main recent findings on attentional behavior in VR, then the general classes of attention guidance techniques, the perception of slow motion and finally how the problem of streaming VR has been tackled so far.
Sitzmann et al. in [44] provide an extensive study (involving 169 users) of how do people explore in static VR environment (i.e., 360∘ images). They show that the average exploration time, that is the time a user takes to scan the entire
New types of impairments: VW and SD
We first present elements on the phases of the human attention when watching a 360∘ video, before introducing the new types of impairments we propose, each aimed at being used in one of the phases.
Hypothesis and experimental protocol
This section details the specific hypotheses we make on the VW and the SD impairments, as well as the evaluation of their overall usability and user experience. This evaluation is made using a double-stimulus approach following the guidelines of the International Telecommunications Union (ITU) [47]. We use standard and ad hoc questionnaires with specific metrics the users are asked to score. This evaluation is completed by the analysis of the head motion logs recorded during the experiments.
In
Results
We first analyze the results of the user experiments for VW, then for SD. We show in which extent they can confirm hypotheses H1 and H2. We analyze the importance of each factor (visual quality, responsiveness or comfort scores) in the expressed preference. The last part analyzes the results of the SUS and AttrakDiff questionnaires.
System-level impact of VW and SD
The previous section has shown results of the user experiments that were aimed at verifying whether VW and SD are alternative impairments acceptable to the users and that can improve the level of experience compared with compression alone. These alternative impairments are made to help support usage in limited bandwidth, and the experiments were made for typical scenarios where these impairments are envisioned to help (counter quality degradation in the startup exploration phase for SD or in
Discussion
VW and SD will be particularly useful when there is a significant discrepancy between the available bandwidth and the bitrate of the highest quality of the sphere: the higher this discrepancy, the narrower the area where the quality can be maximum. This discrepancy will worsen with future headsets with significantly increased resolution (such as the newly released Varjo with 50 megapixels per eye). Resorting to SD and VW will enable to increase this area. This has been echoed by the findings of
Conclusions and future works
This article has identified two new types of impairments to help streaming VR videos under limited bandwidth. We have built on the recent characterization of human attention in VR to introduce Virtual Walls and Slow Down, which we show to be well-accepted and useful to improve the level of experience compared with quality adaptation alone. The SD and VW impairments are complementary in that they are meant to apply to different types of scenes (exploration and concentrated focus, respectively).
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work has been supported by the French government, through the UCA JEDI and EUR DS4H Investments in the Future projects ANR-15-IDEX-0001 and ANR-17-EURE-0004.
References (53)
- et al.
The prefetch aggressiveness tradeoff in 360 degree video streaming
Proceedings of the 9th ACM multimedia systems conference. MMSys ’18
(2018) - et al.
Rats: Adaptive 360-degree live streaming
Proceedings of the 10th ACM multimedia systems conference. MMSys ’19
(2019) - et al.
Determining what individual SUS scores mean: adding an adjective rating scale
J Usability Stud
(2009) - et al.
Toward interconnected virtual reality: opportunities, challenges, and enablers
IEEE Commun Mag
(2017) - et al.
The influence of video quality on perceived audio quality and vice versa
J Audio Eng Soc
(1999) SUS – A quick and dirty usability scale
- et al.
Optimized adaptive streaming of multi-video stream bundles
IEEE Trans Multimed
(2017) - Caruso E.M., Burns Z.C., Converse B.. Slow motion increases perceived intent. Proc Natl Acad Sci USA...
- Corporation I.D.. Demand for augmented reality/virtual reality headsets expected to rebound in 2018. 2018. Industry...
- et al.
Learning a time-dependent master saliency map from eye-tracking data in videos
(2017)
TOUCAN-VR
Software
Film editing: new levers to improve VR streaming
Proceedings of the 9th ACM multimedia systems conference. MMSys ’18
A dataset of head and eye movements for 360 degree videos
Proceedings of the 9th ACM multimedia systems conference, MMSys ’18
Viewpoint snapping to reduce cybersickness in virtual reality
Graph Interfaces
Director’s cut – analysis of aspects of interactive storytelling for VR films
Tiling in interactive panoramic video: approaches and evaluation
IEEE Trans Multimed
Adaptive playout for low latency video streaming
Proceedings of the IEEE international conference on image processing (ICIP)
Towards bandwidth efficient adaptive streaming of omnidirectional video over http: design, implementation, and evaluation
Proceedings of the 8th ACM on multimedia systems conference. MMSys’17
Subtle gaze guidance for immersive environments
Proceedings of the ACM symposium on applied perception, SAP ’17
User experience – a research agenda
Behav Inf Technol
Deep 360 pilot: learning a deep agent for piloting through 360∘ sports videos
Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Efficient live and on-demand tiled HEVC 360 VR video streaming
Proceedings of the IEEE international symposium on multimedia (ISM)
Designing a vibrotactile head-mounted display for spatial awareness in 3D spaces
IEEE Trans Vis Comput Graph
Voice activity detection using an adaptive context attention model
IEEE Signal Process Lett
Cited by (15)
How good are virtual hands? Influences of input modality on motor tasks in virtual reality
2023, Journal of Environmental PsychologyEditorial Note
2020, Computers and Graphics (Pergamon)Foreword to the Special Section on the Symposium on Virtual and Augmented Reality 2019 (SVR 2019)
2020, Computers and Graphics (Pergamon)A Novel Approach for Scalable and Sustainable 6G Networks
2024, IEEE Open Journal of the Communications SocietyDesigning and developing a virtual reality escape game for youth vocational rehabilitation
2024, International Social Work