Quality assessment of multiply and singly distorted stereoscopic images via adaptive construction of cyclopean views

https://doi.org/10.1016/j.image.2021.116175Get rights and content

Highlights

  • Models are built for blind 3D IQA without requiring training on MOS/DMOS values.

  • Quality-compensated MCM is proposed for adaptive construction of cyclopean views.

  • Quality of two monocular views are adaptively combined by a new weighting strategy.

  • MUSIQUE-3D achieves competitive performance as compared with other 3D IQA methods.

Abstract

A challenging problem confronted when designing a blind/no-reference (NR) stereoscopic image quality assessment (SIQA) algorithm is to simulate the quality assessment (QA) behavior of the human visual system (HVS) during binocular vision. An effective way to solve this problem is to estimate the quality of the merged single view created in the human brain which is also referred to as the cyclopean image. However, due to the difficulty in modeling the binocular fusion and rivalry properties of the HVS, obtaining effective cyclopean images for QA is non-trivial, and consequently previous NR SIQA algorithms either require the MOS/DMOS values of the distorted 3D images for training or ignore the quality analysis of the merged cyclopean view. In this paper, we focus on (1) constructing accurate and appropriate cyclopean views for QA of stereoscopic images by adaptively analyzing the distortion information of two monocular views, and (2) training NR SIQA models without requiring the assistance of the MOS/DMOS values in existing databases. Accordingly, we present an effective opinion-unaware SIQA algorithm called MUSIQUE-3D, which blindly assesses the quality of multiply and singly distorted stereoscopic images by analyzing quality degradations of both monocular and cyclopean views. The monocular view quality is estimated by an extended version of the MUSIQUE algorithm, and the cyclopean view quality is computed from the distortion parameter values predicted by a two-layer classification-regression model trained on a large 3D image dataset. Tests on various 3D image databases demonstrate the superiority of our method as compared with other state-of-the-art SIQA algorithms.

Introduction

Recently, the rapid development of virtual reality technology has provided users with the exciting visual experience of stereoscopic/3D content, and consequently various 3D services and applications such as 3D television, 3D video conferencing, 3D cinema, and 3D games, etc. have gained popularity. Normally, these 3D visual contents often go through multiple stages of processing (e.g., image acquisition, compression, transmission, reception, display, etc.) before ultimately being presented to the consumers, and for each stage, various types of distortions can be introduced, negatively impacting the quality of the user’s 3D visual experience. Thus, there is a need for effective and reliable stereoscopic image quality assessment (SIQA) algorithms that can estimate the perceptual quality of the finally observed 3D scene.

Although noticeable progress has been made on developing various kinds of 2D image quality assessment (IQA) algorithms to assess the perceptual quality of 2D images (see [1], [2] for reviews), designing effective SIQA algorithms to automatically assess the perceptual quality of a 3D scene is extremely challenging because many different factors including heterogeneous distortions, mismatched inter-view perceptions, excessive binocular disparity, and inappropriate depth-of-focus can all lead to visual discomfort. Although visual discomfort is certainly an important factor that contributes to the overall 3D quality of experience, another very important factor is image distortion, which has been the focus of most existing works, and which is also the focus of this paper.

Even when considering only a single factor, predicting the quality of a 3D scene is still difficult, because the human visual system (HVS) perceives a 3D scene through two stereoscopic views and judges its quality based mainly on the merged single view created in the brain after complex binocular fusion and rivalry processes. Although such QA behaviors seem to be natural to humans, designing a 3D QA algorithm to mimic this process is non-trivial given that only two view images are available. The difficulty arises first from the inevitable occlusion and border area in the two view images, which is attributed to the two slightly different perspectives of the two cameras in capturing the image [3]. The difficulty also arises from the fact that binocular combinations under different distortion types should be considered. As claimed in [4], the higher quality view that contains sufficient information will help suppress the lower quality view with information-loss distortion (e.g., blurring), while for information-additive distortion (e.g., blockiness), the lower quality view cannot be compensated [5]. Moreover, compared to the single-distortion scenario, the multiple-distortion scenario adds another level of difficulty for 3D QA: the algorithm must not only consider the joint effects of different distortions on the two views, but also consider the influence of these distortions on the binocular combination behaviors, because both monocular and binocular visions provide important information for the HVS to judge the 3D image quality.

Despite these difficulties, there have been a number of SIQA algorithms developed over the recent past, and consequently various kinds of features, models, and frameworks have been proposed. For no-reference (NR) SIQA, the most common approach is to learn regression models which map the quality-related features of the test image to associated quality scores. Features that have been used include natural-scene-statistics (NSS) features based on monocular/cyclopean images [6], [7], [8]; saliency-based binocular features [9]; univariate and bivariate statistical features [10]; binocular quality-aware features based on eye-weighting and contrast-gain-control models [11]; and those extracted by using deep neural networks (e.g., [4], [12], [13]) and autoencoders (e.g., [14]). These are often called “opinion-aware” approaches, because the regression models are trained on distorted images along with their human subjective rating scores. Consequently, their applicability is restricted by the limited number of existing 3D image quality databases. In comparison, much fewer “opinion-unaware” approaches have been presented. The main idea of this type of approach is to estimate quality differences between distorted images and pristine images by using the various quality-related features and measurements, such as the BRISQUE [15] features and the Mahalanobis distance measurement of multivariate Gaussian models adopted in [16], the amplitude/phase difference features and the visual-codebook-based quality-lookup method adopted in [17], [18], the label consistent K-singular value decomposition classification framework adopted in [19], etc.

Although the aforementioned NR SIQA algorithms are effective, most of them were originally designed to work only for singly-distorted stereoscopic images (SDSIs), while in practice a stereoscopic image can be simultaneously contaminated by multiple distortions during the multiple stages of processing. With the recently developed multiply-distorted stereoscopic image quality database (i.e., the NBU-MDSID database [20], [21]), the NR SIQA topic on multiply-distorted stereoscopic images (MDSIs) has begun to receive increased attention. However, due to the aforementioned difficulties introduced by multiple distortions, only a few related works have been reported. For example, Shao et al. [20] proposed a multi-model joint sparse representation framework based on learning modality specific dictionaries and projection matrices from singly-distorted images. Later, Shao et al. [21] proposed another multi-model sparse representation framework which uses a local phase and amplitude description for dictionary learning and employs a multi-stage pooling strategy for quality estimation. Jiang et al. [3] proposed a unified NR quality evaluator for SDSIs and MDSIs based on learning monocular and binocular local visual primitives in order to characterize the local receptive field properties of the visual cortex.

Indeed, these three algorithms pioneered the progress in the field of NR multiply-distorted SIQA; however, they all suffer from certain limitations. For example, both Shao’s two sparse-representation-based methods follow the traditional SIQA pipeline that the quality of the two monocular views are evaluated separately, followed by a linear combination that collapses the two quality estimates into one final quality score. Without analyzing the quality degradation of the merged view perceived by the HVS, both algorithms are unable to interpret how the binocular fusion and rivalry behaviors operate when symmetrically and asymmetrically distorted stereopairs are viewed, and thus do not fully mimic the intrinsic mechanism of the HVS in judging the visual quality of stereoscopic scenes. Although Jiang’s method takes into account binocular vision by incorporating a similar cyclopean framework as [22], the method operates in an “opinion-aware” manner, and thus is also restricted by the limited quantity of training data. Finally, all three algorithms consider only three distortion types (white noise, Gaussian blur, and JPEG compression) and their combinations, while another common distortion type, JPEG2000 compression, is not included.

By summarizing all existing SIQA approaches, we argue that a grand challenge for effective blind QA of MDSIs and SDSIs lies in solving three fundamental problems: (1) how to assess the quality of two monocular views corrupted by multiple distortions without requiring training the IQA models on human subjective rating scores; (2) how to construct accurate and appropriate cyclopean views to simulate the merged single view created in the human brain by using only stereopairs; and (3) how to combine quality estimates corresponding to monocular and binocular visions into one scalar to represent the overall 3D image quality. One promising approach to build IQA models without training on human subjective rating scores is our recently developed MUSIQUE algorithm [23], which decouples the QA task into two subtasks: (1) estimation of distortion parameters from input distorted images, and (2) estimation of quality from the estimated parameters. Motivated by this approach, as well as to overcome the potential limitations encountered by the current NR SIQA works, we present in this paper an effective opinion-unaware SIQA algorithm called MUSIQUE-3D to blindly assess the quality of both MDSIs and SDSIs.

Specifically, MUSIQUE-3D operates via three main stages as shown in Fig. 1. In the first stage, the quality of the two monocular views are estimated separately by an extension of the MUSIQUE algorithm to take into account four common distortion types (white noise, Gaussian blur, JPEG compression, and JPEG2000 compression) and their combinations. To this end, we present a more advanced classification framework trained on a large dataset of 2D images to distinguish among nine different distortion cases. We also present a more advanced quality-fusion strategy which adaptively addresses the joint effects of the four distortion types by considering the masking effect caused particularly by noise.

In the second stage, the quality of the cyclopean view is estimated based on modeling the most crucial properties of the HVS in 3D viewing. Specifically, intermediate maps corresponding to the luminance and the pixel-based contrast are generated based on an optical flow algorithm which is employed to compute a disparity map, and based on a quality-compensated multipathway contrast gain-control model (QC-MCM) which is employed to model the binocular fusion and rivalry behaviors of the HVS in viewing symmetrically and asymmetrically distorted 3D images. The cyclopean view quality is then estimated by a cyclopean IQA framework, which contains classification and regression models trained on a large dataset of 3D images to predict the distortion parameter values of the two intermediate maps (cyclopean luminance and contrast images).

In the final stage, the two quality estimates obtained from the two monocular views are combined, which is then incorporated by the cyclopean view quality obtained in the second stage to yield the overall quality estimate of the stereoscopic image. To this end, we propose a new combination strategy that adaptively merges the quality estimates of the left and right views based not only on the contrast of each view, but also on analyzing whether or not the lower quality view can be compensated by the other when the two view images share similar perceived contrast. As we will demonstrate, all stages together allow MUSIQUE-3D to achieve better/competitive quality predictive performance as compared with many other FR/NR IQA algorithms on various 3D image quality databases.

The rest of the paper is organized as follows. Section 2 describes details of the proposed MUSIQUE-3D algorithm. Section 3 analyzes the performance of MUSIQUE-3D on various multiply and singly distorted stereoscopic image databases. General conclusions are presented in Section 4.

Section snippets

Algorithm

The proposed MUSIQUE-3D algorithm is based on the assumption that the overall perceptual quality of a 3D scene can be evaluated by combining the two monocular view qualities and the merged binocular view quality. Thus, MUSIQUE-3D operates via three main stages as mentioned in Section 1: (1) MUSIQUE-based QA of the two monocular views; (2) QC-MCM-based QA of the cyclopean view; and (3) combination of the monocular and cyclopean views’ qualities to yield the final quality score of the

Results

In this section, we analyze MUSIQUE-3D’s ability to predict image quality by using various multiply and singly distorted stereoscopic image quality databases. We also compare the performance of MUSIQUE-3D with other FR and NR SIQA algorithms.

Conclusion

This paper presented an opinion-unaware NR algorithm for quality assessment of multiply and singly distorted stereoscopic images via adaptive construction of cyclopean views. Our method, called MUSIQUE-3D, operates under the principle that the quality degradation of a 3D image can be represented by the distortion parameters of both the monocular and cyclopean views, and can thereby lead to an improved estimate of quality. Accordingly, MUSIQUE-3D contains three main stages: (1) MUSIQUE-based

CRediT authorship contribution statement

Yi Zhang: Conceptualization, Methodology, Software, Investigation, Writing - original draft. Damon M. Chandler: Software, Data curation, Visualization, Formal analysis, Writing - review & editing. Xuanqin Mou: Validation, Resources, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (NSFC) (Grant No. 61901355), the Chinese Postdoctoral Science Foundation (Grant No. 2018M640991), and the National Key Research and Development Program of China (Grant No. 2016YFA0202003).

References (54)

  • JiangQ. et al.

    Unified no-reference quality assessment of singly and multiply distorted stereoscopic images

    IEEE Trans. Image Process.

    (2019)
  • ShaoF. et al.

    Toward a blind deep quality evaluator for stereoscopic images based on monocular and binocular interactions

    IEEE Trans. Image Process.

    (2016)
  • SeuntiensP. et al.

    Perceived quality of compressed stereoscopic images: Effects of symmetric and asymmetric jpeg coding and camera separation

    ACM Trans. Appl. Percept. (TAP)

    (2006)
  • ChenM.-J. et al.

    No-reference quality assessment of natural stereopairs

    IEEE Trans. Image Process.

    (2013)
  • SuC.-C. et al.

    Oriented correlation models of distorted natural images with application to natural stereopair quality evaluation

    IEEE Trans. Image Process.

    (2015)
  • XuX. et al.

    No-reference stereoscopic image quality assessment based on saliency-guided binocular feature consolidation

    Electron. Lett.

    (2017)
  • OhH. et al.

    Blind deep S3D image quality evaluation via local to global feature aggregation

    IEEE Trans. Image Process.

    (2017)
  • MittalA. et al.

    No-reference image quality assessment in the spatial domain

    IEEE Trans. Image Process.

    (2012)
  • ZhouW.-J. et al.

    Utilizing binocular vision to facilitate completely blind 3D image quality measurement

    Signal Process.

    (2016)
  • ShaoF. et al.

    Blind image quality assessment for stereoscopic images using binocular guided quality lookup and visual codebook

    IEEE Trans. Image Process.

    (2015)
  • ShaoF. et al.

    Learning receptive fields and quality lookups for blind quality assessment of stereoscopic images

    IEEE Trans. Cybern.

    (2016)
  • ShaoF. et al.

    Toward domain transfer for no-reference quality prediction of asymmetrically distorted stereoscopic images

    IEEE Trans. Circuits Syst. Video Technol.

    (2018)
  • ShaoF. et al.

    Learning sparse representation for blind quality assessment of multiply distorted stereoscopic images

    IEEE Trans. Multimed.

    (2017)
  • ShaoF. et al.

    Multistage pooling for blind quality prediction of asymmetric multiply-distorted stereoscopic images

    IEEE Trans. Multimed.

    (2018)
  • ZhangY. et al.

    Opinion-unaware blind quality assessment of multiply and singly distorted images via distortion parameter estimation

    IEEE Trans. Image Process.

    (2018)
  • MartinD. et al.

    A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics

  • ScharsteinD. et al.

    High-resolution stereo datasets with subpixel-accurate ground truth

  • Cited by (1)

    View full text