A new reduced reference metric for color plus depth 3D video

https://doi.org/10.1016/j.jvcir.2013.12.009Get rights and content

Highlights

  • We propose a reduced reference quality metric for 3D videos.

  • This metric employs structural and statistical measures of 3D videos.

  • This metric is broadly validated in presence of compression or transmission artifacts.

  • This metric can be employed in estimating QoE of compressed or transmitted 3D videos.

Abstract

A new reduced reference (RR) objective quality metric for 3D video is proposed that incorporates spatial neighboring information. The contrast measures from gray level co-occurrence matrices (GLCM) for both color and depth sections are main parts of spatial information. Side information is extracted from edge properties of reference 3D video and sent through an auxiliary channel. The other important factor in the proposed metric is the unequal weight of color and depth sections, which can maximize the performance of the proposed metric for some specific values. Performance of the proposed metric is validated through series of subjective tests. For validations, compression and transmission artifacts are considered. The average correlation of the proposed metric and subjective quality scores is 0.82 for compressed 3D videos when color to depth importance ratio is near 0.8. This measure for transmitted 3D videos is 0.857 for the same value of color to depth importance ratio.

Introduction

Evaluating quality of 3D videos can be performed subjectively or objectively. Subjective evaluation needs real human observers to grade every video they view. To avoid this tedious and expensive process, objective quality metrics are considered to estimate human assessment of video quality. Objective quality metrics are more reliable, if they highly correlate to human perception of video. Objective quality metrics, as numerical models, are not vulnerable to human factors like tiredness.

Rendered 3D video can be represented with two 2D sections: color video and depth (z) video or left and right views. Commonly, current 2D quality metrics like Structural Similarity Index Metric (SSIM) [1], ANSI video quality metric (VQM) [2] and Moving Picture Quality Metric (MPQM) [3] are applied to each 2D section to evaluate 3D video quality. Using 2D objective metrics for quality assessment of 3D videos has some drawbacks like they do not consider the importance of depth perception. Furthermore, having reduced reference and no reference metrics exclusively for 3D videos is vigorous for real-time applications. In reduced reference (RR) and no reference (NR) metrics, there is a limited or no access to the reference video [4], [5]. In RR, some partial information of the reference signal is available through an auxiliary channel. Side information is not available in NR calculations [4], [6]. Due to unquantifiable properties in human understanding of quality, NR metrics are similar to blind measuring of the distortion in the decoded video [4], [5].

Other categories of metrics include: statistical, structural, and psychophysical metrics based on human visual system (HVS). Statistical measures are like mean squared error (MSE) and PSNR. Previous works on measures like PSNR or MSE show that they could not model human eye observation [7]. Structural measures can be used under NR scheme, like blocking or blurring distortion measurements. For psychophysical metrics, one should consider human visual system (HVS) structure. This structure mainly consists of contrast sensitivity function, masking, frequency selection, color perception and pooling [8], [9], [10].

One of the major FR-structural metrics is structural similarity index measure (SSIM), which is based on HVS capability of extracting structural information [1]. This metric can be calculated over a window of size N × N for luminance component and is a decimal value between −1 and 1, and it gets 1 if x and y are identical. One of the main FR-psychophysical objective metric is video quality metric (VQM) which is developed by the Institute of Telecommunications Sciences (ITS) and American National Standard Institute (ANSI) [2]. VQM is capable of closely estimating subjective quality ratings from real human viewers [11]. VQM measures perceptual effects of video impairments like noise, blocking, blurring, and unnatural motions and combines them into a single metric [2]. VQM calculations involve extracting perception-based (HVS) features. These features are combined to form VQM metric. Some important perception based parameters like: loss of spatial information (blur), shift of edges from vertical or horizontal to diagonal orientations, shift of edges from diagonal to horizontal or vertical (blocking), changes in distribution of U-V samples, amount of motion in video, and local color impairments are used to extract this feature [2].

In RR and NR metrics, there is a limited or no access to reference video. For RR metrics, only partial information of reference signal is available through an auxiliary channel. Side information of reference signal is sent to RR quality measurement calculator as well as decoded video to measure RR metrics. Side information is not available in NR calculations [6]. Different approaches exist in RR quality metrics. Transmitter can send temporal or spatial information of reference video over an auxiliary channel. Another approach to use side information is to put hidden pattern of bits in video and encode the video; these bits should not degrade video quality. RR data line carries these bits and at the RR quality calculator, error between original hidden bits and decoded bits will reveal a measure of quality.

Watermark can be used to measure RR quality grade. In this case, degradation of watermark quality in decoder section implies quality decreases in video. Previous research on hidden information and watermark can be found in [12], [13]. Measuring NR metrics is harder than RR since no information from reference signal and its attributes is available in receiver side. In this case, more perception-based parameters interfere in understanding of quality [14], [15], [16], [17]. Due to unquantifiable parameters in human understanding of quality, NR metrics formed into measuring of distortion in received signal. Distortions can occur in capturing, coding, transmission, decoding and presentation stages. It has been reported in [4] that distortions based on HVS and considering scene statistics can outperform others like blurring or blocking. Video quality expert group (VQEG) is considering the standardization of NR and RR metrics mostly for block DCT-based video compressions [4].

Amongst NR metrics, blocking and blurring measures for color images have attracted many research activities. Blocking occurs in block based coding schemes. They appear in block borders as discontinuities or shift in edges along blocks. There are different methods to measure this artifact [14], [15], [18]. The approach in [14] is to find discontinuities along vertical or horizontal lines (borders). Blurriness or luminance bleeding is another artifact caused during compression. This is due to bigger values for quantization parameter (QP).

For 3D videos, different objective quality metrics can be derived from color images, left and right views. Applicable depth map objective quality metrics are PSNR and SSIM. In other words, applicable metrics to depth maps are statistical or structural metrics while for color maps psychophysical metrics are also applicable. In order to see capabilities of objective metrics, they should be validated through subjective tests, where better fitting metrics correlates subjective scores more. In [19], it has been presented that among FR metrics VQM is more capable in modeling perception-based quality of 3D videos. 2D Objective metrics even if they fit HVS understanding of video quality cannot guarantee human observer satisfaction of 3D video. Different unquantifiable parameters change the way human judge 3D video quality. Parameters like aesthetic, cognitive relevance, ambient light and eye comfort are not included in any of those objective metrics. However, for the best practice to model QoE these attributes should be measured and considered.

Authors in [20] suggested a FR metric for compressed videos that correlates HVS. In [21], a compound FR metric is introduced to operate for encoded 3D video. For transmission and compression, a reduced reference metric is introduced in [22]. Edge information for color and depth videos are extracted and utilized to construct the metric in [22]. In [23], VQM [2] has been extended to model 3D video quality considering ambient illumination. In aforesaid metrics, spatial neighboring information, which is very vulnerable during transmission, is not considered. Moreover, depth importance ratio in overall 3D perception needs proper investigation to construct a new metric.

In this paper, a new RR 3D video quality metric is proposed that highly correlates overall 3D quality perceived at receiver side. Spatial neighboring information and edge information are extracted to formulate the proposed metric. Gray level co-occurrence matrices [24], [25] and their contrast features for color and depth maps are fundamental parts of the proposed metric. The proposed metric is a linear combination of spatial information elements. Furthermore, edge information for original color and depth views, as intrinsic features, are linear coefficients to formulate the proposed metric. The other feature used in computation of the proposed metric is unequal weights of color and depth videos. The proposed metric is verified extensively over 3D video data sets which are presented in Section 2. Subjective scores are gathered through subjective assessment methodology for video quality (SAMVIQ [26]) and are presented in Section 2.

Section snippets

Data set

In this section, two type of common artifacts are considered for 3D videos: (a) compression artifacts that occur during encoding and (b) transmission artifacts that are due to stochastic properties of transmission channel. In both cases, several 3D videos in the form of color plus depth are considered. In this paper, 3D videos1 are in different types of motion, depth perception, number of objects, color palette

The proposed metric

In this section, the technique for calculating the proposed metric is explained. The proposed metric is based on two set of features for each color and depth sections: (1) neighboring properties and (2) edge properties. Neighboring properties change during encoding or transmission and can be a primary measure for distortion. By extracting neighboring characteristics before and after encoding/transmission a simple fidelity metric can be constructed. The other important aspect is the way of

Metric validation

In this section, computed metric for two sets of data is validated. First, SVQM is validated over compressed data set to see how it correlates subjective scores. In another experiment, SVQM is tested over transmitted data set. As stated earlier, color weight is bigger than the depth weight. For this purpose, we started our experiments with χ values between 0.6 to 0.95. However, only interesting points where we have better performance are presented in the following subsections. Moreover, at the

Conclusions

In this paper, pixel wise neighboring and edge information are used to construct the proposed RR 3D video quality metric. Neighboring pixel values change during encoding and can be used as a measure of distortion. To model neighboring information, contrast features from GLCM are computed. Contrast feature should be computed for all frames of original and compressed video. GLCM contrast measures and edge information of original frames are transferred to receiver side. The other component in the

References (35)

  • Zhou Wang et al.

    Video quality assessment based on structural distortion measurement

    Signal Process. Image commun.

    (2004)
  • Margaret H. Pinson et al.

    A new standardized method for objectively measuring video quality

    IEEE Trans. Broadcast.

    (2004)
  • Perceptual quality measure using a spatio-temporal model of the human visual...
  • Winkler Stefan

    Digital Video Quality: Vision Models and Metrics

    (2005)
  • Arthur A. Webster et al.

    An objective video quality assessment system based on human perception

    SPIE Human Vision Visual Process. Digital Disp. IV

    (1993)
  • B. Girod

    Whats wrong with mean-squared error?

    (1993)
  • C.J. van den Branden Lambrecht, Perceptual models and architectures for video coding applications (Ph.D. thesis),...
  • C.J. van den Branden Lambrecht, A working spatio-temporal model of the human visual system for image restoration and...
  • J. Lubin

    The use of psychophysical data and models in the analysis of display system performance

  • S. Wegner

    H.264 AVC over IP

    IEEE Trans. Circuit syst. Video Technol.

    (2003)
  • R.K. Osamu Sugimoto, Masahiro Wada, Shuichi Matsumoto, Objective measurement scheme for perceived picture quality...
  • M.C.Q. Farias, S.K. Mitra, M. Carli, A. Neri, A comparison between an objective quality measure and the mean annoyance...
  • Zhou Wang, Alan C. Bovik, B. L. Evan, Blind measurement of blocking artifacts in images, in: Int. Conf. Image Process.,...
  • A. C. Bovik, L. Shizhong, DCT-domain blind measurement of blocking artifacts in DCT-coded images, in: IEEE Int. Conf....
  • P. Gastaldo, S. Rovetta, and R. Zunino, Objective assessment of MPEG-video quality: a neural-network approach, in: Int....
  • M. Knee, A robust, efficient and accurate single-ended picture quality measure for MPEG-2, VQEG,...
  • Cited by (12)

    • Stereoscopic video quality assessment based on 3D convolutional neural networks

      2018, Neurocomputing
      Citation Excerpt :

      As a consequence, some methods were proposed to measure the 3D video perceptual quality by applying both 2D and 3D information extracted from stereoscopic video. For example, Malekmohamadi et al. [16] proposed a RR method that encoded spatial neighboring information from gray level co-occurrence matrices for both color and depth sections. In [17], left-right views quality metric and depth perception metric were designed and pooled into SVQA score.

    • A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain

      2017, Information Sciences
      Citation Excerpt :

      Hewage et al. [14] proposed a 3D video quality metric by evaluating two views and analyzing the edges and contours of the depth map. A similar RR-SVQA metric was proposed by Malekmohamadi et al. [25], they extracted side information from edge properties and gray level co-occurrence matrices from color and depth sections. Furthermore, Zhu et al. [49] also presented a SVQA method by taking two quality metrics into account.

    • Binocular perception based reduced-reference stereo video quality assessment method

      2016, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Hewage et al. proposed a RR 3D image quality metric by extracting edges and contours of depth map [12]. Malekmohamadi et al. proposed a RR quality metric for 3D video by extracting side information from edge properties and gray level co-occurrence matrices from color and depth sections [13]. Taking into account the spatial information in video quality assessment (VQA), Ma et al. proposed a RR VQA model [14], which not only considers spatial statistical characteristics of video, but also temporal statistical characteristics.

    • Modern trends on quality of experience assessment and future work

      2019, APSIPA Transactions on Signal and Information Processing
    View all citing articles on Scopus
    View full text