Journal of Visual Communication and Image Representation
A new reduced reference metric for color plus depth 3D video
Introduction
Evaluating quality of 3D videos can be performed subjectively or objectively. Subjective evaluation needs real human observers to grade every video they view. To avoid this tedious and expensive process, objective quality metrics are considered to estimate human assessment of video quality. Objective quality metrics are more reliable, if they highly correlate to human perception of video. Objective quality metrics, as numerical models, are not vulnerable to human factors like tiredness.
Rendered 3D video can be represented with two 2D sections: color video and depth (z) video or left and right views. Commonly, current 2D quality metrics like Structural Similarity Index Metric (SSIM) [1], ANSI video quality metric (VQM) [2] and Moving Picture Quality Metric (MPQM) [3] are applied to each 2D section to evaluate 3D video quality. Using 2D objective metrics for quality assessment of 3D videos has some drawbacks like they do not consider the importance of depth perception. Furthermore, having reduced reference and no reference metrics exclusively for 3D videos is vigorous for real-time applications. In reduced reference (RR) and no reference (NR) metrics, there is a limited or no access to the reference video [4], [5]. In RR, some partial information of the reference signal is available through an auxiliary channel. Side information is not available in NR calculations [4], [6]. Due to unquantifiable properties in human understanding of quality, NR metrics are similar to blind measuring of the distortion in the decoded video [4], [5].
Other categories of metrics include: statistical, structural, and psychophysical metrics based on human visual system (HVS). Statistical measures are like mean squared error (MSE) and PSNR. Previous works on measures like PSNR or MSE show that they could not model human eye observation [7]. Structural measures can be used under NR scheme, like blocking or blurring distortion measurements. For psychophysical metrics, one should consider human visual system (HVS) structure. This structure mainly consists of contrast sensitivity function, masking, frequency selection, color perception and pooling [8], [9], [10].
One of the major FR-structural metrics is structural similarity index measure (SSIM), which is based on HVS capability of extracting structural information [1]. This metric can be calculated over a window of size N N for luminance component and is a decimal value between −1 and 1, and it gets 1 if x and y are identical. One of the main FR-psychophysical objective metric is video quality metric (VQM) which is developed by the Institute of Telecommunications Sciences (ITS) and American National Standard Institute (ANSI) [2]. VQM is capable of closely estimating subjective quality ratings from real human viewers [11]. VQM measures perceptual effects of video impairments like noise, blocking, blurring, and unnatural motions and combines them into a single metric [2]. VQM calculations involve extracting perception-based (HVS) features. These features are combined to form VQM metric. Some important perception based parameters like: loss of spatial information (blur), shift of edges from vertical or horizontal to diagonal orientations, shift of edges from diagonal to horizontal or vertical (blocking), changes in distribution of U-V samples, amount of motion in video, and local color impairments are used to extract this feature [2].
In RR and NR metrics, there is a limited or no access to reference video. For RR metrics, only partial information of reference signal is available through an auxiliary channel. Side information of reference signal is sent to RR quality measurement calculator as well as decoded video to measure RR metrics. Side information is not available in NR calculations [6]. Different approaches exist in RR quality metrics. Transmitter can send temporal or spatial information of reference video over an auxiliary channel. Another approach to use side information is to put hidden pattern of bits in video and encode the video; these bits should not degrade video quality. RR data line carries these bits and at the RR quality calculator, error between original hidden bits and decoded bits will reveal a measure of quality.
Watermark can be used to measure RR quality grade. In this case, degradation of watermark quality in decoder section implies quality decreases in video. Previous research on hidden information and watermark can be found in [12], [13]. Measuring NR metrics is harder than RR since no information from reference signal and its attributes is available in receiver side. In this case, more perception-based parameters interfere in understanding of quality [14], [15], [16], [17]. Due to unquantifiable parameters in human understanding of quality, NR metrics formed into measuring of distortion in received signal. Distortions can occur in capturing, coding, transmission, decoding and presentation stages. It has been reported in [4] that distortions based on HVS and considering scene statistics can outperform others like blurring or blocking. Video quality expert group (VQEG) is considering the standardization of NR and RR metrics mostly for block DCT-based video compressions [4].
Amongst NR metrics, blocking and blurring measures for color images have attracted many research activities. Blocking occurs in block based coding schemes. They appear in block borders as discontinuities or shift in edges along blocks. There are different methods to measure this artifact [14], [15], [18]. The approach in [14] is to find discontinuities along vertical or horizontal lines (borders). Blurriness or luminance bleeding is another artifact caused during compression. This is due to bigger values for quantization parameter (QP).
For 3D videos, different objective quality metrics can be derived from color images, left and right views. Applicable depth map objective quality metrics are PSNR and SSIM. In other words, applicable metrics to depth maps are statistical or structural metrics while for color maps psychophysical metrics are also applicable. In order to see capabilities of objective metrics, they should be validated through subjective tests, where better fitting metrics correlates subjective scores more. In [19], it has been presented that among FR metrics VQM is more capable in modeling perception-based quality of 3D videos. 2D Objective metrics even if they fit HVS understanding of video quality cannot guarantee human observer satisfaction of 3D video. Different unquantifiable parameters change the way human judge 3D video quality. Parameters like aesthetic, cognitive relevance, ambient light and eye comfort are not included in any of those objective metrics. However, for the best practice to model QoE these attributes should be measured and considered.
Authors in [20] suggested a FR metric for compressed videos that correlates HVS. In [21], a compound FR metric is introduced to operate for encoded 3D video. For transmission and compression, a reduced reference metric is introduced in [22]. Edge information for color and depth videos are extracted and utilized to construct the metric in [22]. In [23], VQM [2] has been extended to model 3D video quality considering ambient illumination. In aforesaid metrics, spatial neighboring information, which is very vulnerable during transmission, is not considered. Moreover, depth importance ratio in overall 3D perception needs proper investigation to construct a new metric.
In this paper, a new RR 3D video quality metric is proposed that highly correlates overall 3D quality perceived at receiver side. Spatial neighboring information and edge information are extracted to formulate the proposed metric. Gray level co-occurrence matrices [24], [25] and their contrast features for color and depth maps are fundamental parts of the proposed metric. The proposed metric is a linear combination of spatial information elements. Furthermore, edge information for original color and depth views, as intrinsic features, are linear coefficients to formulate the proposed metric. The other feature used in computation of the proposed metric is unequal weights of color and depth videos. The proposed metric is verified extensively over 3D video data sets which are presented in Section 2. Subjective scores are gathered through subjective assessment methodology for video quality (SAMVIQ [26]) and are presented in Section 2.
Section snippets
Data set
In this section, two type of common artifacts are considered for 3D videos: (a) compression artifacts that occur during encoding and (b) transmission artifacts that are due to stochastic properties of transmission channel. In both cases, several 3D videos in the form of color plus depth are considered. In this paper, 3D videos1 are in different types of motion, depth perception, number of objects, color palette
The proposed metric
In this section, the technique for calculating the proposed metric is explained. The proposed metric is based on two set of features for each color and depth sections: (1) neighboring properties and (2) edge properties. Neighboring properties change during encoding or transmission and can be a primary measure for distortion. By extracting neighboring characteristics before and after encoding/transmission a simple fidelity metric can be constructed. The other important aspect is the way of
Metric validation
In this section, computed metric for two sets of data is validated. First, SVQM is validated over compressed data set to see how it correlates subjective scores. In another experiment, SVQM is tested over transmitted data set. As stated earlier, color weight is bigger than the depth weight. For this purpose, we started our experiments with values between 0.6 to 0.95. However, only interesting points where we have better performance are presented in the following subsections. Moreover, at the
Conclusions
In this paper, pixel wise neighboring and edge information are used to construct the proposed RR 3D video quality metric. Neighboring pixel values change during encoding and can be used as a measure of distortion. To model neighboring information, contrast features from GLCM are computed. Contrast feature should be computed for all frames of original and compressed video. GLCM contrast measures and edge information of original frames are transferred to receiver side. The other component in the
References (35)
- et al.
Video quality assessment based on structural distortion measurement
Signal Process. Image commun.
(2004) - et al.
A new standardized method for objectively measuring video quality
IEEE Trans. Broadcast.
(2004) - Perceptual quality measure using a spatio-temporal model of the human visual...
Digital Video Quality: Vision Models and Metrics
(2005)- et al.
An objective video quality assessment system based on human perception
SPIE Human Vision Visual Process. Digital Disp. IV
(1993) Whats wrong with mean-squared error?
(1993)- C.J. van den Branden Lambrecht, Perceptual models and architectures for video coding applications (Ph.D. thesis),...
- C.J. van den Branden Lambrecht, A working spatio-temporal model of the human visual system for image restoration and...
The use of psychophysical data and models in the analysis of display system performance
H.264 AVC over IP
IEEE Trans. Circuit syst. Video Technol.
Cited by (12)
Stereoscopic video quality assessment based on 3D convolutional neural networks
2018, NeurocomputingCitation Excerpt :As a consequence, some methods were proposed to measure the 3D video perceptual quality by applying both 2D and 3D information extracted from stereoscopic video. For example, Malekmohamadi et al. [16] proposed a RR method that encoded spatial neighboring information from gray level co-occurrence matrices for both color and depth sections. In [17], left-right views quality metric and depth perception metric were designed and pooled into SVQA score.
A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain
2017, Information SciencesCitation Excerpt :Hewage et al. [14] proposed a 3D video quality metric by evaluating two views and analyzing the edges and contours of the depth map. A similar RR-SVQA metric was proposed by Malekmohamadi et al. [25], they extracted side information from edge properties and gray level co-occurrence matrices from color and depth sections. Furthermore, Zhu et al. [49] also presented a SVQA method by taking two quality metrics into account.
Binocular perception based reduced-reference stereo video quality assessment method
2016, Journal of Visual Communication and Image RepresentationCitation Excerpt :Hewage et al. proposed a RR 3D image quality metric by extracting edges and contours of depth map [12]. Malekmohamadi et al. proposed a RR quality metric for 3D video by extracting side information from edge properties and gray level co-occurrence matrices from color and depth sections [13]. Taking into account the spatial information in video quality assessment (VQA), Ma et al. proposed a RR VQA model [14], which not only considers spatial statistical characteristics of video, but also temporal statistical characteristics.
Modern trends on quality of experience assessment and future work
2019, APSIPA Transactions on Signal and Information ProcessingFull-Reference Objective Quality Metric for Three-Dimensional Deformed Models
2024, International Journal of Image and Graphics3D Video QoE Based Adaptation Framework for Future Communication Networks
2023, Lecture Notes in Networks and Systems