Adaptive Block-size Transform based Just-Noticeable Difference model for images/videos
Introduction
Just-Noticeable Difference (JND) accounts for the smallest detectable difference between a starting and a secondary level of a particular sensory stimulus in psychophysics [1], which is also known as the difference limen or differential threshold. JND model has given a promising way to model the properties of the Human Visual System (HVS) accurately and efficiently in many image/video processing research fields, such as perceptual image/video compression [2], [3], [4], [11], [12], image/video perceptual quality evaluation [5], [6], [7], [18], watermarking [8], and so on.
Generally automatic JND model for images can be determined in the spatial domain or the transform domain, such as Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT), or the combination of the two schemes [17]. JND models generated in the spatial domain [9], [10], named as the pixel-based JND, mainly focus on the background luminance adaptation and the spatial contrast masking. Yang et al. [11], [12] deduce the overlapping effect of luminance adaptation and spatial contrast masking to refine the JND model in [9]. However pixel-based JND models do not consider the human vision sensitivities of different frequency components. Therefore it cannot describe the HVS properties accurately. JND models generated in the transform domain, namely the subband-based JND, usually incorporate all the major effecting factors, such as Contrast Sensitivity Function (CSF), luminance adaptation, and contrast masking. In [2], the JND model is developed from the spatial CSF. Then the DCTune JND model [3] is developed by considering the contrast masking. Hontsch and Karam [4] modify the DCTune model by replacing a single pixel with a foveal region, and Zhang et al. [13] refine the JND model by formulating the luminance adaptation adjustment and contrast masking. More recently, Wei and Ngan [15] incorporate new formulae of luminance adaptation, contrast masking, and Gamma correction to estimate the JND threshold in the DCT domain. Zhang et al. [17] propose to estimate the JND profile by summing the effects in DCT and spatial domain together.
In order to extend the JND profile from spatial to temporal, temporal characteristics of the HVS are considered. The previous works mostly focus on the perceptual differences between an original video sequence and its processed version [7], [18]. Actually, the temporal HVS properties are highly correlated with the video signals, and can be approximated by a computational model. In [9], [11], [12], an empirical function based on the luminance difference between adjacent frames is proposed to model the temporal masking property. Kelly [22] proposes to measure the spatio-temporal CSF model at a constant retinal velocity, which is tuned to a particular spatial frequency. Daly [26] refines the model by taking the retina movement compensation into consideration. Jia et al. [23] estimate the JND for video sequences by considering both the spatio-temporal CSF and eye movements. Wei and Ngan [14], [15] take the directionality of the motion into consideration to generate the temporal modulation factor.
However all the existing DCT-based JND models are calculated based on the 8×8 DCT, which do not consider the perceptual properties of the HVS over transforms of different block sizes. Recently Adaptive Block-size Transform (ABT) has attracted researchers' attention for its coding efficiency in image and video compression [19], [20], [27]. It will not only improve the coding efficiency but also provide subjective benefits, especially for High Definition (HD) movie sequences from the viewpoint of subtle texture preservation [34], [35]. Specifically, transforms of larger blocks can better exploit the correlation within the block, while the smaller block size is more suitable for adapting to the local structures of the image [16]. Therefore by incorporating ABT into the JND, an adaptive JND model is obtained, which can more precisely model the spatio-temporal HVS properties. Furthermore, since ABT has been adopted in current video coding standards, the ABT-based JND model for images/videos should be considered for applications such as video compression, image/video quality assessment, watermarking, and so on.
In this paper, extension from 8×8 DCT-based JND to 16×16 DCT-based JND is performed by conducting a psychophysical experiment to parameterize the CSF for the 16×16 DCT. For still images or the INTRA video frames, a new spatial selection strategy based on the Spatial Content Similarity (SCS) is utilized to yield the JND map. For the INTER video frames, a temporal selection strategy based on the Motion Characteristic Similarity (MCS) is employed to determine the transform size for generating the JND map. Furthermore, its applications on image/video quality assessment and perceptual video coding are demonstrated to evaluate its efficiency in modeling the HVS properties.
The rest of the paper is organized as follows. Section 2 briefly introduces the extension procedure from the 8×8 JND to 16×16 JND. The proposed spatial and temporal selection strategies are presented in Section 3. The experimental performances are demonstrated and compared with the existing relevant models in Section 4. Finally, Section 5 concludes the paper.
Section snippets
JND model based on transforms of different block sizes
JND model in the DCT domain is determined by a basic visibility threshold Tbasic, the spatial and temporal modulation factors. It can be expressed aswhere k denotes the frame index of the video sequence, (m,n) is the position of DCT block in the current frame, (i, j) indicates the DCT coefficient position, and αlum and αcm, denoting the luminance adaptation and contrast masking, respectively,
Selection strategy between transforms of different block sizes
In the last section, the formulations of the JND models for the 8×8 and 16×16 DCT transforms are described. Decision method for the proper transform size, i.e., 8×8 or 16×16, will be discussed in this section.
JND model performance evaluation
In order to demonstrate the efficiency of the proposed ABT-based JND model, the noise is injected into each DCT coefficient of each image or video frame to evaluate the HVS error tolerance abilitywhere Ĩtyp is the noise-contaminated DCT coefficient, which is located on the (i,j)th position of the (m,n)th block in the kth frame. For still images, k is set as 0. R takes the value of +1 or −1 randomly to avoid introducing a fixed pattern of
Conclusions
In this paper, a novel ABT-based JND profile for images/videos is proposed by exploiting the HVS properties over transforms of different block sizes. Novel spatial and temporal selection strategies are designed to determine which block-size transform is employed, for still images/video INTRA frames and INTER video frames, respectively. The experimental results have demonstrated that the ABT-based JND profile can effectively model the HVS properties. Based on the proposed JND model, a simple
Acknowledgment
This work was partially supported by a grant from the Chinese University of Hong Kong under the Focused Investment Scheme (Project 1903003). Thanks to the area editor and all the anonymous reviewers for their constructive comments and useful suggestions that led to the improvements in the quality, presentation and organization of the paper. The authors are grateful to Dr. Zhenyu Wei and Prof. Weisi Lin for providing their JND codes for comparisons, and thank Dr. Jie Dong for her valuable
References (54)
- et al.
Just noticeable distortion model and its applications in video coding
Signal Processing: Image Communication
(2005) - et al.
Improved estimation for just-noticeable visual distortion
Signal Processing
(2005) - et al.
Just-noticeable difference estimation with pixels in images
Journal of Visual Communication and Image Representation
(2008) - Weber's Law of Just Noticeable Differences,...
- et al.
Luminance-model-based DCT quantization for color image compression
Proceedings of the SPIE, Human Vision, Visual Processing, and Digital Display III
(1992) DCTune: a technique for visual optimization of DCT quantization matrices for individual images
Society for Information Display (SID) Digest
(1993)- et al.
Adaptive image coding with perceptual distortion control
IEEE Transactions on Image Processing
(2002) - et al.
Visual distortion gauge based on discrimination of noticeable contrast changes
IEEE Transactions on Circuits and Systems for Video Technology.
(2005) - et al.
Modeling visual attention's modulatory aftereffects on visual sensitivity and quality evaluation
IEEE Transactions on Image Processing
(2005) - et al.
Digital video quality metric based on human vision
Journal of Electronic Imaging
(2001)
Perceptual watermarks for digital images and video
Proceedings of IEEE
A perceptual optimized 3-D subband codec for video communication over wireless channels
IEEE Transactions on Circuits and Systems for Video Technology
A software-only videocodec using pixelwise conditional differential replenishment and perceptual enhancements
IEEE Transactions on Circuits and Systems for Video Technology
Motion-compensated residue pre-processing in video coding based on just-noticeable-distortion profile
IEEE Transactions on Circuits and Systems for Video Technology
Spatial–temporal just noticeable distortion profile for grey scale image/video in DCT domain
IEEE Transactions on Circuits and Systems for Video Technology
Variable block size coding of images with hybrid quantization
IEEE Transactions on Circuits and Systems for Video Technology
Approach to compatible adaptive block-size transforms
Proceedings of VCIP
Adaptive cosine transform coding of image in perceptual domain
IEEE Transactions on Acoustics, Speech, and Signal Processing
Motion and vision II. Stabilized spatio-temporal threshold surface
Journal of the Optical Society America
Estimating just-noticeable distortion for video
IEEE Transactions on Circuits and Systems for Video Technology
Spatial and temporal contrast sensitivity functions of the visual system
Journal of the Optical Society America
Video Processing and Communications
Engineering observations from sptaiovelocity and spatiotemporal visual models
Proceedings of the SPEI
2D order-16 integer transforms for HD video coding
IEEE Transaction on Circuit System and Video Technology
Cited by (45)
A survey on just noticeable distortion estimation and its applications in video coding
2024, Journal of Visual Communication and Image RepresentationTransfer learning for just noticeable difference estimation
2023, Information SciencesLearning-based JNCD prediction for quality-wise perceptual quantization in HEVC
2023, Journal of Visual Communication and Image RepresentationPerceptual coding scheme for ultra-high definition video based on perceptual noise channel model
2021, Digital Signal Processing: A Review JournalCitation Excerpt :Watson presented a technique for visual optimization of DCT quantization matrices for individual images [7]. Wei and Ngan proposed various CM-JND thresholds based on an 8 × 8 block-sized DCT kernel for plain, edge, and texture image patches [8], and Ma et al. extended the CM-JND model to work for a 16 × 16 block-sized DCT kernel [9]. Bae et al. also proposed an elaborate CM-JND model for variable block-sized transforms in monochrome images [10].
An improved DCT-based JND estimation model considering multiple masking effects
2020, Journal of Visual Communication and Image RepresentationCitation Excerpt :As for CM effect, it refers to the reduction in the visibility of one visual signal at the presence of another one. Unlike the pixel-based JND estimation, not only the LA and CM effects, but also spatial contrast sensitivity function (CSF) is taken into account in subband-based JND estimation [13,19–23]. The sensitivity of HVS has a band-pass property in spatial frequency, which is described by the spatial contrast sensitivity function.