Adaptive Block-size Transform based Just-Noticeable Difference model for images/videos

https://doi.org/10.1016/j.image.2011.02.002Get rights and content

Abstract

In this paper, we propose a novel Adaptive Block-size Transform (ABT) based Just-Noticeable Difference (JND) model for images/videos. Extension from 8×8 Discrete Cosine Transform (DCT) based JND model to 16×16 DCT based JND is firstly performed by considering both the spatial and temporal Human Visual System (HVS) properties. For still images or INTRA video frames, a new spatial selection strategy based on the Spatial Content Similarity (SCS) between a macroblock and its sub-blocks is proposed to determine the transform size to be employed to generate the JND map. For the INTER video frames, a temporal selection strategy based on the Motion Characteristic Similarity (MCS) between a macroblock and its sub-blocks is presented to decide the transform size for the JND. Compared with other JND models, our proposed scheme can tolerate more distortions while preserving better perceptual quality. In order to demonstrate the efficiency of the ABT-based JND in modeling the HVS properties, a simple visual quality metric is designed by considering the ABT-based JND masking properties. Evaluating on the image and video subjective databases, the proposed metric delivers a performance comparable to the state-of-the-art metrics. It confirms that the ABT-based JND consists well with the HVS. The proposed quality metric also is applied on ABT-based H.264/Advanced Video Coding (AVC) for the perceptual video coding. The experimental results demonstrate that the proposed method can deliver video sequences with higher visual quality at the same bit-rates.

Introduction

Just-Noticeable Difference (JND) accounts for the smallest detectable difference between a starting and a secondary level of a particular sensory stimulus in psychophysics [1], which is also known as the difference limen or differential threshold. JND model has given a promising way to model the properties of the Human Visual System (HVS) accurately and efficiently in many image/video processing research fields, such as perceptual image/video compression [2], [3], [4], [11], [12], image/video perceptual quality evaluation [5], [6], [7], [18], watermarking [8], and so on.

Generally automatic JND model for images can be determined in the spatial domain or the transform domain, such as Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT), or the combination of the two schemes [17]. JND models generated in the spatial domain [9], [10], named as the pixel-based JND, mainly focus on the background luminance adaptation and the spatial contrast masking. Yang et al. [11], [12] deduce the overlapping effect of luminance adaptation and spatial contrast masking to refine the JND model in [9]. However pixel-based JND models do not consider the human vision sensitivities of different frequency components. Therefore it cannot describe the HVS properties accurately. JND models generated in the transform domain, namely the subband-based JND, usually incorporate all the major effecting factors, such as Contrast Sensitivity Function (CSF), luminance adaptation, and contrast masking. In [2], the JND model is developed from the spatial CSF. Then the DCTune JND model [3] is developed by considering the contrast masking. Hontsch and Karam [4] modify the DCTune model by replacing a single pixel with a foveal region, and Zhang et al. [13] refine the JND model by formulating the luminance adaptation adjustment and contrast masking. More recently, Wei and Ngan [15] incorporate new formulae of luminance adaptation, contrast masking, and Gamma correction to estimate the JND threshold in the DCT domain. Zhang et al. [17] propose to estimate the JND profile by summing the effects in DCT and spatial domain together.

In order to extend the JND profile from spatial to temporal, temporal characteristics of the HVS are considered. The previous works mostly focus on the perceptual differences between an original video sequence and its processed version [7], [18]. Actually, the temporal HVS properties are highly correlated with the video signals, and can be approximated by a computational model. In [9], [11], [12], an empirical function based on the luminance difference between adjacent frames is proposed to model the temporal masking property. Kelly [22] proposes to measure the spatio-temporal CSF model at a constant retinal velocity, which is tuned to a particular spatial frequency. Daly [26] refines the model by taking the retina movement compensation into consideration. Jia et al. [23] estimate the JND for video sequences by considering both the spatio-temporal CSF and eye movements. Wei and Ngan [14], [15] take the directionality of the motion into consideration to generate the temporal modulation factor.

However all the existing DCT-based JND models are calculated based on the 8×8 DCT, which do not consider the perceptual properties of the HVS over transforms of different block sizes. Recently Adaptive Block-size Transform (ABT) has attracted researchers' attention for its coding efficiency in image and video compression [19], [20], [27]. It will not only improve the coding efficiency but also provide subjective benefits, especially for High Definition (HD) movie sequences from the viewpoint of subtle texture preservation [34], [35]. Specifically, transforms of larger blocks can better exploit the correlation within the block, while the smaller block size is more suitable for adapting to the local structures of the image [16]. Therefore by incorporating ABT into the JND, an adaptive JND model is obtained, which can more precisely model the spatio-temporal HVS properties. Furthermore, since ABT has been adopted in current video coding standards, the ABT-based JND model for images/videos should be considered for applications such as video compression, image/video quality assessment, watermarking, and so on.

In this paper, extension from 8×8 DCT-based JND to 16×16 DCT-based JND is performed by conducting a psychophysical experiment to parameterize the CSF for the 16×16 DCT. For still images or the INTRA video frames, a new spatial selection strategy based on the Spatial Content Similarity (SCS) is utilized to yield the JND map. For the INTER video frames, a temporal selection strategy based on the Motion Characteristic Similarity (MCS) is employed to determine the transform size for generating the JND map. Furthermore, its applications on image/video quality assessment and perceptual video coding are demonstrated to evaluate its efficiency in modeling the HVS properties.

The rest of the paper is organized as follows. Section 2 briefly introduces the extension procedure from the 8×8 JND to 16×16 JND. The proposed spatial and temporal selection strategies are presented in Section 3. The experimental performances are demonstrated and compared with the existing relevant models in Section 4. Finally, Section 5 concludes the paper.

Section snippets

JND model based on transforms of different block sizes

JND model in the DCT domain is determined by a basic visibility threshold Tbasic, the spatial and temporal modulation factors. It can be expressed asT(k,m,n,i,j)=Tspatio(m,n,i,j)×αtempo(k,m,n,i,j),Tspatio(m,n,i,j)=Tbasic(i,j)×αlum(m,n)×αcm(m,n,i,j),where k denotes the frame index of the video sequence, (m,n) is the position of DCT block in the current frame, (i, j) indicates the DCT coefficient position, and αlum and αcm, denoting the luminance adaptation and contrast masking, respectively,

Selection strategy between transforms of different block sizes

In the last section, the formulations of the JND models for the 8×8 and 16×16 DCT transforms are described. Decision method for the proper transform size, i.e., 8×8 or 16×16, will be discussed in this section.

JND model performance evaluation

In order to demonstrate the efficiency of the proposed ABT-based JND model, the noise is injected into each DCT coefficient of each image or video frame to evaluate the HVS error tolerance abilityI˜typ(k,m,n,i,j)=Ityp(k,m,n,i,j)+R·Ttyp(k,m,n,i,j),where Ĩtyp is the noise-contaminated DCT coefficient, which is located on the (i,j)th position of the (m,n)th block in the kth frame. For still images, k is set as 0. R takes the value of +1 or −1 randomly to avoid introducing a fixed pattern of

Conclusions

In this paper, a novel ABT-based JND profile for images/videos is proposed by exploiting the HVS properties over transforms of different block sizes. Novel spatial and temporal selection strategies are designed to determine which block-size transform is employed, for still images/video INTRA frames and INTER video frames, respectively. The experimental results have demonstrated that the ABT-based JND profile can effectively model the HVS properties. Based on the proposed JND model, a simple

Acknowledgment

This work was partially supported by a grant from the Chinese University of Hong Kong under the Focused Investment Scheme (Project 1903003). Thanks to the area editor and all the anonymous reviewers for their constructive comments and useful suggestions that led to the improvements in the quality, presentation and organization of the paper. The authors are grateful to Dr. Zhenyu Wei and Prof. Weisi Lin for providing their JND codes for comparisons, and thank Dr. Jie Dong for her valuable

References (54)

  • X. Yang et al.

    Just noticeable distortion model and its applications in video coding

    Signal Processing: Image Communication

    (2005)
  • X. Zhang et al.

    Improved estimation for just-noticeable visual distortion

    Signal Processing

    (2005)
  • X. Zhang et al.

    Just-noticeable difference estimation with pixels in images

    Journal of Visual Communication and Image Representation

    (2008)
  • Weber's Law of Just Noticeable Differences,...
  • A.J. Ahumada et al.

    Luminance-model-based DCT quantization for color image compression

    Proceedings of the SPIE, Human Vision, Visual Processing, and Digital Display III

    (1992)
  • A.B. Watson

    DCTune: a technique for visual optimization of DCT quantization matrices for individual images

    Society for Information Display (SID) Digest

    (1993)
  • I. Hontsch et al.

    Adaptive image coding with perceptual distortion control

    IEEE Transactions on Image Processing

    (2002)
  • W. Lin et al.

    Visual distortion gauge based on discrimination of noticeable contrast changes

    IEEE Transactions on Circuits and Systems for Video Technology.

    (2005)
  • Z. Lu et al.

    Modeling visual attention's modulatory aftereffects on visual sensitivity and quality evaluation

    IEEE Transactions on Image Processing

    (2005)
  • A.B. Watson et al.

    Digital video quality metric based on human vision

    Journal of Electronic Imaging

    (2001)
  • R.B. Wolfgang et al.

    Perceptual watermarks for digital images and video

    Proceedings of IEEE

    (1999)
  • C. Chou et al.

    A perceptual optimized 3-D subband codec for video communication over wireless channels

    IEEE Transactions on Circuits and Systems for Video Technology

    (1996)
  • Y. Chin et al.

    A software-only videocodec using pixelwise conditional differential replenishment and perceptual enhancements

    IEEE Transactions on Circuits and Systems for Video Technology

    (1999)
  • X. Yang et al.

    Motion-compensated residue pre-processing in video coding based on just-noticeable-distortion profile

    IEEE Transactions on Circuits and Systems for Video Technology

    (2005)
  • Z. Wei, K.N. Ngan, A temporal just-noticeable distortion profile for video in DCT domain, in: Proceedings of the...
  • Z. Wei et al.

    Spatial–temporal just noticeable distortion profile for grey scale image/video in DCT domain

    IEEE Transactions on Circuits and Systems for Video Technology

    (2009)
  • Y. Huh et al.

    Variable block size coding of images with hybrid quantization

    IEEE Transactions on Circuits and Systems for Video Technology

    (1996)
  • S.J.P. Westen, R.L. Lagendijk, J. Biemond, A quality measure for compressed image sequences based on an eye movement...
  • J. Dong et al.

    Approach to compatible adaptive block-size transforms

    Proceedings of VCIP

    (2005)
  • H. Qi, W. Gao, S. Ma, D. Zhao, Adaptive block-size transform based on extended integer 8×8/4×4 transforms for...
  • K.N. Ngan et al.

    Adaptive cosine transform coding of image in perceptual domain

    IEEE Transactions on Acoustics, Speech, and Signal Processing

    (1989)
  • D.H. Kelly

    Motion and vision II. Stabilized spatio-temporal threshold surface

    Journal of the Optical Society America

    (1979)
  • Y. Jia et al.

    Estimating just-noticeable distortion for video

    IEEE Transactions on Circuits and Systems for Video Technology

    (2006)
  • G. Robson

    Spatial and temporal contrast sensitivity functions of the visual system

    Journal of the Optical Society America

    (1966)
  • Y. Wang et al.

    Video Processing and Communications

    (2002)
  • S. Daly

    Engineering observations from sptaiovelocity and spatiotemporal visual models

    Proceedings of the SPEI

    (1998)
  • J. Dong et al.

    2D order-16 integer transforms for HD video coding

    IEEE Transaction on Circuit System and Video Technology

    (2009)
  • Cited by (45)

    • A survey on just noticeable distortion estimation and its applications in video coding

      2024, Journal of Visual Communication and Image Representation
    • Learning-based JNCD prediction for quality-wise perceptual quantization in HEVC

      2023, Journal of Visual Communication and Image Representation
    • Perceptual coding scheme for ultra-high definition video based on perceptual noise channel model

      2021, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Watson presented a technique for visual optimization of DCT quantization matrices for individual images [7]. Wei and Ngan proposed various CM-JND thresholds based on an 8 × 8 block-sized DCT kernel for plain, edge, and texture image patches [8], and Ma et al. extended the CM-JND model to work for a 16 × 16 block-sized DCT kernel [9]. Bae et al. also proposed an elaborate CM-JND model for variable block-sized transforms in monochrome images [10].

    • An improved DCT-based JND estimation model considering multiple masking effects

      2020, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      As for CM effect, it refers to the reduction in the visibility of one visual signal at the presence of another one. Unlike the pixel-based JND estimation, not only the LA and CM effects, but also spatial contrast sensitivity function (CSF) is taken into account in subband-based JND estimation [13,19–23]. The sensitivity of HVS has a band-pass property in spatial frequency, which is described by the spatial contrast sensitivity function.

    View all citing articles on Scopus
    View full text