Skip to main content

Counteracting geometrical attacks on robust image watermarking by constructing a deformable pyramid transform

Abstract

Counteracting geometrical attacks remains one of the most challenging problems in robust watermarking. In this paper, we resist rotation, scaling, and translation (RST) by constructing a kind of deformable pyramid transform (DPT) that is shift-invariant, steerable, and scalable. The DPT is extended from a closed-form polar-separable steerable pyramid transform (SPT). The radial component of the SPT's basis filters is taken as the kernel of the scalable basis filters, and the angular component is used for the steerable basis filters. The shift-invariance is inherited from the SPT by retaining undecimated high-pass and band-pass subbands. Based on the designed DPT, we theoretically derive interpolation functions for steerability and scalability and synchronization mechanisms for translation, rotation, and scaling. By exploiting the preferable characteristics of DPT, we develop a new template-based robust image watermarking scheme that is resilient to RST. Translation invariance is achieved by taking the Fourier magnitude of the cover image as the DPT's input. The resilience to rotation and scaling is obtained using the synchronization mechanisms for rotation and scaling, for which an efficient template-matching algorithm has been devised. Extensive simulations show that the proposed scheme is highly robust to geometrical attacks, such as RST, cropping, and row/column line removal, as well as common signal processing attacks such as JPEG compression, additive white Gaussian noise, and median filtering.

1. Introduction

Counteracting geometrical attacks such as rotation, scaling, and translation (RST) remains one of the most challenging problems for robust watermarking. This is because geometrical attacks easily desynchronize the watermark, degrading its robustness dramatically. To address such problems, a number of RST-invariant blind watermarking schemes have been developed over the past two decades. These schemes can be roughly categorized into five paradigms, namely exhaustive search, invariant domain, auto-correlation, feature-based implicit synchronization, and geometrical correction [14]. These are briefly described below.

The exhaustive search method [57] iteratively corrects each geometrical distortion in the search space and then evaluates the watermark extracted from the geometrically corrected carrier accordingly. This method generally leads to high computational complexity and a large probability of false positives. The invariant domain approach [814] eliminates the need to identify geometrical distortions by embedding the watermark in a domain that is invariant to such distortions. However, this method may encounter the issue of interpolation approximation during the geometrically invariant transform. Auto-correlation-based techniques periodically insert the watermark in the cover and use a cross-correlation function to locate the periodic autocorrelation peaks, which indicate the geometrical transform that has been performed. The schemes discussed in [1517] are typical examples of this category. The fourth category exploits salient features to achieve geometrical synchronization, as presented in the schemes of [3, 10], and [1820]. Under this approach, the embedder binds the watermark with the geometrically invariant salient features. The watermark is recovered inversely by the receiver, who seeks the salient features that still exist, even after severe geometrical distortion. In general, this category is somewhat robust against geometrical distortions, but it may degrade greatly if the salient feature detection fails. The final category estimates the geometrical distortion parameters, thus permitting geometrical correction and watermark extraction. The template is generally constructed and embedded in the cover, and the geometrical parameters are sought via a particular technique. Several examples are shown in the schemes described in [2124]. In addition to the template, support vector machines (SVMs) have also been incorporated to obtain geometrical parameters. For example, a number of recently developed schemes [2528] generate patterns, such as the inserted template and Zernike moments, from geometrically attacked watermarked images. These patterns are then input to the SVM to train the classification model, and finally, the trained model is used to predict the geometrical parameters of the to-be-checked image.

In this paper, we develop a new geometrical correction-based robust image watermarking scheme by constructing a deformable pyramid transform (DPT) that is shift-invariant, steerable, and scalable. This is motivated by the scheme of [24], in which a steerable pyramid transform (SPT) with shift-invariance and steerability [29] is exploited to estimate, with an auxiliary inserted template, the rotation angle. This allows for rotation correction and watermark extraction. Although the scheme in [24] is highly robust against rotation, it cannot resist scaling attacks because of the lack of scalability in SPT. To counteract both the rotation and scaling, a kind of pyramid transform (PT) with shift-invariance, steerability, and scalability is needed. However, such a PT has not, to the best of our knowledge, been reported in the literature. Inspired by this situation, we design a shift-invariant, steerable, and scalable DPT. This is extended from SPT as follows.

We start by introducing the SPT. In essence, SPT is a variant of the wavelet transform (WT). As illustrated in [29], the conventional orthogonal or bi-orthogonal WT is sensitive to translation because of its critical sampling. That is, once the input signal has been translated slightly, its wavelet coefficients are not the translated versions of the original wavelet coefficients, and the information represented within a wavelet subband of the translated signal is not the same as that in the original wavelet subband. To address this issue, Freeman and Adelson [30] proposed a kind of steerable filter that can be used to synthesize any filter at an arbitrary orientation via a linear interpolation. This is termed steerability. Furthermore, Perona [31, 32] developed scalable filters that can be used to interpolate any filter on a scale within a certain range, which is called scalability. These steerable and scalable filters were further integrated to give deformable filters.

In [29], Simoncelli et al. analyzed the translation invariance of WT and then generalized it to the concept of shiftability. In brief, shiftability implies that any filter at an arbitrary position, orientation, and scale can be obtained through a linear interpolation of the designed shiftable, steerable, and scalable basis filters, respectively. The shiftability in orientation and scale is essentially equivalent to the steerability and scalability proposed in [3032], respectively. In addition, Simoncelli et al. also proposed the concept of joint shiftability, which allows shiftability in the subset of position, orientation, and scale to be achieved simultaneously. As an illustration of these concepts, they also designed a kind of SPT that is shift-invariant and has shiftability in orientation (i.e., steerability).

In [33], Karasaridis and Simoncelli analyzed constraints for SPT and subsequently designed an SPT under these constraints via a numerical approach. Unfortunately, this SPT has non-perfect reconstruction. In contrast, Portilla et al. [34] developed an SPT with perfect reconstruction.

In summary, the filters developed in [30, 33, 34] are mainly steerable analysis filters or SPTs with steerability but without scalability. Although the filters designed in [30, 31] have both steerability and scalability, they do not incorporate synthesis filters for reconstruction and thus cannot be considered as PTs. To the best of our knowledge, no PTs with shift-invariance, steerability, and scalability have been reported in the literature. In the interest of counteracting RST in robust image watermarking, we are motivated to extend the SPT with shift-invariance and steerability to include scalability. This is termed the DPT for convenience. To this end, we adopt the SPT with perfect reconstruction developed in [34]. The steerable filters of this SPT are represented in a polar-separable form, where the angular components are designed so as to achieve the steerability. This implies that scaling the steerable filters would be equivalent to dilating the radial component. Thus, according to the shiftability framework in [29], we can take the radial component of the SPT's steerable filters as the kernel for constructing the scalable filters. Furthermore, combining the scalable and steerable filters derived from the radial and angular components, respectively, gives rise to the scalability and steerability of the DPT. Its shift-invariance is inherited from the SPT by retaining undecimated high-pass and band-pass coefficients. In this way, we construct a DPT with shift-invariance, steerability, and scalability.

In an attempt to apply the DPT in robust watermarking to counteract RST, we first exploit the shift-invariance, steerability, and scalability of DPT to theoretically derive a mechanism for RST synchronization. As will be shown, the DPT coefficients of the translated signal are the translated versions of those of the original signal, which is the essence of shift-invariance. The relationship between the DPT coefficients of the original signal and those of the rotated and scaled signal is characterized by a linear interpolation function parameterized using the rotation angle and scaling factor.

In this paper, based on the derived RST synchronization mechanism, we develop a new robust image watermarking that is resilient to RST. According to the aforementioned essence of DPT's shift-invariance, the translation of the input signal should affect the synchronization of rotation and scaling. To uncouple the translation from rotation and scaling, we take the Fourier magnitude of the cover image as the input to the DPT. This achieves true translation invariance. Rotation and scaling attacks are counteracted by first deploying the rotation and scaling synchronization mechanism to estimate their parameters. The estimated parameters are then used to correct the rotation and scaling that has been performed. In blind watermarking, the original signal cannot be used by the receiver, so we resort to the template to estimate the parameters of rotation and scaling attacks. Specifically, we insert the template and watermark at level 1 and levels 2 to 3, respectively, of the DPT pyramid in the embedding process. During the detection process, we exploit the rotation and scaling synchronization mechanism to identify the rotation angle and scaling factor, and further use these estimated parameters to correct the rotation and scaling distortions and recover the watermark from the geometrically corrected signal. Extensive experimental results demonstrate that the proposed algorithm is highly robust against geometrical attacks such as RST and exhibits favorable performance against common signal processing, including JPEG compression, median filtering, Gaussian noise, and low-pass filtering. In addition, we observe comparable or higher robustness with respect to other algorithms in the simulation.

The rest of this paper is organized as follows. Section 2 reviews the SPT with shift-invariance and steerability, and the construction of the DPT with shift-invariance, steerability, and scalability is detailed in Section 3. In Section 4, we theoretically derive the RST synchronization mechanism. We describe the proposed robust image watermarking scheme in Section 5, and present our experimental results in Section 6. Finally, the conclusions are discussed in Section 7.

2. Steerable pyramid transform with shift-invariance and steerability

In this section, we review the SPT with shift-invariance and steerability. We first describe the constraints for SPT given in [33] and then introduce one closed-form SPT presented in [34].

In [33], Karasaridis and Simoncelli evaluated the constraints for the recursive multi-scale SPT. Figure 1 illustrates a single-stage SPT, and the multi-stage version can be formed by recursively inserting the block enclosed in the dashed box into the filled circle. To reach the perfect construction, the SPT should meet the following constraints [33]:

H 0 ω 2 + L 0 ω 2 L 1 ω 2 + k = 0 K 1 B k ω 2 = 1
(1-1)
L 1 ω 2 2 L 1 ω 2 + k = 0 K 1 B k ω 2 = L 1 ω 2 2
(1-2)
L 1 ω = 0 , ω > π 2 ,
(1-3)

where ω = (ω x , ω y ) is the frequency vector in the Fourier domain, H 0(ω) and L 0(ω) denote the non-oriented high-pass and low-pass filters, respectively, and L 1(ω) and B k (ω) (k = 0, …, K − 1) represent the narrow-band low-pass filter and the oriented band-pass filter, respectively. Eqs. (1-1), (1-2), and (1-3) describe the unit system response amplitude, recursion relationship, and aliasing cancellation, respectively. Furthermore, the following constraint must hold to achieve steerability:

B k ω = B ω j cos θ θ k K 1 ,
(2)

where θ = arg(ω), θ k  = πk / K, and B ω = k = 0 K 1 B k ω 2 .

Figure 1
figure 1

System diagram of SPT. Recursively inserting the block in the dashed box at the location of the filled circle gives rise to the multi-scale SPT.

Under the constraints in Eqs. (1) and (2), Karasaridis and Simoncelli employed a numerical technique to design the SPT [33], but unfortunately, this resulted in an SPT with non-perfect reconstruction. In contrast, Portilla et al. [34] devised an SPT with perfect reconstruction. This satisfies the constraints described above and can be represented in a closed form. Because perfect reconstruction is a natural requisite for watermarking, we only introduce the closed-form SPT in [34]. This SPT is represented in the Fourier domain, with polar-separable filters written as:

L 1 r , θ = cos π 2 log 2 4 r π , π 4 < r < π 2 1 , r π 4 0 , r π 2
(3)
B k r , θ = H r G k θ k = 0 , , K 1 ,
(4)

where r = ω = ω x 2 + ω y 2 , θ = arg(ω), and H(r) and G k (θ) are defined as:

H r = cos π 2 log 2 2 r π , π 4 < r < π 2 1 , r π 2 0 , r π 4
(5)
G k θ = K 1 ! K 2 K 1 ! 2 cos θ πk K K 1 .
(6)

The filters L 0(r, θ) and H 0(r, θ) are thus constructed as:

L 0 r , θ = L 1 r 2 , θ
(7)
H 0 r , θ = H r 2 .
(8)

Note that the high-pass filter H 0(r, θ) in [34] is also split into a number of oriented subbands, i.e., H 0(r, θ) = H(r / 2)G k (θ). Because the oriented high-pass subbands will not be used in our scheme, they have been equivalently simplified as Eq. (8).

3. Design of deformable pyramid transform with shift-invariance, steerability, and scalability

In the interest of counteracting RST in robust watermarking, we are motivated to design a DPT with shift-invariance, steerability, and scalability. We take the SPT in [34] with shift-invariance and steerability as the starting point. Based on such an SPT, we further achieve scalability by constructing scalable basis filters from the steerable ones, B k (r, θ). According to the theory of shiftability in [29], scalable basis filters are essentially scaled versions of B k (r, θ). As scaling B k (r, θ) is equivalent to scaling H(r) according to Eq. (4), H(r) can be taken as the kernel for constructing scalable basis filters. That is, the radial component H(r) in Eq. (4) is used to achieve scalability, and the angular components G k (θ) are used to satisfy steerability. Together, this results in the joint steerability and scalability. Furthermore, keeping the high-pass and band-pass subbands undecimated, as in the SPT, yields the property of shift-invariance. Continuing with this line of thought, we can construct the DPT, as shown in Figure 2. This achieves the desired characteristics of shift-invariance, steerability, and scalability. The C j (r)(j = 0, 1, …, J − 1) in Figure 2 denote the scalable filters designed from the kernel H(r).

Figure 2
figure 2

System diagram of DPT represented in the Fourier domain. C j (r)(j = 0, 1, …, J − 1) are the scalable basis filters designed from the kernelH(r), and the other filters are the same as those in Figure 1. Recursively inserting the sub-system in the dashed box at the location of the filled circle forms the multi-scale DPT.

It can be observed from Figure 2 that perfect reconstruction requires the following constraint to be satisfied:

H 0 r 2 + L 0 r 2 L 1 r 2 + j = 0 J 1 C j r 2 k = 0 K 1 G k θ 2 = 1.
(9)

By comparing Eq. (9) to Eqs. (1-1) and (4), we have

j = 0 J 1 C j r 2 = H r 2 .
(10)

Below, we determine a suitable number of scalable basis filters, J, derive the closed-form C j (r), and obtain the interpolation functions for steerability and scalability.

3.1. Construction of scalable basis filters

According to the sufficient and necessary condition of shiftability in [29], the number of basis filters is equal to or greater than the number of Fourier frequencies with non-zero magnitude, where the Fourier frequency denotes the kernel's frequency in the form of an imaginary exponent. Because H(r) is a piecewise function in the Fourier domain, we determine the number of scalable basis filters in a piecewise fashion, as follows.

First, consider the case r (π/4, π/2) where H(r) = cos ((π/2) log2(2r/π)). Here, H(r) can be treated as a function that has undergone a logarithmic warping operation, i.e., H(r) = cos(ρ(2r/π)), where ρ(2r/π) = π log2(2r/π)/2 (− π/2, 0). Because warping operations do not, according to [29], affect the property of shiftability, the number of scalable basis filters for r (π / 4, π / 2) depends on the non-warping kernel H ˜ r = cos 2 r / π = e j 2 r / π + e j 2 r / π / 2 . Clearly, there are two Fourier frequencies with non-zero magnitude, and thus, the number of scalable basis filters for r (π / 4, π / 2) satisfies J ≥ 2. For simplicity, we choose J = 2 and construct, according to [29], the two scalable basis filters C j (r)(j = 0, 1) as:

C j r = a j cos π 2 log 2 2 r π + R j ,
(11)

where a j (a j  > 0) meets the constraint in Eq. (10), and R j (− π / 2, 0) is set as:

R j = π 2 + j + 1 π 6 j = 0 , 1 ,
(12)

which aims to generate frequency subbands with equal size on a logarithmic axis. To make the scalable basis filter reflection-shiftable [29], we further design C j (r)(j = 0, 1) as:

C j r = a j cos π 2 log 2 2 r π + R j + a j cos π 2 log 2 2 r π + R j = 2 a j cos R j cos π 2 log 2 2 r π .
(13)

By substituting Eqs. (13) and (12) into Eq. (10), we have

4 a 0 2 cos 2 π 3 + 4 a 1 2 cos 2 π 6 = 1.
(14)

As Eq. (14) is underdetermined, there exist many values of a j that satisfy Eq. (14). By simply setting a 0 = a 1, we have the following solutions:

a 0 = a 1 = 1 2 .
(15)

Therefore, the two scalable basis filters for the case r (π / 4, π / 2) are constructed as:

C 0 r = cos π 3 cos π 2 log 2 2 r π = 1 2 cos π 2 log 2 2 r π C 1 r = cos π 6 cos π 2 log 2 2 r π = 3 2 cos π 2 log 2 2 r π .
(16)

We proceed to handle the case r (0, π / 4]. The kernel H(r) is H(r) = 0, and thus, J ≥ 0 holds. For the case r [π/2, π], H(r) is represented as H(r) = 1, and hence, we have J ≥ 1. For the convenience of construction, we uniformly adopt J = 2 scalable basis filters for all three cases. Under the constraint of Eq. (10), the two scalable basis filters for r (0, π / 4] and r [π / 2, π] are derived as C 0(r) = C 1(r) = 0 and C 0 r = C 1 r = 1 / 2 , respectively.

In summary, the two scalable basis filters are constructed as follows:

C 0 r = 1 2 cos π 2 log 2 2 r π , π 4 < r < π 2 1 2 , r π 2 0 , r π 4 C 1 r = 3 2 cos π 2 log 2 2 r π , π 4 < r < π 2 1 2 , r π 2 0 , r π 4
(17)

3.2. Derivation of interpolation function

Under the shiftability framework [29], the interpolation function is parameterized by translation distance, rotation angle, or scaling factor, and will be used to interpolate the filter (response) at an arbitrary spatial position, orientation, or scale. Because the designed DPT is shift-invariant, we mainly derive interpolation functions for steerability and scalability.

We start with the derivation of the interpolation function for steerability. In the interest of reducing the computational complexity of geometrical synchronization, we adopt K = 2 steerable basis filters, i.e., G 0(θ) = cos (θ) and G 1(θ) = cos(θ − π / 2) according to Eq. (6). From the sufficient and necessary condition of shiftability [29], the steerable interpolation function b k (ϕ) satisfies the following equation:

e = e j 0 e j π 2 b 0 ϕ b 1 ϕ ,
(18)

where ϕ denotes an arbitrary rotation angle. By requiring that both the real and imaginary parts of Eq. (18) agree, we obtain the following interpolation function for steerability:

b 0 ϕ = cos ϕ b 1 ϕ = sin ϕ .
(19)

We proceed to derive the interpolation function for scalability. As mentioned previously, both H(r) and C j (r)(j = 0, 1) are piecewise. Thus, the scalable interpolation functions, say s j (σ), should also be piecewise, where σ(σ > 0) is an arbitrary scaling factor. We first handle the case r (π/4, π / 2). As analyzed in Section 3.1, the Fourier frequency with non-zero amplitude merely depends on that before the un-warping operation. Therefore, the Fourier frequency in this case is equal to k = 2 / π. According to [29], s j (σ) satisfies the following equation:

e j 2 π σ = e j 2 π R 0 e j 2 π R 1 s 0 σ s 1 σ ,
(20)

where R j (j = 0, 1) is defined in Eq. (12). Given that both the real and imaginary parts of Eq. (20) agree, we obtain

s 0 σ = sin 2 π σ R 1 / sin 2 π R 0 R 1 s 1 σ = sin 2 π σ R 0 / sin 2 π R 1 R 0 .
(21)

For the case r (0, π/4], no Fourier frequency has non-zero amplitude, and hence, s j (σ) can be any value. In our scheme, we simply set s j (σ) = 0 for r (0, π / 4]. For the case r =  [π / 2, π], the Fourier frequency with non-zero amplitude is k = 0. As a result, we have

1 = 1 1 s 0 σ s 1 σ .
(22)

Because C 0(r) = C 1(r) has been adopted in the DPT construction, we similarly set s 0(σ) = s 1(σ) and obtain s 0(σ) = s 1(σ) = 1 / 2.

By summarizing the aforementioned results, we derive the following interpolation functions for scalability:

s 0 σ = sin 2 π σ R 1 / sin 2 π R 0 R 1 , π 4 < r < π 2 1 2 , r π 2 0 , r π 4 s 1 σ = sin 2 π σ R 0 / sin 2 π R 1 R 0 , π 4 < r < π 2 1 2 , r π 2 0 , r π 4
(23)

Using the steerable and scalable interpolation functions, we can interpolate the deformable filter at arbitrary orientation ϕ and scale σ, say F ϕ, σ(r, θ), via the following construction:

F ϕ , σ r , θ = s 0 σ C 0 r + s 1 σ C 1 r × b 0 ϕ G 0 θ + b 1 ϕ G 1 θ ,
(24)

where (r, θ) are the polar coordinates in the Fourier domain. For convenience, Eq. (24) is called the deformable interpolation.

Suppose that Q jk l r , θ (j, k {0, 1}; l = 1, 2, …) denotes the DPT basis subband at the l th pyramid level. The filter response at orientation ϕ and scale σ can then be obtained via the deformable interpolation as:

Q l , σ , ϕ r , θ = s 0 σ b 0 ϕ Q 00 l r , θ + b 1 ϕ Q 01 l r , θ + s 1 σ b 0 ϕ Q 10 l r , θ + b 1 ϕ Q 11 l r , θ .
(25)

Although both Eqs. (24) and (25) are represented in the Fourier domain, performing the inverse Fourier transform on them leads to a straightforward interpolation expressed in the spatial-frequency domain.

4. Mechanism for geometrical synchronization

In this section, in an attempt to counteract geometrical attacks in robust watermarking, we exploit the characteristics of shift-invariance, steerability, and scalability in the DPT to theoretically derive synchronization mechanisms for translation, rotation, and scaling. The derivation is as follows.

4.1. Synchronization for translation

Let I(x, y) and I x 0 , y 0 x , y be the original image and its translated version, respectively, i.e., I x 0 , y 0 x , y = T x 0 , y 0 I x , y = I x x 0 , y y 0 , where (x 0, y 0) is the translation distance and T x 0 , y 0 is the translation operator. The corresponding Fourier transforms (FTs) are denoted as I(ω x , ω y ) and I ω x , ω y e j ω x x 0 + ω y y 0 , respectively.

Assume that ω 1 = (ω x , ω y ) represents the coordination at the first (finest) level of the DPT pyramid. Its corresponding coordination at the l th (l ≥ 1) pyramid level is then computed as ω l = ω x l , ω y l = ω x / 2 l 1 , ω y / 2 l 1 (see also Figure 2). Suppose that Q jk l ω l and Q jk l , x 0 , y 0 ω l (j, k {0, 1}; l = 1, 2, …) are the DPT basis subbands in the Fourier domain for I(x, y) and I x 0 , y 0 x , y , respectively. According to Figure 2, we have

Q jk l ω l = I ω l L 0 ω l L 1 ω l l 1 C j ω l G k ω l . ,
(26)
Q jk l , x 0 , y 0 ω l = e j ω x l x 0 + ω y l y 0 I ω l L 0 ω l L 1 ω l l 1 C j ω l G k ω l .
(27)

By considering Eqs. (26) and (27), we clearly find that

Q jk l , x 0 , y 0 ω x , ω y = e j ω x l x 0 + ω y l y 0 Q jk l ω x , ω y q jk l , x 0 , y 0 x , y = T x 0 , y 0 q jk l x , y ,
(28)

where q jk l , x 0 , y 0 x , y and q jk l x , y are inverse FTs of Q jk l , x 0 , y 0 ω x , ω y and Q jk l ω x , ω y respectively.

Equation (28) implies that the DPT basis subband q jk l , x 0 , y 0 x , y in the spatial-frequency domain for the translated input signal I x 0 , y 0 x , y is also the translated version of q jk l x , y for the original input signal. This is the essence of shift-invariance in the DPT.

4.2. Synchronization for rotation and scaling

According to the construction of shift-invariance in the DPT, the translation should affect the synchronization of rotation and scaling. To uncouple the translation from rotation and scaling, we adopt the Fourier magnitude of the input signal as the DPT's input, which in turn achieves the real translation invariance. Under such a setting, we derive the synchronization mechanism for rotation and scaling as follows.

Denote I ϕ , σ x , y = G ϕ , σ I x , y as e rotated and scaled version of the original image I(x, y), where G ϕ , σ is an operator that rotates counter-clockwise by ϕ and dilates by σ about the origin. Let M(ω x , ω y ) and M ϕ,1/σ(ω x , ω y ) be the Fourier magnitude of I(x, y) and I ϕ,σ(x, y), respectively. Then, we have M ϕ , 1 / σ ω x , ω y = G ϕ , 1 / σ M ω x , ω y according to the property of the FT.

As defined in Section 4.1, let ω l = (ω x /2l − 1, ω y /2l − 1) denote the frequency coordinate at the l th (l ≥ 1) level of the DPT pyramid. Assume that M ϕ,1/σ(ω 1) and M(ω 1) are decomposed via DPT into l(l ≥ 1) pyramid levels to yield the basis subbands Q jk l , ϕ , σ ω l and Q jk l ω l j , k 0 , 1 ; l = 1 , 2 , , respectively. By virtue of the steerable and scalable properties in Eq. (25), we use Q jk l ω l to interpolate the response at orientation ψ and scale λ as:

Q l , ψ , λ ω l = j = 0 1 s j λ k = 0 1 b k ψ Q jk l ω l = j = 0 1 s j λ k = 0 1 b k ψ M ω l L 0 ω l L 1 ω l l 1 C j ω l G k ω l = M ω l F l , ψ , λ ω l
(29)

where F l , ψ , λ ω l = j = 0 1 s j λ k = 0 1 b k ψ L 0 ω l L 1 ω l l 1 C j ω l G k ω l . Similarly, we further use Q jk l , ϕ , σ ω l to obtain the response at orientation ϕ + ψ and scale λ/σ as:

Q l , ϕ + ψ , λ σ ω l = j = 0 1 s j λ σ k = 0 1 b k ϕ + ψ Q jk l , ϕ , σ ω l = j = 0 1 s j λ σ k = 0 1 b k ϕ + ψ M ϕ , 1 σ L 0 ω l L 1 ω l l 1 C j ω l G k ω l = M ϕ , 1 σ ω l F l , ϕ + ψ , λ σ ω l
(30)

where F l , ϕ + ψ , λ / σ ω l = j = 0 1 s j λ / σ k = 0 1 b k ψ + ϕ L 0 ω l L 1 ω l l 1 C j ω l G k ω l . In the framework of shiftability [29], F l,ψ,λ(ω l) represents the filter at orientation ψ and scale λ in the l th level of the multi-scale DPT (see also Figure 2). This is actually the rotated and scaled version of the kernel F l , 0 , R 0 ω l = L 0 ω l L 1 ω l l 1 C 0 ω l G 0 ω l at orientation 0 and scale R 0 (see also Eq. (12)). In other words, F l , ψ , λ ω l = G ψ , λ / R 0 F l , 0 , R 0 ω l holds and so does F l , ϕ + ψ , λ / σ ω l = G ϕ + ψ , λ / σ R 0 F l , 0 , R 0 ω l . Therefore, we have

F l , ϕ + ψ , λ σ ω l = G ϕ , 1 σ F l , ψ , λ ω l .
(31)

Taking Eq. (31) and M ϕ , 1 / σ ω l = G ϕ , 1 / σ M ω l into account, Eqs. (29) and (30) essentially imply the following synchronization mechanism for rotation and scaling:

Q l , ϕ + ψ , λ σ ω l = G ϕ , 1 σ Q l , ψ , λ ω l Q l , ψ , λ ω l = G ϕ , σ Q l , ϕ + ψ , λ σ ω l .
(32)

Performing the inverse FT leads to the rotation and scaling synchronization mechanism in the spatial-frequency domain:

q l , ϕ + ψ , σ λ s l = G ϕ , σ q l , ψ , 1 λ s l q l , ψ , 1 λ s l = G ϕ , 1 σ q l , ϕ + ψ , σ λ s l ,
(33)

where s l = (x/2l−1, y/2l−1) is the coordination in the spatial-frequency domain at pyramid level l, and q l,ϕ+ψ,σ/λ(s l) and q l,ψ,1/λ(s l) are the inverse FTs of Q l,ϕ+ψ,λ/σ(ω l) and Q l,ψ,λ(ω l), respectively.

Based on Eq. (32), the synchronization for rotation and scaling can be performed as follows: decompose the Fourier magnitude of I ϕ,σ(x, y) into an l-level DPT pyramid to generate the basis subbands Q jk l , ϕ , σ ω l j , k 0 , 1 ; l = 1 , 2 , . Then, interpolate the response at orientation ϕ + ψ and scale λ/σ as Q jk l , ϕ + ψ , λ / σ ω l . Finally, successively rotate counter-clockwise by -ϕ and dilate by σ the interpolated subband Q jk l , ϕ + ψ , λ / σ ω l to yield the response Q l,ψ,λ(ω l) at orientation ψ and scale λ. The Q l,ψ,λ(ω l) is equivalent to the subband at orientation ψ and scale λ that is synthesized from the DPT basis subbands Q jk l ω l of the original image I(x, y). The rotation and scaling synchronization using Eq. (33) is similar to that based on Eq. (32).

5. Proposed robust watermarking scheme

In this section, we present the proposed robust image watermarking algorithm, which is RST-resilient. The translation invariance is achieved by taking the Fourier magnitude of the cover image I(x, y) as the DPT input, and the rotation and scaling are counteracted using the inserted template and the rotation and scaling synchronization. The details are given below, where only K = 2 steerable basis filters are adopted to reduce the computational complexity of the rotation and scaling synchronization.

5.1. Template and watermark inserti

Assume that the size of cover image I(x, y) is H × W. To obtain favorable resolution for template matching, we symmetrically pad (crop) the rows/columns of I(x, y) to the size of 1,024 if the height/width, H/W, is smaller (larger) than 1,024. We then calculate its Fourier magnitude M(ω x , ω y ) and phase Ψ(ω x , ω y ), and further decompose M(ω x , ω y ) into a three-level DPT pyramid to generate the spatial-frequency basis subbands q jk l(x, y)(j, k {0, 1}; l = 1, 2, 3). Among these, the subbands at the first (finest) level, q jk 1(x, y), are used for template insertion, whereas those at the other two levels, q jk l(x, y)(l = 2, 3), are for watermark embedding. We chose to embed in the spatial-frequency domain instead of the Fourier domain because the symmetry of the Fourier magnitude would decrease the number of candidate coefficients for watermarking and thus the embedding capacity. The template and watermark embedding process is illustrated in Figure 3, which is explained as follows.

Figure 3
figure 3

System diagram of template and watermark insertion, where RA, DPT, and FT denote the repeat-accumulate, the deformable pyramid transform, and the Fourier transform, respectively.

5.1.1. Template embedding

  1. (1)

    Generate, via a secret key KEY t 1, a random sequence P = {p i {+ 1, − 1}, i = 1, …, N t } of length N t as the template.

  2. (2)

    To enhance the security, we tune q jk 1(x, y) to the predefined secret orientation θ t and scale σ t and obtain q 1 , θ t , σ t x , y . According to the steerability and scalability in Eq. (25), we have

    q 1 , θ t , σ t = cos θ t F 1 s 0 σ t Q 00 1 + s 1 σ t Q 10 1 + sin θ t F 1 s 0 σ t Q 01 1 + s 1 σ t Q 11 1 ,
    (34)

    where Q jk 1(ω x , ω y ) denotes the FT of q jk 1(x, y) and F 1 is the inverse FT. Note that the coordinates in Eq. (34) are omitted for compactness.

  3. (3)

    Randomly select N t template positions from q 1 , θ t , σ t x , y using a secret key KEY t 2, which is denoted as PS = {(x i , y i ), i = 1, 2, …,N t }. As a trade-off between robustness and imperceptibility, we prefer the (x i , y i ) located in the spatial-frequency region with normalized radius r (π/4, π/2). Then, embed the template in the selected positions using

    u 1 , θ t , σ t x i , y i = q 1 , θ t , σ t x i , y i + β t p i ,
    (35)

    where β t is the embedding strength.

  4. (4)

    Tune u 1 , θ t , σ t x i , y i backward to obtain the watermarked basis subbands u jk 1(x, y). This, however, is non-trivial for the following two reasons. First, it is difficult to interpolate Eq. (35) backward to yield four embedded basis subbands u jk 1(x, y)(j, k {0, 1}). Second, s j (σ t ) is a piecewise function with respect to F 1 , according to Eq. (23), and thus, the interpolation in Eq. (34) cannot be implemented directly in the spatial-frequency domain. The latter situation implies that multiple FTs are required to complete the template insertion. This will significantly degrade the performance of brute-force template matching by the receiver and consequently make the template matching unaffordable.

To simplify the template insertion and template matching, we are motivated to adopt a non-piecewise s j (σ t ), e.g., setting s j (σ t ) to a fixed value u(u > 0), which turns Eq. (34) into

q 1 , θ t , σ t = u cos θ t q 00 1 + q 10 1 + u sin θ t q 01 1 + q 11 1 .
(36)

Because s j (σ t ) is piecewise, we determine a suitable u in a piecewise manner. As pointed out in Section 3.2, for r [0, π/4], s j (σ t ) can be any value. Thus, we merely consider the cases of r (π/4, π/2) and r [π/2, π]. For the case r [π/2, π], the setting s j (σ t ) = 1/2 is already a fixed value. For r (π/4, π/2), taking Eqs. (16) and (5) into account, we calculate the expression s 0(σ t )Q 0k 1 + s 1(σ t )Q 1k 1 in Eq. (34) as:

s 0 σ t Q 0 k 1 + s 1 ( σ t ) Q 1 k 1 = s 0 ( σ t ) M L 0 C 0 G k + s 1 ( σ t ) M L 0 C 1 G k = s 0 σ t + 3 s 1 σ t 2 M L 0 H G k = s 0 σ t + 3 s 1 σ t 1 + 3 1 + 3 2 M L 0 H G k
(37)

Given that the scale range concerned in our scheme is [0.5, 2] (a broader scale range would degrade the robustness to scaling attacks), the value of s 0 σ t + 3 s 0 σ t / 1 + 3 is in the range [0.69, 0.95]. Thus, we roughly set s j (σ t ) = u = 0.7, which approximates the s j (σ t ) in the cases of r (π/4, π/2) and r [π/2, π]. Although such an approximation will lead to interpolation errors, it is demonstrated to be feasible by the extensive experimental results in Section 6.

Via the simplified Eq. (37), we equivalently embed the template in the DPT basis subbands q jk 1(x i , y i ) as follows:

u j 0 1 x i , y i = q j 0 1 ( x i , y i ) + β t p i cos θ t u u j 1 1 x i , y i = q j 1 1 ( x i , y i ) + β t p i sin θ t u , j = 0 , 1
(38)

which avoids both the forward and backward interpolations and solves the problem that exists in the backward interpolation.

5.1.2. Watermark embedding

  1. (1)

    Generate N m random bits b = {b i , i = 1, …, N m } as the message using a secret key KEY w 1.

  2. (2)

    Encode b with the repeat-accumulate (RA) code of rate rate [35] to generated the encoded binary sequence e = {e i , i = 1, …, N m /rate}, where RA is a kind of code with excellent codec performance.

  3. (3)

    Because there exists a natural quad-tree structure between q jk 3(x, y)(j, k {0, 1}) and {q jk 2(2x − 1, 2y − 1), q jk 2(2x, 2y − 1), q jk 2(2x − 1, 2y), q jk 2(2x, 2y)}, we group the four quad-trees from four different subbands q jk l(x, y) together to form a 20-element vector tree T i  = {T iv , v = 1, …, 20}(i = 1, …, 1024 × 1024/16), as illustrated in Figure 4, where the child coefficients of q 00 3(x, y) are listed but the other child coefficients are omitted from the figure for compactness. In our scheme, each vector tree is taken as the basic unit for watermarking. This is an attempt to achieve a reasonable trade-off between robustness and embedding capacity.

  4. (4)

    In the interest of resisting against cropping, we choose, via a secret key KEY w 2, N m /rate vector trees located in the central region for watermark insertion. Assume that each vector tree T i is inserted with one encoded bit e i (e i  = 0, 1). We then need to take a 20-element vector to represent e i . To enhance the watermarking detection performance, we set the 20-element vector w 0 = 1 , , 1 20 " - 1 " s for e 0 and w 1 = + 1 , , + 1 20 " + 1 " s for e 1, which achieves the maximum codeword distance and thus decreases the detection error probability.

  5. (5)

    Associate the allocated bit e i to w e i and perform the embedding as follows:

    Y i = T i + β w w e i ,
    (39)

    where Y i is the watermarked vector tree and β w is a non-adaptive embedding strength because, to the best of our knowledge, no suitable human visual model has been reported in the literature for the situation in our scheme. Equation (39) is equivalently written as:

    u jk l x iv , y iv = q jk l x iv , y iv + β w 2 e i 1 , v = 1 , , 20 , l = 2 , 3 ,
    (40)

    where (x i , y i ) is the coordination corresponding to the v th element of T i .

  6. (6)

    Embed all bits e i into the chosen vector trees by iteratively implementing step 5.

  7. (7)

    Finally, perform the inverse DPT on the watermark-inserted basis subbands u jk l(l = 2, 3) and the template-embedded ones u jk 1 in Section 5.1.1 to obtain the watermarked Fourier magnitude M w (ω x , ω y ).

  8. (8)

    Multiply M w (ω x , ω y ) by the original phase Ψ(ω x , ω y ) and perform the inverse FT to obtain the watermarked image I w pre(x, y) of size 1,024 × 1,024.

  9. (9)

    Execute the inverse padding (cropping) operation on I w pre(x, y) to obtain the final watermarked image, I w (x, y), of size H × W.

Figure 4
figure 4

Illustration of a 20-element vector tree.

5.2. Efficient template-matching algorithm

Because translation invariance has been achieved by taking the Fourier magnitude of the cover image as the DPT input, we merely use the inserted template to estimate the rotation angle and scaling factor. These will be used to correct the rotation and scaling before watermark extraction. Based on the synchronization mechanisms for rotation and scaling in Section 4.2, we develop the efficient template-matching algorithm as follows.

Assume that the received image is I r (x, y). We first preprocess I r (x, y) with the same method as in the embedding stage to give an image size of 1,024 × 1,024. We then calculate the Fourier magnitude of the preprocessed image and decompose the resulting magnitude into a one-level DPT pyramid. This is because only the template inserted at level 1 is required for template matching. This yields the DPT basis subbands q jk 1(x, y)(j, k {0, 1}). According to Eq. (36), the template matching for rotation and scaling estimation can be performed as follows. The basis subbands q jk 1(x, y) are tuned to any candidate orientation and scale, and the tuned subband is then inversely rotated and dilated. The template is extracted accordingly to compute the correlation with the original template. After all candidate rotation angles and scaling factors have been searched in this way, the orientation and scale corresponding to the maximum correlation are adopted as the estimated parameters for rotation and scaling.

From the process discussed above, it can be seen that only a limited number of template points are involved in template matching. This motivates us to simplify the template matching by merely interpolating the relevant template points, as described below.

  1. (1)

    Set the range [−180, 180) with step Δ ϕ (e.g., Δ ϕ  = 0.5) as the search space for the rotation angle, and [σ 1, σ 2] (e.g., [0.5, 2.0]) with step Δ σ (e.g., Δ σ  = 0.01) as that for the scaling factor. Initialize the search parameters as ϕ = − 180 and σ = σ 1.

  2. (2)

    For each parameter pair (ϕ, σ), compute the candidate template position as:

    x i = round ( ( x i cx cos ϕ + ( y i cy ) sin ϕ ) / σ + cx ) y i = round ( ( y i cy cos ϕ ( x i cx ) sin ϕ ) / σ + cy ) ,
    (41)

    where (x i , y i )(i = 1, 2, …, N t ) denotes the original template coordinates determined by key KEY t 2, (cx, cy) is the geometrical center, and round() is the rounding operation.

  3. (3)

    Obtain, via the steerability and scalability in Eq. (36), the coefficients at location (x i , y i ) as:

    q 1 , ϕ , σ x i , y i = u cos ϕ q 00 1 x i , y i + q 10 1 x i , y i + u sin ϕ q 01 1 x i , y i + q 11 1 x i , y i .
    (42)
  1. (4)

    Calculate the correlation between the extracted and original templates as:

    Corr ϕ , σ = i = 1 N t q 1 , ϕ , σ x i , y i p i
    (43)
  1. (5)

    Increase the candidate scale σ to σ = σ + Δ σ while keeping ϕ unchanged. Repeat steps 2 to 4 until σ ≥ σ 2.

  2. (6)

    Augment the candidate rotation angle ϕ by Δ ϕ, i.e., ϕ = ϕ + Δϕ, and re-execute steps 2 to 5 until ϕ ≥ 180.

  3. (7)

    Find the maximum correlation value Corr(ϕ, σ)max and take the corresponding geometrical parameters (ϕest, σ est) as the estimated rotation angle and scaling factor.

  4. (8)

    Calculate the real parameters of rotation and dilation attacks as ϕattack = ϕest − θ t and σ attack = σ est/σ t , respectively. This is because ϕest and σ est are essentially, according to Section 4.2, equal to ϕest = θ t  + ϕattack and σ est = σ t σ attack, respectively, where θ t , ϕattack, σ t , and σ attack correspond to ψ, ϕ, 1/λ, and σ in Section 4.2, respectively.

Although the above template-matching algorithm only addresses symmetrical scaling, i.e., the scaling factors along the x- and y-axes are the same, it can easily be extended to the situation with different scaling factors. To this end, set the parameter space (ϕ, σ x , σ y ) with ϕ [−180, 180) and σ x , σ y [σ 1, σ 2], and then search each parameter space successively, similar to the above algorithm. Nevertheless, this would significantly increase the computational complexity.

5.3. Geometrical correction and watermark extraction

Suppose that M θ,1/σ(ω x , ω y ) denotes the Fourier magnitude of the preprocessed image of size 1,024 × 1,024. We then correct M θ,1/σ(ω x , ω y ) by rotating counter-clockwise by − ϕ attack and scaling by σ attack about the origin, and obtain the Fourier magnitude M 0,1(ω x , ω y ) corresponding to the original watermarked image at orientation 0 and scale 1. Next, the watermark is recovered from M 0,1(ω x , ω y ) as follows.

  1. (1)

    Decompose M 0,1(ω x , ω y ) into a three-level pyramid via the DPT, which yields the DPT basis subbands q jk l(x, y)(l = 1, 2, 3; j, k {0, 1}).

  2. (2)

    Use q jk l(x, y) to construct the 20-element vector trees Z i (i = 1, …, 1,024 × 1,024/16) in the same way as watermark embedding. Choose N m /rate vector trees located in the central area via the secret key KEY w 2.

  3. (3)

    For each selected vector tree Z i , extract the encoded message bits as follows:

    v = 1 20 Z iv w 0 v > b = 0 < b = 1 v = 1 20 Z iv w 1 v ,
    (44)

    where w b (b = 0, 1) are as in Eq. (39).

  4. (4)

    After completing the extraction of encoded message bits from all N m /rate selected vector trees, run the RA decoding to recover the raw message b ^ .

6. Experimental results and discussion

In this section, we assess the proposed watermarking scheme via experimental simulations. In the simulations, we test 20 512 × 512-grey images with different textures. For each test image, we decompose its Fourier magnitude into a three-level DPT pyramid. The first (finest) level is used for template insertion, and the other two levels are for watermark embedding. The template consists of N t  = 105 random bits and is inserted in positions with normalized radiuses r {0.3, 0.35, 0.4} and angles θ {1:1:10, 15:10:95, 100:17:359}, where 1, 10, and 17 denote the secret step. The watermark is a sequence of 720 bits that is formed by encoding the N m  = 60 random message bits with the RA code of rate rate = 1/12. The embedding strengths β t and β w are adjusted image-by-image such that the peak signal-to-noise ratio (PSNR) is 40 dB. Figure 5 illustrates several watermarked images. This figure demonstrates that the images watermarked by the proposed scheme have feasible visual fidelity. The mean and variance of all PSNRs are calculated as 40.01 dB and 0.01, respectively.

Figure 5
figure 5

Illustration of watermarked images. From left to right, the images are Baboon (PSNR = 40.00 dB), Lena (PSNR = 40.16 dB), and Tank (PSNR = 40.01dB), respectively.

For the 20 generated watermarked images, we impose geometrical attacks (e.g., rotation, scaling, cropping) and common signal processing attacks (e.g., JPEG compression, additive white Gaussian noise (AWGN), median filtering, convolution filtering). We then deploy the efficient template-matching algorithm in Section 5.2 to achieve rotation and scaling synchronization, where the search spaces for rotation and scaling are set as ϕ [−180, 180) with step Δ ϕ  = 0.5 and σ [0.5, 2.0] with Δ σ  = 0.01, respectively. The performance against these attacks is summarized below.

6.1. Performance against geometrical attacks

As translation invariance can be theoretically ensured by taking the Fourier magnitude of the cover image as the DPT input, the translation is no longer assessed in this paper. We mainly examine the performance against geometrical attacks such as rotation, scaling, cropping, and row/column line removal, which is practically implemented in StirMark [36, 37].

In StirMark, rotation attacks include rotation without auto cropping, rotation with auto cropping, and rotation with auto cropping and scaling. For these three types of attack, we set the rotation angles as ±2°, ±1°, ±0.75°, ±0.5°, ±0.25°, 45°, and 90°. We then use the efficient template-matching algorithm in Section 5.2 to estimate the rotation angle and scaling factor, followed by geometrical correction and watermark extraction. The experimental simulation shows that the bit error rates (BERs) for all concerned parameters are exactly 0, which demonstrates the high robustness of the proposed scheme to differently implemented rotations.

For scaling attacks in StirMark, we set the scaling factors in the range [0.5, 2.0] with step 0.1. The performance is shown in Figure 6, where the BER is averaged over all 20 test images. It is observed that the proposed scheme achieves BER = 0 for scaling factors from 0.7 to 1.6 and BER < 0.1 for scaling factors from 1.7 to 1.9. However, it is vulnerable to scaling factors of 0.5 and 2. The failure to counter scaling with a factor of 0.5 can be attributed to the loss of 75% of the image information, and the weakness to scaling with a factor of 2 comes from the interpolation approximation via Eq. (36).

Figure 6
figure 6

Averaged performance against scaling.

Figure 7 summarizes the averaged performance against cropping attacks in the cropping ratio range [40%, 100%] with step 5%, where the ratio is that of the cropped image to the original image. It is found that the proposed scheme achieves BER = 0 for cropping ratios in the range [75%, 100%] and BER < 0.03 for ratios from 60% to 75%. Nevertheless, it is sensitive to cropping with ratios below 60%, which is to be expected as such cropping would lose more than 60% of the image information. In this sense, our scheme has sufficient robustness to cropping.

Figure 7
figure 7

Averaged performance against cropping.

In addition, we also examine row/column line removal attacks, which are considered to be a kind of geometrical manipulation resulting in local distortions. The frequencies of the removed row and column lines are set in the range [10,100] with step size 10. Simulation results show that the proposed scheme can successfully counteract this attack by achieving BER = 0.

6.2. Performance against common signal processing attacks

Common signal processing attacks include JPEG compression, AWGN, median filtering, and convolution filtering. We first evaluate the performance against JPEG compression. In the simulation, we set the quality factor (QF) range of JPEG compression to be [10, 38] with step size 2. Figure 8 plots the performance averaged over 20 test images. It is shown that the proposed scheme has excellent performance against JPEG compression with QF ≥ 22, as well as favorable robustness for QF [16, 22). This preferable performance may be attributed to the redundant representation of the DPT, which is, from a mathematical viewpoint, essentially a frame expansion with promising robustness to added noise.

Figure 8
figure 8

Averaged performance against JPEG compression.

We further impose AWGN on the aforementioned watermarked images. We set the noise levels for AWGN in StirMark from 1 to 7 and then examine the performance in terms of BER. Figure 9 depicts the averaged performance. It is seen that the BER is 0 for noise levels of less than or equal to 3 and smaller than 0.05 for noise level 4, although it is large for the other cases. This indicates that the proposed scheme has sufficient robustness to AWGN.

Figure 9
figure 9

Averaged performance against AWGN.

In examining the performance against median filtering (cut), we set the size of the median filter in the range [2, 5] with step 1. The simulation results are summarized in Table 1. It is observed that the proposed scheme has high robustness to median filtering with sizes 2 and 3, but the performance suffers with sizes 4 and 5. This implies that the proposed scheme has feasible but not sufficiently robust performance against median filtering.

Table 1 Averaged performance against median filtering

Finally, we test the robustness of the proposed scheme to convolution filtering, which includes sharpening and Gaussian filtering. The simulation result shows that the average BER for sharpening is exactly 0 and that for Gaussian filtering is 0.143. This demonstrates that the proposed scheme is totally insensitive to sharpening but is not sufficiently robust to Gaussian filtering.

6.3. Computation time evaluation

In this section, we evaluate the computation time of the proposed scheme. As described in Section 5, the proposed scheme consists of message embedding and extraction processes. As the computation time for the message embedding process is much less than that for the message extraction process, we mainly examine the computation time for message extraction.

The message extraction process includes two stages, namely template matching and message recovery. As the former stage takes up most of the computation time of the message extraction process, below focuses on the analysis on the computation time of template matching. According to Section 5.2, there are total N ϕ  = 360/Δ ϕ candidate rotation angles and N σ  = (σ 2 − σ 1)/Δ σ candidate scaling factors. For each candidate rotation angle, all candidate scaling factors are required to search. Therefore, the computational complexity, in unit of N t -dimensional correlation calculation, of template matching can be represented as O(N ϕ N σ ).

To illustrate the computation time of template matching, we perform the following experimental simulation. As set in Section 6.1, we adopt Δ ϕ  = 0.5, σ 1 = 0.5, σ 2 = 2.0, Δ σ  = 0.01, and N t  = 105 for simulation. Under these settings, we evaluate the computation time of template matching by executing the Matlab code on an Intel personal computer (Intel, Santa Clara, CA, USA) with 2.2-GHz core(TM) 2 Duo CPU and 2-GB memory. The computation time averaged over 10 runs is 4.34 s. This implies that although a brute-force searching approach is employed for template matching, its computation time is not as high as the intuition believes. This is because only a number of template points are incorporated in template matching, and thus, feasible computation time is achieved.

6.4. Comparison to related schemes

To further evaluate the proposed watermarking scheme, we compare it with those in [21], [38], and [14], which are denoted as PP-AFFINE, WNZH-DMST, and NIKO-SFND for notational convenience, respectively. PP-AFFINE is, as surveyed in [1], a typical template-based watermarking algorithm with high robustness against affine attacks. WNZH-DMST is another template-based watermarking approach incorporating the deformable multi-scale transform (DMST) that is somewhat similar to the DPT. NIKO-SFND is a kind of salient-feature and normalization-based watermarking scheme with excellent performance against both local and global distortions.

We start with the comparison to PP-AFFINE. In this scheme, three 512 × 512-grey images, namely Baboon, Lena, and Boat, are adopted for performance assessment. Both a 60-bit message and a 14-point template are inserted in the Fourier domain, and the resulting watermarked images have PSNRs no greater than 38 dB. To ensure a fair comparison, we also embed a 60-bit message via the proposed scheme and adjust the embedding strength adaptively to make the PSNRs close to 38 dB. We then employ the same evaluation as [21] for performance comparison.

Table 2 summarizes the simulation results for PP-AFFINE and the proposed scheme. It can be observed that both schemes have the same high robustness against enhancement, row/column removal, rotation with auto cropping and scaling, and random geometrical distortions. It is interesting to find that the proposed scheme obtains a significant improvement in performance against JPEG compression, where PP-AFFINE takes the Fourier magnitude of the test image as the cover signal and the proposed scheme inputs the Fourier magnitude of the test image to the DPT to generate DPT subbands as the cover signal. This implies that DPT subbands facilitate the improvement in watermarking robustness. Nevertheless, the proposed scheme is worse than PP-AFFINE in counteracting scaling, cropping, and shearing, which is explained as follows. Compared with PP-AFFINE, the proposed scheme cannot resist against a scaling attack with factor 2, because of the interpolation approximation in Eq. (36). The proposed scheme fails to deal with cropping ratios of 50% and 75%, because at least 75% of the image information has been lost in these two situations. The weakness of the proposed scheme to shearing is to be expected, as it is designed to counteract RST rather than affine transforms. In this sense, we may claim that the proposed scheme is comparable to PP-AFFINE in terms of its performance against RST and common signal processing attacks.

Table 2 Performance comparison between the proposed scheme and PP-AFFINE[21] against the attacks in StirMark 3

We proceed to compare the proposed scheme with WNZH-DMST [38]. For fair comparison, we adopt the same settings as WNZH-DMST. In particular, we test the same five images (i.e., Tank, Globe, Lena, Man, and Zelda) as those in WNZH-DMST. We also insert a 60-bit message sequence in each test image, and let PSNRs of watermarked images be equal to those in WNZH-DMST by slightly adjusting the embedding strength. We then perform common signal processing attacks and geometrical attacks on watermarked images via StirMark 4.1 [36, 37]. The performance comparison between the proposed scheme (denoted as DPT) and WNZH-DMST is summarized in Tables 3 and 4. It is found that the performance of the proposed scheme is comparable to or better than that of WNZH-DMST.

Table 3 Performance against common processing attacks in StirMark 4.1
Table 4 Performances against global geometrical attacks in StirMark 4.1

According to [38], the DMST only has analysis filters and thus is not a pyramid transform. For this sake, WNZH-DMST uses the SPT for template/watermark insertion and message extraction and adopts the DMST to estimate the rotation angle and scaling factor. As the DMST is similar to DPT's analysis filters, the template matching of WNZH-DMST is thus similar to that in the proposed scheme. By recalling the above performance comparison, it makes sense to claim that the proposed scheme is promising to achieve better performance.

In comparison to NIKO-SFND [14], we adopt the same settings as NIKO-SFND for impartial evaluation. In [14], 10 512 × 512-grey images, namely Airplane, Boat, House, Peppers, Splash, Baboon, Couple, Lena, Elaine, and Lake, were used for performance examination. For each image, a 50-bit message was inserted and the PSNR was held at 40 dB. The watermarked images were then polluted with both local and global geometrical attacks and common signal processing attacks. The performance was evaluated by comparing the NIKO-SFND scheme to state-of-the-art approaches belonging to the same category. It was found that NIKO-SFND demonstrated comparable or even better performance than the schemes it was compared with. To ensure a fair comparison, these settings are similarly applied to our scheme, and the performance comparison is then carried out accordingly. In the simulation, we compare three state-of-the-art schemes, i.e., the SIFT-based NIKO_SFND and the schemes presented by Dong et al. [39] and Tian et al. [40]. The SIFT-based NIKO-SFND is one of the best of its class using different salient features.

1. Local geometrical attacks: According to [14], this type of attack includes row/column line removal, jitter, and cropping. The performance of the proposed scheme against these attacks is summarized in Figures 10, 11, and 12, respectively, where Nikolaidis denotes the NIKO-SFND in [14], as for all other figures below. The BERs shown in figures are averaged over 10 test images and the search space (ϕ, σx, σy) (see Section 5.2) is adopted in evaluating the attack of row/column line removal. It can be seen that the proposed scheme significantly outperforms the three comparison approaches in counteracting the attacks of row/column line removal and cropping, whereas it is remarkably weaker than the compared schemes. The weakness to jitter attacks comes from the fact that jitter attacks are outside the scope of the proposed scheme.

Figure 10
figure 10

Performance comparison against a row/column line removal attack.

Figure 11
figure 11

Performance comparison against jitter attacks.

Figure 12
figure 12

Performance comparison against a band cropping attack.

  1. 2.

    Global geometrical attacks: In [14], the author evaluated the performance of NIKO-SFND against global geometrical attacks such as rotation, scaling, downsampling followed by upsampling, shearing, and general affine transforms. Performance comparisons are given in Figures 13, 14, 15, 16, and 17, respectively. It can be observed from Figure 13 that the proposed scheme successfully estimates and corrects all checked rotation angles. This outperforms the method of Dong et al., obtains a remarkable improvement over NIKO-SFND, and has superiority over Tian et al.

Figure 13
figure 13

Performance comparison against rotation with auto-cropping.

Figure 14
figure 14

Performance comparison against scaling.

Figure 15
figure 15

Performance comparison against downsampling followed by upsampling.

Figure 16
figure 16

Performance against shearing.

Figure 17
figure 17

Performance against general affine transform.

Based on the performance against scaling shown in Figure 14, the proposed scheme exhibits a considerable improvement over the three compared schemes for scaling factors of 0.75, 0.9, 1.1, and 1.5. However, it is much worse for scaling factors of 0.5 and 2.

Figure 15 shows that the proposed scheme has high robustness to the downsampling and upsampling pairs (0.5, 1.5), (1.5, 0.5), (0.7, 1.3), and (1.3, 0.7), outperforming the approaches of Dong et al., Tian et al., and Nikolaidis. Nevertheless, these three approaches outperform the proposed scheme for other cases with a high ratio of information loss.

It is shown from Figures 16 to 17 that the proposed scheme is vulnerable to the attacks of shearing and general affine transform. This is because that the proposed scheme is designed to counteract RST attacks rather than to handle affine transform. Also, it is found that the proposed scheme and the Tian et al.'s approach have similar poor performance, and both of them are significantly worse than NIKO-SFNK and Dong et al.

  1. 3.

    Signal processing attacks: The signal processing attacks considered in [14] are JPEG compression, H.264 intra-frame compression, Gaussian noise addition, and low-pass filtering. The performance against these attacks is illustrated in Figures 18, 19, 20, and 21, respectively.

Figure 18
figure 18

Performance comparison against JPEG compression.

Figure 19
figure 19

Performance against H.264 intra-frame compression.

Figure 20
figure 20

Performance comparison against Gaussian noise addition.

Figure 21
figure 21

Performance comparison against low-pass filtering.

It is found from Figure 18 that the proposed scheme has better performance than the compared schemes for JPEG QFs from 20 to 50 but is weaker for other cases. Because situations with QF < 20 seldom occur in practice, the proposed scheme is more favorable than the three compared approaches in counteracting JPEG compression.

Figure 19 presents the performance comparison against H.264 intra-frame compression. It can be observed that the proposed scheme has similarly high robustness as the compared three approaches for quality factor values below 25, but it has significantly better performance for other cases.

As shown in Figures 20 and 21, the performance against Gaussian noise addition and low-pass filtering is somewhat similar. According to Figure 20, the proposed scheme has significant superiority over the schemes of Tian et al. and Nikolaidis. It is similar to the scheme of Dong et al. for noise variances from 0.001 to 0.005 but is slightly better for a noise variance of 0.006. The low-pass filtering results shown in Figure 21 demonstrate that the robustness of the proposed scheme is higher than that of the schemes presented by Tian et al. and Nikolaidis, while it is equivalent to that of Dong et al.'s scheme.

7. Conclusions

In this paper, we have presented a DPT-based robust image watermarking scheme resilient to rotation, scaling, and translation. We first constructed a DPT with shift-invariance, steerability, and scalability by extending an SPT represented in a closed and polar-separable form. The radial component of the SPT's basis filters was taken as the kernel for designing the scalable basis filters. These were further combined with the steerable basis filters corresponding to the angular components of the SPT's basis filters, resulting in joint scalability and steerability. The shift-invariance was inherited from the SPT by retaining undecimated high-pass and band-pass basis subbands. We also derived interpolation functions for steerability and scalability. These allow the interpolation of any filter (response) at an arbitrary orientation and scale via a linear combination of the DPT's basis filters (responses). By exploiting the characteristics of shift-invariance, steerability, and scalability, we further derived the theoretical synchronization mechanisms for translation, rotation, and scaling.

Based on the constructed DPT with preferable characteristics, we developed a robust image watermarking scheme that is resilient to translation, rotation, and scaling. The translation invariance is achieved by taking the Fourier magnitude of the cover image as the DPT input. The resilience to rotation and scaling is obtained via the synchronization mechanisms for rotation and scaling. At the transmitter, the template and watermark are inserted in the first level of the DPT pyramid and the other two levels, respectively. At the receiver, the rotation angle and scaling factor are estimated via an efficient template-matching algorithm, and these are further used to correct the rotation and scaling attacks on the received image followed by watermark extraction from the corrected image. Extensive simulations show that the proposed scheme is highly robust to geometrical attacks, such as rotation, scaling, translation, cropping, and row/column line removal, as well as common signal processing attacks such as JPEG compression, AWGN, median filtering, and convolution filtering. In addition, the comparison to some excellent related schemes demonstrated that the proposed scheme has a comparable performance against rotation, scaling, translation, cropping, and row/column line removal attacks, whereas it generally achieves a higher robustness to JPEG compression, AWGN, and low-pass filtering.

References

  1. Zheng D, Liu Y, Zhao J, Saddik AE: A survey of RST invariant image watermarking algorithms. ACM Comput. Surv. 2007, 39(2, Article 5):1-91.

    Article  Google Scholar 

  2. Kumar A, Santhi V: A review on geometric invariant digital image water-marking techniques. Int. J. Comp. Appl. 2011, 12(9):31-36.

    Google Scholar 

  3. Bas P, Chassery JM, Macq B: Geometrically invariant watermarking using feature points. IEEE Trans. Image Processing 2002, 11(9):1014-1028. 10.1109/TIP.2002.801587

    Article  Google Scholar 

  4. Wang X, Wang C, Yang Y, Niu P: A robust blind color image watermarking in quaternion Fourier transform domain. J. Syst. Softw. 2013, 86: 255-277. 10.1016/j.jss.2012.08.015

    Article  Google Scholar 

  5. Wang Y, Doherty JF, Van Dyck RE: A rotation, scaling and translation resilient image watermarking algorithm using circular Gaussian filters. In Proc. of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing. Baltimore, MD; 2001.

    Google Scholar 

  6. Lichtenauer J, Setyawana I, Kalker T, Lagendijka R: Exhaustive geometrical search and the false positive watermark detection probability. In Proc. of the SPIE-Security and Watermarking of Multimedia Contents V. Volume 5020. Santa Clara, CA; 2003:203-214. 10.1117/12.503186

    Chapter  Google Scholar 

  7. Barni M: Effectiveness of exhaustive search and template matching against watermark desynchronization. IEEE Trans. Signal Proc. Letters 2005, 12(2):158-161.

    Article  MathSciNet  Google Scholar 

  8. O’Ruanaidh JJK, Pun T: Rotation, scale and translation invariant spread spectrum digital image watermarking. Signal Process. 1998, 66(3):303-317. 10.1016/S0165-1684(98)00012-7

    Article  MATH  Google Scholar 

  9. Kim HS, Lee H-K: Invariant image watermark using Zernike moments. IEEE Trans. Circuit Syst. Video Technol. 2003, 13(8):766-775. 10.1109/TCSVT.2003.815955

    Article  Google Scholar 

  10. Tang C-W, Hang H-M: A featured-based robust digital image watermarking scheme. IEEE Trans. Signal Processing 2003, 51(4):1123-1129.

    MathSciNet  Google Scholar 

  11. Teague M: Image analysis via the general theory of moment. J. Opt. Soc. Am. 1980, 70(8):920-930. 10.1364/JOSA.70.000920

    Article  MathSciNet  Google Scholar 

  12. Zhang H, Shu H, Coatrieux G, Zhu J, Wu J, et al.: Affine Legendre moment invariants for image watermarking robust to geometrical distortions. IEEE Trans. Image Process. 2011, PP(99):1055-1068.

    MathSciNet  Google Scholar 

  13. Xiang S, Kim H-J, Huang J: Invariant image watermarking based on statistical features in the low-frequency domain. IEEE Trans. Circuit Syst. Video Technol. 2008, 18(6):777-790.

    Article  Google Scholar 

  14. Nikolaidis A: Local distortion resistant image watermarking relying on salient feature extraction. EURASIP J. Adv. Signal Processing 2012, 2012: 97. 10.1186/1687-6180-2012-97

    Article  Google Scholar 

  15. Kutter M: Watermarking resistance to translation, rotation, and scaling. In Proc. of the International Society for Optical Engineering (SPIE): Multimedia Systems Applications. Volume 3528. Boston, MA; 1998:423.

    Chapter  Google Scholar 

  16. Voloshynovskiy S, Deguillaume F, Pun T: Content adaptive watermarking based on a stochastic multiresolution image modeling. In Proc. of the 10th European Signal Processing Conference (EUSIPCO’2000). Tampere, Finland; 2000:5-8.

    Google Scholar 

  17. Voloshynovskiy S, Deguillaume F, Pun T: Multibit digital watermarking robust against local nonlinear geometrical distortions. In Proc. of the International Conference on Image Processing. Volume 3. Thessaloniki, Greece; 2001:999.

    Google Scholar 

  18. Zheng Z, Wang S, Zhao J: RST invariant image watermarking algorithm with mathematical modeling and analysis of the watermarking processes. IEEE Trans. Image Process. 2009, 18(5):1055-1068.

    Article  MathSciNet  Google Scholar 

  19. Tsai J, Huang W, Kuo Y: On the Selection of optimal feature region set for robust digital image watermarking. IEEE Trans. Image Process. 2011, 20(3):735-743.

    Article  MathSciNet  Google Scholar 

  20. Tsai J-S, Huang W-B, Kuo Y-H, Horng M-F: Joint robustness and security enhancement for feature-based image watermarking using invariant feature regions. Signal Process. 2012, 92: 1431-1445. 10.1016/j.sigpro.2011.11.033

    Article  Google Scholar 

  21. Pereia S, Pun T: Robust template matching for affine resistant image watermarks. IEEE Trans. Image Processing 2000, 9(6):1123-1129. 10.1109/83.846253

    Article  Google Scholar 

  22. Kang X, Huang J, Shi YQ, Lin Y: A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression. IEEE Trans. Circuit Syst Video Technol 2003, 13(8):776-786. 10.1109/TCSVT.2003.815957

    Article  Google Scholar 

  23. Ni J, Wang C, Huang J: A RST-Invariant robust DWT-HMM watermarking algorithm incorporating Zernike moments and template. In KES2005: Knowledge-Based Intelligent Information & Engineering Systems. Volume 3681. Edited by: Khosla R, Howlett R, Jain LC. Heidelberg: Springer; 2005:1233-1239. 10.1007/11552413_176

    Chapter  Google Scholar 

  24. Ni J, Zhang R, Huang J, Wang C, Li Q: A rotation-Invariant secure image watermarking algorithm incorporating steerable pyramid transform. In IWDW2006: Digital Watermarking. Volume 4283. 5th edition. Edited by: Shi YQ, Jeon B. Heidelberg: Springer; 2006:446-460. Int’l Workshop on Digital Watermarking, Lecture Notes Computer Science 10.1007/11922841_36

    Chapter  Google Scholar 

  25. Bogumi D: An asymmetric image watermarking scheme resistant against geometrical distortions. Signal Processing: Image Communication 2006, 21: 59-66. 10.1016/j.image.2005.06.005

    Google Scholar 

  26. Fu YG, Shen R, Lu H: Watermarking scheme based on support vector machine for color images. IEE Electronics Letters 2004, 40(16):986-7. 10.1049/el:20040600

    Article  Google Scholar 

  27. Tsai HH, Sun DW: Color image watermark extraction based on support vector machines. Inf. Sci. 2007, 177(2):550-69. 10.1016/j.ins.2006.05.002

    Article  Google Scholar 

  28. Peng H, Wang J, Wang W: Image watermarking method in multi-wavelet domain based on support vector machines. J. Syst. Softw. 2010, 83(8):1470-7. 10.1016/j.jss.2010.03.006

    Article  Google Scholar 

  29. Simoncelli EP, Freeman WT, Adelson EH, Heeger DJ: Shiftable multi-scale transform. IEEE Trans. Information Theory 1992, 38(2):587-607. 10.1109/18.119725

    Article  MathSciNet  Google Scholar 

  30. Freeman WT, Adelson EH: The design and use of steerable filters. IEEE Trans. PAMI 1991, 13(9):891-906. 10.1109/34.93808

    Article  Google Scholar 

  31. Perona P: Deformable kernels for early vision. In Proc. of the third Int. Conf. on Computer Vision and Pattern Recognition (CVPR). Lahaina, Maui; 1991:222-227.

    Google Scholar 

  32. Perona P: Deformable kernels for early vision. IEEE Trans. Pattern Anal. Machine Intelligence 1995, 17(5):488-499. 10.1109/34.391394

    Article  Google Scholar 

  33. Karasaridis A, Simoncelli EP: A filter design technique for steerable pyramid transform. In Proc. of the 21th International Conference on Acoustics, Speech, and Signal Processing. Volume 4. Atlanta, GA; 1996:2387-2390.

    Google Scholar 

  34. Portilla J, Strela V, Wainwright MJ, Simoncelli EP: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Processing 2003, 12(11):1338-1351. 10.1109/TIP.2003.818640

    Article  MathSciNet  MATH  Google Scholar 

  35. Divsalar D, Jin H, Mceliece RJ: Coding theorems for turbo-like codes. In Proc of the 36th Annual Allerton Conf. on Communication, Control and Comp. Monticello, IL; 1998:525-539.

    Google Scholar 

  36. Petitcolas FAP: Watermarking schemes evaluation. IEEE Trans. Signal Processing 2000, 17(5):58-64. 10.1109/79.879339

    Article  Google Scholar 

  37. Petitcolas FAP, Stir M: IOP Publishing PhysicsWeb, 2012. 2013. . Accessed 16 Feb. 2013 http://www.cl.cam.ac.uk/~fapp2/watermarking/stirmark/

    Google Scholar 

  38. Wang C, Ni J, Zhuo H, Huang J: A geometrically resilient robust image watermarking scheme using deformable multi-scale transform. In Proc. of the Intl. Conf. on Image Processing 2010. Hong Kong; 2010:3677-3680.

    Chapter  Google Scholar 

  39. Dong P, Brankov JG, Galatsanos NP, Yang Y, Davoine F: Digital Watermarking Robust to Geometric Distortions. IEEE Trans. Image Process. 2005, 14(12):2140-2150.

    Article  Google Scholar 

  40. Tian H, Zhao Y, Ni R, Pan J-S: Spread spectrum-based image watermarking resistant to rotation and scaling using radon transform. In Proc of the Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2010). Darmstadt; 2010:442-445.

    Chapter  Google Scholar 

Download references

Acknowledgments

This work is supported by NSFC (nos. 61202467 and 61100170), the National Research Foundation for the Doctoral Program of Higher Education of China (no. 20120171110037), the Key Program of Natural Science Foundation of Guangdong (no. S2012020011114), and the Scientific Research Foundation for Returned Overseas Chinese Scholars (State Education Ministry).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuntao Wang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Authors’ original file for figure 16

Authors’ original file for figure 17

Authors’ original file for figure 18

Authors’ original file for figure 19

Authors’ original file for figure 20

Authors’ original file for figure 21

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Wang, C., Ni, J. & Zhang, D. Counteracting geometrical attacks on robust image watermarking by constructing a deformable pyramid transform. EURASIP J. Adv. Signal Process. 2013, 119 (2013). https://doi.org/10.1186/1687-6180-2013-119

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-6180-2013-119

Keywords