Keywords

1 Introduction

For a given image, the core task of image compression is to store a small amount of data from which the original can be reconstructed with high accuracy. Homogeneous and biharmonic partial differential equations (PDEs) can restore images with a high quality from a small fraction of prescribed image points, if their position and value are carefully optimised [1, 2, 6, 15]. This suggests the viability of linear, parameter-free PDEs for image compression. Unfortunately, all these data optimisation efforts are in vain, as long as one cannot store the data efficiently.

Currently, PDE-based compression is dominated by nonlinear models that require parameter optimisation, e.g. edge-enhancing anisotropic diffusion (EED) [25]. In particular, the general purpose codec R-EED by Schmaltz et al. [23] can beat popular transform-based coders like JPEG [18] and JPEG2000 [24]. In contrast, homogeneous diffusion has only been applied for specialised applications (e.g. cartoon compression in [16]) which do not employ the sophisticated data optimisation methods mentioned above. Moreover, there is no compression codec based on biharmonic inpainting at all. This shows the lack of general purpose codecs with linear inpainting PDEs.

Our Contribution. To fill this gap, we investigate how to embed powerful data optimisation strategies for harmonic and biharmonic inpainting into two codecs: (1) A compression scheme that combines free choice of known pixels by optimal control [2, 6] with tailor-made entropy coding. (2) A stochastic method that restricts pixel selection to a locally adaptive grid, but allows to store these positions efficiently as a binary tree. We evaluate how individual restrictions and lossy compression steps of both frameworks affect the performance of harmonic and biharmonic inpainting. In addition, we compare our best methods against the state-of-the-art in PDE-based compression and the quasi-standards in transform-based compression.

Related Work. Homogeneous diffusion has been applied for the compression of specific classes of images. In particular, Mainberger et al. [14] have proposed a highly efficient codec for cartoon images, and there are several successful coders for depth maps [5, 7, 12]. However, unlike our approach, these methods rely primarily on semantical image features such as edges. This choice is motivated by the theoretical results of Belhachmi et al. [1] which suggest to choose known data at locations with large Laplacian magnitude. Köstler et al. [11] apply homogeneous diffusion for real-time video playback on a Playstation 3.

General purpose codecs with PDEs are mainly based on edge-enhancing anisotropic diffusion (EED) [25] and efficient representations of data locations by binary trees. Initially, this class of methods was proposed by Galić et al. [4], while the current state-of-the-art is the R-EED codec by Schmaltz et al. [23]. Modifications and extensions of R-EED include colour codecs [19], 3-D data compression [23], and progressive modes [22].

In addition, there are several works that are closely related to compression, but do not consider actual encoding [1, 2, 6, 15]. Instead, they deal with optimal reconstruction from small fractions of given data. We directly use results from the optimal control scheme for harmonic PDEs by Hoeltgen et al. [6] and its biharmonic extension. Our densification approach on restricted point sets is inspired by the approach of Mainberger et al. [15]. They consider a stochastic sparsification on unrestricted point sets.

Organisation of the Paper. We begin with a short introduction to PDE-based inpainting and the optimisation of spatial and tonal data in Sect. 2. Section 3 proposes solutions to general challenges of PDE-based compression algorithms. These form the foundation for our codecs. In Sect. 4 we describe a new general compression approach for exact masks, and we apply it to the optimal control method of Hoeltgen et al. [6]. Section 5 introduces a stochastic, tree-based method that imposes restrictions on the location of known data. We evaluate both compression frameworks in Sect. 6. Section 7 concludes our paper with a summary and outlook on future work.

2 PDE-Based Inpainting

Image Reconstruction. In PDE-based image compression, we want to store only a small fraction of the original image data and reconstruct the missing image parts. To this end, consider the original image \(f: \varOmega \rightarrow \mathbb {R}\). It maps each point from the rectangular image domain \(\varOmega \) to its grey value. Let us assume that the set of known locations \(K \subset \varOmega \), the inpainting mask, is already given and we want to reconstruct the missing data in the inpainting domain \(\varOmega \setminus K\).

For a suitable differential operator L, we obtain the missing image parts u as the steady state for \(t \rightarrow \infty \) of the evolution that is described by the PDE

$$\begin{aligned} \partial _{t} u = L u \quad on \ \, \varOmega \setminus K. \end{aligned}$$
(1)

Here, we impose reflecting boundary conditions at the image boundary \(\partial \varOmega \). In addition, the known data is fixed on K, thus creating Dirichlet boundary conditions \(u=f\). In our work, we consider two different parameter-free choices for the differential operator L. In the simplest case, we apply homogeneous diffusion [9]: \(Lu = \varDelta u = div (\varvec{\nabla } u)\). Since experiments suggest that the biharmonic operator \(Lu = - \varDelta ^2 u\) may give better reconstructions [2, 4, 23], it is also considered.

Both operators propagate known information equally in all directions and behave consistently throughout the whole image evolution. In contrast, more complex models in compression are nonlinear and anisotropic: edge-enhancing diffusion (EED) [25] inhibits diffusion across image edges. In principle, this allows EED to obtain more accurate reconstructions from the same amount of known data [4, 23]. However, the price for this increase in quality are algorithms with higher computational complexity and the need for parameter optimisation. Note that the compression frameworks that we propose in the following Sections work in a discrete setting. To this end we consider the finite difference approximations of the inpainting equation in the same way as in [15].

3 From Inpainting to Compression

Spatial and Tonal Optimisation. For a pure inpainting problem, predetermined missing image regions are reconstructed from known data. This is fundamentally different in compression, where the whole image is known and the encoder actively chooses known pixels. Many research results have confirmed that the choice of known data influences the reconstruction significantly [1, 2, 6, 15]. At the same mask density, i.e. the same amount of known data, choosing optimal positions and pixel values is vital for a good reconstruction. In our paper we consider a separate optimisation process of both types of data. First we optimise the locations of data and then perform a grey value optimisation (GVO) step.

figure a

Efficient Storage. In PDE-based image compression, the reconstruction capability of the inpainting operator is only one of two important factors for the success of a codec. The other key element is the efficient storage of known data. There is a straightforward trade-off between a desired sparsity of stored points and reconstruction quality. In addition, a codec can influence the final file size by combining multiple other lossless and lossy compression steps. In the following, we discuss different possibilities for such compression steps and how they can be combined with the aforementioned spatial and tonal optimisation.

Entropy Coding is an essential concept of compression that is employed in most successful codecs. It aims at removing redundancy from data and thereby stores it losslessly, but with a reduced file size. Huffman coding [8], adaptive arithmetic coding [21] and PAQ [13] have been successfully used in PDE-based compression [23]. In our setting there are always two different kinds of known data that must be stored: grey values and locations on the pixel grid. In addition, some header information like file sizes or parameters need to be stored. Therefore, we choose to use PAQ for our codecs, since it is a context mixing scheme that can locally adapt to different types of data. PAQ combines models that predict the next bit in a file from a history of already compressed bits. With neural networks, it adapts the weighting of these models to changing content types throughout a single file. Thus, it can be directly applied as an efficient container format for both positional and brightness data.

Quantisation. For brightness data, the performance of the aforementioned entropy coders can be improved by a coarser quantisation. Instead of storing grey values with float precision, we only consider a finite number \(q \le 256\) of different discrete brightness values. Since this introduces an error to the known data, it is a lossy preprocessing step to the lossless entropy coding. In a PDE-based setting, the benefits of grey value optimisation (GVO) can be diminished if such a quantisation is applied afterwards. Therefore, grey values should be optimised under the constraint of the coarse quantisation (see [23]).

To this end, we propose quantisation-aware GVO in Algorithm 1. Since both PDEs we consider are linear, we can use so-called inpainting echos [15] to speed up the optimisation. For a given mask \(\varvec{c}\) and corresponding grey values \(\varvec{g}\), let \(r(\varvec{c},\varvec{g})\) denote the inpainting result according to Sect. 2. Then, an echo is computed as \(r(\varvec{c},\varvec{e(i)})\) where in the image \(\varvec{e(\varvec{i})}\) the ith known pixel is set to 1 and all other known data is set to zero. Thereby, each echo represents the influence of a single known data point. As long as the mask \(\varvec{c}\) is constant, we can compute the reconstruction for arbitrary grey values in \(\varvec{g}\) as \(r(\varvec{c}, \varvec{g})=\sum _{i \in K} g(\varvec{x}) r(\varvec{c}, \varvec{e(i)})\). Note that the echoes only need to be computed once.

During optimisation, a Gauss-Seidel scheme successively updates the grey values at mask positions one by one. The crucial difference to [15] is that we directly quantise the grey values after every update. In our experiments, we observe that this algorithm already converges after only a few iterations over all mask points. Since the inpainting mask remains constant, the echoes can be reused for arbitrary quantisation parameters q. Thus, in contrast to the nonlinear inpainting case in R-EED [23], we are able to optimise q thoroughly and efficiently with the help of a simple grid search.

The number q of quantised grey values also influences the overall file size, since the entropy coding of the grey values becomes more efficient for smaller numbers of different grey values. Therefore, for smaller q, also the file size becomes smaller in general. Simultaneously, the error increases, since the constraints to the GVO become more strict. For a given mask, the best trade-off between file size and reconstruction quality must be found, i.e. the inpainting error and the file size have to be minimised simultaneously. For a given quantisation q, let \(s: \{0,...,255\} \rightarrow \mathbb {N}\) be the file size in byte and \(e: \{0,...,255\} \rightarrow \mathbb {R}\) the corresponding mean square error. By normalising both quantities to the range [0,1] and combining them additively, we define the trade-off coefficient \(\mu \):

$$\begin{aligned} \mu := \frac{s(q)}{s(255)} + \frac{e(q)}{e(255)}. \end{aligned}$$

The smaller this coefficient, the better the trade-off for a given q. Our goal is to find the best q for a given mask. In our algorithms, we minimise \(\mu \) with respect to q in combination with quantisation-aware GVO.

Efficient Representations of Positional Data. In the previous paragraphs we have covered how compression steps influence tonal optimisation. In the following sections we propose two different approaches to perform the spatial optimisation: in Sect. 4, we allow free choice of point positions on the pixel grid and in Sect. 5 we restrict ourselves to a coarser, locally adaptive mesh. For both cases we discuss efficient storage. Note that for both codecs, decompression follows the same pattern: first we extract the entropy-coded mask data, then we apply inpainting to reconstruct the image.

4 Encoding with Exact Masks

Finding optimal positions on the pixel grid for a fixed amount of mask points is nontrivial. Besides the greedy approaches in [15], the algorithms in [2, 6] find masks by formulating an optimisation problem. As proposed in these papers, a primal-dual scheme can be employed for finding a solution. While [6] focuses on homogeneous diffusion, biharmonic inpainting is considered in [2]. Thus, we use both PDEs to find optimised masks for our compression scheme. Note that we use the variants of both methods that produce binary masks.

figure b

An inpainting mask acquired from the aforementioned algorithms is essentially a binary image of the same size as the original image. This suggests to use compression schemes like JBIG [10] to store the mask, in particular since it has been successfully applied in edge-based compression [16]. However, JBIG is optimised for binary images that obey a connectivity, since its primary intention was the use in fax machines. This is not the case for sparse inpainting masks which makes them hard to compress for JBIG.

Therefore, we have also evaluated other compression methods for binary image data. While block coding schemes [3, 17, 26, 27] and coordinate coding [17] are not competitive to JBIG on their own, they can act as a preprocessing step for entropy coders such as Huffman Coding [8], arithmetic coding [21] and PAQ [13]. We experimentally found that a simple block coding scheme [27] in combination with PAQ is a good choice. It reduces the file size by up to \(10\,\%\) in comparison to JBIG. Together with the grey value and quantisation optimisation from Sect. 3, we obtain the following three-step compression for exact masks:

  1. 1.

    Select \(p\,\%\) of total pixels with the optimal control approach [6].

  2. 2.

    Perform GVO with quantisation optimisation (Algorithm 1).

  3. 3.

    Optimise block size for optimal compression with PAQ.

5 Encoding with Stochastic Tree-Building

In this section, we pursue an approach that imposes restrictions to spatial optimisation. To define an adaptive regular grid, we start by storing a fixed point pattern for a given image: its four corner points and its midpoint. We can refine this grid by splitting the image in half in its largest dimension and add the same point pattern to its two subimages. Each split in this partitions the original image into smaller subimages and thereby refines the grid locally.

figure c

Compared to the free choice of the previous sections, this restriction reduces the size of the search space for optimal known data at the potential cost of reconstruction quality. In addition, it offers a lower coding cost for the locations by using binary tree representations. We represent each subimage of a given partition as a leaf node of the tree. For a tree T consisting of nodes \(t_0,\ldots ,t_n\), the root node \(t_0\) stands for the original image and the root’s children for the halves of the original. By adding more nodes to the tree, the image corresponding to their parent node is split further. We can consider leaf nodes as indicators for termination, i.e. the subimage corresponding to a leaf node is not split further. This allows us to represent the tree as a bit sequence (0 for leaf nodes, 1 for other nodes). In [23], such an efficient representation of the point positions is obtained with a heuristic method. We propose to use a more powerful stochastic mask selection approach inspired by Mainberger et al. [15] instead.

The original idea in [15] for unrestricted point sets is to start with a full mask that contains all image points. From this mask, we remove a fixed percentage \(\alpha \) of known data. After inpainting with the smaller mask, we add a fraction \(\beta \) of the removed pixels with the highest reconstruction error back to the mask. This sparsification algorithm iterates the aforementioned steps until the target mask density is reached.

If we transfer this concept to a binary tree representation, there are some key differences: We have experimentally determined that densification is more efficient for tree structures than sparsification. Therefore, we start with a small amount of data and iteratively add more points at locations with large error until the target density is reached. In addition, we consider to add nodes to the tree instead of dealing with mask points directly. The tree structure dictates that only subimages corresponding to leaf nodes may be split. Such a split is equivalent to adding two child nodes to the former leaf node, each corresponding to one new subimage. Note that several mask points might be added by a single subdivision and that these mask points can also be contained in several of the neighbouring subimages. For our probabilistic method, we also need an appropriate error computation. In order to avoid a distortion of the influence of each node, we do not consider the mean square error in each subimage, but the sum \(e(t_k)\) of unnormalised squared differences

$$\begin{aligned} e(t_k) = \sum _{(i,j) \in \varOmega _k} (f_{i,j} - u_{i,j})^2 \end{aligned}$$
(2)

where \(\varOmega _k\) denotes the image domain of the subimage corresponding to the tree node \(t_k\). Without this unnormalised error measure, the same per-pixel-error in small subimages would be weighted higher than in large subimages. Taking all of these differences into account, we define stochastic tree densification in Algorithm 2. For a target density d, it produces an optimised tree T with a corresponding pixel mask \(C(T) \subset \varOmega \).

Just as the original sparsification algorithm, there is a risk that Algorithm 2 is caught in a local minimum. To avoid this problem, we propose an adapted version of the nonlocal pixel exchange from [15] in Algorithm 3. The method from [15] first generates a candidate set containing m non-mask pixels. Afterwards, the n pixels with the largest reconstruction error are exchanged with n randomly chosen mask pixels. If the new mask yields a better reconstruction it is kept, otherwise the change is reverted.

In our case, we have to adapt the node selection to respect the tree structure in order to define a nonlocal node exchange. In particular, the candidate set that is removed from the tree can only consist of nodes that are only split once. This is the case if and only if both children are leaf nodes. We call these nodes single split node and the reversion of their associated split comes down to removing their children which converts the single split node to a leaf node. These modifications lead to Algorithm 3.

The binary trees obtained from the densification and nonlocal node exchange can finally be stored as a sequence of bits. As in [23], we store a maximum and minimum tree depth and only save the node-structure explicitly inbetween. As header data, only the image size and the number of quantised grey values have to be stored. In total, this leads to the following encoding procedure:

  1. 1.

    Select a fraction p of total pixels with tree densification (Algorithm 2).

  2. 2.

    Optimise the splitting tree with nonlocal node exchange (Algorithm 3).

  3. 3.

    Perform GVO with quantisation optimisation (Algorithm 1).

  4. 4.

    Concatenate header, positional, and grey value data and apply PAQ.

6 Experiments

In the following we evaluate the capabilities of harmonic and biharmonic inpainting for the two compression methods from the previous sections. Our experiments rely on a set of widely-used test images. We start with a pure comparison of the inpainting operators on the test image peppers: We optimise exact and restricted masks with different densities, perform GVO and compare the mean square error (MSE) at the same mask density. The results in Fig. 1 (a) show that, in general, biharmonic performs better than harmonic inpainting given the same amount of known data. This is consistent with previous results [2]. The restriction of the mask to an adaptive grid has a significant negative impact on the quality. This affects harmonic inpainting more than its biharmonic counterpart.

Fig. 1.
figure 1

Comparisons for the test image \(256\,\times \,256\) image peppers. The top row compares harmonic and biharmonic versions of our codecs, the bottom row compares our best methods to transform coders and R-EED. (a) Top Left: Comparison at same mask density. (b) Top Right: Comparison at same compression ratio. (c) Bottom Left: Low to medium compression ratios. (d) Bottom Right: High compression ratios.

Fig. 2.
figure 2

Compression results for peppers (\(256\,\times \,256\) pixels) with compression ratio \(\approx 8\):1.

Table 1. MSE comparison on several test images. For the compression rate of 15:1 we use exact masks with homogeneous inpainting. The high compression rate of 60:1 is obtained with the biharmonic tree codec.

Interestingly, an evaluation of the actual compression performance with the codecs from Sects. 4 and 5 in Fig. 1(b) shows a significantly different ranking than in the density comparison. For exact masks, harmonic inpainting can even surpass its biharmonic counterpart. The coding cost for the known data is similar in both cases, but since harmonic inpainting is less sensitive to a coarse quantisation of the grey values, it performs overall better than biharmonic inpainting. The drawbacks of the restrictions in the tree-based approach are attenuated by the reduced positional coding cost. After a break-even point around ratio 20:1, the biharmonic tree-based method outperforms both exact approaches.

In relation to transform-based coders, the tree-based method performs consistently better than JPEG and in many cases also outperforms JPEG2000 for compression rates larger than 35:1. Nevertheless, R-EED copes better with the restriction to the adaptive grid and remains the preferable choice at high compression ratios. In comparison to other PDE-based methods, linear diffusion performs best in the area of low to medium compression rates (up to 15:1). Figs. 1, 2 and Table 1 show that, it can beat R-EED, mostly outperforms JPEG, and comes close to the quality of JPEG2000. Only on images that contain partially smooth data with high-contrast edges, R-EED is significantly better (e.g. on trui and walter). This demonstrates how powerful simple PDEs can be.

7 Conclusion

We have shown that codecs with parameter-free linear inpainting PDEs can beat both the quasi standard JPEG2000 of transform-based compression and the state-of-the-art in PDE-based compression. Comparing the different inpainting operators yields a valuable general insight: The performance of PDEs for compression can only be evaluated in the context of actual codecs. Comparisons that do not consider all compression steps can lead to false rankings of inpainting operators that do not reflect their real compression capabilities. In particular, the sensitivity of the biharmonic operator to coarsely quantised known data makes the simpler harmonic diffusion the preferable choice for compression.

Currently, the algorithms for mask selections are not competitive to JPEG2000 in regard to runtime. In our ongoing research, we focus on speeding up our algorithms for mask selection, such that PDE-based codecs are not only fast for decompression [11], but also for compression. Moreover, we will consider combinations of linear PDEs with patch-based inpainting for highly textured images, since this concept is already successful for nonlinear PDEs [20].