Image super-resolution via 2D tensor regression learning

https://doi.org/10.1016/j.cviu.2014.11.005Get rights and content

Highlights

  • Presenting a novel framework on 2D tensor regression learning model.

  • Proposing three types of regularization to assist the model learning.

  • Implementing the optimization algorithms for each specific model.

  • Conducting comprehensive experiments on testing images.

  • Demonstrating the high performance & efficiency of reconstruction.

Abstract

Among the example-based learning methods of image super-resolution (SR), the mapping function between a high-resolution (HR) image and its low-resolution (LR) version plays a critical role in SR process. This paper presents a novel framework on 2D tensor regression learning model to favor single image SR reconstruction. From the image statistical point of view, the statistical matching relationship between an HR image patch and its LR counterpart can be efficiently represented in tensor spaces. Specifically, in this paper, we define a generalized 2D tensor regression framework between HR and LR image patch pairs to learn a set of tensor coefficients gathering statistical dependency between HR and LR patches. The framework is imposed by different constraint terms resulting in an interesting interpretation for the linear mapping function relating the LR and HR image patch spaces for image super-resolution. Finally, the HR image is then synthesized by a set of patches from one LR image input under the learned tensor regression model. Experimental results show that our algorithm generates HR images that are competitive or even superior to images produced by other similar SR methods in both PSNR (peak signal-to-noise ratio) and visual quality.

Introduction

As one of software resolution enhancement techniques, image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) input images [27]. Actually, image super-resolution (SR) processing is often desirable for low-cost imaging devices with resolution limitations [13], such as mobile phones, satellite imaging, video surveillance, microscopy, and digital mosaicing. With the assistance of SR, we can offer high quality images for the growing capability of modern HR displays. The basic idea behind SR is the fusion of a single or a sequence of LR noisy blurred images to produce an HR image or sequence. In many cases, multiple LR images of the same scene offer more information to recover a HR image of the scene. However, sometimes we only acquired few even a single LR input image in real world, thus an SR algorithm or technique using a single LR input image to recover HR image is more practical [25], [39]. In this paper, we only consider the case that the input is a single LR image.

In general, SR task is usually cast as the inverse problem [7] of recovering original HR image by fusing the observed LR images. The inverse problem is formulated under the following generic model,y=Hxwhere y is the observed LR image (vectorized) and x is the unknown HR image (vectorized). The matrix H represents the imaging system, consisting of several processes, such as blurring and down-sampling operations. However, finding x from y based on the above model is severely ill-posed because of the insufficient information from LR images thus the solution from the reconstruction constraint is not unique.

Generally speaking, the existing super-resolution methods can be roughly categorized into three types, which are interpolation based [5], reconstruction based [22] and learning-based [3] approaches. Interpolation based techniques have their roots in sampling theory and the HR image is directly recovered through an interpolation from the LR input. These approaches tend to blur high frequency details resulting in noticeably smooth images with ringing and jagged artifacts particularly along edges, however, they remain popular due to their computational simplicity. In reconstruction based approaches, SR problem is cast as an inverse problem [7] of recovering an HR image based on a reasonable observation model assumption that maps HR image to the LR image(s) together with prior knowledge [6] about HR images. In this process, many kinds of regularization [24] are incorporated into the model as prior knowledge to stabilize the inversion procedure of this ill-posed problem, see [11], [30]. However, the performance of the mentioned approaches is only acceptable for small upscaling factors, leading to the development of example-based learning approaches [12], which aim to learn the co-occurrence prior between local HR and LR image structures from an external training database [3], [29].

As the contemporary super-resolution approach, learning-based methods provide some promising results according to reported experiments. In [3], Chang et al. adopted the philosophy of Locally Linear Embedding (LLE) from manifold learning to recover the high-resolution image given its low-resolution counterpart as input. It is worth mentioning that Yang et al. [37] proposed a learning based super-resolution scheme based on the sparse representation and Gao et al. [13] implemented the sparse dictionary learning for image super-resolution by using the-state-of-the-art Restricted Boltzmann Machine (RBM). Fundamentally all the mentioned approaches rely on the assumed linear (maybe locally linear) relationships between the LR and HR pairs. For better exploiting the underlying context information of image, Yang et al. [39] modeled the relationship between the HR and LR patches by learning similar textural context for image sparse representation. In order to enhance the performance of image restoration (IR), Dong et al. [10] introduced two adaptive regularization terms, i.e., piecewise autoregressive (AR) model and non-local (NL) self-similarity, into the adaptive sparse representation framework. To further improve the capability of sparse representation based IR, the authors [9] proposed the concept of sparse coding noise and recast the IR goal into how to suppress the sparse coding noise.

On the other hand, other researchers also proposed SR techniques by exploiting nonlinear relations, for example, the Gaussian process (GP) based regression models have been utilized to learn such nonlinear mappings at pixel levels [16]. Although the resulting HR image is pleasing, there exists huge computational overhead because all the Gaussian models have to be re-calculated for each pixel of each input image. To extend their work of [37], the authors proposed a bilevel optimization model for the coupled dictionary training under sparse representation framework [36]. For image SR, they considered the case where the mapping function may take nonlinear forms. The experimental results showed that the new learning method outperform their old one, i.e., joint dictionary training method, both quantitatively and qualitatively.

In the above techniques for learning the mapping function, the HR/LR image patches are all manually vectorized. Thus some important spatial information among pixels tends to lose in the vectorization process. To effectively exploit such spatial information, appropriate feature representation for image patches are desired. 2D tensor [1] is an effective representative for images without damaging pixel spatial relationships. Jia et al. [18] proposed a Bayesian framework to perform face image super-resolution for recognition in tensor space. Furthermore, to effectively explore the spatial local information and avoid the curse of dimensionality dilemma, Wu et al. [34] proposed a regression model in the tensorPCA subspace for face super-resolution reconstruction. They separately learn the tensor subspaces for the high-resolution images and low-resolution counterparts, however it is more desired to learn the matching relations between LR and HR images.

Currently, tensor learning methods are drawing considerable attentions [15], [41]. Motivated by the idea of using this popular tool, in this paper, we propose a generalized 2D tensor regression learning framework for single image super-resolution via learning tensor coefficients for HR and LR image patch pairs simultaneously. In light of the importance of the mapping function in the learning-based SR, the intent of our work will be understood explicitly since the SR quality largely depends on whether the mapping function can represent well the underlying relation between HR and LR pairs. Meanwhile it is well known that some patches in a natural image may redundantly occur many times not only within the same scale, but across different scales. This observation motivates us to better exploit the relationship between HR/LR patch pairs for image SR task. Different from the existing vectorizing-based methods, the pixel spatial information can be preserved well when image patches are represented as tensorial data. By imposing different constraint terms, we can obtain three kinds of 2D tensor regression learning task for image SR.

Our main contributions are summarized as follows. Firstly, by taking advantage of tensorial representation, we propose a general 2D tensor regression learning framework to learn a mapping function for image super-resolution; and secondly, to stabilize the solution of the 2D tensor learning task, we further impose three different regularization terms, i.e., Orthogonal constraint, Squared 2-norm constraint and Non-negative constraint, on the generic model, which can leverage the power of this combination for image super-resolution.

The remainder of the paper is organized as follows. In Section 2, a generalized 2D tensor regression learning framework is proposed to learn a mapping function for single image SR problem followed by a detailed description of algorithm in Section 3. In Section 4, we discuss how to apply the learned mapping function to the single image SR. The extensive experimental results for image SR and their analysis are reported in Section 5, where our results show the proposed methods are quantitatively and qualitatively competitive or even superior to the existing interpolation and learning-based SR approaches. Finally, the conclusion is drawn in Section 6.

Section snippets

2D tensor regression learning framework

To review some related existing learning-based SR techniques and introduce our proposed model, we first borrow some useful notations for tensorial algebra that will be used throughout this paper. More specifically, matrices will be denoted by capital letters, e.g., X, vectors by boldface lowercase letters, e.g., x, and scalars by lowercase letters, e.g., x. As for tensors, we denote it by Euler script calligraphic letters, e.g. X.

In [15], Guo et al. considered the following linear regression

2D tensor learning algorithm description

In this section, we detail how to solve the 2D tensor-learning SR optimization problems along with different constraints on the parameter matrices U and V.

Image SR via learned mapping function

In this section, we discuss how to perform the patchwise SR recovery by the learned mapping function. At the training stage, we collect a large number of HR and LR image pairs to efficiently learn the relationship between them. In order to accurately represent the mapping, we first classify each training patch pairs into a certain cluster. Specifically, we apply a high-pass filter to each HR patch to output the feature for clustering similar to [9], [10]. Given {Xi,Yi}i=1N patches, we

Experimental results

In order to investigate the performance of our proposed SR schemes based on our 2D tensor regression learning model, we conducted several experiments of single image super-resolution using the proposed method and other existing methods. Except for Parthenon, the size of most test images are all 256 × 256 which are popular for validating image super-resolution performance in literature [7], [24], [37]. An observed LR image is synthesized by simply down-sampling an HR image at a certain scaling

Conclusion

In this paper, we proposed a generalized 2D tensor regression learning framework for single image super-resolution reconstruction, which can efficiently preserve 2D pixel spatial information between HR and LR images. Instead of using the linear regression model to represent the relationship between vectorized HR/LR image patch pairs, we directly regress the HR patch over LR patch by a multiple 2D tensor regression learning model imposed on different constraints. Once the appropriate model

Acknowledgments

Junbin Gao’s work is supported by the Australian Research Council (ARC) through the Grant DP130100364. The work of the first and third authors is supported by NSF China under Grants No. 61201392.

References (41)

  • Michal Irani et al.

    Improving resolution by image registration

    CVGIP: Graph. Models Image Process.

    (1991)
  • Deng Cai, Xiaofei He, Jiawei Han, Subspace learning based on tensor analysis, Technical report, Computer Science...
  • Deng Cai et al.

    Graph regularized non-negative matrix factorization for data representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • H. Chang, D.-Y. Yeung, Y. Xiong, Super-resolution through neighbor embedding, in: IEEE Conference on Computer Vision...
  • P.L. Combettes et al.

    Fixed-Point Algorithms for Inverse Problems in Science and Engineering

    (2011)
  • S. Dai, M. Han, W. Xu, Y.Wu, Y. Gong, Soft edge smoothness prior for alpha channel super resolution, in: IEEE...
  • Shengyang Dai et al.

    Softcuts: a soft edge smoothness prior for color image super-resolution

    IEEE Trans. Image Process.

    (2009)
  • I. Daubechies et al.

    An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

    Commun. Pure Appl. Mathe.

    (2004)
  • Chris H.Q. Ding et al.

    Convex and seminonnegative matrix factorizations

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • Weisheng Dong et al.

    Nonlocal centralized sparse representation for image restoration

    IEEE Trans. Image Process.

    (2013)
  • Weisheng Dong et al.

    Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization

    IEEE Trans. Image Process.

    (2011)
  • S. Farsiu et al.

    Fast and robust multiframe super-resolution

    IEEE Trans. Image Process.

    (2004)
  • William T. Freeman et al.

    Example-based super-resolution

    IEEE Comput. Graph. Appl.

    (2002)
  • Junbin Gao, Yi Guo, Ming Yin, Restricted boltzmann machine approach to couple dictionary training for image...
  • Daniel Glasner, Shai Bagon, Michal Irani, Super-resolution from a single image, in: ICCV, 2009, pp....
  • W. Guo et al.

    Tensor learning for regression

    IEEE Trans. Image Process.

    (2012)
  • He He, Wan-Chi Siu, Single image super-resolution using Gaussian process regression, in: Proceedings of IEEE Conference...
  • Kui Jia, Shaogang Gong, Multi-modal tensor face for simultaneous super-resolution and recognition, in: Proceedings of...
  • Hyunsoo Kim et al.

    Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method

    SIAM J. Matrix Anal. Appl.

    (2008)
  • Daniel D. Lee et al.

    Learning the parts of objects by non-negative matrix factorization

    Nature

    (1999)
  • Cited by (7)

    • Variational probabilistic generative framework for single image super-resolution

      2019, Signal Processing
      Citation Excerpt :

      The regression-based methods directly learn mapping functions from the LR patch to its HR counterpart. In [22] and [23], tensor regression and kernel ridge regression are adopted for this purpose, respectively. Considering that deep architectures are more powerful in data expression than shallow models in various applications [24–29], various methods build deep networks to solve the SR regression problem.

    • Modified non-local means for super-resolution of hybrid videos

      2018, Computer Vision and Image Understanding
      Citation Excerpt :

      However, due to the limited information available, reconstruction-based methods hit a bottleneck in improving the recovered image quality. Example-based algorithms use known HR images to build a database which consists of pairs of LF information and HF information in a training phase (Freeman et al., 2001; Timofte et al., 2016; Wang et al., 2016; Yin et al., 2015). Then the established database guides the learning phase to search a matching HR block for every block in the LR image.

    • Video super-resolution based on spatial-temporal recurrent residual networks

      2018, Computer Vision and Image Understanding
      Citation Excerpt :

      Learning-based methods learn the mapping function from the training data to model the spatial correlation of single images. These methods include neighbor embedding (Chang et al., 2004), sparse representation (Yang et al., 2010), anchor regression (Timofte et al., 2013), random forest (Salvador and Prez-Pellitero, 2015), tensor regression (Yin et al., 2015), ramp transformation (Singh and Ahuja, 2015) and deep learning (Cui et al., 2014; Dong et al., 2014; Zeng et al., 2016). Some recent works focus on super-resolution on a specific kind of images, such as depth image (Ismaeil et al., 2016; Joshi and Chaudhuri, 2006), multispectral image (Aguena and Mascarenhas, 2006) and multi-resolution (Lu and Li, 2014).

    • Weighted Patches Based Face Super-Resolution Via Adaboost

      2018, Proceedings - International Conference on Machine Learning and Cybernetics
    View all citing articles on Scopus

    This paper has been recommended for acceptance by C.V. Jawahar.

    View full text