Improving feature matching strategies for efficient image retrieval

https://doi.org/10.1016/j.image.2017.02.006Get rights and content

Highlights

  • Twin Feature is further discussed and optimized into binary code to reduce memory requirement.

  • Dynamic normalization is designed to adjust the norm of image when calculating similarity scores.

  • An effective image retrieval framework based on SMK with the proposed methods integrated is designed.

Abstract

A number of state-of-the-art image retrieval systems have been built upon non-aggregated techniques such as Hamming Embedding (HE) and Selective Match Kernel (SMK). However, the retrieval performances of these techniques are directly affected by the quality of feature matching during the search process. In general, undesirable matched results appear mainly due to the following three aspects: (1) the locality of local features, (2) the quantization errors and (3) the phenomenon of burstiness. In this paper, starting from the framework of SMK, an in-depth study of the integration of Twin Feature (TF) and Similarity Maximal Matching (SMM) is fully investigated. To be specific, two effective modifications based on TF and SMM are proposed to further improve the quality of feature matching. On one hand, the original float vectors of TF are replaced with efficient binary signatures, which achieve relatively high efficiency and comparable accuracy of retrieval. On the other hand, Dynamic Normalization (DN) is designed to effectively control the impact of penalization generated by SMM and improve the performance with almost no extra cost. At last, an efficient image retrieval system is designed and realized based on a cloud-based heterogeneous computing framework through Apache Spark and multiple GPUs to deal with large-scale tasks. Experimental results demonstrate that the proposed system can greatly refine the visual matching process and improve image retrieval results.

Introduction

Due to the popularity of social networks, images and videos can be captured and shared easily, leading to an explosive growth of multimedia content. As a result, a number of related applications and techniques have been developed to store and process the immense multimedia data. Among all, content-based image retrieval is one of the most challenging tasks [1]. Given a query image, the aim of such systems is to return a series of relevant images from a huge image repository.

Recent years have witnessed a promising progress on image retrieval. Most image retrieval frameworks are built upon local feature based image retrieval models such as Bag-of-Words (BoW) [2] and Selective Match Kernel (SMK) [3] due to their effectiveness and efficiency. In such models, local features are firstly detected from small patches of an image. These patches are of different sizes and contain abundant information of the image. Then, robust algorithms are applied to mine and encode the structure of patches into feature descriptors. Usually, a large amount of descriptors are used to generate a visual vocabulary and the image can be then converted into several feature vectors by quantizing its descriptors to the visual vocabulary. Thanks to the inverted file indexing structure [4], fast retrieval can be obtained when dealing with a large amount of images.

Actually, one of the most crucial steps to impact the retrieval performance is feature matching within inverted files, which determines the matching score between two images directly. Unlike words in text, visual features are usually represented by numerical vectors, and do not have particular meanings like bird, flower or tower, making it challenging to be matched as precisely as text words. False positive matches appear inevitably even though advanced algorithms are applied mainly due to the following issues.

The first one is the locality of features. Local patches of different semantics may be described similarly and would be assigned to the same visual word, which will dramatically affect the visual matching accuracy. Fig. 1 shows a typical example of false match between two images, where the corresponding pair of features are so similar that even human beings are likely to feel confused if only limited local patches (within circles) are concerned. The second issue is quantization error. Irrelevant descriptors may be assigned to the same visual word when the vocabulary size k is small. On the contrary, similar descriptors may be assigned to different visual words when k is very large. Both of these cases will reduce the discriminative power of local descriptors and thus corrupt the retrieval accuracy. The third one is the burstiness [5] phenomenon. Due to the properties of local features as well as the retrieval model, a visual element may appear more than once in an image which causes multiple matches and impacts the ranking scores of candidate images.

As might be expected, this work is committed to investigating how to enhance the performance of image retrieval through refining the process of feature matching. Three major contributions are proposed as follows:

  • The technique of Twin Feature (TF) [6] is reviewed and further studied to become more efficient with binary signatures, which retains the favorable structure of inverted files, and provides higher retrieval efficiency.

  • A technique named as Dynamic Normalization (DN) is designed to effectively control the impact of penalization generated by Similarity Maximal Matching (SMM) [6] to resolve burstiness and thus improve the final retrieval accuracy.

  • An effective image retrieval system based on the state-of-the-art SMK [3] with TF, SMM and DN integrated is designed for image retrieval.

Generally speaking, the aim of this work is to further improve feature matching for image retrieval and can be considered as an extension of our previous work [6]. Both the methods TF and SMM are further discussed and improved. Moreover, to meet the needs of escalating computational demands, the cloud-based heterogeneous computing framework for large-scale image retrieval [7] is employed to carry out the proposed methods. Experimental results on two benchmark image retrieval datasets Oxford5k1 and Paris6k2 demonstrate the effectiveness of proposed methods.

The rest of this paper is organized as follows. Section 2 gives an overview of related works. The details of the proposed image retrieval system are presented in Section 3. In Section 4, the experimental results are presented and analyzed. Finally, Section 5 concludes this paper.

Section snippets

Related work

Recently, a number of works have been made to improve image retrieval from the aspects of extracting more discriminative features, optimizing quantization process, further investigating the attributes of features and designing better indexing or matching strategies. Popular local feature detectors including Difference-of-Gaussian (DoG) [8] and Hessian-Affine [9] are proposed to find local image patches containing abundant structural information. These patches are often described by local

Proposed image retrieval system

In this section, the proposed image retrieval system is introduced. In addition, the hardware architecture of our system is built upon a GPU based heterogeneous cluster to accelerate large-scale multimedia data mining as discussed in [7]. Fig. 2 illustrates the technological process of the proposed image retrieval system, in which the Twin Feature (TF) and its binary version TFb, Selective Maximal Matching (SMM) and Dynamic Normalization (DN) are integrated which significantly improve the final

Experimental results

In this section, we first give a brief introduction about our implementation details for image retrieval. And then, a number of evaluations are presented to analyze and compare the performances achieved by the proposed methods.

Conclusion

In this paper, an in-depth study of the integration of Twin Feature and similarity maximal matching on the state-of-the-art retrieval model SMK is presented and discussed. And then, two novel methods based on them are proposed to further improve the performance of image retrieval. The first one is the Twin Feature with binary signatures, which retains the favorable structure of inverted files, and provides high retrieval efficiency. The second is called dynamic normalization, which is designed

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61472281 and 61622115, and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. GZ2015005).

References (42)

  • G. Salton et al.

    Term-weighting approaches in automatic text retrieval

    Inf. Process. Manag.

    (1988)
  • Y. Chen et al.

    Ranking consistency for image matching and object retrieval

    Pattern Recognit.

    (2014)
  • R. Datta, J. Li, J.-Z. Wang, Content-based image retrieval: approaches and trends of the new age, in: 7th ACM SIGMM...
  • J. Sivic, A. Zisserman, Video google: a text retrieval approach to object matching in videos, in: Proceedings of...
  • G. Tolias, Y. Avrithis, H. Jégou, To aggregate or not to aggregate: selective match kernels for image search, in:...
  • H. Jégou, M. Douze, C. Schmid, On the burstiness of visual elements, in: Proceedings of CVPR’09, 2009, pp....
  • L. Wang, H. Wang, Twin feature and similarity maximal matching for image retrieval, in: Proceedings of ICMR’15, 2015,...
  • H. Wang et al.

    CHCFa cloud-based heterogeneous computing framework for large-scale image retrieval

    IEEE Trans. Circuits Syst. Video Technol.

    (2015)
  • D. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • K. Mikolajczyk et al.

    Scale & affine invariant interest point detectors

    Int. J. Comput. Vis.

    (2004)
  • H. Bay, T. Tuytelaars, L.V. Gool, SURF: speeded up robust features, in: Proceedings of ECCV’06, 2007, pp....
  • J. Heinly, E. Dunn, J.-M. Frahm, Comparative evaluation of binary features, in: Proceedings of ECCV’12, 2012, pp....
  • W. Zhou, H. Li, M. Wang, Q. Lu, Q. Tian, Binary SIFT: Towards efficient feature matching verification for image search,...
  • S. Zhang et al.

    USBultrashort binary descriptor for fast visual matching and retrieval

    IEEE Trans. Image Process.

    (2014)
  • H. Jégou, M. Douze, C. Shimid, Hamming embedding and weak geometric consistency for large scale image search, in:...
  • Y. Cao, C. Wang, Z. Li, L. Zhang, L. Zhang, Spatial-bag-of-features, in: Proceedings of CVPR’10, 2010, pp....
  • Y. Avrithis et al.

    Hough pyramid matchingspeeded-up geometry re-ranking for large scale image retrieval

    Int. J. Comput. Vis.

    (2014)
  • P. Xu et al.

    Nested-SIFT for efficient image matching and retrieval

    IEEE Multimed.

    (2013)
  • E. Zhang, M. Mayo, Improving bag-of-words model with spatial information, in: Proceedings of IVCNZ’10, 2010, pp....
  • E. Roman-Rangel, S. Marchand-Maillet, Bag-of-visual-phrases via local contexts, in: Proceedings of ACPR’13, 2013, pp....
  • D. Nister, H. Stewenius, Scalable recognition with a vocabulary tree, in: Proceedings of CVPR’06, 2006, pp....
  • Cited by (9)

    • Grayscale-inversion and rotation invariant image description with sorted LBP features

      2021, Signal Processing: Image Communication
      Citation Excerpt :

      The extraction of texture features is an important building block for many computer vision and pattern recognition tasks such as texture classification [1–3], texture segmentation [4,5], image matching [6,7], object detection [8,9], scene recognition [10,11], and face recognition [12–14].

    • Clustering based one-to-one hypergraph matching with a large number of feature points

      2019, Signal Processing: Image Communication
      Citation Excerpt :

      Graph matching has received significant attention to solve many computer vision problems. It has been successfully applied in computer vision areas such as feature correspondence [1–3], object recognition [4,5], texture regularity discovery [6], object tracking [7], shape matching [8–10], surface registration [11,12] and bioinformatics techniques [13]. Graph-matching algorithms are also applied to document image analysis, image reconstruction [14,15] and optical character recognition (OCR) [16,17].

    • A medical image retrieval method based on texture block coding tree

      2017, Signal Processing: Image Communication
      Citation Excerpt :

      In medical image retrieval, CBIR has been widely studied during past decades [1–6]. Currently, CBIR utilizes a variety of content features (e.g. grayscale feature, texture, color, SIFT) to represent images and measures similarities [7–14]. Unay et al. (2010) utilized LBP descriptor with a spatial index in brain MR image retrieval [15].

    View all citing articles on Scopus
    View full text