Improving feature matching strategies for efficient image retrieval

doi:10.1016/j.image.2017.02.006

Signal Processing: Image Communication

Volume 53, April 2017, Pages 86-94

https://doi.org/10.1016/j.image.2017.02.006 Get rights and content

Highlights

•
Twin Feature is further discussed and optimized into binary code to reduce memory requirement.
•
Dynamic normalization is designed to adjust the norm of image when calculating similarity scores.
•
An effective image retrieval framework based on SMK with the proposed methods integrated is designed.

Abstract

A number of state-of-the-art image retrieval systems have been built upon non-aggregated techniques such as Hamming Embedding (HE) and Selective Match Kernel (SMK). However, the retrieval performances of these techniques are directly affected by the quality of feature matching during the search process. In general, undesirable matched results appear mainly due to the following three aspects: (1) the locality of local features, (2) the quantization errors and (3) the phenomenon of burstiness. In this paper, starting from the framework of SMK, an in-depth study of the integration of Twin Feature (TF) and Similarity Maximal Matching (SMM) is fully investigated. To be specific, two effective modifications based on TF and SMM are proposed to further improve the quality of feature matching. On one hand, the original float vectors of TF are replaced with efficient binary signatures, which achieve relatively high efficiency and comparable accuracy of retrieval. On the other hand, Dynamic Normalization (DN) is designed to effectively control the impact of penalization generated by SMM and improve the performance with almost no extra cost. At last, an efficient image retrieval system is designed and realized based on a cloud-based heterogeneous computing framework through Apache Spark and multiple GPUs to deal with large-scale tasks. Experimental results demonstrate that the proposed system can greatly refine the visual matching process and improve image retrieval results.

Introduction

Due to the popularity of social networks, images and videos can be captured and shared easily, leading to an explosive growth of multimedia content. As a result, a number of related applications and techniques have been developed to store and process the immense multimedia data. Among all, content-based image retrieval is one of the most challenging tasks [1]. Given a query image, the aim of such systems is to return a series of relevant images from a huge image repository.

Recent years have witnessed a promising progress on image retrieval. Most image retrieval frameworks are built upon local feature based image retrieval models such as Bag-of-Words (BoW) [2] and Selective Match Kernel (SMK) [3] due to their effectiveness and efficiency. In such models, local features are firstly detected from small patches of an image. These patches are of different sizes and contain abundant information of the image. Then, robust algorithms are applied to mine and encode the structure of patches into feature descriptors. Usually, a large amount of descriptors are used to generate a visual vocabulary and the image can be then converted into several feature vectors by quantizing its descriptors to the visual vocabulary. Thanks to the inverted file indexing structure [4], fast retrieval can be obtained when dealing with a large amount of images.

Actually, one of the most crucial steps to impact the retrieval performance is feature matching within inverted files, which determines the matching score between two images directly. Unlike words in text, visual features are usually represented by numerical vectors, and do not have particular meanings like bird, flower or tower, making it challenging to be matched as precisely as text words. False positive matches appear inevitably even though advanced algorithms are applied mainly due to the following issues.

The first one is the locality of features. Local patches of different semantics may be described similarly and would be assigned to the same visual word, which will dramatically affect the visual matching accuracy. Fig. 1 shows a typical example of false match between two images, where the corresponding pair of features are so similar that even human beings are likely to feel confused if only limited local patches (within circles) are concerned. The second issue is quantization error. Irrelevant descriptors may be assigned to the same visual word when the vocabulary size k is small. On the contrary, similar descriptors may be assigned to different visual words when k is very large. Both of these cases will reduce the discriminative power of local descriptors and thus corrupt the retrieval accuracy. The third one is the burstiness [5] phenomenon. Due to the properties of local features as well as the retrieval model, a visual element may appear more than once in an image which causes multiple matches and impacts the ranking scores of candidate images.

As might be expected, this work is committed to investigating how to enhance the performance of image retrieval through refining the process of feature matching. Three major contributions are proposed as follows:

•
The technique of Twin Feature (TF) [6] is reviewed and further studied to become more efficient with binary signatures, which retains the favorable structure of inverted files, and provides higher retrieval efficiency.
•
A technique named as Dynamic Normalization (DN) is designed to effectively control the impact of penalization generated by Similarity Maximal Matching (SMM) [6] to resolve burstiness and thus improve the final retrieval accuracy.
•
An effective image retrieval system based on the state-of-the-art SMK [3] with TF, SMM and DN integrated is designed for image retrieval.

Generally speaking, the aim of this work is to further improve feature matching for image retrieval and can be considered as an extension of our previous work [6]. Both the methods TF and SMM are further discussed and improved. Moreover, to meet the needs of escalating computational demands, the cloud-based heterogeneous computing framework for large-scale image retrieval [7] is employed to carry out the proposed methods. Experimental results on two benchmark image retrieval datasets Oxford5k¹ and Paris6k² demonstrate the effectiveness of proposed methods.

The rest of this paper is organized as follows. Section 2 gives an overview of related works. The details of the proposed image retrieval system are presented in Section 3. In Section 4, the experimental results are presented and analyzed. Finally, Section 5 concludes this paper.

Section snippets

Related work

Recently, a number of works have been made to improve image retrieval from the aspects of extracting more discriminative features, optimizing quantization process, further investigating the attributes of features and designing better indexing or matching strategies. Popular local feature detectors including Difference-of-Gaussian (DoG) [8] and Hessian-Affine [9] are proposed to find local image patches containing abundant structural information. These patches are often described by local

Proposed image retrieval system

In this section, the proposed image retrieval system is introduced. In addition, the hardware architecture of our system is built upon a GPU based heterogeneous cluster to accelerate large-scale multimedia data mining as discussed in [7]. Fig. 2 illustrates the technological process of the proposed image retrieval system, in which the Twin Feature (TF) and its binary version TFb, Selective Maximal Matching (SMM) and Dynamic Normalization (DN) are integrated which significantly improve the final

Experimental results

In this section, we first give a brief introduction about our implementation details for image retrieval. And then, a number of evaluations are presented to analyze and compare the performances achieved by the proposed methods.

Conclusion

In this paper, an in-depth study of the integration of Twin Feature and similarity maximal matching on the state-of-the-art retrieval model SMK is presented and discussed. And then, two novel methods based on them are proposed to further improve the performance of image retrieval. The first one is the Twin Feature with binary signatures, which retains the favorable structure of inverted files, and provides high retrieval efficiency. The second is called dynamic normalization, which is designed

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61472281 and 61622115, and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. GZ2015005).

References (42)

G. Salton et al.
Term-weighting approaches in automatic text retrieval
Inf. Process. Manag.
(1988)
Y. Chen et al.
Ranking consistency for image matching and object retrieval
Pattern Recognit.
(2014)
R. Datta, J. Li, J.-Z. Wang, Content-based image retrieval: approaches and trends of the new age, in: 7th ACM SIGMM...
J. Sivic, A. Zisserman, Video google: a text retrieval approach to object matching in videos, in: Proceedings of...
G. Tolias, Y. Avrithis, H. Jégou, To aggregate or not to aggregate: selective match kernels for image search, in:...
H. Jégou, M. Douze, C. Schmid, On the burstiness of visual elements, in: Proceedings of CVPR’09, 2009, pp....
L. Wang, H. Wang, Twin feature and similarity maximal matching for image retrieval, in: Proceedings of ICMR’15, 2015,...
H. Wang et al.
CHCFa cloud-based heterogeneous computing framework for large-scale image retrieval
IEEE Trans. Circuits Syst. Video Technol.
(2015)
D. Lowe
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)
K. Mikolajczyk et al.
Scale & affine invariant interest point detectors
Int. J. Comput. Vis.
(2004)

H. Bay, T. Tuytelaars, L.V. Gool, SURF: speeded up robust features, in: Proceedings of ECCV’06, 2007, pp....

J. Heinly, E. Dunn, J.-M. Frahm, Comparative evaluation of binary features, in: Proceedings of ECCV’12, 2012, pp....

W. Zhou, H. Li, M. Wang, Q. Lu, Q. Tian, Binary SIFT: Towards efficient feature matching verification for image search,...

S. Zhang et al.

USBultrashort binary descriptor for fast visual matching and retrieval

IEEE Trans. Image Process.

(2014)

H. Jégou, M. Douze, C. Shimid, Hamming embedding and weak geometric consistency for large scale image search, in:...

Y. Cao, C. Wang, Z. Li, L. Zhang, L. Zhang, Spatial-bag-of-features, in: Proceedings of CVPR’10, 2010, pp....

Y. Avrithis et al.

Hough pyramid matchingspeeded-up geometry re-ranking for large scale image retrieval

Int. J. Comput. Vis.

(2014)

P. Xu et al.

Nested-SIFT for efficient image matching and retrieval

IEEE Multimed.

(2013)

E. Zhang, M. Mayo, Improving bag-of-words model with spatial information, in: Proceedings of IVCNZ’10, 2010, pp....

E. Roman-Rangel, S. Marchand-Maillet, Bag-of-visual-phrases via local contexts, in: Proceedings of ACPR’13, 2013, pp....

D. Nister, H. Stewenius, Scalable recognition with a vocabulary tree, in: Proceedings of CVPR’06, 2006, pp....

Cited by (9)

Grayscale-inversion and rotation invariant image description with sorted LBP features
2021, Signal Processing: Image Communication
Citation Excerpt :
The extraction of texture features is an important building block for many computer vision and pattern recognition tasks such as texture classification [1–3], texture segmentation [4,5], image matching [6,7], object detection [8,9], scene recognition [10,11], and face recognition [12–14].
Local binary pattern (LBP) is sensitive to inverse grayscale changes. To overcome this problem, several methods map each LBP code and its complement to the minimum one. However, without distinguishing LBP codes and their complements, these methods show limited description ability. In this paper, we introduce a generic histogram sorting method which exploits pattern transition rules to preserve the distribution information of LBP codes and their complements. Based on this method, we develop a series of sorted LBP (SLBP) descriptors, including pairwise sorted ones and fully sorted ones, which are all invariant to grayscale inversion and image rotation. Since SLBP focuses on encoding difference-sign information, it is further generalized to embed difference-magnitude LBP features to obtain complementary representations. We also propose an invariant pyramid pooling strategy to aggregate SLBP features into a pyramid image representation. Experiments on several benchmark texture databases and one newly collected image database (grayscale-inversion images, GII) demonstrate the effectiveness of our descriptors for image classification under (linear or nonlinear) grayscale-inversion and rotation changes. The source code will be available at https://github.com/stc-cqupt/slbp.
Clustering based one-to-one hypergraph matching with a large number of feature points
2019, Signal Processing: Image Communication
Citation Excerpt :
Graph matching has received significant attention to solve many computer vision problems. It has been successfully applied in computer vision areas such as feature correspondence [1–3], object recognition [4,5], texture regularity discovery [6], object tracking [7], shape matching [8–10], surface registration [11,12] and bioinformatics techniques [13]. Graph-matching algorithms are also applied to document image analysis, image reconstruction [14,15] and optical character recognition (OCR) [16,17].
Hypergraph matching is a useful technique for multiple feature point matching. In the last decade, hypergraph matching has shown great potential for solving many challenging problems of computer vision. The matching of a large number of feature points in hypergraph constraints is an NP-hard problem. It requires high computational complexity in many algorithms such as spectral graph matching, tensor graph matching and reweighted random walk matching. In this paper, we propose a computationally efficient clustering based algorithm for one-to-one hypergraph matching, which clusters a large hypergraph into many sub-hypergraphs. These sub-hypergraphs can be matched based on a tensor model, which guarantees the maximum matching score. The results from the sub-hypergraphs are then used to match all feature points in the entire hypergraph. Simulation results on real and synthetic data sets validates the efficiency of the proposed method.
A medical image retrieval method based on texture block coding tree
2017, Signal Processing: Image Communication
Citation Excerpt :
In medical image retrieval, CBIR has been widely studied during past decades [1–6]. Currently, CBIR utilizes a variety of content features (e.g. grayscale feature, texture, color, SIFT) to represent images and measures similarities [7–14]. Unay et al. (2010) utilized LBP descriptor with a spatial index in brain MR image retrieval [15].
Content-based medical image retrieval (CBMIR) has been widely studied for computer aided diagnosis. Accurate and comprehensive retrieval results are effective to facilitate diagnosis and treatment. Texture is one of the most important features used in CBMIR. Most of existing methods utilize the distances between matching point pairs for texture similarity measurement. However, the distance based similarity measurements are of low tolerance to slight texture shifts, which result in an excessive sensitivity. Furthermore, with the increase of the number of texture points, their time complexity is in explosive growth. In this paper, a new medical image retrieval model is presented based on an iterative texture block coding tree. The corresponding methods for coarse-grained and fine-grained similarity matching are also proposed. Moreover, a multi-level index structure is designed to enhance the retrieval efficiency. Experimental results show that, our methods are of high efficiency and appropriate tolerance on slight shifts, and achieve a relative better retrieval performance in comparison of other existing methods.
Relative Examination of Texture Feature Extraction Techniques in Image Retrieval Systems by Employing Neural Network: An Experimental Review
2021, Advances in Intelligent Systems and Computing
Experimental analogy of different texture feature extraction techniques in image retrieval systems
2020, Multimedia Tools and Applications
Employing divergent machine learning classifiers to upgrade the preciseness of image retrieval systems
2020, Cybernetics and Information Technologies

View all citing articles on Scopus

View full text

Improving feature matching strategies for efficient image retrieval

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed image retrieval system

Experimental results

Conclusion

Acknowledgments

Inf. Process. Manag.

Pattern Recognit.

CHCFa cloud-based heterogeneous computing framework for large-scale image retrieval

IEEE Trans. Circuits Syst. Video Technol.

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.

Scale & affine invariant interest point detectors

Int. J. Comput. Vis.

USBultrashort binary descriptor for fast visual matching and retrieval

IEEE Trans. Image Process.

Hough pyramid matchingspeeded-up geometry re-ranking for large scale image retrieval

Int. J. Comput. Vis.

Nested-SIFT for efficient image matching and retrieval

IEEE Multimed.