Improving feature matching strategies for efficient image retrieval
Introduction
Due to the popularity of social networks, images and videos can be captured and shared easily, leading to an explosive growth of multimedia content. As a result, a number of related applications and techniques have been developed to store and process the immense multimedia data. Among all, content-based image retrieval is one of the most challenging tasks [1]. Given a query image, the aim of such systems is to return a series of relevant images from a huge image repository.
Recent years have witnessed a promising progress on image retrieval. Most image retrieval frameworks are built upon local feature based image retrieval models such as Bag-of-Words (BoW) [2] and Selective Match Kernel (SMK) [3] due to their effectiveness and efficiency. In such models, local features are firstly detected from small patches of an image. These patches are of different sizes and contain abundant information of the image. Then, robust algorithms are applied to mine and encode the structure of patches into feature descriptors. Usually, a large amount of descriptors are used to generate a visual vocabulary and the image can be then converted into several feature vectors by quantizing its descriptors to the visual vocabulary. Thanks to the inverted file indexing structure [4], fast retrieval can be obtained when dealing with a large amount of images.
Actually, one of the most crucial steps to impact the retrieval performance is feature matching within inverted files, which determines the matching score between two images directly. Unlike words in text, visual features are usually represented by numerical vectors, and do not have particular meanings like bird, flower or tower, making it challenging to be matched as precisely as text words. False positive matches appear inevitably even though advanced algorithms are applied mainly due to the following issues.
The first one is the locality of features. Local patches of different semantics may be described similarly and would be assigned to the same visual word, which will dramatically affect the visual matching accuracy. Fig. 1 shows a typical example of false match between two images, where the corresponding pair of features are so similar that even human beings are likely to feel confused if only limited local patches (within circles) are concerned. The second issue is quantization error. Irrelevant descriptors may be assigned to the same visual word when the vocabulary size k is small. On the contrary, similar descriptors may be assigned to different visual words when k is very large. Both of these cases will reduce the discriminative power of local descriptors and thus corrupt the retrieval accuracy. The third one is the burstiness [5] phenomenon. Due to the properties of local features as well as the retrieval model, a visual element may appear more than once in an image which causes multiple matches and impacts the ranking scores of candidate images.
As might be expected, this work is committed to investigating how to enhance the performance of image retrieval through refining the process of feature matching. Three major contributions are proposed as follows:
- •
The technique of Twin Feature (TF) [6] is reviewed and further studied to become more efficient with binary signatures, which retains the favorable structure of inverted files, and provides higher retrieval efficiency.
- •
A technique named as Dynamic Normalization (DN) is designed to effectively control the impact of penalization generated by Similarity Maximal Matching (SMM) [6] to resolve burstiness and thus improve the final retrieval accuracy.
- •
An effective image retrieval system based on the state-of-the-art SMK [3] with TF, SMM and DN integrated is designed for image retrieval.
Generally speaking, the aim of this work is to further improve feature matching for image retrieval and can be considered as an extension of our previous work [6]. Both the methods TF and SMM are further discussed and improved. Moreover, to meet the needs of escalating computational demands, the cloud-based heterogeneous computing framework for large-scale image retrieval [7] is employed to carry out the proposed methods. Experimental results on two benchmark image retrieval datasets Oxford5k1 and Paris6k2 demonstrate the effectiveness of proposed methods.
The rest of this paper is organized as follows. Section 2 gives an overview of related works. The details of the proposed image retrieval system are presented in Section 3. In Section 4, the experimental results are presented and analyzed. Finally, Section 5 concludes this paper.
Section snippets
Related work
Recently, a number of works have been made to improve image retrieval from the aspects of extracting more discriminative features, optimizing quantization process, further investigating the attributes of features and designing better indexing or matching strategies. Popular local feature detectors including Difference-of-Gaussian (DoG) [8] and Hessian-Affine [9] are proposed to find local image patches containing abundant structural information. These patches are often described by local
Proposed image retrieval system
In this section, the proposed image retrieval system is introduced. In addition, the hardware architecture of our system is built upon a GPU based heterogeneous cluster to accelerate large-scale multimedia data mining as discussed in [7]. Fig. 2 illustrates the technological process of the proposed image retrieval system, in which the Twin Feature (TF) and its binary version TFb, Selective Maximal Matching (SMM) and Dynamic Normalization (DN) are integrated which significantly improve the final
Experimental results
In this section, we first give a brief introduction about our implementation details for image retrieval. And then, a number of evaluations are presented to analyze and compare the performances achieved by the proposed methods.
Conclusion
In this paper, an in-depth study of the integration of Twin Feature and similarity maximal matching on the state-of-the-art retrieval model SMK is presented and discussed. And then, two novel methods based on them are proposed to further improve the performance of image retrieval. The first one is the Twin Feature with binary signatures, which retains the favorable structure of inverted files, and provides high retrieval efficiency. The second is called dynamic normalization, which is designed
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants 61472281 and 61622115, and the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. GZ2015005).
References (42)
- et al.
Term-weighting approaches in automatic text retrieval
Inf. Process. Manag.
(1988) - et al.
Ranking consistency for image matching and object retrieval
Pattern Recognit.
(2014) - R. Datta, J. Li, J.-Z. Wang, Content-based image retrieval: approaches and trends of the new age, in: 7th ACM SIGMM...
- J. Sivic, A. Zisserman, Video google: a text retrieval approach to object matching in videos, in: Proceedings of...
- G. Tolias, Y. Avrithis, H. Jégou, To aggregate or not to aggregate: selective match kernels for image search, in:...
- H. Jégou, M. Douze, C. Schmid, On the burstiness of visual elements, in: Proceedings of CVPR’09, 2009, pp....
- L. Wang, H. Wang, Twin feature and similarity maximal matching for image retrieval, in: Proceedings of ICMR’15, 2015,...
- et al.
CHCFa cloud-based heterogeneous computing framework for large-scale image retrieval
IEEE Trans. Circuits Syst. Video Technol.
(2015) Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)- et al.
Scale & affine invariant interest point detectors
Int. J. Comput. Vis.
(2004)
USBultrashort binary descriptor for fast visual matching and retrieval
IEEE Trans. Image Process.
Hough pyramid matchingspeeded-up geometry re-ranking for large scale image retrieval
Int. J. Comput. Vis.
Nested-SIFT for efficient image matching and retrieval
IEEE Multimed.
Cited by (9)
Grayscale-inversion and rotation invariant image description with sorted LBP features
2021, Signal Processing: Image CommunicationCitation Excerpt :The extraction of texture features is an important building block for many computer vision and pattern recognition tasks such as texture classification [1–3], texture segmentation [4,5], image matching [6,7], object detection [8,9], scene recognition [10,11], and face recognition [12–14].
Clustering based one-to-one hypergraph matching with a large number of feature points
2019, Signal Processing: Image CommunicationCitation Excerpt :Graph matching has received significant attention to solve many computer vision problems. It has been successfully applied in computer vision areas such as feature correspondence [1–3], object recognition [4,5], texture regularity discovery [6], object tracking [7], shape matching [8–10], surface registration [11,12] and bioinformatics techniques [13]. Graph-matching algorithms are also applied to document image analysis, image reconstruction [14,15] and optical character recognition (OCR) [16,17].
A medical image retrieval method based on texture block coding tree
2017, Signal Processing: Image CommunicationCitation Excerpt :In medical image retrieval, CBIR has been widely studied during past decades [1–6]. Currently, CBIR utilizes a variety of content features (e.g. grayscale feature, texture, color, SIFT) to represent images and measures similarities [7–14]. Unay et al. (2010) utilized LBP descriptor with a spatial index in brain MR image retrieval [15].
Relative Examination of Texture Feature Extraction Techniques in Image Retrieval Systems by Employing Neural Network: An Experimental Review
2021, Advances in Intelligent Systems and ComputingExperimental analogy of different texture feature extraction techniques in image retrieval systems
2020, Multimedia Tools and ApplicationsEmploying divergent machine learning classifiers to upgrade the preciseness of image retrieval systems
2020, Cybernetics and Information Technologies