Discriminative Supervised Hashing for Cross-Modal Similarity Search☆
Introduction
Recently, the explosion of multimedia data, such as image, text, video, and audio, increases the demands for high efficiency, low storage cost and effectiveness of retrieval applications. Hashing has received much attention in information retrieval and related areas because of its high retrieval processing speed. Among many hashing methods [[1], [2], [3], [4], [5], [6], [7], [8]], Minimal loss hashing (MLH) [2] is a framework based on the latent structural SVM. Kernel-based supervised hashing (KSH) [3], Supervised discrete hashing with point-wise labels (SDH) [1] and Scalable discrete hashing with pairwise supervision (COSDISH) [7] have been shown to deliver reasonable retrieval performance. However, the above methods have been designed for unimodal data setting and are not directly applicable to cross-modal retrieval.
Cross-modal is a very interesting scenario. For example, for a given image, it is possible to retrieve semantically relevant texts from the database. It is hard to directly measure the similarity between different modalities. To tackle the problem, most existing methods [[9], [10], [11], [12], [13], [14]] focus on finding a common subspace where the heterogeneous data can be measured. For instance, the main idea of the Inter-Media Hashing (IMH) [10] is that two points from the same neighborhood should be as close as possible in the common subspace. Semi-Paired Discrete Hashing (SPDH) [13] explores the common latent subspace by constructing a cross-view similarity graph. Fusion Similarity Hashing (FSH) [9] learns the hashing function by preserving the fusion similarity. However, the learned hashing codes have weak discrimination ability. Benefitting from the discriminative information provided by category labels, supervised hashing methods [[15], [16], [17], [18]] often improve the retrieval accuracy. Cross View Hashing (CVH) [17] aims to minimize the hamming distance between data objects belonging to the same class in a common hamming space. Semantic Correlation Maximization (SCM) [15] learns discriminative binary codes based on the cosine similarity between the semantic label vectors. Supervised Matrix Factorization Hashing (SMFH) [18] integrates graph regularization into the hashing learning framework. However, they tend to learn hashing through preserving the similarities of the inter-modal and intra-modal data but cannot ensure the learned hashing codes are semantically discriminative. In fact, it is very important that those samples with the same label have similar binary codes for cross-modal similarity search. Moreover, the computational cost of similarities of the inter-modal and intra-modal data is relatively high.
To tackle the problem, we propose a DSH model which integrates the classifier learning and matrix factorization with a consistent label into hashing learning framework. Furthermore, kernelized hash functions are learned for out-of-sample extension. Fig. 1 illustrates the overall framework of the proposed DSH. Compared with Ref. [19], our framework explores the shared structure of each category. The main contributions of DSH hashing method are given as follows:
- 1.
To learn more discriminative binary codes, DSH learns unified binary codes by combining classifier learning and label consistent matrix factorization.
- 2.
DSH learns hashing functions for each modality through employing the kernel method which can capture nonlinear structural information of the object.
Structurally, the rest of this paper falls into three sections. Our model and optimization algorithm are presented in Section 2. Section 3 shows the experimental results on three available datasets. Finally, the conclusions are drawn in Section 4. The source code of DSH proposed in this paper is available.
Section snippets
Notation and problem statement
Suppose that O = [o1,o2,…,on] is a set of n training instances with m modalities pairs. denotes the m-th modality, where is the i-th sample of X(m) with dimension dm. L = l1,l2,…,ln ∈ Rc×n is a label matrix, where c denotes the number of categories. lik is the k-th element of li, lik = 1 if the i-th instance belongs to the k-th category and lik = 0 otherwise. Here, an instance o1 can be classified into multiple categories. Without loss of generality, data
Datasets
Wiki [22] contains 2866 multimedia documents harvested from Wikipedia. Every document consists of a pair of an image and a text description, and every paired sample is classified as one of 10 categories. We take 2866 pairs from the dataset to form the training set and the rest as a test set.
MirFlickr25k dataset [23] is collected from the Flickr website. It consists of 25,000 image-text pairs and each pair is assigned into some of 20 categories. We keep 20,015 pairs which have at least 20
Conclusion
In this paper, we propose a new model (DSH) which integrates subspace learning, classifier learning and the basis matrix learning into a joint framework to learn the unified hashing features that both retain discrimination ability and preserve class-specific content by using the label matrix. In contrast to previous works, a nonlinear method is introduced to learn a common subspace. We adopt the efficient DCC algorithm to optimize the problem with discrete constraint. We evaluate our method on
Declaration of Competing Interest
The authors declare no conflict of interest.
Acknowledgments
The paper is supported by theNational Natural Science Foundation of China (Grant Nos.61373055,61672265), UK EPSRC GrantEP/N007743/1,MURI / EPSRC / DSTL GrantEP/R018456/1, and the111 Project of Ministry of Education of China (Grant No.B12018).
References (25)
- et al.
Supervised discrete hashing
- et al.
Minimal loss hashing for compact binary codes
- et al.
Supervised hashing with kernels
- et al.
Discrete graph hashing
- et al.
Towards optimal binary code learning via ordinal embedding
- et al.
Ordinal constrained binary code learning for nearest neighbor search
- et al.
Column sampling based discrete supervised hashing
- et al.
Toward optimal manifold hashing via discrete locally linear embedding
IEEE Trans. Image Process.
(2017) - et al.
Cross-modality binary code learning via fusion similarity hashing
- et al.
Inter-media hashing for large-scale retrieval from heterogeneous data sources
Robust and flexible discrete hashing for cross-modal similarity search
IEEE Trans. Circuits Syst. Video Technol.
Robust cross-view hashing for multimedia retrieval
IEEE Signal Process Lett.
Cited by (12)
Online supervised collective matrix factorization hashing for cross-modal retrieval
2023, Applied IntelligenceDeep Discriminative Hashing for Cross-Modal Hashing Based Computer-Aided Diagnosis
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Label Guided Discrete Hashing for Cross-Modal Retrieval
2022, IEEE Transactions on Intelligent Transportation SystemsAdaptive multi-modal fusion hashing via Hadamard matrix
2022, Applied Intelligence
- ☆
This paper has been recommended for acceptance by Sinisa Todorovic.