Abstract
Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields – MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as BLASTed Image Linkage (BASIL), is empirically validated using various real data sets. This work can be viewed as the “first” step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic Local Alignment Search Tool. J. Mol. Biology 215(3), 403–410 (1990)
Cai, D., He, X., Han, J.: Spectral Regression: A Unified Subspace Learning Framework for Content-Based Image Retrieval. In: ACM Multimedia (2007)
Dong, W., Wang, Z., Charikar, M., Li, K.: Efficiently Matching Sets of Features with Random Histograms. In: ACM Multimedia (2008)
Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Caching Contentbased Queries for Robust and Efficient Image Retrieval. In: EDBT (2009)
Foo, J.J., Zobel, J., Sinha, R.: Clustering near-duplicate images in large collections. In: ACM MIR (2007)
Foo, J.J., Zobel, J., Sinha, R., Tahaghoghi, S.M.M.: Detection of Near-Duplicate Images for Web Search. In: ACM CIVR (2007)
Howarth, P., Rüger, S.M.: Evaluation of Texture Features for Content-Based Image Retrieval. In: ACM CIVR (2004)
Ke, Y., Sukthankar, R., Huston, L.: An Efficient Parts-based Near-Duplicate and Sub-Image Retrieval System. In: ACM Multimedia (2004)
Kim, H., Chang, H., Liu, H., Lee, J., Lee, D.: BIM: Image Matching using Biological Gene Sequence Alignment. In: IEEE Int’l Conf. on Image Processing (ICIP) (November 2009)
Mehta, B., Nangia, S., Gupta, M., Nejdl, W.: Detecting Image Spam using Visual Features and Near Duplicate Detection. In: WWW (2008)
Valle, E., Cord, M., Philipp-Foliguet, S.: High-dimensional Descriptor Indexing for Large Multimedia Databases. In: ACM CIKM (2008)
Wu, X., Hauptmann, A.G., Ngo, C.-W.: Practical Elimination of Near-Duplicates from Web Video Search. In: ACM Multimedia (2007)
Zhang, D.-Q., Chang, S.-F.: Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning. In: ACM Multimedia, October 2004, pp. 877–884 (2004)
Zhao, W.-L., Ngo, C.-W., Tan, H.-K., Wu, X.: Near-Duplicate Keyframe Identification with Interest Point Matching and Pattern Learning. IEEE Trans. On Multimedia 9, 1037–1048 (2007)
Zheng, Y.-T., Neo, S.-Y., Chua, T.-S., Tian, Q.: The Use of Temporal, Semantic and Visual Partitioning Model for Efficient Near-Duplicate Keyframe Detection in Large Scale News Corpus. In: ACM CIVR (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, Hs., Chang, HW., Lee, J., Lee, D. (2010). BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-12275-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)