Efficient Media Retrieval from Non-Cooperative Queries

Shih, Kevin; Di, Wei; Jagadeesh, Vignesh; Piramuthu, Robinson

doi:10.1007/978-3-319-20904-3_35

Kevin Shih¹⁷,
Wei Di¹⁸,
Vignesh Jagadeesh¹⁸ &
…
Robinson Piramuthu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9163))

Included in the following conference series:

International Conference on Computer Vision Systems

1749 Accesses

Abstract

Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names. Using the images from the book cover set from the Stanford Mobile Visual Search dataset and additional book covers and metadata from openlibrary.org, we construct a large scale book cover retrieval dataset, complete with 100 K distractor covers and title and author strings for each.

Because our query images are poorly conditioned for clean text extraction, we propose a method for extracting a matching noisy and erroneous OCR readings and matching it against clean author and book title strings in a standard document look-up problem setup. Finally, we demonstrate how to use this text-matching as a feature in conjunction with popular retrieval features such as VLAD using a simple learning setup to achieve significant improvements in retrieval accuracy over that of either VLAD or the text alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Chapter Google Scholar
Chandrasekhar, V.R., Chen, D.M., Tsai, S.S., Cheung, N.M., Chen, H., Takacs, G., Reznik, Y., Vedantham, R., Grzeszczuk, R., Bach, J., et al.: The stanford mobile visual search data set. In: Proceedings of the Second Annual ACM Conference on Multimedia Systems, pp. 117–122. ACM (2011)
Google Scholar
Chen, D.M., Tsai, S.S., Girod, B., Hsu, C.H., Kim, K.H., Singh, J.P.: Building book inventories using smartphones. In: Proceedings of the International Conference on Multimedia, pp. 651–654. ACM (2010)
Google Scholar
Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 467–471. IEEE (2013)
Google Scholar
Hariharan, B., Malik, J., Ramanan, D.: Discriminative Decorrelation for Clustering and Classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)
Chapter Google Scholar
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)
Chapter Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE (2010)
Google Scholar
Joachims, T.: Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 217–226. ACM, New York (2006). http://doi.acm.org/10.1145/1150402.1150429
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 89–96. IEEE (2011)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)
Google Scholar
Matsushita, K., Iwai, D., Sato, K.: Interactive bookshelf surface for in situ book searching and storing support. In: Proceedings of the 2nd Augmented Human International Conference, p. 2. ACM (2011)
Google Scholar
Navarro, G., Baeza-yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24, 2001 (2000)
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8. IEEE (2007)
Google Scholar
Shahab, A., Shafait, F., Dengel, A.: Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1491–1496. IEEE (2011)
Google Scholar
Shao, H., Svoboda, T., Van Gool, L.: Zubud-zurich buildings database for image based recognition. Computer Vision Lab, Swiss Federal Institute of Technology, Switzerland, Technical report 260 (2003)
Google Scholar
Smith, R.: An overview of the tesseract ocr engine. ICDAR. 7, 629–633 (2007)
Google Scholar
Tsai, S.S., Chen, D., Chen, H., Hsu, C.H., Kim, K.H., Singh, J.P., Girod, B.: Combining image and text features: a hybrid approach to mobile book spine recognition. In: Proceedings of the 19th ACM International Conference on Multimedia, MM 2011, pp. 1029–1032. ACM, New York (2011). http://doi.acm.org/10.1145/2072298.2071930

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Kevin Shih
EBay Research Labs, 2065 Hamilton Ave., San Jose, CA, USA
Wei Di, Vignesh Jagadeesh & Robinson Piramuthu

Authors

Kevin Shih
View author publications
You can also search for this author in PubMed Google Scholar
Wei Di
View author publications
You can also search for this author in PubMed Google Scholar
Vignesh Jagadeesh
View author publications
You can also search for this author in PubMed Google Scholar
Robinson Piramuthu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin Shih .

Editor information

Editors and Affiliations

Aalborg University, Copenhagen, Denmark
Lazaros Nalpantidis
Aalborg University, Copenhagen, Denmark
Volker Krüger
Royal Institute of Technology - KTH, Stockholm, Sweden
Jan-Olof Eklundh
Democritus University of Thrace, Xanthi, Greece
Antonios Gasteratos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shih, K., Di, W., Jagadeesh, V., Piramuthu, R. (2015). Efficient Media Retrieval from Non-Cooperative Queries. In: Nalpantidis, L., Krüger, V., Eklundh, JO., Gasteratos, A. (eds) Computer Vision Systems. ICVS 2015. Lecture Notes in Computer Science(), vol 9163. Springer, Cham. https://doi.org/10.1007/978-3-319-20904-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-20904-3_35
Published: 19 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20903-6
Online ISBN: 978-3-319-20904-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics