Skip to main content

Efficient Media Retrieval from Non-Cooperative Queries

  • Conference paper
  • First Online:
Computer Vision Systems (ICVS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9163))

Included in the following conference series:

  • 1749 Accesses

Abstract

Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names. Using the images from the book cover set from the Stanford Mobile Visual Search dataset and additional book covers and metadata from openlibrary.org, we construct a large scale book cover retrieval dataset, complete with 100 K distractor covers and title and author strings for each.

Because our query images are poorly conditioned for clean text extraction, we propose a method for extracting a matching noisy and erroneous OCR readings and matching it against clean author and book title strings in a standard document look-up problem setup. Finally, we demonstrate how to use this text-matching as a feature in conjunction with popular retrieval features such as VLAD using a simple learning setup to achieve significant improvements in retrieval accuracy over that of either VLAD or the text alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Chandrasekhar, V.R., Chen, D.M., Tsai, S.S., Cheung, N.M., Chen, H., Takacs, G., Reznik, Y., Vedantham, R., Grzeszczuk, R., Bach, J., et al.: The stanford mobile visual search data set. In: Proceedings of the Second Annual ACM Conference on Multimedia Systems, pp. 117–122. ACM (2011)

    Google Scholar 

  3. Chen, D.M., Tsai, S.S., Girod, B., Hsu, C.H., Kim, K.H., Singh, J.P.: Building book inventories using smartphones. In: Proceedings of the International Conference on Multimedia, pp. 651–654. ACM (2010)

    Google Scholar 

  4. Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 467–471. IEEE (2013)

    Google Scholar 

  5. Hariharan, B., Malik, J., Ramanan, D.: Discriminative Decorrelation for Clustering and Classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE (2010)

    Google Scholar 

  8. Joachims, T.: Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 217–226. ACM, New York (2006). http://doi.acm.org/10.1145/1150402.1150429

  9. Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 89–96. IEEE (2011)

    Google Scholar 

  10. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  11. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)

    Google Scholar 

  12. Matsushita, K., Iwai, D., Sato, K.: Interactive bookshelf surface for in situ book searching and storing support. In: Proceedings of the 2nd Augmented Human International Conference, p. 2. ACM (2011)

    Google Scholar 

  13. Navarro, G., Baeza-yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24, 2001 (2000)

    Google Scholar 

  14. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8. IEEE (2007)

    Google Scholar 

  15. Shahab, A., Shafait, F., Dengel, A.: Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1491–1496. IEEE (2011)

    Google Scholar 

  16. Shao, H., Svoboda, T., Van Gool, L.: Zubud-zurich buildings database for image based recognition. Computer Vision Lab, Swiss Federal Institute of Technology, Switzerland, Technical report 260 (2003)

    Google Scholar 

  17. Smith, R.: An overview of the tesseract ocr engine. ICDAR. 7, 629–633 (2007)

    Google Scholar 

  18. Tsai, S.S., Chen, D., Chen, H., Hsu, C.H., Kim, K.H., Singh, J.P., Girod, B.: Combining image and text features: a hybrid approach to mobile book spine recognition. In: Proceedings of the 19th ACM International Conference on Multimedia, MM 2011, pp. 1029–1032. ACM, New York (2011). http://doi.acm.org/10.1145/2072298.2071930

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin Shih .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Shih, K., Di, W., Jagadeesh, V., Piramuthu, R. (2015). Efficient Media Retrieval from Non-Cooperative Queries. In: Nalpantidis, L., Krüger, V., Eklundh, JO., Gasteratos, A. (eds) Computer Vision Systems. ICVS 2015. Lecture Notes in Computer Science(), vol 9163. Springer, Cham. https://doi.org/10.1007/978-3-319-20904-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20904-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20903-6

  • Online ISBN: 978-3-319-20904-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics