skip to main content
research-article

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

Published:05 February 2015Publication History
Skip Abstract Section

Abstract

Due to the ubiquity of sensor-equipped smartphones, it has become increasingly feasible for users to capture videos together with associated geographic metadata, for example the location and the orientation of the camera. Such contextual information creates new opportunities for the organization and retrieval of geo-referenced videos. In this study we explore the task of landmark retrieval through the analysis of two types of state-of-the-art techniques, namely media-content-based and geocontext-based retrievals. For the content-based method, we choose the Spatial Pyramid Matching (SPM) approach combined with two advanced coding methods: Sparse Coding (SC) and Locality-Constrained Linear Coding (LLC). For the geo-based method, we present the Geo Landmark Visibility Determination (GeoLVD) approach which computes the visibility of a landmark based on intersections of a camera's field-of-view (FOV) and the landmark's geometric information available from Geographic Information Systems (GIS) and services. We first compare the retrieval results of the two methods, and discuss the strengths and weaknesses of each approach in terms of precision, recall and execution time. Next we analyze the factors that affect the effectiveness for the content-based and the geo-based methods, respectively. Finally we propose a hybrid retrieval method based on the integration of the visual (content) and geographic (context) information, which is shown to achieve significant improvements in our experiments. We believe that the results and observations in this work will enlighten the design of future geo-referenced video retrieval systems, improve our understanding of selecting the most appropriate visual features for indexing and searching, and help in selecting between the most suitable methods for retrieval based on different conditions.

References

  1. Giuseppe Amato, Fabrizio Falchi, and Fausto Rabitti. 2012. Landmark recognition in VISITO Tuscany. In Multimedia for Cultural Heritage, 1--13.Google ScholarGoogle Scholar
  2. Sakire Arslan Ay, Roger Zimmermann, and SeonHo Kim. 2010. Relevance ranking in georeferenced video search. Multimedia Syst. 16, 2, 105--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sakire Arslan Ay, Roger Zimmermann, and Seon Ho Kim. 2008. Viewable scene modeling for geospatial video search. In Proceedings of the ACM International Conference on Multimedia. 309--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Syst. 16, 6, 345--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias, and Evaggelos Spyrou. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the ACM International Conference on Multimedia. 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. M. Chen, G. Baatz, K. Koser, S. S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, Xin Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. 2011a. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 737--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tao Chen, Kim-Hui Yap, and L.-P. Chau. 2011b. Integrated content and context analysis for mobile landmark recognition. IEEE Trans. Circuits Syst. Video Technol. 1476--1486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. 2007. Total recall: automatic query expansion with a generative feature model for object retrieval. In Proceedings of the International Conference on Computer Vision. 1--8.Google ScholarGoogle Scholar
  9. G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 1--22.Google ScholarGoogle Scholar
  10. Shaolei Feng and R. Manmatha. 2008. A discrete direct retrieval model for image and video retrieval. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 427--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Efstratios Gavves, Cees G. M. Snoek, and Arnold W. M. Smeulders. 2012. Visual synonyms for landmark image retrieval. Comput. Vision Image Understand 116, 12, 238--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, and FengWu. 2012. 3D visual phrases for landmark recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Hecht. 2001. Optics (4th ed.). Addison-Wesley Publishing Company.Google ScholarGoogle Scholar
  14. N.V. Hoàng, V. Gouet-Brunet, M. Rukoz, and M. Manouvrier. 2010. Embedding spatial information into image content description for scene retrieval. Pattern Recog. 43, 9, 3013--3024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zi Huang, Bo Hu, Hong Cheng, Heng Tao Shen, Hongyan Liu, and Xiaofang Zhou. 2010. Mining near-duplicate graph for cluster-based reranking of web video search results. ACM Trans. Inf. Syst. 22:1--22:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ramesh Jain and Pinaki Sinha. 2010. Content without context is meaningless. In Proceedings of the ACM International Conference on Multimedia. 1259--1268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pascal Kelm, Sebastian Schmiedeke, and Thomas Sikora. 2011. A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs. In Proceedings of the ACM Workshop on Social and Behavioural Networked Media Access. 15--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lyndon S. Kennedy and Mor Naaman. 2008. Generating diverse and representative image search results for landmarks. In Proceedings of the International Conference on World Wide Web. 297--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Youngwoo Kim, Jinha Kim, and Hwanjo Yu. 2012. GeoSearch: Georeferenced video retrieval system. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. 1540--1543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans. Multimedia, 1079--1090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2169--2178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhen Li and Kim-Hui Yap. 2012. Content and context boosting for mobile landmark recognition. Signal Process. Lett. 459--462.Google ScholarGoogle Scholar
  23. Xiaotao Liu, Mark Corner, and Prashant Shenoy. 2005. SEVA: sensor-enhanced video annotation. In Proceedings of the ACM International Conference on Multimedia. 618--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew Gallagher. 2011. Geotagging in multimedia and computer vision—a survey. Multimedia Tools Appl. 51, 1, 187--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Otávio A. B. Penatti, Fernanda B. Silva, Eduardo Valle, Valerie Gouet-Brunet, and Ricardo da S. Torres. 2014. Visual word spatial arrangement for image retrieval and classification. Pattern Recognit. 705--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Otávio A. B. Penatti, Lin Tzy Li, Jurandy Almeida, and Ricardo da S. Torres. 2012. A visual approach for video geocoding using bag-of-scenes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Adam Rae, Vannesa Murdock, Pavel Serdyukov, and Pascal Kelm. 2011. Working Notes for the Placing Task at MediaEval 2011.Google ScholarGoogle Scholar
  29. Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, and Roger Zimmermann. 2011. Automatic tag generation and ranking for sensor-rich outdoor videos. In Proceedings of the ACM International Conference on Multimedia. 93--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rainer Simon and Peter Fröhlich. 2007. A mobile application framework for the geospatial web. In Proceedings of the International Conference on World Wide Web. 381--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proceedings of the ACM International Conference on Multimedia. 399--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Fabrice Souvannavong, Bernard Merialdo, and Benoit Huet. 2005. Region-based video content indexing and retrieval. In Proceedings of the International Workshop on Content-Based Multimedia Indexing. 21--23.Google ScholarGoogle Scholar
  33. Xinmei Tian, Linjun Yang, Jingdong Wang, Yichen Yang, Xiuqing Wu, and Xian-Sheng Hua. 2008. Bayesian video search reranking. In Proceedings of the ACM International Conference on Multimedia. 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ville Viitaniemi and Jorma Laaksonen. 2008. Experiments on selection of codebooks for local image feature histograms. In Visual Information Systems, Web-Based Visual Information Search and Management, 126--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3360--3367.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jianchao Yang, Kai Yu, Yihong Gong, and T. Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1794--1801.Google ScholarGoogle Scholar
  37. Kim-Hui Yap, Tao Chen, Zhen Li, and Kui Wu. 2010. A comparative study of mobile-based landmark recognition techniques. IEEE Intell. Syst. 25, 1, 48--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Bo Zhang, Qinlin Li, Hongyang Chao, Bill Chen, Eyal Ofek, and Ying-Qing Xu. 2010. Annotating and navigating tourist videos. In Proceedings of the International Conference on Advances in Geographic Information Systems. 260--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yan-Tao Zheng, Ming Zhao, Yang Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua, and H. Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1085--1092.Google ScholarGoogle Scholar

Index Terms

  1. Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 3
      January 2015
      173 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2733235
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 February 2015
      • Accepted: 1 September 2014
      • Revised: 1 November 2013
      • Received: 1 June 2013
      Published in tomm Volume 11, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader