research-article

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

Authors:
Yifang Yin

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Beomjoo Seo

Hongik University

Hongik University
View Profile

,
Roger Zimmermann

National University of Singapore

National University of Singapore
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 11 Issue 3Article No.: 39pp 1–21https://doi.org/10.1145/2700287

Published:05 February 2015Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Due to the ubiquity of sensor-equipped smartphones, it has become increasingly feasible for users to capture videos together with associated geographic metadata, for example the location and the orientation of the camera. Such contextual information creates new opportunities for the organization and retrieval of geo-referenced videos. In this study we explore the task of landmark retrieval through the analysis of two types of state-of-the-art techniques, namely media-content-based and geocontext-based retrievals. For the content-based method, we choose the Spatial Pyramid Matching (SPM) approach combined with two advanced coding methods: Sparse Coding (SC) and Locality-Constrained Linear Coding (LLC). For the geo-based method, we present the Geo Landmark Visibility Determination (GeoLVD) approach which computes the visibility of a landmark based on intersections of a camera's field-of-view (FOV) and the landmark's geometric information available from Geographic Information Systems (GIS) and services. We first compare the retrieval results of the two methods, and discuss the strengths and weaknesses of each approach in terms of precision, recall and execution time. Next we analyze the factors that affect the effectiveness for the content-based and the geo-based methods, respectively. Finally we propose a hybrid retrieval method based on the integration of the visual (content) and geographic (context) information, which is shown to achieve significant improvements in our experiments. We believe that the results and observations in this work will enlighten the design of future geo-referenced video retrieval systems, improve our understanding of selecting the most appropriate visual features for indexing and searching, and help in selecting between the most suitable methods for retrieval based on different conditions.

References

Giuseppe Amato, Fabrizio Falchi, and Fausto Rabitti. 2012. Landmark recognition in VISITO Tuscany. In Multimedia for Cultural Heritage, 1--13.Google Scholar
Sakire Arslan Ay, Roger Zimmermann, and SeonHo Kim. 2010. Relevance ranking in georeferenced video search. Multimedia Syst. 16, 2, 105--125. Google ScholarDigital Library
Sakire Arslan Ay, Roger Zimmermann, and Seon Ho Kim. 2008. Viewable scene modeling for geospatial video search. In Proceedings of the ACM International Conference on Multimedia. 309--318. Google ScholarDigital Library
Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Syst. 16, 6, 345--379. Google ScholarDigital Library
Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias, and Evaggelos Spyrou. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the ACM International Conference on Multimedia. 153--162. Google ScholarDigital Library
D. M. Chen, G. Baatz, K. Koser, S. S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, Xin Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. 2011a. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 737--744. Google ScholarDigital Library
Tao Chen, Kim-Hui Yap, and L.-P. Chau. 2011b. Integrated content and context analysis for mobile landmark recognition. IEEE Trans. Circuits Syst. Video Technol. 1476--1486. Google ScholarDigital Library
O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. 2007. Total recall: automatic query expansion with a generative feature model for object retrieval. In Proceedings of the International Conference on Computer Vision. 1--8.Google Scholar
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 1--22.Google Scholar
Shaolei Feng and R. Manmatha. 2008. A discrete direct retrieval model for image and video retrieval. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 427--436. Google ScholarDigital Library
Efstratios Gavves, Cees G. M. Snoek, and Arnold W. M. Smeulders. 2012. Visual synonyms for landmark image retrieval. Comput. Vision Image Understand 116, 12, 238--249. Google ScholarDigital Library
Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, and FengWu. 2012. 3D visual phrases for landmark recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601. Google ScholarDigital Library
E. Hecht. 2001. Optics (4th ed.). Addison-Wesley Publishing Company.Google Scholar
N.V. Hoàng, V. Gouet-Brunet, M. Rukoz, and M. Manouvrier. 2010. Embedding spatial information into image content description for scene retrieval. Pattern Recog. 43, 9, 3013--3024. Google ScholarDigital Library
Zi Huang, Bo Hu, Hong Cheng, Heng Tao Shen, Hongyan Liu, and Xiaofang Zhou. 2010. Mining near-duplicate graph for cluster-based reranking of web video search results. ACM Trans. Inf. Syst. 22:1--22:27. Google ScholarDigital Library
Ramesh Jain and Pinaki Sinha. 2010. Content without context is meaningless. In Proceedings of the ACM International Conference on Multimedia. 1259--1268. Google ScholarDigital Library
Pascal Kelm, Sebastian Schmiedeke, and Thomas Sikora. 2011. A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs. In Proceedings of the ACM Workshop on Social and Behavioural Networked Media Access. 15--20. Google ScholarDigital Library
Lyndon S. Kennedy and Mor Naaman. 2008. Generating diverse and representative image search results for landmarks. In Proceedings of the International Conference on World Wide Web. 297--306. Google ScholarDigital Library
Youngwoo Kim, Jinha Kim, and Hwanjo Yu. 2012. GeoSearch: Georeferenced video retrieval system. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. 1540--1543. Google ScholarDigital Library
Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans. Multimedia, 1079--1090. Google ScholarDigital Library
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2169--2178. Google ScholarDigital Library
Zhen Li and Kim-Hui Yap. 2012. Content and context boosting for mobile landmark recognition. Signal Process. Lett. 459--462.Google Scholar
Xiaotao Liu, Mark Corner, and Prashant Shenoy. 2005. SEVA: sensor-enhanced video annotation. In Proceedings of the ACM International Conference on Multimedia. 618--627. Google ScholarDigital Library
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 91--110. Google ScholarDigital Library
Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew Gallagher. 2011. Geotagging in multimedia and computer vision—a survey. Multimedia Tools Appl. 51, 1, 187--211. Google ScholarDigital Library
Otávio A. B. Penatti, Fernanda B. Silva, Eduardo Valle, Valerie Gouet-Brunet, and Ricardo da S. Torres. 2014. Visual word spatial arrangement for image retrieval and classification. Pattern Recognit. 705--720. Google ScholarDigital Library
Otávio A. B. Penatti, Lin Tzy Li, Jurandy Almeida, and Ricardo da S. Torres. 2012. A visual approach for video geocoding using bag-of-scenes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 1--8. Google ScholarDigital Library
Adam Rae, Vannesa Murdock, Pavel Serdyukov, and Pascal Kelm. 2011. Working Notes for the Placing Task at MediaEval 2011.Google Scholar
Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, and Roger Zimmermann. 2011. Automatic tag generation and ranking for sensor-rich outdoor videos. In Proceedings of the ACM International Conference on Multimedia. 93--102. Google ScholarDigital Library
Rainer Simon and Peter Fröhlich. 2007. A mobile application framework for the geospatial web. In Proceedings of the International Conference on World Wide Web. 381--390. Google ScholarDigital Library
Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proceedings of the ACM International Conference on Multimedia. 399--402. Google ScholarDigital Library
Fabrice Souvannavong, Bernard Merialdo, and Benoit Huet. 2005. Region-based video content indexing and retrieval. In Proceedings of the International Workshop on Content-Based Multimedia Indexing. 21--23.Google Scholar
Xinmei Tian, Linjun Yang, Jingdong Wang, Yichen Yang, Xiuqing Wu, and Xian-Sheng Hua. 2008. Bayesian video search reranking. In Proceedings of the ACM International Conference on Multimedia. 131--140. Google ScholarDigital Library
Ville Viitaniemi and Jorma Laaksonen. 2008. Experiments on selection of codebooks for local image feature histograms. In Visual Information Systems, Web-Based Visual Information Search and Management, 126--137. Google ScholarDigital Library
Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3360--3367.Google ScholarCross Ref
Jianchao Yang, Kai Yu, Yihong Gong, and T. Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1794--1801.Google Scholar
Kim-Hui Yap, Tao Chen, Zhen Li, and Kui Wu. 2010. A comparative study of mobile-based landmark recognition techniques. IEEE Intell. Syst. 25, 1, 48--57. Google ScholarDigital Library
Bo Zhang, Qinlin Li, Hongyang Chao, Bill Chen, Eyal Ofek, and Ying-Qing Xu. 2010. Annotating and navigating tourist videos. In Proceedings of the International Conference on Advances in Geographic Information Systems. 260--269. Google ScholarDigital Library
Yan-Tao Zheng, Ming Zhao, Yang Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua, and H. Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1085--1092.Google Scholar

Index Terms

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Landmark recognition and retrieval: from 2D to 3D
J-HGBU '11: Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding

Existing landmark retrieval methods cannot provide a comprehensive solution, by which user can view different angles of landmark. In this paper, we propose a novel approach to reconstruct and retrieve 3D landmark models by direct 2D to 3D matching. In ...
Read More
An integrated semantic-based approach in concept based video retrieval

Multimedia content has been growing quickly and video retrieval is regarded as one of the most famous issues in multimedia research. In order to retrieve a desirable video, users express their needs in terms of queries. Queries can be on object, motion, ...
Read More
Content or context?: searching for musical meaning in task-based interactive information retrieval
IIiX '08: Proceedings of the second international symposium on Information interaction in context

Creative professionals search for digital music to accompany moving images using interactive information retrieval systems run by music publishers and record companies. This research-in-progress investigates creative professionals and intermediaries ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 11, Issue 3
January 2015
173 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2733235
Editor:
Ralf Steinmetz
Technische Universität Darmstadt, Germany
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 February 2015
- Accepted: 1 September 2014
- Revised: 1 November 2013
- Received: 1 June 2013
Published in tomm Volume 11, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Content-based analysis
geo-referenced videos
landmark retrieval
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 321
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Landmark recognition and retrieval: from 2D to 3D

An integrated semantic-based approach in concept based video retrieval

Content or context?: searching for musical meaning in task-based interactive information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Landmark recognition and retrieval: from 2D to 3D

An integrated semantic-based approach in concept based video retrieval

Content or context?: searching for musical meaning in task-based interactive information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media