Abstract
Due to the ubiquity of sensor-equipped smartphones, it has become increasingly feasible for users to capture videos together with associated geographic metadata, for example the location and the orientation of the camera. Such contextual information creates new opportunities for the organization and retrieval of geo-referenced videos. In this study we explore the task of landmark retrieval through the analysis of two types of state-of-the-art techniques, namely media-content-based and geocontext-based retrievals. For the content-based method, we choose the Spatial Pyramid Matching (SPM) approach combined with two advanced coding methods: Sparse Coding (SC) and Locality-Constrained Linear Coding (LLC). For the geo-based method, we present the Geo Landmark Visibility Determination (GeoLVD) approach which computes the visibility of a landmark based on intersections of a camera's field-of-view (FOV) and the landmark's geometric information available from Geographic Information Systems (GIS) and services. We first compare the retrieval results of the two methods, and discuss the strengths and weaknesses of each approach in terms of precision, recall and execution time. Next we analyze the factors that affect the effectiveness for the content-based and the geo-based methods, respectively. Finally we propose a hybrid retrieval method based on the integration of the visual (content) and geographic (context) information, which is shown to achieve significant improvements in our experiments. We believe that the results and observations in this work will enlighten the design of future geo-referenced video retrieval systems, improve our understanding of selecting the most appropriate visual features for indexing and searching, and help in selecting between the most suitable methods for retrieval based on different conditions.
- Giuseppe Amato, Fabrizio Falchi, and Fausto Rabitti. 2012. Landmark recognition in VISITO Tuscany. In Multimedia for Cultural Heritage, 1--13.Google Scholar
- Sakire Arslan Ay, Roger Zimmermann, and SeonHo Kim. 2010. Relevance ranking in georeferenced video search. Multimedia Syst. 16, 2, 105--125. Google ScholarDigital Library
- Sakire Arslan Ay, Roger Zimmermann, and Seon Ho Kim. 2008. Viewable scene modeling for geospatial video search. In Proceedings of the ACM International Conference on Multimedia. 309--318. Google ScholarDigital Library
- Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Syst. 16, 6, 345--379. Google ScholarDigital Library
- Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias, and Evaggelos Spyrou. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the ACM International Conference on Multimedia. 153--162. Google ScholarDigital Library
- D. M. Chen, G. Baatz, K. Koser, S. S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, Xin Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. 2011a. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 737--744. Google ScholarDigital Library
- Tao Chen, Kim-Hui Yap, and L.-P. Chau. 2011b. Integrated content and context analysis for mobile landmark recognition. IEEE Trans. Circuits Syst. Video Technol. 1476--1486. Google ScholarDigital Library
- O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. 2007. Total recall: automatic query expansion with a generative feature model for object retrieval. In Proceedings of the International Conference on Computer Vision. 1--8.Google Scholar
- G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 1--22.Google Scholar
- Shaolei Feng and R. Manmatha. 2008. A discrete direct retrieval model for image and video retrieval. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 427--436. Google ScholarDigital Library
- Efstratios Gavves, Cees G. M. Snoek, and Arnold W. M. Smeulders. 2012. Visual synonyms for landmark image retrieval. Comput. Vision Image Understand 116, 12, 238--249. Google ScholarDigital Library
- Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, and FengWu. 2012. 3D visual phrases for landmark recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601. Google ScholarDigital Library
- E. Hecht. 2001. Optics (4th ed.). Addison-Wesley Publishing Company.Google Scholar
- N.V. Hoàng, V. Gouet-Brunet, M. Rukoz, and M. Manouvrier. 2010. Embedding spatial information into image content description for scene retrieval. Pattern Recog. 43, 9, 3013--3024. Google ScholarDigital Library
- Zi Huang, Bo Hu, Hong Cheng, Heng Tao Shen, Hongyan Liu, and Xiaofang Zhou. 2010. Mining near-duplicate graph for cluster-based reranking of web video search results. ACM Trans. Inf. Syst. 22:1--22:27. Google ScholarDigital Library
- Ramesh Jain and Pinaki Sinha. 2010. Content without context is meaningless. In Proceedings of the ACM International Conference on Multimedia. 1259--1268. Google ScholarDigital Library
- Pascal Kelm, Sebastian Schmiedeke, and Thomas Sikora. 2011. A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs. In Proceedings of the ACM Workshop on Social and Behavioural Networked Media Access. 15--20. Google ScholarDigital Library
- Lyndon S. Kennedy and Mor Naaman. 2008. Generating diverse and representative image search results for landmarks. In Proceedings of the International Conference on World Wide Web. 297--306. Google ScholarDigital Library
- Youngwoo Kim, Jinha Kim, and Hwanjo Yu. 2012. GeoSearch: Georeferenced video retrieval system. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. 1540--1543. Google ScholarDigital Library
- Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans. Multimedia, 1079--1090. Google ScholarDigital Library
- S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2169--2178. Google ScholarDigital Library
- Zhen Li and Kim-Hui Yap. 2012. Content and context boosting for mobile landmark recognition. Signal Process. Lett. 459--462.Google Scholar
- Xiaotao Liu, Mark Corner, and Prashant Shenoy. 2005. SEVA: sensor-enhanced video annotation. In Proceedings of the ACM International Conference on Multimedia. 618--627. Google ScholarDigital Library
- David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 91--110. Google ScholarDigital Library
- Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew Gallagher. 2011. Geotagging in multimedia and computer vision—a survey. Multimedia Tools Appl. 51, 1, 187--211. Google ScholarDigital Library
- Otávio A. B. Penatti, Fernanda B. Silva, Eduardo Valle, Valerie Gouet-Brunet, and Ricardo da S. Torres. 2014. Visual word spatial arrangement for image retrieval and classification. Pattern Recognit. 705--720. Google ScholarDigital Library
- Otávio A. B. Penatti, Lin Tzy Li, Jurandy Almeida, and Ricardo da S. Torres. 2012. A visual approach for video geocoding using bag-of-scenes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 1--8. Google ScholarDigital Library
- Adam Rae, Vannesa Murdock, Pavel Serdyukov, and Pascal Kelm. 2011. Working Notes for the Placing Task at MediaEval 2011.Google Scholar
- Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, and Roger Zimmermann. 2011. Automatic tag generation and ranking for sensor-rich outdoor videos. In Proceedings of the ACM International Conference on Multimedia. 93--102. Google ScholarDigital Library
- Rainer Simon and Peter Fröhlich. 2007. A mobile application framework for the geospatial web. In Proceedings of the International Conference on World Wide Web. 381--390. Google ScholarDigital Library
- Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proceedings of the ACM International Conference on Multimedia. 399--402. Google ScholarDigital Library
- Fabrice Souvannavong, Bernard Merialdo, and Benoit Huet. 2005. Region-based video content indexing and retrieval. In Proceedings of the International Workshop on Content-Based Multimedia Indexing. 21--23.Google Scholar
- Xinmei Tian, Linjun Yang, Jingdong Wang, Yichen Yang, Xiuqing Wu, and Xian-Sheng Hua. 2008. Bayesian video search reranking. In Proceedings of the ACM International Conference on Multimedia. 131--140. Google ScholarDigital Library
- Ville Viitaniemi and Jorma Laaksonen. 2008. Experiments on selection of codebooks for local image feature histograms. In Visual Information Systems, Web-Based Visual Information Search and Management, 126--137. Google ScholarDigital Library
- Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3360--3367.Google ScholarCross Ref
- Jianchao Yang, Kai Yu, Yihong Gong, and T. Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1794--1801.Google Scholar
- Kim-Hui Yap, Tao Chen, Zhen Li, and Kui Wu. 2010. A comparative study of mobile-based landmark recognition techniques. IEEE Intell. Syst. 25, 1, 48--57. Google ScholarDigital Library
- Bo Zhang, Qinlin Li, Hongyang Chao, Bill Chen, Eyal Ofek, and Ying-Qing Xu. 2010. Annotating and navigating tourist videos. In Proceedings of the International Conference on Advances in Geographic Information Systems. 260--269. Google ScholarDigital Library
- Yan-Tao Zheng, Ming Zhao, Yang Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua, and H. Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1085--1092.Google Scholar
Index Terms
- Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval
Recommendations
Landmark recognition and retrieval: from 2D to 3D
J-HGBU '11: Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understandingExisting landmark retrieval methods cannot provide a comprehensive solution, by which user can view different angles of landmark. In this paper, we propose a novel approach to reconstruct and retrieve 3D landmark models by direct 2D to 3D matching. In ...
An integrated semantic-based approach in concept based video retrieval
Multimedia content has been growing quickly and video retrieval is regarded as one of the most famous issues in multimedia research. In order to retrieve a desirable video, users express their needs in terms of queries. Queries can be on object, motion, ...
Content or context?: searching for musical meaning in task-based interactive information retrieval
IIiX '08: Proceedings of the second international symposium on Information interaction in contextCreative professionals search for digital music to accompany moving images using interactive information retrieval systems run by music publishers and record companies. This research-in-progress investigates creative professionals and intermediaries ...
Comments