Abstract
With Internet delivery of video content surging to an unprecedented level, video recommendation, which suggests relevant videos to targeted users according to their historical and current viewings or preferences, has become one of most pervasive online video services. This article presents a novel contextual video recommendation system, called VideoReach, based on multimodal content relevance and user feedback. We consider an online video usually consists of different modalities (i.e., visual and audio track, as well as associated texts such as query, keywords, and surrounding text). Therefore, the recommended videos should be relevant to current viewing in terms of multimodal relevance. We also consider that different parts of videos are with different degrees of interest to a user, as well as different features and modalities have different contributions to the overall relevance. As a result, the recommended videos should also be relevant to current users in terms of user feedback (i.e., user click-through). We then design a unified framework for VideoReach which can seamlessly integrate both multimodal relevance and user feedback by relevance feedback and attention fusion. VideoReach represents one of the first attempts toward contextual recommendation driven by video content and user click-through, without assuming a sufficient collection of user profiles available. We conducted experiments over a large-scale real-world video data and reported the effectiveness of VideoReach.
- Adomavicius, G. and Tuzhilin, A. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Engin. 17, 6, 734--749. Google ScholarDigital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison Wesley. Google ScholarDigital Library
- Balabanovic, M. 1998. Exploring versus exploiting when learning user models for text recommendation. User Model. User-Adapt. Interact. 8, 4, 71--102. Google ScholarDigital Library
- Baluja, S., Seth, R., Sivakumar, D., et al. 2008. Video suggestion and discovery for youtube, taking random walks through the view graph. In Proceedings of the International World Wide Web Conference. Google ScholarDigital Library
- Boll, S. 2007. Multitube-Where multimedia and web 2.0 could meet. IEEE Multimedia Mag. 14, 1, 9--13. Google ScholarDigital Library
- Bollen, J., Nelson, M. L., Araujo, R., and Geisler, G. 2005. Video recommendations for the open video project. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries. 369--369. Google ScholarDigital Library
- Burke, R. 2002. Hybrid recommender systems: Survey and experiments. User Model. User-Adapt. Interact. 12, 4, 331--370. Google ScholarDigital Library
- Chang, S.-F., Ma, W.-Y., and Smeulders, A. 2007. Recent advances and challenges of semantic image/video search. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).Google Scholar
- Christakou, C. and Stafylopatis, A. 2005. A hybrid movie recommender system based on neural networks. In Proceedings of the 5th International Conference on Intelligent Systems Design and Applications. Google ScholarDigital Library
- Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 65. Google ScholarDigital Library
- Encyclopedia. 2011. Encyclopedia. http://www.encyclopedia.com/.Google Scholar
- Fouss, F., Pirotte, A., Renders, J. M., and Saerens, M. 2007. Random-Walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Trans. Knowl. Data Engin. 19, 3, 355--369. Google ScholarDigital Library
- Gibas, M., Canahuate, G., and Ferhatosmanoglu, H. 2008. Online index recommendations for high-dimensional databases using query workloads. IEEE Trans. Knowl. Data Engin. 20, 2, 246--260. Google ScholarDigital Library
- Gu, Z., Mei, T., Hua, X.-S., Tang, J., and Wu, X. 2008. Multi-Layer multi-instance learning for video concept detection. IEEE Trans. Multimedia 10, 8, 1605--1616. Google ScholarDigital Library
- Hauptmann, A. G., Christel, M. G., and Yan, R. 2008. Video retrieval based on semantic concepts. Proc. IEEE 96, 4, 602--622.Google ScholarCross Ref
- Hu, J., Zeng, H.-J., Li, H., Niu, C., and Chen, Z. 2007. Demographic prediction based on user’s browsing behavior. In Proceedings of the International World Wide Web Conference. Google ScholarDigital Library
- Hua, X.-S., Lu, L., and Zhang, H.-J. 2004a. Optimization-Based automated home video editing system. IEEE Trans. Circ. Syst. Video Tech. 14, 5, 572--583. Google ScholarDigital Library
- Hua, X.-S. and Zhang, H.-J. 2004b. An attention-based decision fusion scheme for multimedia information retrieval. In Proceedings of the IEEE Pacific-Rim Conference on Multimedia. Google ScholarDigital Library
- Iwata, T., Saito, K., and Yamada, T. 2008. Recommendation method for improving customer lifetime value. IEEE Trans. Knowl. Data Engin. 20, 9, 1254--1263. Google ScholarDigital Library
- Kennedy, L., Chang, S.-F., and Natsev, A. 2008. Query-Adaptive fusion for multimodal search. Proc. IEEE 96, 4, 567--588.Google ScholarCross Ref
- Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-Based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Comm. Appl. 2, 1, 1--19. Google ScholarDigital Library
- Liu, Y., Mei, T., and Hua, X.-S. 2009. CrowdReranking: Exploring multiple search engines for visual search reranking. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 500--507. Google ScholarDigital Library
- Mei, T., Hua, X.-S., Lai, W., Yang, L., et al. 2007a. MSRA-USTC-SJTU at TRECVID 2007: High-Level feature extraction and search. In Proceedings of TREC Video Retrieval Evaluation Online.Google Scholar
- Mei, T., Hua, X.-S., Yang, L., and Li, S. 2007b. VideoSense: Towards effective online video advertising. In Proceedings of ACM Multimedia. 1075--1084. Google ScholarDigital Library
- Mei, T., Yang, B., Hua, X.-S., Yang, L., Yang, S.-Q., and Li, S. 2007c. VideoReach: An online video recommendation system. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 767--768. Google ScholarDigital Library
- Moxley, E., Mei, T., and Manjunath, B. S. 2010. Video annotation through search and graph reinforcement mining. IEEE Trans. Multimedia 12, 3, 184--193. Google ScholarDigital Library
- MSN Video. 2011. MSN video. http://video.msn.com/video.aspx?mkt=en-us&tab=soapbox/.Google Scholar
- Naphade, M., Smith, J. R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., and Curtis, J. 2006. Large-Scale concept ontology for multimedia. IEEE Multimedia Mag. 13, 3, 86--91. Google ScholarDigital Library
- Resnick, P. and Varian, H. R. 1997. Recommender systems. Comm. ACM 40, 3, 56--58. Google ScholarDigital Library
- Rui, Y., Huang, T. S., Ortega, M., and Mehrotra, S. 1998. Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. Circ. Video Tech. 8, 5, 644--655. Google ScholarDigital Library
- Setten, M. V. and Veenstra, M. 2003. Prediction strategies in a TV recommender system---Method and experiments. In Proceedings of the International World Wide Web Conference.Google Scholar
- Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006a. Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24, 3, 320--352. Google ScholarDigital Library
- Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006b. Building bridges for web query classification. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 131--138. Google ScholarDigital Library
- Shen, J., Shepherd, J., Cui, B., and Tan, K.-L. 2009. A novel framework for efficient automated singer identification in large music databases. ACM Trans. Inf. Syst. 27, 3. Google ScholarDigital Library
- Shen, J., Tao, D., and Li, X. 2008. Modality mixture projections for semantic video event detection. IEEE Trans. Circ. Syst. Video Tech. 18, 11, 1587--1596. Google ScholarDigital Library
- Siersdorfer, S., Pedro, J. S., and Sanderson, M. 2009. Automatic video tagging using content redundancy. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 395--402. Google ScholarDigital Library
- Snoek, C. G. M. and Worring, M. 2009. Concept-based video retrieval. Found. Trends Inf. Retr. 4, 2, 215--322. Google ScholarDigital Library
- Snoek, C., Worring, M., van Gemert, J., Geusebroek, J.-M., and Smeulders, A. W. M. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia. 421--430. Google ScholarDigital Library
- Tao, D., Tang, X., Li, X., and Wu, X. 2006. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Patt. Anal. Mach. Intell. 28, 7, 1088--1099. Google ScholarDigital Library
- TRECVID. 2011. TRECVID. http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
- Wei, Y. Z., Moreau, L., and Jennings, N. R. 2005. Learning users interests by quality classification in market-based recommender systems. IEEE Trans. Knowl. Data Engin. 17, 12, 1678--1688. Google ScholarDigital Library
- Yahoo! 2011. Yahoo. http://www.yahoo.com/.Google Scholar
- Yang, B., Mei, T., Hua, X.-S., Yang, L., Yang, S.-Q., and Li, M. 2007. Online video recommendation based on multimodal fusion and relevance feedback. In Proceedings of the ACM International Conference on Image and Video Retrieval. 73--80. Google ScholarDigital Library
- Yang, Y. and Liu, X. 1999. A re-examination of text categorization methods. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
- YouTube. 2011. YouTube. http://www.youtube.com/.Google Scholar
- Yu, B., Ma, W.-Y., Nahrstedt, K., and Zhang, H.-J. 2003. Video summarization based on user log enhanced link analysis. In Proceedings of the ACM International Conference on Multimedia. 382--391. Google ScholarDigital Library
- Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B. L., Zha, H., and Giles, C. L. 2008. Learning multiple graphs for document recommendations. In Proceedings of the International World Wide Web Conference. 141--150. Google ScholarDigital Library
Index Terms
- Contextual Video Recommendation by Multimodal Relevance and User Feedback
Recommendations
Online video recommendation based on multimodal fusion and relevance feedback
CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrievalWith Internet delivery of video content surging to an un-precedented level, video recommendation has become a very popular online service. The capability of recommending relevant videos to targeted users can alleviate users' efforts on finding the most ...
Query refinement suggestion in multimodal image retrieval with relevance feedback
ICMI '11: Proceedings of the 13th international conference on multimodal interfacesIn the literature, it has been shown that relevance feedback is a good strategy for the system to interact with the user and provide better results in a content-based image retrieval (CBIR) system. On the other hand, there are many retrieval systems ...
Multimodal retrieval with relevance feedback based on genetic programming
This paper presents a framework for multimodal retrieval with relevance feedback based on genetic programming. In this supervised learning-to-rank framework, genetic programming is used for the discovery of effective combination functions of (multimodal)...
Comments