research-article

Classifying YouTube channels: a practical system

Author:
Vincent Simonet

Google, Paris, France

Google, Paris, France
View Profile

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMay 2013Pages 1295–1304https://doi.org/10.1145/2487788.2488164

Published:13 May 2013Publication History

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Pages 1295–1304

ABSTRACT

This paper presents a framework for categorizing channels of videos in a thematic taxonomy with high precision and coverage. The proposed approach consists of three main steps.First, videos are annotated by semantic entities describing their central topics. Second, semantic entities are mapped to categories using a combination of classifiers.Last, the categorization of channels is obtained by combining the results of both previous steps.

This framework has been deployed on the whole corpus of YouTube, in 8 languages, and used to build several user facing products. Beyond the description of the framework, this paper gives insight into practical aspects and experience: rationale from product requirements to the choice of the solution, spam filtering, human-based evaluations of the quality of the results, and measured metrics on the live site.

References

Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the SIGMOD Conference, 2008. Google ScholarDigital Library
Razvan C. Bunescu, Marius Pasca, and Marius Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the EACL conference, 2006.Google Scholar
Andras Csomai, Rada Mihalcea, and Rada Mihalcea. Linking documents to encyclopedic knowledge. 2008.Google ScholarDigital Library
Silviu Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the EMNLP-CoNLL conference, 2007.Google Scholar
Ofer Dekel, Joseph Keshet, and Yoram Singer. Large margin hierarchical classification. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In Proceedings of the CVPR conference, 2009.Google ScholarCross Ref
Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, Ramanathan V. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien, and Jason Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the WWW conference, 2003. Google ScholarDigital Library
Olivier Duchenne, Ivan Laptev, Josef Sivic, Francis Bach, Jean Ponce, and Jean Ponce. Automatic annotation of human actions in video. In Proceedings of the ICCV conference, 2009.Google ScholarCross Ref
D.A. Forsyth and J. Ponce. Computer Vision: A Modern Approach. (Second edition). Pearson Education Inc., 2011.Google Scholar
Abhinav Gupta, Praveen Srinivasan, Jianbo Shi, Larry S. Davis, and Larry S. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In Proceedings of the CVPR conference, 2009.Google ScholarCross Ref
Xianpei Han, Le Sun, Jun Zhao, and Jun Zhao. Collective entity linking in web text: a graph-based method. In Proceedings of the SIGIR conference, 2011. Google ScholarDigital Library
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Furstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, Gerhard Weikum, and Gerhard Weikum. Robust disambiguation of named entities in text. In Proceedings of the EMNLP conference, 2011. Google ScholarDigital Library
Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti, and Soumen Chakrabarti. Collective annotation of Wikipedia entities in web text. In Proceedings of the KDD conference, 2009. Google ScholarDigital Library
Tie-Yan Liu, Yiming Yang, Hao Wan, Hua-Jun Zeng, Zheng Chen, and Wei-Ying Ma. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl., 7(1), June 2005. Google ScholarDigital Library
Marcin Marszalek, Ivan Laptev, Cordelia Schmid, and Cordelia Schmid. Actions in context. In Proceedings of the CVPR conference, 2009.Google ScholarCross Ref
David N. Milne, Ian H. Witten, and Ian H. Witten. Learning to link with Wikipedia. In Proceedings of the CIKM conference, 2008. Google ScholarDigital Library
Vicente Ordonez, Girish Kulkarni, Tamara L. Berg, and Tamara L. Berg. Im2text: Describing images using 1 million captioned photographs. In Proceedings of the NIPS conference, 2011.Google Scholar
Yang Song, Ming Zhao, Jay Yagnik, and Xiaoyun Wu. Taxonomic classification for web-based videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2010.Google ScholarCross Ref
Aixin Sun and Ee-Peng Lim. Hierarchical text classification and evaluation. In Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM '01, Washington, DC, USA, 2001. IEEE Computer Society.Google ScholarDigital Library
George Toderici, Hrishikesh Aradhye, Marius Pasca, Luciano Sbaiz, Jay Yagnik, and Jay Yagnik. Finding meaning on youtube: Tag recommendation and category discovery. In Proceedings of the CVPR conference, 2010.Google ScholarCross Ref
Cornelis Joost "Keith" van Rijsbergen. Information Retrieval. Butterworth, London, Great Britain; Boston, Massachusetts, 1979.Google ScholarDigital Library
Weilong Yang, George Toderici, and George Toderici. Discriminative tag learning on youtube videos with latent sub-tags. In Proceedings of the CVPR conference, 2011. Google ScholarDigital Library
Zhicheng Zheng, Xiance Si, Fangtao Li, Edward Y. Chang, and Xiaoyan Zhu. Entity disambiguation with Freebase. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, December 2012. Google ScholarDigital Library

Index Terms

Classifying YouTube channels: a practical system
1. Applied computing

Recommendations

Editorial: Narrative-based taxonomy distillation for effective indexing of text collections

Taxonomies embody formalized knowledge and define aggregations between concepts/categories in a given domain, facilitating the organization of the data and making the contents easily accessible to the users. Since taxonomies have significant roles in ...
Read More
Balloon Synopsis: A Modern Node-Centric RDF Viewer and Browser for the Web
The Semantic Web: ESWC 2014 Satellite Events
Abstract
Nowadays, the RDF data model is a crucial part of the Semantic Web. Especially web developers favour RDF serialization formats like RDFa and JSON-LD. However, the visualization of large portions of RDF data in an appealing way is still a ...
Read More
Music on YouTube

We present the first study of YouTube's most popular content genre, music videos.Our analysis of popular music videos identified three main types and 12 subtypes.User-appropriated videos emerged as the most important new category of videos.Derivative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web
May 2013
1636 pages
ISBN:9781450320382
DOI:10.1145/2487788
General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea
Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
semantic entity
taxonomy classification
video
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '13 Companion Paper Acceptance Rate831of1,250submissions,66%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 322
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Classifying YouTube channels: a practical system

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Editorial: Narrative-based taxonomy distillation for effective indexing of text collections

Balloon Synopsis: A Modern Node-Centric RDF Viewer and Browser for the Web

Music on YouTube