skip to main content
10.1145/2487788.2488164acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Classifying YouTube channels: a practical system

Published:13 May 2013Publication History

ABSTRACT

This paper presents a framework for categorizing channels of videos in a thematic taxonomy with high precision and coverage. The proposed approach consists of three main steps.First, videos are annotated by semantic entities describing their central topics. Second, semantic entities are mapped to categories using a combination of classifiers.Last, the categorization of channels is obtained by combining the results of both previous steps.

This framework has been deployed on the whole corpus of YouTube, in 8 languages, and used to build several user facing products. Beyond the description of the framework, this paper gives insight into practical aspects and experience: rationale from product requirements to the choice of the solution, spam filtering, human-based evaluations of the quality of the results, and measured metrics on the live site.

References

  1. Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the SIGMOD Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Razvan C. Bunescu, Marius Pasca, and Marius Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the EACL conference, 2006.Google ScholarGoogle Scholar
  3. Andras Csomai, Rada Mihalcea, and Rada Mihalcea. Linking documents to encyclopedic knowledge. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Silviu Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the EMNLP-CoNLL conference, 2007.Google ScholarGoogle Scholar
  5. Ofer Dekel, Joseph Keshet, and Yoram Singer. Large margin hierarchical classification. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In Proceedings of the CVPR conference, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, Ramanathan V. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien, and Jason Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the WWW conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Olivier Duchenne, Ivan Laptev, Josef Sivic, Francis Bach, Jean Ponce, and Jean Ponce. Automatic annotation of human actions in video. In Proceedings of the ICCV conference, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. D.A. Forsyth and J. Ponce. Computer Vision: A Modern Approach. (Second edition). Pearson Education Inc., 2011.Google ScholarGoogle Scholar
  10. Abhinav Gupta, Praveen Srinivasan, Jianbo Shi, Larry S. Davis, and Larry S. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In Proceedings of the CVPR conference, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  11. Xianpei Han, Le Sun, Jun Zhao, and Jun Zhao. Collective entity linking in web text: a graph-based method. In Proceedings of the SIGIR conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Furstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, Gerhard Weikum, and Gerhard Weikum. Robust disambiguation of named entities in text. In Proceedings of the EMNLP conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti, and Soumen Chakrabarti. Collective annotation of Wikipedia entities in web text. In Proceedings of the KDD conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tie-Yan Liu, Yiming Yang, Hao Wan, Hua-Jun Zeng, Zheng Chen, and Wei-Ying Ma. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl., 7(1), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Marcin Marszalek, Ivan Laptev, Cordelia Schmid, and Cordelia Schmid. Actions in context. In Proceedings of the CVPR conference, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. David N. Milne, Ian H. Witten, and Ian H. Witten. Learning to link with Wikipedia. In Proceedings of the CIKM conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Vicente Ordonez, Girish Kulkarni, Tamara L. Berg, and Tamara L. Berg. Im2text: Describing images using 1 million captioned photographs. In Proceedings of the NIPS conference, 2011.Google ScholarGoogle Scholar
  18. Yang Song, Ming Zhao, Jay Yagnik, and Xiaoyun Wu. Taxonomic classification for web-based videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. Aixin Sun and Ee-Peng Lim. Hierarchical text classification and evaluation. In Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM '01, Washington, DC, USA, 2001. IEEE Computer Society.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. George Toderici, Hrishikesh Aradhye, Marius Pasca, Luciano Sbaiz, Jay Yagnik, and Jay Yagnik. Finding meaning on youtube: Tag recommendation and category discovery. In Proceedings of the CVPR conference, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. Cornelis Joost "Keith" van Rijsbergen. Information Retrieval. Butterworth, London, Great Britain; Boston, Massachusetts, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Weilong Yang, George Toderici, and George Toderici. Discriminative tag learning on youtube videos with latent sub-tags. In Proceedings of the CVPR conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhicheng Zheng, Xiance Si, Fangtao Li, Edward Y. Chang, and Xiaoyan Zhu. Entity disambiguation with Freebase. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, December 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Classifying YouTube channels: a practical system

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web
      May 2013
      1636 pages
      ISBN:9781450320382
      DOI:10.1145/2487788

      Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 May 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '13 Companion Paper Acceptance Rate831of1,250submissions,66%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader