ABSTRACT
Collaborative tagging describes the process by which many users add metadata in the form of unstructured keywords to shared content. The recent practical success of web services with such a tagging component like Flickr or del.icio.us has provided a plethora of user-supplied metadata about web content for everyone to leverage.
In this paper, we conduct a quantitative and qualitative analysis of metadata and information provided by the authors and publishers of web documents compared with metadata supplied by end users for the same content. Our study is based on a random sample of 100,000 web documents from the Open Directory, for which we examined the original documents from the World Wide Web in addition to data retrieved from the social bookmarking service del.icio.us, the content rating system ICRA, and the search engine Google. To the best of our knowledge, this is the first study to compare user tags with the metadata and actual content of documents in the WWW on a larger scale and to integrate document popularity information in the observations. The data set of our experiments is freely available for research.
- M. Ames and M. Naaman. Why we tag: Motivations for annotation in mobile and online media. In Proceedings of CHI '07, 2007. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of WWW '98, pages 107--117, 1998. Google ScholarDigital Library
- C. H. Brooks and N. Montanez. Improved annotation of the blogosphere via autotagging and hierarchical clustering. In Proceedings of WWW '06, pages 625--632, 2006. Google ScholarDigital Library
- S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of WWW '98, pages 65--74, 1998. Google ScholarDigital Library
- S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198--208, 2006. Google ScholarDigital Library
- Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proceedings of VLDB '04, pages 271--279, Toronto, Canada, 2004. Google ScholarDigital Library
- I. Hickson. Google: Web authoring statistics, http://code.google.com/webstats/. Technical report, Google, Inc., December 2005.Google Scholar
- M. J. Jones and J. M. Rehg. Statistical color models with application to skin detection. International Journal of Computer Vision, 46(1):81--96, 2002. Google ScholarDigital Library
- M.-Y. Kan. Web page categorization without the web page. In WWW, pages 262--263. ACM, 2004. Google ScholarDigital Library
- C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, tagging paper, taxonomy, flickr, academic article, to read. In Proceedings of HT '06, pages 31--40, 2006. Google ScholarDigital Library
- A. Mathes. Folksonomies - cooperative classification and communication through shared metadata. Technical report, UIC, 2004.Google Scholar
- M. G. Noll and C. Meinel. Web page classification: An exploratory study of internet content rating systems. In Proceedings of HACK '05, Luxembourg, 2005.Google Scholar
- M. G. Noll and C. Meinel. Design and anatomy of a social web filtering service. In Proceedings of CIC '06, pages 35--44, Hong Kong, 2006.Google Scholar
- A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of WWW '06, pages 83--92, Edinburgh, Scotland, 2006. Google ScholarDigital Library
- H. A. Rowley, Y. Jing, and S. Baluja. Large scale image-based adult-content filtering. In 1st int'l Conference on Computer Vision Theory, 2006.Google Scholar
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988. Google ScholarDigital Library
- S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski, J. Osterhouse, F. M. Harper, and J. Riedl. tagging, communities, vocabulary, evolution. In Proceedings of CSCW '06, pages 181--190, 2006. Google ScholarDigital Library
- E. Tonkin and M. Guy. Folksonomies: Tyding up tags? D-Lib Magazine, 12(1), January 2006.Google Scholar
- J. Varghese, R. Krishnan, Y. U. Ryu, R. Chandrasekaran, and S. Hong. Filtering objectionable internet content. In Proceedings of ICIS '99, pages 274--278, 1999. Google ScholarDigital Library
- Y. Wang, W. Wang, and W. Gao. Research on the discrimination of pornographic and bikini images. In Proceedings of IEEE ISM '05, pages 558--564, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- H. Yu, J. Han, and K. C.-C. Chang. Pebl: positive example based learning for web page classification using svm. In Proceedings of SIGKDD '02, Canada, 2002. Google ScholarDigital Library
Index Terms
- Authors vs. readers: a comparative study of document metadata and content in the www
Recommendations
The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01In this paper, we study and compare three different but related types of metadata about web documents: social annotations provided by readers of web documents, hyperlink anchor text provided by authors of web documents, and search queries of users ...
TagScore: Approximate Similarity Using Tag Synopses
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01Collaborative tagging is the aggregate effort by a community of online users to annotate web content with metadata labels called tags. It is a simple activity that enriches our knowledge about digital content, and has gained popularity with services ...
Finding similar pages in a social tagging repository
WWW '08: Proceedings of the 17th international conference on World Wide WebSocial tagging describes a community of users labeling web content with tags. It is a simple activity that enriches our knowledge about resources on the web. For a computer to help users search the tagged repository, it must know when tags are good or ...
Comments