skip to main content
10.1145/1135777.1135871acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

A comparison of implicit and explicit links for web page classification

Authors Info & Claims
Published:23 May 2006Publication History

ABSTRACT

It is well known that Web-page classification can be enhanced by using hyperlinks that provide linkages between Web pages. However, in the Web space, hyperlinks are usually sparse, noisy and thus in many situations can only provide limited help in classification. In this paper, we extend the concept of linkages from explicit hyperlinks to implicit links built between Web pages. By observing that people who search the Web with the same queries often click on different, but related documents together, we draw implicit links between Web pages that are clicked after the same queries. Those pages are implicitly linked. We provide an approach for automatically building the implicit links between Web pages using Web query logs, together with a thorough comparison between the uses of implicit and explicit links in Web page classification. Our experimental results on a large dataset confirm that the use of the implicit links is better than using explicit links in classification performance, with an increase of more than 10.5% in terms of the Macro-F1 measurement.

References

  1. D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In KDD '00: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 407--416, New York, NY, USA, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Hourly analysis of a very large topically categorized web query log. In SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 321--328, New York, NY, USA, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD '98: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 307--318, New York, NY, USA, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S.-L. Chuang and L.-F. Chien. Enriching web taxonomies through subject categorization of query terms from search engine logs. Decision Support Systems, 35(1):113--127, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Eiron and K. S. McCurley. Analysis of anchor text for web search. In SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 459--460, Toronto, Canada, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Fürnkranz. Exploiting structural information for text classification on the www. In IDA '99: Proceedings of the 3rd Symposium on Intelligent Data Analysis, pages 487--498, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Ghani, S. Slattery, and Y. Yang. Hypertext categorization using hyperlink patterns and meta data. In ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 178--185, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. J. Glover, K. Tsioutsiouliklis, S. Lawrence, D. M. Pennock, and G. W. Flake. Using web structure for classifying and describing web pages. In WWW '02: Proceedings of the 11th International Conference on World Wide Web, pages 562--569, Honolulu, Hawaii, USA, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Joachims. Text categorization with suport vector machines: Learning with many relevant features. In ECML '98: Proceedings of the 10th European Conference on Machine Learning, pages 137--142, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Joachims. Learning to classify text using support vector machines. Dissertation, Kluwer, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.]]Google ScholarGoogle Scholar
  13. T. Mitchell. Machine Learning. McGraw-Hill, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H.-J. Oh, S.-H. Myaeng, and M.-H. Lee. A practical hypertext categorization method using links and incrementally available class information. In SIGIR '00: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 264--271, Athens, Greece, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Quek. Classification of world wide web documents. Thesis, School of Computer Science, CMU, 1997.]]Google ScholarGoogle Scholar
  16. V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 344--350, Seattle, Washington, USA, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. J. van Rijsbergen. Information Retrieval. Butterworth, London, 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, NY, USA, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW' 01: Proceedings of the Tenth International World Wide Web Conference, pages 162--168, Hong Kong, China, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G.-R. Xue, D. Shen, Q. Yang, H.-J. Zeng, Z. Chen, Y. Yu, W. Xi, and W.-Y. Ma. Irc: An iterative reinforcement categorization algorithm for interrelated web objects. In ICDM '04: Proceedings of the 4th IEEE International Conference on Data Mining, pages 273--280,Brighton, UK, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning, pages 412--420, Nashville, TN, USA, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A comparison of implicit and explicit links for web page classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '06: Proceedings of the 15th international conference on World Wide Web
        May 2006
        1102 pages
        ISBN:1595933239
        DOI:10.1145/1135777

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 May 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader