skip to main content
article

SALSA: the stochastic approach for link-structure analysis

Published:01 April 2001Publication History
Skip Abstract Section

Abstract

Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.

References

  1. AUGUSTSON,J.G.AND MINKER, J. 1970. An analysis of some graph theoretical clustering techniques. J. ACM 17, 4 (Oct.), 571-588. Google ScholarGoogle Scholar
  2. BHARAT,K.AND HENZINGER, M. R. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Chairs. ACM Press, New York, NY, 104-111. Google ScholarGoogle Scholar
  3. BRIN,S.AND PAGE, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on WWW. Google ScholarGoogle Scholar
  4. BOTAFOGO,R.A.,RIVLIN, E., AND SHNEIDERMAN, B. 1992. Structural analysis of hypertexts: Identifying hierarchies and useful metrics. ACM Trans. Inf. Syst. 10, 2 (Apr.), 142-180. Google ScholarGoogle Scholar
  5. CHAKRABARTI, S., DOM, B., GIBSON, D., KUMAR,S.R.,RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1998a. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web (Melbourne, Australia). ACM Press, New York, NY.Google ScholarGoogle Scholar
  6. CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG,J.M.,RAGHAVAN, P., AND RAJAGOPALAN,S. 1998b. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International WWW Conference. Google ScholarGoogle Scholar
  7. CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG, J., KUMAR,S.R.,RAGHAVAN, P., RAJAGO-PALAN, S., AND TOMKINS, A. 1999a. Hypersearching the web. Sci. Am. (June).Google ScholarGoogle Scholar
  8. CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG, J., KUMAR,S.R.,RAGHAVAN, P., RAJAGO-PALAN, S., AND TOMKINS, A. 1999b. Mining the link structure of the WWW. IEEE Computer (Aug.). Google ScholarGoogle Scholar
  9. CARRI~RE,J.AND KAZMAN, R. 1997. Webquery: Searching and visualizing the web through connectivity. In Proceedings of the 6th International Conference on WWW. Google ScholarGoogle Scholar
  10. FRISSE, M. E. 1988. Searching for information in a hypertext medical handbook. Commun. ACM 31, 7 (July), 880-886. Google ScholarGoogle Scholar
  11. F~RNKRANZ, J. 1998. Using links for classifying Web-pages. Tech. Rep. TR-OEFAI-98-29. Austrian Research Institute for Artificial Intelligence.Google ScholarGoogle Scholar
  12. GALLAGER, R. G. 1996. Discrete Stochastic Processes. Kluwer Academic Publishers, Hingham, MA.Google ScholarGoogle Scholar
  13. GARFIELD, E. 1972. Citation analysis as a tool in journal evaluation. Science 178, 471-479.Google ScholarGoogle Scholar
  14. GIBSON, D., KLEINBERG, J., AND RAGHAVAN, P. 1998. Inferring Web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space-Structure in Hypermedia Systems (HYPERTEXT '98, Pittsburgh, PA, June 20-24), R. Akscyn, Chair. ACM Press, New York, NY, 225-234. Google ScholarGoogle Scholar
  15. KESSLER, M. M. 1963. Bibliographic coupling between scientific papers. Am. Doc. 14, 10-25.Google ScholarGoogle Scholar
  16. KLEINBERG, J. M. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of the 1998 ACM-SIAM Symposium on Discrete Algorithms (San Francisco CA, Jan.). ACM Press, New York, NY. Google ScholarGoogle Scholar
  17. KLEINBERG,J.M.,KUMAR, R., RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. S. 1999. The web as a graph: Measurements, models and methods. In Proceedings of the Fifth Interna-tional Conference on Computing and Combinatorics. Google ScholarGoogle Scholar
  18. LAW, K., TONG, T., AND WONG, A. 1999. Automatic categorization based on link structure. http://www.stanford.edu/tomtong/cs349/web.htm.Google ScholarGoogle Scholar
  19. LEMPEL,R.AND MORAN, S. 2000. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Tech. Rep. CS-2000-06. Electrical Engineering Department, Tech-nion Israel Institute of Technology, Haifa, Israel.Google ScholarGoogle Scholar
  20. MARCHIORI, M. 1997. The quest for correct information on the Web: Hyper search engines. In Proceedings of the 6th International Conference on WWW. Google ScholarGoogle Scholar
  21. PAPADIMITRIOU,C.H.,TAMAKI, H., RAGHAVAN, P., AND VEMPALA, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '98, Seattle, WA, June 1-3), A. Mendelson and J. Paredaens, Chairs. ACM Press, New York, NY, 159-168. Google ScholarGoogle Scholar
  22. PIROLLI, P., PITKOW, J., AND RAO, R. 1996. Silk from a sow's ear: extracting usable structures from the Web. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '96, Vancouver, B.C., Apr. 13-18), M. J. Tauber, Ed. ACM Press, New York, NY, 118-125. Google ScholarGoogle Scholar
  23. SMALL, H. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265-269.Google ScholarGoogle Scholar
  24. VAN RIJSBERGEN, C. J. 1979. Information Retrieval. 2nd ed. Butterworths, London, UK. Google ScholarGoogle Scholar
  25. WEISS, R., V~LEZ, B., SHELDON,M.A.,NANPREMPRE, C., SZILAGYI, P., DUDA, P., AND GIFFORD,D. 1996. HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In Proceedings of the Seventh ACM Conference on Hypertext '96 (Washington, D.C., Mar. 16-20), D. Stotts, Chair. ACM Press, New York, NY, 180-193. Google ScholarGoogle Scholar

Index Terms

  1. SALSA: the stochastic approach for link-structure analysis

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader