Abstract
Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.
- AUGUSTSON,J.G.AND MINKER, J. 1970. An analysis of some graph theoretical clustering techniques. J. ACM 17, 4 (Oct.), 571-588. Google Scholar
- BHARAT,K.AND HENZINGER, M. R. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Chairs. ACM Press, New York, NY, 104-111. Google Scholar
- BRIN,S.AND PAGE, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on WWW. Google Scholar
- BOTAFOGO,R.A.,RIVLIN, E., AND SHNEIDERMAN, B. 1992. Structural analysis of hypertexts: Identifying hierarchies and useful metrics. ACM Trans. Inf. Syst. 10, 2 (Apr.), 142-180. Google Scholar
- CHAKRABARTI, S., DOM, B., GIBSON, D., KUMAR,S.R.,RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1998a. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web (Melbourne, Australia). ACM Press, New York, NY.Google Scholar
- CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG,J.M.,RAGHAVAN, P., AND RAJAGOPALAN,S. 1998b. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International WWW Conference. Google Scholar
- CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG, J., KUMAR,S.R.,RAGHAVAN, P., RAJAGO-PALAN, S., AND TOMKINS, A. 1999a. Hypersearching the web. Sci. Am. (June).Google Scholar
- CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG, J., KUMAR,S.R.,RAGHAVAN, P., RAJAGO-PALAN, S., AND TOMKINS, A. 1999b. Mining the link structure of the WWW. IEEE Computer (Aug.). Google Scholar
- CARRI~RE,J.AND KAZMAN, R. 1997. Webquery: Searching and visualizing the web through connectivity. In Proceedings of the 6th International Conference on WWW. Google Scholar
- FRISSE, M. E. 1988. Searching for information in a hypertext medical handbook. Commun. ACM 31, 7 (July), 880-886. Google Scholar
- F~RNKRANZ, J. 1998. Using links for classifying Web-pages. Tech. Rep. TR-OEFAI-98-29. Austrian Research Institute for Artificial Intelligence.Google Scholar
- GALLAGER, R. G. 1996. Discrete Stochastic Processes. Kluwer Academic Publishers, Hingham, MA.Google Scholar
- GARFIELD, E. 1972. Citation analysis as a tool in journal evaluation. Science 178, 471-479.Google Scholar
- GIBSON, D., KLEINBERG, J., AND RAGHAVAN, P. 1998. Inferring Web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space-Structure in Hypermedia Systems (HYPERTEXT '98, Pittsburgh, PA, June 20-24), R. Akscyn, Chair. ACM Press, New York, NY, 225-234. Google Scholar
- KESSLER, M. M. 1963. Bibliographic coupling between scientific papers. Am. Doc. 14, 10-25.Google Scholar
- KLEINBERG, J. M. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of the 1998 ACM-SIAM Symposium on Discrete Algorithms (San Francisco CA, Jan.). ACM Press, New York, NY. Google Scholar
- KLEINBERG,J.M.,KUMAR, R., RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. S. 1999. The web as a graph: Measurements, models and methods. In Proceedings of the Fifth Interna-tional Conference on Computing and Combinatorics. Google Scholar
- LAW, K., TONG, T., AND WONG, A. 1999. Automatic categorization based on link structure. http://www.stanford.edu/tomtong/cs349/web.htm.Google Scholar
- LEMPEL,R.AND MORAN, S. 2000. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Tech. Rep. CS-2000-06. Electrical Engineering Department, Tech-nion Israel Institute of Technology, Haifa, Israel.Google Scholar
- MARCHIORI, M. 1997. The quest for correct information on the Web: Hyper search engines. In Proceedings of the 6th International Conference on WWW. Google Scholar
- PAPADIMITRIOU,C.H.,TAMAKI, H., RAGHAVAN, P., AND VEMPALA, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '98, Seattle, WA, June 1-3), A. Mendelson and J. Paredaens, Chairs. ACM Press, New York, NY, 159-168. Google Scholar
- PIROLLI, P., PITKOW, J., AND RAO, R. 1996. Silk from a sow's ear: extracting usable structures from the Web. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '96, Vancouver, B.C., Apr. 13-18), M. J. Tauber, Ed. ACM Press, New York, NY, 118-125. Google Scholar
- SMALL, H. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265-269.Google Scholar
- VAN RIJSBERGEN, C. J. 1979. Information Retrieval. 2nd ed. Butterworths, London, UK. Google Scholar
- WEISS, R., V~LEZ, B., SHELDON,M.A.,NANPREMPRE, C., SZILAGYI, P., DUDA, P., AND GIFFORD,D. 1996. HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In Proceedings of the Seventh ACM Conference on Hypertext '96 (Washington, D.C., Mar. 16-20), D. Stotts, Chair. ACM Press, New York, NY, 180-193. Google Scholar
Index Terms
- SALSA: the stochastic approach for link-structure analysis
Recommendations
Comparing the effectiveness of hits and salsa
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementThis paper compares the effectiveness of two well-known query-dependent link-based ranking algorithms, "Hyperlink-Induced Topic Search" (HITS) and the "Stochastic Approach for Link-Structure Analysis" (SALSA). The two algorithms are evaluated on a very ...
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
AbstractToday, when searching for information on the World Wide Web, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web sites whose contents match the query. For broad ...
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Comments