Abstract
Distributed Information Retrieval (DIR) is a generic area of research that brings together techniques, such as resource selection and results aggregation, dealing with data that, for organizational or technical reasons, cannot be managed centrally. Existing and potential applications of DIR methods vary from blog retrieval to aggregated search and from multimedia and multilingual retrieval to distributed Web search. In this tutorial we briefly discuss main DIR phases, that are resource description, resource selection, results merging and results presentation. The main focus is made on applications of DIR techniques: blog, expert and desktop search, aggregated search and personal meta-search, multimedia and multilingual retrieval. We also discuss a number of potential applications of DIR techniques, such as distributed Web search, enterprise search and aggregated mobile search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arguello, J., Callan, J., Diaz, F.: Classification-based resource selection. In: Proceedings of CIKM, pp. 1277–1286. ACM (2009)
Arguello, J., Diaz, F., Callan, J.: Learning to aggregate vertical results into web search results. In: Proceedings of CIKM, pp. 201–210 (2011)
Arguello, J., Diaz, F., Callan, J., Crespo, J.F.: Sources of evidence for vertical selection. In: Proceedings of SIGIR, pp. 315–322 (2009)
Baeza-Yates, R., Murdock, V., Hauff, C.: Efficiency trade-offs in two-tier web search systems. In: Proceedings of SIGIR, pp. 163–170 (2009)
Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of SIGIR, pp. 21–28 (1995)
Callan, J.: Advances in Information Retrieval. In: Distributed Information Retrieval, vol. ch. 5, pp. 127–150. Kluwer Academic Publishers (2000)
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions of Information Systems 19(2), 97–130 (2001)
Callan, J., Crestani, F., Nottelmann, H., Pala, P., Shou, X.M.: Resource selection and data fusion in multimedia distributed digital libraries. In: Proceedings of SIGIR, pp. 363–364 (2003)
Cambazoglu, B.B., Plachouras, V., Baeza-Yates, R.: Quantifying performance and quality gains in distributed web search engines. In: Proceedings of SIGIR, pp. 411–418 (2009)
Cambazoglu, B.B., Varol, E., Kayaaslan, E., Aykanat, C., Baeza-Yates, R.: Query forwarding in geographically distributed search engines. In: Proceedings of SIGIR, pp. 90–97 (2010)
Elsas, J.L., Arguello, J., Callan, J., Carbonell, J.G.: Retrieval and feedback models for blog feed search. In: Proceedings of SIGIR, pp. 347–354 (2008)
Hong, D., Si, L., Bracke, P., Witt, M., Juchcinski, T.: A joint probabilistic classification model for resource selection. In: Proceedings of SIGIR, pp. 98–105 (2010)
Kim, J., Croft, W.B.: Ranking using multiple document types in desktop search. In: Proceedings of SIGIR. pp. 50–57 (2010)
Kulkarni, A., Callan, J.: Document allocation policies for selective searching of distributed indexes. In: Proceedings of CIKM, pp. 449–458 (2010)
Markov, I.: Modeling document scores for distributed information retrieval. In: Proceedings of SIGIR, pp. 1321–1322 (2011)
Markov, I., Arampatzis, A., Crestani, F.: Unsupervised linear score normalization revisited. In: Proceedings of SIGIR, pp. 1161–1162 (2012)
Markov, I., Arampatzis, A., Crestani, F.: On CORI results merging. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Agichtein, S.R.E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 753–756. Springer, Heidelberg (2013)
Markov, I., Azzopardi, L., Crestani, F.: Reducing the uncertainty in resource selection. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Agichtein, S.R.E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 507–519. Springer, Heidelberg (2013)
Nguyen, D., Demeester, T., Trieschnigg, D., Hiemstra, D.: Federated search in the wild: the combined power of over a hundred search engines. In: Proceedings of CIKM, pp. 1874–1878 (2012)
Paltoglou, G., Salampasis, M., Satratzemi, M.: Integral based source selection for uncooperative distributed information retrieval environments. In: Proceedings of the ACM LSDS-IR Workshop, pp. 67–74 (2008)
Seo, J., Croft, W.B.: Blog site search using resource selection. In: Proceedings of CIKM, pp. 1053–1062 (2008)
Shokouhi, M.: Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)
Shokouhi, M., Si, L.: Federated search. Foundations and Trends in Information Retrieval 5, 1–102 (2011)
Shokouhi, M., Zobel, J.: Robust result merging using sample-based score estimates. ACM Transactions of Information Systems 27(3), 1–29 (2009)
Si, L., Callan, J.: Using sampled data and regression to merge search engine results. In: Proceedings of SIGIR, pp. 19–26 (2002)
Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proceedings of SIGIR, pp. 298–305 (2003)
Si, L., Callan, J., Cetintas, S., Yuan, H.: An effective and efficient results merging strategy for multilingual information retrieval in federated search environments. Information Retrieval 11(1), 1–24 (2008)
Sushmita, S., Joho, H., Lalmas, M., Villa, R.: Factors affecting click-through behavior in aggregated search interfaces. In: Proceedings of CIKM, pp. 519–528 (2010)
Thomas, P.: To what problem is distributed information retrieval the solution? Journal of the American Society for Information Science and Technology 63(7), 1471–1476 (2012)
Thomas, P., Hawking, D.: Server selection methods in personal metasearch: a comparative empirical study. Information Retrieval 12(5), 581–604 (2009)
Thomas, P., Noack, K., Paris, C.: Evaluating interfaces for government metasearch. In: Proceedings of IIiX, pp. 65–74 (2010)
Thomas, P., Shokouhi, M.: Sushi: scoring scaled samples for server selection. In: Proceedings of SIGIR, pp. 419–426 (2009)
Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of SIGIR, pp. 254–261 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crestani, F., Markov, I. (2013). Distributed Information Retrieval and Applications. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_104
Download citation
DOI: https://doi.org/10.1007/978-3-642-36973-5_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)