Skip to main content

The Linear Combination Data Fusion Method in Information Retrieval

  • Conference paper
Book cover Database and Expert Systems Applications (DEXA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6861))

Included in the following conference series:

Abstract

In information retrieval, data fusion has been investigated by many researchers. Previous investigation and experimentation demonstrate that the linear combination method is an effective data fusion method for combining multiple information retrieval results. One advantage is its flexibility since different weights can be assigned to different component systems so as to obtain better fusion results. However, how to obtain suitable weights for all the component retrieval systems is still an open problem.

In this paper, we use the multiple linear regression technique to obtain optimum weights for all involved component systems. Optimum is in the least squares sense that minimize the difference between the estimated scores of all documents by linear combination and the judged scores of those documents. Our experiments with four groups of runs submitted to TREC show that the linear combination method with such weights steadily outperforms the best component system and other major data fusion methods such as CombSum, CombMNZ, and the linear combination method with performance level/performance square weighting schemas by large margins.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference, New Orleans, Louisiana, USA, pp. 276–284 (September 2001)

    Google Scholar 

  2. Bartell, B.T., Cottrell, G.W., Belew, R.K.: Automatic combination of multiple ranked retrieval systems. In: Proceedings of ACM SIGIR 1994, Dublin, Ireland, pp. 173–184 (July 1994)

    Google Scholar 

  3. Calvé, A.L., Savoy, J.: Database merging strategy based on logistic regression. Information Processing & Management 36(3), 341–359 (2000)

    Article  Google Scholar 

  4. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the Tenth International World Wide Web Conference, Hong Kong, China, pp. 613–622 (May 2001)

    Google Scholar 

  5. Efron, M.: Generative model-based metasearch for data fusion in information retrieval. In: Proceedings of the 2009 Joint International Conference on Digital Libraries, Austin, USA, pp. 153–162 (June 2009)

    Google Scholar 

  6. Farah, M., Vanderpooten, D.: An outranking approach for rank aggregation in information retrieval. In: Proceedings of the 30th ACM SIGIR Conference, Amsterdam, The Netherlands, pp. 591–598 (July 2007)

    Google Scholar 

  7. Fox, E.A., Koushik, M.P., Shaw, J., Modlin, R., Rao, D.: Combining evidence from multiple searches. In: The First Text REtrieval Conference (TREC-1), Gaitherburg, MD, USA, pp. 319–328 (March 1993)

    Google Scholar 

  8. Fox, E.A., Shaw, J.: Combination of multiple searches. In: The Second Text REtrieval Conference (TREC-2), Gaitherburg, MD, USA, pp. 243–252 (August 1994)

    Google Scholar 

  9. Juárez-González, A., Montes y Gómez, M., Pineda, L., Avendaño., D., Pérez-Coutiño, M.: Selecting the n-top retrieval result lists for an effective data fusion. In: Proceedings of 11th International Conference on Computational Linguistics and Intelligent Text Processing, Iasi, Romania, pp. 580–589 (March 2010)

    Google Scholar 

  10. Lee, J.H.: Analysis of multiple evidence combination. In: Proceedings of the 20th Annual International ACM SIGIR Conference, Philadelphia, Pennsylvania, USA, pp. 267–275 (July 1997)

    Google Scholar 

  11. Lillis, D., Zhang, L., Toolan, F., Collier, R., Leonard, D., Dunnion, J.: Estimating probabilities for effective data fusion. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, pp. 347–354 (July 2010)

    Google Scholar 

  12. Manmatha, R., Rath, T., Feng, F.: Modelling score distributions for combining the outputs of search engines. In: Proceedings of the 24th Annual International ACM SIGIR Conference, New Orleans, USA, pp. 267–275 (September 2001)

    Google Scholar 

  13. Montague, M., Aslam, J.A.: Relevance score normalization for metasearch. In: Proceedings of ACM CIKM Conference, Berkeley, USA, pp. 427–433 (November 2001)

    Google Scholar 

  14. Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of ACM CIKM Conference, USA, pp. 538–548 (November 2002)

    Google Scholar 

  15. Nottelmann, H., Fuhr, N.: From retrieval status values to probabilities of relevance for advanced ir applications. Information Retrieval 6(3-4), 363–388 (2003)

    Article  MATH  Google Scholar 

  16. Renda, M.E., Straccia, U.: Web metasearch: rank vs. score based rank aggregation methods. In: Proceedings of ACM 2003 Symposium of Applied Computing, Melbourne, USA, pp. 847–452 (April 2003)

    Google Scholar 

  17. Shokouhi, M.: Segmentation of search engine results for effective data-fusion. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 185–197. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Thompson, P.: Description of the PRC CEO algorithms for TREC. In: The First Text REtrieval Conference (TREC-1), Gaitherburg, MD, USA, pp. 337–342 (March 1993)

    Google Scholar 

  19. Vogt, C.C., Cottrell, G.W.: Predicting the performance of linearly combined IR systems. In: Proceedings of the 21st Annual ACM SIGIR Conference, Melbourne, Australia, pp. 190–196 (August 1998)

    Google Scholar 

  20. Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)

    Article  Google Scholar 

  21. Wu, S.: Applying statistical principles to data fusion in information retrieval. Expert Systems with Applications 36(2), 2997–3006 (2009)

    Article  Google Scholar 

  22. Wu, S., Bi, Y., McClean, S.: Regression relevance models for data fusion. In: Proceedings of the 18th International Workshop on Database and Expert Systems Applications, Regensburg, Germany, pp. 264–268 (September 2007)

    Google Scholar 

  23. Wu, S., Bi, Y., Zeng, X., Han, L.: Assigning appropriate weights for the linear combination data fusion method in information retrieval. Information Processing & Management 45(4), 413–426 (2009)

    Article  Google Scholar 

  24. Wu, S., Crestani, F.: Data fusion with estimated weights. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, pp. 648–651 (November 2002)

    Google Scholar 

  25. Wu, S., Crestani, F., Bi, Y.: Evaluating score normalization methods in data fusion. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 642–648. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  26. Wu, S., McClean, S.: Data fusion with correlation weights. In: Proceedings of the 27th European Conference on Information Retrieval, pp. 275–286. Santiago de Composite, Spain (2005)

    Google Scholar 

  27. Wu, S., McClean, S.: Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion. Journal of American Society for Information Science and Technology 57(14), 1962–1973 (2006)

    Article  Google Scholar 

  28. Zhou, D., Lawless, S., Min, J., Wade, V.: A late fusion approach to cross-lingual document re-ranking. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management, Toronto, Canada, pp. 1433–1436 (October 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, S., Bi, Y., Zeng, X. (2011). The Linear Combination Data Fusion Method in Information Retrieval. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6861. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23091-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23091-2_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23090-5

  • Online ISBN: 978-3-642-23091-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics