skip to main content
10.1145/2245276.2245304acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A model for mining relevant and non-redundant information

Published:26 March 2012Publication History

ABSTRACT

We propose a relatively simple yet powerful model for choosing relevant and non-redundant pieces of information. The model addresses data mining or information retrieval settings where relevance is measured with respect to a set of key or query objects, either specified by the user or obtained by a data mining step. The problem addressed is not only to identify other relevant objects, but also ensure that they are not related to possible negative query objects, and that they are not redundant with respect to each other.

The model proposed here only assumes a similarity or distance function for the objects. It has simple parameterization to allow for different behaviors with respect to query objects. We analyze the model and give two efficient, approximate methods. We illustrate and evaluate the proposed model on different applications: linguistics and social networks. The results indicate that the model and methods are useful in finding a relevant and non-redundant set of results.

While this area has been a popular topic of research, our contribution is to provide a simple, generic model that covers several related approaches while providing a systematic model for taking account of positive and negative query objects as well as non-redundancy of the output.

References

  1. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR '98, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR '08, pages 659--666, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. U. Feige, V. S. Mirrokni, and J. Vondrák. Maximizing non-monotone submodular functions. In FOCS '07, pages 461--471, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315: 972--976, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. F. C. Gey, A. Chen, J. He, L. Xu, and J. Meggs. Term importance. Boolean conjunct training, negative terms, and foreign language retrieval: Probabilists algorithms at TREC-5. In TREC-5, 1996.Google ScholarGoogle Scholar
  6. S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW '09, pages 381--390, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20: 422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Core algorithms in the CLEVER system. ACM Transactions on Internet Technology, 6(2): 131--152, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Lad and Y. Yang. Learning to rank relevant and novel documents through user feedback. In CIKM '10, pages 469--478, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Langohr and T. Toivonen. Finding representative nodes in probabilistic graphs. In WEIN at ECML PKDD '09, pages 65--76, 2009.Google ScholarGoogle Scholar
  11. T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in social networks. In KDD '09, pages 467--476. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. D. Meyer. Matrix Ananlysis and Applied Linear Algebra. SIAM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions --- I. Mathematical Programming, 14: 265--294, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Pan, W. Wang, A. K. H. Tung, and J. Yang. Finding representative set from massive data. In ICDM '05, pages 338--345, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. In VLDB '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML '08, pages 784--791, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. C. Riquelme, J. S. Aguilar-Ruiz, and M. Toro. Finding representative patterns with ordered projections. Pattern Recognition, 36(4): 1009--1018, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. X. Wang, H. Fang, and C. X. Zhai. A study of methods for negative relevance feedback. In SIGIR '08, pages 219--226, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Xu and R. Akella. Active relevance feedback for difficult queries. In CIKM '08, pages 459--468, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Yan, L. Huang, and M. I. Jordan. Fast approximate spectral clustering. In KDD '09, pages 907--916, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A model for mining relevant and non-redundant information

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
          March 2012
          2179 pages
          ISBN:9781450308571
          DOI:10.1145/2245276
          • Conference Chairs:
          • Sascha Ossowski,
          • Paola Lecca

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 March 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SAC '12 Paper Acceptance Rate270of1,056submissions,26%Overall Acceptance Rate1,650of6,669submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader