research-article

A model for mining relevant and non-redundant information

Authors:
Laura Langohr

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

,
Hannu Toivonen

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingMarch 2012Pages 132–137https://doi.org/10.1145/2245276.2245304

Published:26 March 2012Publication History

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

Pages 132–137

ABSTRACT

We propose a relatively simple yet powerful model for choosing relevant and non-redundant pieces of information. The model addresses data mining or information retrieval settings where relevance is measured with respect to a set of key or query objects, either specified by the user or obtained by a data mining step. The problem addressed is not only to identify other relevant objects, but also ensure that they are not related to possible negative query objects, and that they are not redundant with respect to each other.

The model proposed here only assumes a similarity or distance function for the objects. It has simple parameterization to allow for different behaviors with respect to query objects. We analyze the model and give two efficient, approximate methods. We illustrate and evaluate the proposed model on different applications: linguistics and social networks. The results indicate that the model and methods are useful in finding a relevant and non-redundant set of results.

While this area has been a popular topic of research, our contribution is to provide a simple, generic model that covers several related approaches while providing a systematic model for taking account of positive and negative query objects as well as non-redundancy of the output.

References

J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR '98, pages 335--336, 1998. Google ScholarDigital Library
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR '08, pages 659--666, 2008. Google ScholarDigital Library
U. Feige, V. S. Mirrokni, and J. Vondrák. Maximizing non-monotone submodular functions. In FOCS '07, pages 461--471, 2007. Google ScholarDigital Library
B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315: 972--976, 2007.Google ScholarCross Ref
F. C. Gey, A. Chen, J. He, L. Xu, and J. Meggs. Term importance. Boolean conjunct training, negative terms, and foreign language retrieval: Probabilists algorithms at TREC-5. In TREC-5, 1996.Google Scholar
S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW '09, pages 381--390, 2009. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20: 422--446, 2002. Google ScholarDigital Library
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Core algorithms in the CLEVER system. ACM Transactions on Internet Technology, 6(2): 131--152, 2006. Google ScholarDigital Library
A. Lad and Y. Yang. Learning to rank relevant and novel documents through user feedback. In CIKM '10, pages 469--478, 2010. Google ScholarDigital Library
L. Langohr and T. Toivonen. Finding representative nodes in probabilistic graphs. In WEIN at ECML PKDD '09, pages 65--76, 2009.Google Scholar
T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in social networks. In KDD '09, pages 467--476. ACM, 2009. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
C. D. Meyer. Matrix Ananlysis and Applied Linear Algebra. SIAM, 2000. Google ScholarDigital Library
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions --- I. Mathematical Programming, 14: 265--294, 1978.Google ScholarDigital Library
F. Pan, W. Wang, A. K. H. Tung, and J. Yang. Finding representative set from massive data. In ICDM '05, pages 338--345, 2005. Google ScholarDigital Library
M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. In VLDB '10, 2010. Google ScholarDigital Library
F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML '08, pages 784--791, 2008. Google ScholarDigital Library
J. C. Riquelme, J. S. Aguilar-Ruiz, and M. Toro. Finding representative patterns with ordered projections. Pattern Recognition, 36(4): 1009--1018, 2003.Google ScholarCross Ref
X. Wang, H. Fang, and C. X. Zhai. A study of methods for negative relevance feedback. In SIGIR '08, pages 219--226, 2008. Google ScholarDigital Library
Z. Xu and R. Akella. Active relevance feedback for difficult queries. In CIKM '08, pages 459--468, 2008. Google ScholarDigital Library
D. Yan, L. Huang, and M. I. Jordan. Fast approximate spectral clustering. In KDD '09, pages 907--916, 2009. Google ScholarDigital Library

Index Terms

A model for mining relevant and non-redundant information
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
    2. Retrieval models and ranking
  2. Information systems applications
    1. Data mining

Recommendations

Adaptable similarity search using non-relevant information
VLDB '02: Proceedings of the 28th international conference on Very Large Data Bases

Many modern database applications require content-based similarity search capability in numeric attribute space. Further, users' notion of similarity varies between search sessions. Therefore online techniques for adaptively refining the similarity ...
Read More
A relevant score normalization method using shannon's information measure
ICADL'05: Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences

Given the ranked lists of images with relevance scores returned by multiple image retrieval subsystems in response to a given query, the problem of combined retrieval system is how to combine these lists equivalently. In this paper, we propose a novel ...
Read More
Improvement of vector space information retrieval model based on supervised learning
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

This paper proposes and method to improve retrieval performance of the vector space model (VSM) by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
Conference Chairs:
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Paola Lecca
The Microsoft Research - University of Trento COSBI, Italy
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 March 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SAC '12 Paper Acceptance Rate270of1,056submissions,26%Overall Acceptance Rate1,650of6,669submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 109
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A model for mining relevant and non-redundant information

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptable similarity search using non-relevant information

A relevant score normalization method using shannon's information measure

Improvement of vector space information retrieval model based on supervised learning