ABSTRACT
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly.
In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with previous approaches, our new ranking method is simple yet effective, and agrees with human perceptions. We also study efficient query processing methods for the new ranking method, and propose algorithms that have minimal accesses to the database. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency.
- S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002. Google ScholarDigital Library
- H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475--486, 2006. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages 431--440, 2002. Google ScholarDigital Library
- S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
- K. C.-C. Chang and S. Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD, pages 346--357, 2002. Google ScholarDigital Library
- S. Chaudhuri, R. Ramakrishnan, and G. Weikum. Integrating db and ir technologies: What is the sound of one hand clapping? In CIDR, pages 1--12, 2005.Google Scholar
- R. Cyganiak. D2RQ benchemarking. http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2rq/benchmarks/.Google Scholar
- G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, pages 451--462, 2006. Google ScholarDigital Library
- B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, 2007.Google ScholarCross Ref
- R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarDigital Library
- R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In VLDB, 1998. Google ScholarDigital Library
- T. Grabs, K. Böhm, and H. J. Schek. PowerDB-IR-information retrieval on top of a database cluster. In CIKM, pages 411--418, 2001. Google ScholarDigital Library
- P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD 1999, pages 287--298, 1999. Google ScholarDigital Library
- V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In VLDB, 2003. Google ScholarDigital Library
- V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670--681, 2002. Google ScholarDigital Library
- I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. VLDB Journal, 13(3):207--221, 2004. Google ScholarDigital Library
- V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005. Google ScholarDigital Library
- B. Kimelfeld and Y. Sagiv. Efficient engines for keyword proximity search. In WebDB, pages 67--72, 2005.Google Scholar
- B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, pages 173--182, 2006. Google ScholarDigital Library
- G. Koutrika, A. Simitsis, and Y. Ioannidis. Précis: The essence of a query answer. In ICDE, 2006. Google ScholarDigital Library
- F. Liu, C. T. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD, pages 563--574, 2006. Google ScholarDigital Library
- Y. Luo, X. Lin, W. Wang, and X. Zhou. SPARK: Top-k keyword query in relational databases. Technical Report 0708, School of Computer Science and Engineering, University of New South Wales, 2007.Google Scholar
- N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggregation of ranked inputs. In ICDE, 2006. Google ScholarDigital Library
- A. Natsev, Y. C. Chang, J. R. Smith, C. S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In VLDB, pages 281--290, 2001. Google ScholarDigital Library
- S. E. Robertson, H. Zaragoza, and M. J. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM, pages 42--49, 2004. Google ScholarDigital Library
- D. E. Rose and D. R. Cutting. Ranking for usability: Enhanced retrieval for short queries. Technical Report 163, Apple Technical Report, 1996.Google Scholar
- G. Salton, E. A. Fox, and H. Wu. Extended boolean information retrieval. Communication of the ACM, 26(11):1022--1036, 1983. Google ScholarDigital Library
- M. Sayyadan, H. LeKhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.Google ScholarCross Ref
- Q. Su and J. Widom. Indexing relational database content offline for efficient keyword-based search. In IDEAS, 2005. Google ScholarDigital Library
- R. Wilkinson, J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In TREC, 1995.Google Scholar
- D. Xin, C. Chen, and J. Han. Towards robust indexing for ranked queries. In VLDB, pages 235--246, 2006. Google ScholarDigital Library
Index Terms
- Spark: top-k keyword query in relational databases
Recommendations
SPARK2: Top-k Keyword Query in Relational Databases
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query ...
EasyKSORD: A Platform of Keyword Search Over Relational Databases
WISM '09: Proceedings of the International Conference on Web Information Systems and MiningKeyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. ...
Effective Top-k Keyword Search in Relational Databases Considering Query Semantics
Advances in Web and Network Technologies, and Information ManagementKeyword search in relational databases has recently emerged as a new research topic. As a search result is often assembled from multiple relational tables, existing IR-style ranking strategies can not be applied directly. In this paper, we propose a ...
Comments