skip to main content
10.1145/1247480.1247495acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Spark: top-k keyword query in relational databases

Authors Info & Claims
Published:11 June 2007Publication History

ABSTRACT

With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly.

In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with previous approaches, our new ranking method is simple yet effective, and agrees with human perceptions. We also study efficient query processing methods for the new ranking method, and propose algorithms that have minimal accesses to the database. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency.

References

  1. S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475--486, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages 431--440, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. C.-C. Chang and S. Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD, pages 346--357, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chaudhuri, R. Ramakrishnan, and G. Weikum. Integrating db and ir technologies: What is the sound of one hand clapping? In CIDR, pages 1--12, 2005.Google ScholarGoogle Scholar
  7. R. Cyganiak. D2RQ benchemarking. http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2rq/benchmarks/.Google ScholarGoogle Scholar
  8. G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, pages 451--462, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Grabs, K. Böhm, and H. J. Schek. PowerDB-IR-information retrieval on top of a database cluster. In CIKM, pages 411--418, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD 1999, pages 287--298, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670--681, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. VLDB Journal, 13(3):207--221, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Kimelfeld and Y. Sagiv. Efficient engines for keyword proximity search. In WebDB, pages 67--72, 2005.Google ScholarGoogle Scholar
  20. B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, pages 173--182, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Koutrika, A. Simitsis, and Y. Ioannidis. Précis: The essence of a query answer. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Liu, C. T. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD, pages 563--574, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Luo, X. Lin, W. Wang, and X. Zhou. SPARK: Top-k keyword query in relational databases. Technical Report 0708, School of Computer Science and Engineering, University of New South Wales, 2007.Google ScholarGoogle Scholar
  24. N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggregation of ranked inputs. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Natsev, Y. C. Chang, J. R. Smith, C. S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In VLDB, pages 281--290, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. E. Robertson, H. Zaragoza, and M. J. Taylor. Simple bm25 extension to multiple weighted fields. In CIKM, pages 42--49, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. E. Rose and D. R. Cutting. Ranking for usability: Enhanced retrieval for short queries. Technical Report 163, Apple Technical Report, 1996.Google ScholarGoogle Scholar
  28. G. Salton, E. A. Fox, and H. Wu. Extended boolean information retrieval. Communication of the ACM, 26(11):1022--1036, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Sayyadan, H. LeKhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  30. Q. Su and J. Widom. Indexing relational database content offline for efficient keyword-based search. In IDEAS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Wilkinson, J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In TREC, 1995.Google ScholarGoogle Scholar
  32. D. Xin, C. Chen, and J. Han. Towards robust indexing for ranked queries. In VLDB, pages 235--246, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Spark: top-k keyword query in relational databases

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
          June 2007
          1210 pages
          ISBN:9781595936868
          DOI:10.1145/1247480
          • General Chairs:
          • Lizhu Zhou,
          • Tok Wang Ling,
          • Program Chair:
          • Beng Chin Ooi

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 June 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader