skip to main content
10.1145/2063576.2063792acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Index structures and top-k join algorithms for native keyword search databases

Authors Info & Claims
Published:24 October 2011Publication History

ABSTRACT

For supporting keyword search on structured data, current solutions require large indexes to be built that redundantly store subgraphs called neighborhoods. Further, for exploring keyword search results, large graphs have to be loaded into memory. We propose a solution, which employs much more compact index structures for neighborhood lookups. Using these indexes, we reduce keyword search result exploration to the traditional database problem of top-k join processing, enabling results to be computed efficiently. In particular, this computation can be performed on data streams successively loaded from disk (i.e., does not require the entire input to be loaded at once into memory). For supporting this, we propose a top-k procedure based on the rank join operator, which not only computes the k-best results, but also selects query plans in a top-k fashion during the process. In experiments using large real-world datasets, our solution reduced storage requirements and also outperformed the state-of-the-art in terms of performance and scalability.

References

  1. J. Cheng and J. X. Yu. On-line exact shortest distance query processing. In EDBT, pages 481--492, Saint Petersburg, Russia, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. In ACM-SIAM Symposium on Discrete algorithms, pages 937--946, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, pages 836--845. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: ranked keyword searches on graphs. In SIGMOD Conference, pages 305--316, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient ir-style keyword search over relational databases. In VLDB, pages 850--861, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Hwang, V. Hristidis, and Y. Papakonstantinou. Objectrank: a system for authority-based search on databases. In SIGMOD Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Joining ranked inputs in practice. In VLDB, pages 950--961, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Ladwig and T. Tran. Index structures and top-k join algorithms for native keyword search databases. Technical report, 2011. http://people.aifb.kit.edu/gla/tr/cikm_kwtopk.pdf.Google ScholarGoogle Scholar
  11. G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD Conference, pages 903--914, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Liu, C. T. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In S. Chaudhuri, V. Hristidis, and N. Polyzotis, editors, SIGMOD Conference, pages 563--574. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: top-k keyword query in relational databases. In SIGMOD Conference, pages 115--126, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Qin, J. Yu, and L. Chang. Ten thousand sqls: Parallel keyword queries computing. PVLDB, 3(1):58--69, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Qin, J. X. Yu, and L. Chang. Keyword search in databases: the power of rdbms. In SIGMOD Conference, pages 681--694, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Schenkel, A. Theobald, and G. Weikum. HOPI: an efficient connection index for complex XML document collections. In EDBT, pages 665--666. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  17. T. Tran, H. Wang, S. Rudolph, and P. Cimiano. Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. X. Yu, L. Qin, and L. Chang. Keyword search in relational databases: A survey. IEEE Data Eng. Bull., 33(1):67--78, 2010.Google ScholarGoogle Scholar

Index Terms

  1. Index structures and top-k join algorithms for native keyword search databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
        October 2011
        2712 pages
        ISBN:9781450307178
        DOI:10.1145/2063576

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 October 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader