skip to main content
10.1145/1807167.1807281acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Pay-as-you-go: an adaptive approach to provide full context-aware text search over document content

Published:06 June 2010Publication History

ABSTRACT

RDBMS provides best performance for querying structured data that starts out with a well-defined schema. However, such a 'schema first, data later' approach does not work for unstructured data or data without much structure. Therefore, RDBMS typically stores such data without any schema in LOB columns (for example, Character Large Object (CLOB) or Binary Large Object (BLOB) columns) and provides Information-Retrieval (IR) style, keyword-based search capability over these LOB columns. Lately, XML as a native datatype (XMLType) in RDBMS has been introduced via the SQL/XML standard. Semi-structured data with or without any schema can be stored into such XMLType columns, and XQuery provides query capability over them. In particular, XQuery full text specification provides the capability of searching keywords within document context. Such full context-aware text search capability is more powerful than pure keyword search, since the user can now provide fine-grained context in which the keywords should occur. However, XML with XQuery full text searching requires that the user first convert her text data into XML and store them into XMLType column. Such massive physical data migration with possible loss of document fidelity and its potential impact on existing production environments are often expensive enough that users are reluctant to adopt the XML/XQuery approach.

In this paper, we propose a pay-as-you-go architecture to provide XML text view over LOB columns, so that user can take advantage of context-aware full-text search capability adaptively. This adaptive architecture includes a novel XML text index that can be created over the LOB column where the content is stored. The XML text index supports an XML text view over LOB data on top of which XQuery full-text search capability is feasible. Such an adaptive index/view approach provides least intrusion over existing data, as it requires no physical data migration. We describe the design and challenge of building such an adaptive XML text index. Furthermore, we advocate that the pay-as-you-go approach provides the integration bridge between the structured relational world and text oriented document world and fulfills the primary motivation of XML in the database.

References

  1. G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. A. Baeza-Yates, G. Navarro: Integrating Contents and Structure in Text Retrieval. SIGMOD Record 25(1): 67--79 (1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Zobel, A. Moffat: Inverted files for text search engines. ACM Comput. Surv. 38(2): (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Guo, F. Shao, C. Botev, J. Shanmugasundaram: XRANK: Ranked Keyword Search over XML Documents. SIGMOD Conference 2003: 16--27 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. H Liu, S. Chandrasekar, T. Baby, H. J. Chang: Towards a physical XML independent XQuery/SQL/XML engine. PVLDB 1(2): 1356--1367 (2008) Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. W. Brown, J. P. Callan, W. B. Croft: Fast Incremental Indexing for Full-Text Information Retrieval. VLDB 1994: 192--202 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Tomasic, H. Garcia-Molina, K. A. Shoens: Incremental Updates of Inverted Lists for Text Document Retrieval. SIGMOD Conference 1994: 289--300 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Zobel, A. Moffat, R. Sacks-Davis: An Efficient Indexing Technique for Full Text Databases. VLDB 1992: 352--362 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. R. Vasanthakumar, J. P. Callan, W. B. Croft: Integrating INQUERY with an RDBMS to Support Text Retrieval. IEEE Data Eng. Bull. 19(1): 24--33 (1996).Google ScholarGoogle Scholar
  10. I. O. for Standardization (ISO). Information Technology-Database Language SQL-Part 14: XML-Related Specifications (SQL/XML)Google ScholarGoogle Scholar
  11. Z. H. Liu, M. Krishnaprasad, H. J. Chang, V. Arora: XMLTable Index - An Efficient Way of Indexing and Querying XML Property Data, ICDE 2007Google ScholarGoogle Scholar
  12. M. Stonebraker, P. Brown, D. Moore: Object-Relational DBMSs, Second Edition Morgan Kaufmann 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Florescu, D. Kossmann, I. Manolescu: Integrating Keyword Search into XML Query Processing. BDA 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Srinivasan, R. Murthy, S. Sundara, N. Agarwal, S. DeFazio: Extensible Indexing: A Framework for Integrating Domain-Specific Indexing Schemes into Oracle8i. ICDE 2000: 91--100 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Tatarinov, E. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita: Storing and Querying Ordered XML Using a Relational Database System: SIGMOD 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Srinivasan, S. Das, C. Freiwald, E. I. Chong, M. Jagannath, A. Yalamanchi, R. Krishnan, A. Tran, S. DeFazio, J. Banerjee: Oracle8i Index-Organized Table and Its Application to New Domains. VLDB 2000: 285--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. "The XML Web: a First Study" http://www.cs.toronto.edu/db/WebPage/files/www2003.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. H. Liu, M. Krishnaprasad, V. Arora: Native Xquery processing in oracle XMLDB. SIGMOD Conference 2005: 828--833 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Krishnaprasad, Z. H. Liu, A. Manikutty, J. W. Warner, V. Arora, S. Kotsovolos: Query Rewrite for XML in Oracle XML DB. VLDB 2004: 1122--1133 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Novoselsky, K. Karun : XSLTVM - An XSLT Virtual Machine http://dret.net/biblio/reference/nov00Google ScholarGoogle Scholar
  21. A. Eisenberg, Jim Melton: Advancements in SQL/XML. SIGMOD Record 33(3): 79--86 (2004) Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. http://www.w3.org/TR/xpath-full-text-10/Google ScholarGoogle Scholar
  23. Pat Case: Enhancing XML search with XQuery 1.0 and XPath 2.0 Full-Text. IBM Systems Journal 45(2): 353--360 (2006) Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. P. Chen: The Entity-Relationship Model - Toward a Unified View of Data. ACM Trans. Database Syst. 1(1): 9--36 (1976) Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, C. Yu: Making database systems usable. SIGMOD Conference 2007: 13--24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Tsirogiannis, S. Harizopoulos, M. A. Shah, J. L. Wiener, G. Graefe: Query processing techniques for solid state drives. SIGMOD Conference 2009: 59--72 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. http://www.ibm.com/developerworks/data/library/techarticle/dm-0606seubert/Google ScholarGoogle Scholar
  28. http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb09sea.htmGoogle ScholarGoogle Scholar
  29. http://msdn.microsoft.com/en-us/library/bb522491.aspxGoogle ScholarGoogle Scholar
  30. F. Shao, L. Guo, C. Botev, A. Bhaskar, M. Chettiar, F. Yang, J. Shanmugasundaram: Efficient keyword search over virtual XML views. VLDB J. 18(2): 543--570 (2009) Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Curtmola, S. Amer-Yahia, P. Brown, M. F. Fernández: GalaTex: A Conformant Implementation of the XQuery Full-TextGoogle ScholarGoogle Scholar
  32. C. Grün, S. Gath, A. Holupirek, M. H. Scholl: XQuery Full Text Implementation in BaseX. XSym 2009: 114--128 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. ISO/IEC 13249-2:2000, Information technology - Database languages - SQL Multimedia and Application Packages - Part 2: Full-Text, International Organization For Standardization, 2000.Google ScholarGoogle Scholar
  34. C. Kanne, G. Moerkotte: Efficient Storage of XML Data. ICDE 2000: 198Google ScholarGoogle Scholar
  35. G. Grafe, Query Evaluation Techniques for Large Databases, in ACMComputing Surverys, 25(2):73--170, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. XMark: http://www.xml-benchmark.org/Google ScholarGoogle Scholar
  37. XQFT data model: http://www.w3.org/TR/2009/CR-xpath-full-text-10-20090709/#tq-ft-formalmodelGoogle ScholarGoogle Scholar
  38. XQuery Data Model: http://www.w3.org/TR/xpath-datamodel/Google ScholarGoogle Scholar
  39. F Zemke, M. Rys, K. Kulkarni, J. Michels, B. Reinwald, F. Oczan, Z. H. Liu, I. Davis, K. Hare, "XMLTable", ISO/IEC JTC1/SC32 WG3:SIA-051 ANSI NCITS H2 2004-039 http://www.wiscorp.com/H2-2004-039-xmltable.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. Pay-as-you-go: an adaptive approach to provide full context-aware text search over document content

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
          June 2010
          1286 pages
          ISBN:9781450300322
          DOI:10.1145/1807167

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 June 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader