ABSTRACT
RDBMS provides best performance for querying structured data that starts out with a well-defined schema. However, such a 'schema first, data later' approach does not work for unstructured data or data without much structure. Therefore, RDBMS typically stores such data without any schema in LOB columns (for example, Character Large Object (CLOB) or Binary Large Object (BLOB) columns) and provides Information-Retrieval (IR) style, keyword-based search capability over these LOB columns. Lately, XML as a native datatype (XMLType) in RDBMS has been introduced via the SQL/XML standard. Semi-structured data with or without any schema can be stored into such XMLType columns, and XQuery provides query capability over them. In particular, XQuery full text specification provides the capability of searching keywords within document context. Such full context-aware text search capability is more powerful than pure keyword search, since the user can now provide fine-grained context in which the keywords should occur. However, XML with XQuery full text searching requires that the user first convert her text data into XML and store them into XMLType column. Such massive physical data migration with possible loss of document fidelity and its potential impact on existing production environments are often expensive enough that users are reluctant to adopt the XML/XQuery approach.
In this paper, we propose a pay-as-you-go architecture to provide XML text view over LOB columns, so that user can take advantage of context-aware full-text search capability adaptively. This adaptive architecture includes a novel XML text index that can be created over the LOB column where the content is stored. The XML text index supports an XML text view over LOB data on top of which XQuery full-text search capability is feasible. Such an adaptive index/view approach provides least intrusion over existing data, as it requires no physical data migration. We describe the design and challenge of building such an adaptive XML text index. Furthermore, we advocate that the pay-as-you-go approach provides the integration bridge between the structured relational world and text oriented document world and fulfills the primary motivation of XML in the database.
- G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarDigital Library
- R. A. Baeza-Yates, G. Navarro: Integrating Contents and Structure in Text Retrieval. SIGMOD Record 25(1): 67--79 (1996). Google ScholarDigital Library
- J. Zobel, A. Moffat: Inverted files for text search engines. ACM Comput. Surv. 38(2): (2006). Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, J. Shanmugasundaram: XRANK: Ranked Keyword Search over XML Documents. SIGMOD Conference 2003: 16--27 Google ScholarDigital Library
- Z. H Liu, S. Chandrasekar, T. Baby, H. J. Chang: Towards a physical XML independent XQuery/SQL/XML engine. PVLDB 1(2): 1356--1367 (2008) Google ScholarDigital Library
- E. W. Brown, J. P. Callan, W. B. Croft: Fast Incremental Indexing for Full-Text Information Retrieval. VLDB 1994: 192--202 Google ScholarDigital Library
- A. Tomasic, H. Garcia-Molina, K. A. Shoens: Incremental Updates of Inverted Lists for Text Document Retrieval. SIGMOD Conference 1994: 289--300 Google ScholarDigital Library
- J. Zobel, A. Moffat, R. Sacks-Davis: An Efficient Indexing Technique for Full Text Databases. VLDB 1992: 352--362 Google ScholarDigital Library
- S. R. Vasanthakumar, J. P. Callan, W. B. Croft: Integrating INQUERY with an RDBMS to Support Text Retrieval. IEEE Data Eng. Bull. 19(1): 24--33 (1996).Google Scholar
- I. O. for Standardization (ISO). Information Technology-Database Language SQL-Part 14: XML-Related Specifications (SQL/XML)Google Scholar
- Z. H. Liu, M. Krishnaprasad, H. J. Chang, V. Arora: XMLTable Index - An Efficient Way of Indexing and Querying XML Property Data, ICDE 2007Google Scholar
- M. Stonebraker, P. Brown, D. Moore: Object-Relational DBMSs, Second Edition Morgan Kaufmann 1998 Google ScholarDigital Library
- D. Florescu, D. Kossmann, I. Manolescu: Integrating Keyword Search into XML Query Processing. BDA 2000.Google ScholarDigital Library
- J. Srinivasan, R. Murthy, S. Sundara, N. Agarwal, S. DeFazio: Extensible Indexing: A Framework for Integrating Domain-Specific Indexing Schemes into Oracle8i. ICDE 2000: 91--100 Google ScholarDigital Library
- I. Tatarinov, E. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita: Storing and Querying Ordered XML Using a Relational Database System: SIGMOD 2002 Google ScholarDigital Library
- J. Srinivasan, S. Das, C. Freiwald, E. I. Chong, M. Jagannath, A. Yalamanchi, R. Krishnan, A. Tran, S. DeFazio, J. Banerjee: Oracle8i Index-Organized Table and Its Application to New Domains. VLDB 2000: 285--296. Google ScholarDigital Library
- "The XML Web: a First Study" http://www.cs.toronto.edu/db/WebPage/files/www2003.pdf Google ScholarDigital Library
- Z. H. Liu, M. Krishnaprasad, V. Arora: Native Xquery processing in oracle XMLDB. SIGMOD Conference 2005: 828--833 Google ScholarDigital Library
- M. Krishnaprasad, Z. H. Liu, A. Manikutty, J. W. Warner, V. Arora, S. Kotsovolos: Query Rewrite for XML in Oracle XML DB. VLDB 2004: 1122--1133 Google ScholarDigital Library
- A. Novoselsky, K. Karun : XSLTVM - An XSLT Virtual Machine http://dret.net/biblio/reference/nov00Google Scholar
- A. Eisenberg, Jim Melton: Advancements in SQL/XML. SIGMOD Record 33(3): 79--86 (2004) Google ScholarDigital Library
- http://www.w3.org/TR/xpath-full-text-10/Google Scholar
- Pat Case: Enhancing XML search with XQuery 1.0 and XPath 2.0 Full-Text. IBM Systems Journal 45(2): 353--360 (2006) Google ScholarDigital Library
- P. P. Chen: The Entity-Relationship Model - Toward a Unified View of Data. ACM Trans. Database Syst. 1(1): 9--36 (1976) Google ScholarDigital Library
- H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, C. Yu: Making database systems usable. SIGMOD Conference 2007: 13--24 Google ScholarDigital Library
- D. Tsirogiannis, S. Harizopoulos, M. A. Shah, J. L. Wiener, G. Graefe: Query processing techniques for solid state drives. SIGMOD Conference 2009: 59--72 Google ScholarDigital Library
- http://www.ibm.com/developerworks/data/library/techarticle/dm-0606seubert/Google Scholar
- http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10492/xdb09sea.htmGoogle Scholar
- http://msdn.microsoft.com/en-us/library/bb522491.aspxGoogle Scholar
- F. Shao, L. Guo, C. Botev, A. Bhaskar, M. Chettiar, F. Yang, J. Shanmugasundaram: Efficient keyword search over virtual XML views. VLDB J. 18(2): 543--570 (2009) Google ScholarDigital Library
- E. Curtmola, S. Amer-Yahia, P. Brown, M. F. Fernández: GalaTex: A Conformant Implementation of the XQuery Full-TextGoogle Scholar
- C. Grün, S. Gath, A. Holupirek, M. H. Scholl: XQuery Full Text Implementation in BaseX. XSym 2009: 114--128 Google ScholarDigital Library
- ISO/IEC 13249-2:2000, Information technology - Database languages - SQL Multimedia and Application Packages - Part 2: Full-Text, International Organization For Standardization, 2000.Google Scholar
- C. Kanne, G. Moerkotte: Efficient Storage of XML Data. ICDE 2000: 198Google Scholar
- G. Grafe, Query Evaluation Techniques for Large Databases, in ACMComputing Surverys, 25(2):73--170, 1993. Google ScholarDigital Library
- XMark: http://www.xml-benchmark.org/Google Scholar
- XQFT data model: http://www.w3.org/TR/2009/CR-xpath-full-text-10-20090709/#tq-ft-formalmodelGoogle Scholar
- XQuery Data Model: http://www.w3.org/TR/xpath-datamodel/Google Scholar
- F Zemke, M. Rys, K. Kulkarni, J. Michels, B. Reinwald, F. Oczan, Z. H. Liu, I. Davis, K. Hare, "XMLTable", ISO/IEC JTC1/SC32 WG3:SIA-051 ANSI NCITS H2 2004-039 http://www.wiscorp.com/H2-2004-039-xmltable.pdfGoogle Scholar
Index Terms
- Pay-as-you-go: an adaptive approach to provide full context-aware text search over document content
Recommendations
An XQuery engine for digital library systems
JCDL '03: Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital librariesXML is now a standard markup language for web information. Many application areas are producing XML documents on the web. This situation urges digital library systems to deal with not only typical text documents but also XML documents. XML documents are ...
XML Processing and Data Integration with XQuery
Most Web applications exchange data as XML, but they create and process this data with languages that don't have native support for XML. With appropriate middleware, XQuery can dramatically simplify this process, treating all data sources as though they ...
XLight, An Efficient Relational Schema to Store and Query XML Data
DSDE '10: Proceedings of the 2010 International Conference on Data Storage and Data EngineeringBecause of increasing use of XML data on the internet, the need for an efficient method of storing and querying XML data is vital. So far, two major types of system for XML data management have been introduced: XML Enabled systems and XML native ...
Comments