ABSTRACT
There has been much recent interest in XML publish/subscribe systems. Some systems scale to thousands of concurrent queries, but support a limited query language (usually a fragment of XPath 1.0). Other systems support more expressive languages, but do not scale well with the number of concurrent queries. In this paper, we propose a set of novel query processing techniques, referred to as Massively Multi-Query Join Processing techniques, for processing a large number of XML stream queries involving value joins over multiple XML streams and documents. These techniques enable the sharing of representations of inputs to multiple joins, and the sharing of join computation. Our techniques are also applicable to relational event processing systems and publish/subscribe systems that support join queries. We present experimental results to demonstrate the effectiveness of our techniques. We are able to process thousands of XML messages with hundreds of thousands of join queries on real RSS feed streams. Our techniques gain more than two orders of magnitude speedup compared to the naive approach of evaluating such join queries.
- Xpath leashed. http://www-db-out.bell-labs.com/user/benedikt/papers/leashed.ps.gz.Google Scholar
- D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J. H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. B. Zdonik. The design of the borealis stream processing engine. In Proc. CIDR, pages 277--289, 2005.Google Scholar
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. 1995. Google ScholarDigital Library
- M. K. Aguilera, R. E. Strom, D. C. Sturman, M. Astley, and T. D. Chandra. Matching events in a content-based subscription system. In Proc. PODC, pages 53--61, 1999. Google ScholarDigital Library
- M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proc. VLDB, pages 53--64, 2000. Google ScholarDigital Library
- C. Barton, P. Charles, M. Fontoura, V. Josifovski, D. Goyal, and M. Raghavachari. Streaming xpath processing with forward and backward axes. In Proc. ICDE, 2003.Google ScholarCross Ref
- D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class of data management applications. In Proc. VLDB, 2002. Google ScholarDigital Library
- S. Chakravarthy, V. Krishnaprasad, E. Anwar, and S. K. Kim. Composite events for active databases: Semantics, contexts and detection. In Proc. VLDB, pages 606--617, 1994. Google ScholarDigital Library
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proc. CIDR, 2003.Google Scholar
- Y. Chen, S. Davidson, and Y. Zheng. An efficient xpath query processor for xml streams. In Proc. ICDE, 2006. Google ScholarDigital Library
- Byron Choi. What are real dtds like. 2002.Google Scholar
- A. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. White. Towards expressive publish/subscribe systems. In Proc. EDBT, 2006. Google ScholarDigital Library
- Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. M. Fischer. Path sharing and predicate evaluation for high-performance XML filtering. ACM TODS, 28(4):467--516, 2003. Google ScholarDigital Library
- F. Fabret, H. A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering algorithms and implementation for very fast publish/subscribe. In Proc. SIGMOD, pages 115--126, 2001. Google ScholarDigital Library
- D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. Carey, A. Sundararajan, and G. Agrawal. The bea/xqrl streaming xquery processor. In Proc. VLDB, 2003. Google ScholarDigital Library
- X. Gong, W. Qian, Y. Yan, and A. Zhou. Bloom filter-based xml packets filtering for millions of path queries. In Proc. ICDE, 2005. Google ScholarDigital Library
- Todd J. Green, Ashish Gupta, Gerome Miklau, Makoto Onizuka, and Dan Suciu. Processing xml streams with deterministic automata and stream indexes. ACM TODS, 29(4):752--788, 2004. Google ScholarDigital Library
- M. Hong, A. Demers, J. Gehrke, C. Koch, M. Riedewald, and W. White. Massively multi-query join processing in publish/subscribe systems. Technical report, Cornell University, 2007. http://techreports.library.cornell.edu.Google Scholar
- J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating window joins over unbounded streams. In Proc. ICDE, 2003.Google ScholarCross Ref
- C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier. Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In Proc. VLDB, pages 228--239, 2004. Google ScholarDigital Library
- XLi and GAgrawal.Efficient evaluation of xquery over stream data.In Proc. VLDB, pages 265--276, 2005. Google ScholarDigital Library
- B. Ludascher, P. Mukhopadhayn, and Y. Papakonstantinou. A transducer-based xml query processor. In Proc. VLDB, 2002. Google ScholarDigital Library
- R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. S. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system. In Proc. CIDR, 2003.Google Scholar
- Ed Jr. Pegg. Graph minor.http://mathworld.wolfram.com/GraphMinor.html.Google Scholar
- F. Peng and S. Chawathe. Xsq: A streaming xpath engine. ACM TODS, 30(2):577--623, 2005. Google ScholarDigital Library
- U. Srivastava and J. Widom. Flexible time management in data stream systems. In Proc. PODS, pages 263--274, 2004. Google ScholarDigital Library
- E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In Proc. SIGMOD, 2006. Google ScholarDigital Library
- A. Yalamanchi, J. Srinivasan, and D. Gawlick. Managing expressions as data in relational database systems. In Proc. CIDR, 2003.Google Scholar
Index Terms
- Massively multi-query join processing in publish/subscribe systems
Recommendations
Efficient XQuery join processing in publish/subscribe systems
ADC '09: Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92Efficient XML filtering has been a fundamental technique in recent Web service and XML publish/subscribe applications. In this paper, we consider the problem of filtering a continuous stream of XML data against a large number of XQuery queries that ...
Query processing of multi-way stream window joins
This paper introduces a class of join algorithms, termed W-join, for joining multiple infinite data streams. W-join addresses the infinite nature of the data streams by joining stream data items that lie within a sliding window and that match a certain ...
Distributed stream join query processing with semijoins
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing ...
Comments