Abstract
We focus on stream join optimization by exploiting the constraints that are dynamically embedded into data streams to signal the end of transmitting certain attribute values. These constraints are called punctuations. Our stream join operator, PJoin, is able to remove no-longer-useful data from the state in a timely manner based on punctuations, thus reducing memory overhead and improving the efficiency of probing. We equip PJoin with several alternate strategies for purging the state and for propagating punctuations to benefit down-stream operators. We also present an extensive experimental study to explore the performance gains achieved by purging state as well as the trade-off between different purge strategies. Our experimental results of comparing the performance of PJoin with XJoin, a stream join operator without a constraint-exploiting mechanism, show that PJoin significantly outperforms XJoin with regard to both memory overhead and throughput.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. VLDB Journal 12(2), 120–139 (2003)
Arasu, A., Babcock, B., Babu, S., McAlister, J., Widom, J.: Characterizing memory requirements for queries over continuous data streams. In: PODS, June 2002, pp. 221–232 (2002)
Babu, S., Widom, J.: Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical report, Stanford Univ. (November 2002)
Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams - a new class of data management applications. In: VLDB, August 2002, pp. 215–226 (2002)
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M., Hellerstein, J., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: CIDR, January 2003, pp. 269–280 (2003)
Chen, J., DeWitt, D., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: ACM SIGMOD, June 2002, pp. 379–390 (2002)
Ding, L., Rundensteiner, E.A., Heineman, G.T.: MJoin: A metadata-aware stream join operator. In: DEBS (June 2003)
Golab, L., Ozsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: VLDB, September 2003, pp. 500–511 (2003)
Haas, P., Hellerstein, J.: Ripple joins for online aggregation. In: ACM SIGMOD, June 1999, pp. 287–298 (1999)
Hammad, M.A., Franklin, M.J., Aref, W.G., Elmagarmid, A.K.: Scheduling for shared window joins over data streams. In: VLDB, September 2003, pp. 297–308 (2003)
Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.: Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin 23(2), 7–18 (2000)
Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: ACM SIGMOD, pp. 299–310 (1999)
Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: ICDE, March 2003, pp. 341–352 (2003)
Madden, S., Franklin, M.: Fjording the stream: An architecture for queries over streaming sensor data. In: ICDE, February 2002, pp. 555–566 (2002)
Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: ACM SIGMOD, June 2002, pp. 49–60 (2002)
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, resource management, and approximation in a data stream management system. In: CIDR, January 2003, pp. 245–256 (2003)
Su, H., Jian, J., Rundensteiner, E.A.: Raindrop: A uniform and layered algebraic framework for XQueries on XML streams. In: CIKM, September 2003, pp. 279–286 (2003)
Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering 15(3), 555–568 (2003)
Urhan, T., Franklin, M.: XJoin: A reactively scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), 27–33 (2000)
Urhan, T., Franklin, M.J.: Dynamic pipeline scheduling for improving interactive query performance. In: VLDB, September 2001, pp. 501–510 (2001)
Viglas, S., Naughton, J., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information. In: VLDB, September 2003, pp. 285–296 (2003)
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel mainmemory environment. Distributed and Parallel Databases 1(1), 103–128 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ding, L., Mehta, N., Rundensteiner, E.A., Heineman, G.T. (2004). Joining Punctuated Streams. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-24741-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21200-3
Online ISBN: 978-3-540-24741-8
eBook Packages: Springer Book Archive