Skip to main content

Processing Exact Results for Windowed Stream Joins in a Memory-Limited System: A Disk-Based, Adaptive Approach

  • Chapter
  • 355 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 7720))

Abstract

We consider the problem of processing exact results for sliding window joins over data streams with limited memory. Existing approaches either, (1) deal with memory limitations by shedding loads, and therefore cannot provide exact or even highly accurate results for sliding window joins over data streams showing time-varying rate of data arrivals, or (2) suffer from large I/O overhead due to random disk flushes and disk-to-disk stages with a stream join, making the approaches inefficient to handle sliding window joins. We provide an Adaptive, Hash-partitioned Exact Window Join (AH-EWJ) algorithm incorporating disk storage as an archive. Our algorithm spills window data onto the disk on a periodic basis, refines the output result by properly retrieving the disk-resident data, maximizes output rate by employing techniques to manage the memory blocks, and continuously adjusting the allocated memory within the stream windows. The problem of managing the window blocks in memory—similar in nature to the caching issue—captures both the temporal and frequency related properties of the stream arrivals. We present a baseline algorithm called Rate-based Progressive Window Joins (RPWJ), which extends an existing algorithm to tune the performance by reducing disk I/O overhead while processing sliding window joins. We provide experimental results demonstrating the performance and effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.B.: Monitoring streams – a new class of data management applications. In: Proc. Int. Conf. on Very Large Databases, VLDB, Hong Kong, China, pp. 215–226 (August 2002)

    Google Scholar 

  2. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Raman, V., Reiss, F., Shah, M.A.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: Proc. Conf. on Innovative Data Systems Research, CIDR (January 2003)

    Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Processing sliding window multi-joins in continuous queries over data streams. In: Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, Madison, Wisconsin, USA, pp. 1–16 (June 2002)

    Google Scholar 

  4. Gedik, B., Wu, K.-L., Yu, P.S., Liu, L.: A load shedding framework and optimizations for m-way windowed stream joins. In: Proc. Int. Conf. on Data Engineering, Istanbul, Turkey, pp. 536–545 (April 2007)

    Google Scholar 

  5. Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: Proc. Int. Conf. on Very Large Databases, VLDB, Toronto, Canada, pp. 324–335 (September 2004)

    Google Scholar 

  6. Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, San Diego, USA, pp. 40–51 (June 2003)

    Google Scholar 

  7. Tatbul, N., Çetintemel, U., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proc. Int. Conf. on Very Large Databases, VLDB, Berlin, Germany, pp. 309–320 (September 2003)

    Google Scholar 

  8. Liu, B., Zhu, Y., Rundensteiner, E.A.: Run-time operator state spilling for memory intensive long-running queries. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Chicago, Illinois, USA, pp. 347–358 (June 2006)

    Google Scholar 

  9. Urhan, T., Franklin, M.J.: XJoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), 7–18 (2000)

    Google Scholar 

  10. Mokbel, M., Liu, M., Aref, W.: Hash-merge-join: A non-blocking join algorithm for producing fast and early join results. In: Proc. Int. Conf. on Data Engineering, pp. 251–263 (2004)

    Google Scholar 

  11. Viglas, S.D., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Proc. Int. Conf. on Very Large Databases, VLDB, Berlin, Germany, pp. 285–296 (September 2003)

    Google Scholar 

  12. Tao, Y., Yiu, M.L., Papadias, D., Hadjieleftheriou, M., Mamoulis, N.: RPJ: Producing fast join results on streams through rate-based optimization. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Baltimore, Maryland, USA, pp. 371–382 (June 2005)

    Google Scholar 

  13. Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proc. Int. Conf. on Parallel and Distributed Information Systems, PDIS, Miami, Florida, USA, pp. 68–77 (December 1991)

    Google Scholar 

  14. Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: Proc. Int. Conf. on Very Large Databases, VLDB, Hong kong, China, pp. 299–310 (August 2002)

    Google Scholar 

  15. Levandoski, J., Khalefa, M.E., Mokbel, M.F.: Permjoin: An efficient algorithm for producing early results in multi-join query plans. In: Proc. Int. Conf. on Data Engineering, Cancun, Mexico, pp. 1433–1435 (2008)

    Google Scholar 

  16. Double Index NEsted-Loop Reactive Join for Result Rate Optimization (2009)

    Google Scholar 

  17. Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proc. Int. Conf. on Data Engineering, Bangalore, India, pp. 341–352 (March 2003)

    Google Scholar 

  18. Ojewole, A., Zhu, Q., Chi Hou, W.: Window join approximation over data streams with important semantics. In: Proc. Int. Conf. on Information and Knowledge Management, CIKM, Virginia, USA, pp. 112–121 (November 2006)

    Google Scholar 

  19. Golab, L., Ozsu, T.: Processing sliding window multi-joins in continuous queries over data streams. In: Proc. Int. Conf. on Very Large Databases, VLDB, Berlin, Germany, pp. 500–511 (September 2003)

    Google Scholar 

  20. Teubner, J., Mueller, R.: How soccer players would do stream joins. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 625–636 (2011)

    Google Scholar 

  21. Chakraborty, A., Singh, A.: A partition-based approach to support streaming updates over persistent data in an active data warehouse. In: Proc. IEEE Int. Symp. on Parallel and Distributed Processing, IPDPS, Rome, Italy, pp. 1–11 (May 2009)

    Google Scholar 

  22. Chakraborty, A., Singh, A.: A Disk-Based, Adaptive Approach to Memory-Limited Computation of Windowed Stream Joins. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010, Part I. LNCS, vol. 6261, pp. 251–260. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  23. Babu, S., Munagala, K., Widom, J., Motwani, R.: Adaptive caching for continuous queries. In: Proc. Int. Conf. on Data Engineering, Tokyo, Japan, pp. 118–129 (April 2005)

    Google Scholar 

  24. Graefe, G.: Query evaluation techniques for large databases. ACM Computing Surveys 25(2), 73–169 (1993)

    Article  Google Scholar 

  25. Motwani, R., Thomas, D.: Caching queues in memory buffers. In: Proc. Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, New Orleans, Louisiana, USA, pp. 541–549 (January 2004)

    Google Scholar 

  26. Wang, M., Ailamaki, A., Faloutsos, C.: Capturing the spatio-temporal behavior of real traffic data. In: IFIP Int. Symp. on Computer Performance Modeling, Measurement and Evaluation, Rome, Italy (September 2002)

    Google Scholar 

  27. Wang, M., Papadimitriou, S., Madhyastha, T., Faloutsos, C., Change, N.H.: Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In: Proc. Int. Conf. on Data Engineering, pp. 507–516 (February 2002)

    Google Scholar 

  28. Gedik, B., Wu, K.-L., Yu, P.S., Liu, L.: Adaptive load shedding for windowed stream joins. In: Proc. Int. Conf. on Information and Knowledge Management, CIKM, Bremen, Germany, pp. 171–178 (November 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chakraborty, A., Singh, A. (2012). Processing Exact Results for Windowed Stream Joins in a Memory-Limited System: A Disk-Based, Adaptive Approach. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems VII. Lecture Notes in Computer Science, vol 7720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35332-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35332-1_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35331-4

  • Online ISBN: 978-3-642-35332-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics