skip to main content
research-article
Artifacts Available / v1.1

Out-of-Order Sliding-Window Aggregation with Efficient Bulk Evictions and Insertions

Published:01 July 2023Publication History
Skip Abstract Section

Abstract

Sliding-window aggregation is a foundational stream processing primitive that efficiently summarizes recent data. The state-of-the-art algorithms for sliding-window aggregation are highly efficient when stream data items are evicted or inserted one at a time, even when some of the insertions occur out-of-order. However, real-world streams are often not only out-of-order but also bursty, causing data items to be evicted or inserted in larger bulks. This paper introduces a new algorithm for sliding-window aggregation with bulk eviction and bulk insertion. For the special case of single insert and evict, our algorithm matches the theoretical complexity of the best previous out-of-order algorithms. For the case of bulk evict, our algorithm improves upon the theoretical complexity of the best previous algorithm for that case and also outperforms it in practice. For the case of bulk insert, there are no prior algorithms, and our algorithm improves upon the naive approach of emulating bulk insert with a loop over single inserts, both in theory and in practice. Overall, this paper makes high-performance algorithms for sliding window aggregation more broadly applicable by efficiently handling the ubiquitous cases of out-of-order data and bursts.

References

  1. 2022. Citi Bike System Data. https://www.citibikenyc.com/system-data. Retrieved December, 2022.Google ScholarGoogle Scholar
  2. Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips, Zhewei Wei, and Ke Yi. 2012. Mergeable Summaries. In Symposium on Principles of Database Systems (PODS). 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. In Conference on Very Large Data Bases (VLDB) Industrial Track. 734--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Albert Bifet and Ricard Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing. In International Conference on Data Mining (ICDM). 443--448. Google ScholarGoogle ScholarCross RefCross Ref
  5. Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM (CACM) 13, 7 (1970), 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Savong Bou, Hiroyuki Kitagawa, and Toshiyuki Amagasa. 2021. CPiX: RealTime Analytics Over Out-of-Order Data Streams By Incremental Sliding-Window Aggregation. Transactions on Knowledge and Data Engineering (TKDE) Early Access version of 28 January 2021 (2021). Google ScholarGoogle ScholarCross RefCross Ref
  7. Eric Bouillet, Ravi Kothari, Vibhore Kumar, Laurent Mignet, Senthil Nathan, Anand Ranganathan, Deepak S. Turaga, Octavian Udrea, and Olivier Verscheure. 2012. Processing 6 billion CDRs/day: from research to production (experience report). In Conference on Distributed Event-Based Systems (DEBS). 264--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mark R. Brown and Robert E. Tarjan. 1979. A Fast Merging Algorithm. Journal of the ACM (JACM) 26, 2 (April 1979), 211--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and Batch Processing in a Single Engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015), 28--38. http://sites.computer.org/debull/A15dec/p28.pdfGoogle ScholarGoogle Scholar
  10. Thomas Cormen, Charles Leiserson, and Ronald Rivest. 1990. Introduction to Algorithms. MIT Press.Google ScholarGoogle Scholar
  11. Ralf Hinze and Ross Paterson. 2006. Finger trees: a simple general-purpose data structure. Journal of Functional Programming (JFP) 16, 2 (2006), 197--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Martin Hirzel, Scott Schneider, and Buğra Gedik. 2017. SPL: An Extensible Language for Distributed Stream Processing. Transactions on Programming Languages and Systems (TOPLAS) 39, 1 (March 2017), 5:1--5:39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michael Izbicki. 2013. Algebraic Classifiers: A Generic Approach to Fast Cross-Validation, Online Training, and Parallel Training. In International Conference on Machine Learning (ICML). 648--656. http://proceedings.mlr.press/v28/izbicki13.htmlGoogle ScholarGoogle Scholar
  14. Haim Kaplan and Robert E. Tarjan. 1995. Persistent Lists with Catenation via Recursive Slow-down. In Symposium on the Theory of Computing (STOC). 93--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, and Neil Thombre. 2010. Continuous Analytics over Discontinuous Streams. In International Conference on Management of Data (SIGMOD). 1081--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Daan Leijen, Benjamin Zorn, and Leonardo de Moura. 2019. Mimalloc: Free List Sharding in Action. In Asian Symposium on Programming Languages and Systems (APLAS). 244--265. Google ScholarGoogle ScholarCross RefCross Ref
  17. Jin Li, Kristin Tufte, Vladislav Shkapenyuk, Vassilis Papadimos, Theodore Johnson, and David Maier. 2008. Out-of-Order Processing: A New Architecture for High-performance Stream Systems. In Conference on Very Large Data Bases (VLDB). 274--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Adrian Michalke, Philipp M. Grulich, Clemens Lutz, Steffen Zeuch, and Volker Markl. 2021. An Energy-Efficient Stream Join for the Internet of Things. In Workshop on Data Management on New Hardware (DaMoN). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Olga Poppe, Chuan Lei, Lei Ma, Allison Rozet, and Elke A. Rundensteiner. 2021. To Share, or Not to Share Online Event Trend Aggregation Over Bursty Event Streams. In International Conference on Management of Data (SIGMOD). 1452--1464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Marc Seidemann, Nikolaus Glombiewski, Michael Körber, and Bernhard Seeger. 2019. ChronicleDB: A High-Performance Event Store. Transactions on Database Systems (TODS) 44, 4 (Oct. 2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Anatoli U. Shein, Panos K. Chrysanthis, and Alexandros Labrinidis. 2017. FlatFIT: Accelerated Incremental Sliding-Window Aggregation for Real-Time Analytics. In Conference on Scientific and Statistical Database Management (SSDBM). 5.1--5.12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2019. Optimal and General Out-of-Order Sliding-Window Aggregation. In Conference on Very Large Data Bases (VLDB). 1167--1180. http://www.vldb.org/pvldb/vol12/p1167-tangwongsan.pdfGoogle ScholarGoogle Scholar
  23. Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2021. In-Order Sliding-Window Aggregation in Worst-Case Constant Time. Journal on Very Large Data Bases (VLDB J.) 30 (June 2021), 933--957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2023. Out-of-Order Sliding-Window Aggregation with Efficient Bulk Evictions and Insertions (Extended Version). https://arxiv.org/abs/2307.11210.Google ScholarGoogle Scholar
  25. Georgios Theodorakis, Alexandros Koliousis, Peter R. Pietzuch, and Holger Pirk. 2018. Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation. In Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS). 34--41. http://adms-conf.org/2018-camera-ready/SIMDWindowPaper_ADMS'18.pdfGoogle ScholarGoogle Scholar
  26. Georgios Theodorakis, Alexandros Koliousis, Peter R. Pietzuch, and Holger Pirk. 2020. LightSaber: Efficient Window Aggregation on Multi-core Processors. In International Conference on Management of Data (SIGMOD). 2,505--2,521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Georgios Theodorakis, Peter R. Pietzuch, and Holger Pirk. 2020. SlideSide: A fast Incremental Stream Processing Algorithm for Multiple Queries. In Conference on Extending Database Technology (EDBT). 435--438. https://openproceedings.org/2020/conf/edbt/paper_337.pdfGoogle ScholarGoogle Scholar
  28. Jonas Traub, Philipp Grulich, Alejandro Rodriguez Cuellar, Sebastian Bres, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2019. Efficient Window Aggregation with General Stream Slicing. In Conference on Extending Database Technology (EDBT). 97--108. Google ScholarGoogle ScholarCross RefCross Ref
  29. Alvaro Villalba, Josep Lluis Berral, and David Carrera. 2019. Constant-Time Sliding Window Framework with Reduced Memory Footprint and Efficient Bulk Evictions. Transactions on Parallel and Distributed Systems (TPDS) 30, 3 (May 2019), 486--500. Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)1

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader