Abstract
Current pattern-detection proposals for streaming data recognize the need to move beyond a simple regular-expression model over strictly ordered input. We continue in this direction, relaxing restrictions present in some models, removing the requirement for ordered input, and permitting stream revisions (modification of prior events). Further, recognizing that patterns of interest in modern applications may change frequently over the lifetime of a query, we support updating of a pattern specification without blocking input or restarting the operator. Our new pattern operator (called AFA) is a streaming adaptation of a non-deterministic finite automaton (NFA) where additional schema-based user-defined information, called a register, is accessible to NFA transitions during execution. AFAs support dynamic patterns, where the pattern itself can change over time. We propose clean order-agnostic pattern-detection semantics for AFAs, with new algorithms that allow a very efficient implementation, while retaining significant expressiveness and supporting native handling of out-of-order input, stream revisions, dynamic patterns, and several optimizations. Experiments on Microsoft StreamInsight show that we achieve event rates of more than 200K events/sec (up to 5x better than simpler schemes). Our dynamic patterns give up to orders-of-magnitude better throughput than solutions such as operator restart, and our other optimizations are very effective, incurring low memory and latency.
- D. Abadi et al. The design of the Borealis stream processing engine. In CIDR, 2005.Google Scholar
- J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman. Efficient pattern matching over event streams. In SIGMOD, 2008. Google ScholarDigital Library
- M. Ali et al. Microsoft CEP Server and Online Behavioral Targeting. In VLDB, 2009. Google ScholarDigital Library
- B. Babcock et al. Models and issues in data stream systems. In PODS, 2002. Google ScholarDigital Library
- S. Babu, U. Srivastava, and J. Widom. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM TODS, 2004. Google ScholarDigital Library
- R. Barga et al. Consistent streaming through time: A vision for event stream processing. In CIDR, 2007.Google Scholar
- B. Chandramouli, J. Goldstein, and D. Maier. On-the-fly progress detection in iterative stream queries. In VLDB, 2009. Google ScholarDigital Library
- Chart Patterns. http://tinyurl.com/6zvzk5.Google Scholar
- Y. Chen et al. Large-scale behavioral targeting. In KDD, 2009. Google ScholarDigital Library
- A. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. White. Towards expressive publish/subscribe systems. In EDBT, 2006. Google ScholarDigital Library
- Y. Diao et al. Path sharing and predicate evaluation for high-performance XML filtering. ACM TODS, 2003. Google ScholarDigital Library
- Y. Diao et al. Capturing data uncertainty in high-volume stream processing. In CIDR, 2009.Google Scholar
- EsperTech. http://esper.codehaus.org/.Google Scholar
- M. Franklin et al. Continuous analytics: Rethinking query processing in a network-effect world. In CIDR, 2009.Google Scholar
- J. Hopcroft and J. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979. Google ScholarDigital Library
- T. Johnson, S. Muthukrishnan, and I. Rozenbaum. Monitoring regular expressions on out-of-order streams. In ICDE, 2007.Google ScholarCross Ref
- M. Liu et al. Sequence pattern query processing over out-of-order event streams. In ICDE, 2009. Google ScholarDigital Library
- D. Maier et al. Semantics of data streams and operators. In International Conference on Database Theory, 2005. Google ScholarDigital Library
- A. Majumder, R. Rastogi, and S. Vanama. Scalable regular expression matching on data streams. In SIGMOD, 2008. Google ScholarDigital Library
- Y. Mei and S. Madden. Zstream: a cost-based query processor for adaptively detecting composite events. In SIGMOD, 2009. Google ScholarDigital Library
- R. Motwani et al. Query processing, approximation, and resource management in a DSMS. In CIDR, 2003.Google Scholar
- Oracle Inc. http://www.oracle.com/.Google Scholar
- E. Ryvkina et al. Revision processing in a stream processing engine: A high-level design. In ICDE, 2006. Google ScholarDigital Library
- U. Srivastava and J. Widom. Flexible time management in data stream systems. In PODS, 2004. Google ScholarDigital Library
- StreamBase Inc. http://www.streambase.com/.Google Scholar
- P. Tucker et al. Exploiting punctuation semantics in continuous data streams. IEEE TKDE, 2003. Google ScholarDigital Library
- S. Viglas and J. Naughton. Rate-based query optimization for streaming information sources. In SIGMOD, 2002. Google ScholarDigital Library
- E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In SIGMOD, 2006. Google ScholarDigital Library
Index Terms
- High-performance dynamic pattern matching over disordered streams
Recommendations
Efficient pattern matching over event streams
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataPattern matching over event streams is increasingly being employed in many areas including financial services, RFIDbased inventory management, click stream analysis, and electronic health systems. While regular expression matching is well studied, ...
Deciding sequentiability of finite-state transducers by finite-state pattern-matching
Implementation and application automataSequentiality (input-side determinism) is a desirable property of finite-state transducers: such transducers are optimal for time efficiency. Not all transducers are sequentiable and those that are may not be sequential. Sequentialization algorithms of ...
Hardware Architecture for High-Performance Regular Expression Matching
This paper presents a bitmap-based hardware architecture for the Glushkov nondeterministic finite automaton (G-NFA), which recognizes a given regular expression. We show that the inductions of the functions needed to construct the G-NFA can be ...
Comments