research-article

Open Access

Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics

Authors:
Wei Cao

Zhejiang University & Alibaba Group, Hangzhou, China

Zhejiang University & Alibaba Group, Hangzhou, China
View Profile

,
Yusong Gao

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Feifei Li

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Sheng Wang

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Bingchen Lin

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Ke Xu

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Xiaojie Feng

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Yucong Wang

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Zhenjun Liu

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Gejin Zhang

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataJune 2020Pages 739–753https://doi.org/10.1145/3318464.3386136

Published:31 May 2020Publication History

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 739–753

ABSTRACT

With the increasing demand for real-time system monitoring and tracking in various contexts, the amount of time-stamped event data grows at an astonishing rate. Analytics on time-stamped events must be real time and the aggregated results need to be accurate even when data arrives out of order. Unfortunately, frequent occurrences of out-of-order data will significantly slow down the processing, and cause a large delay in the query response. Timon is a timestamped event database that aims to support aggregations and handle late arrivals both correctly (i.e., upholding the exactly-once semantics) and efficiently. Our insight is that a broad range of applications can be implemented with data structures and corresponding operators that satisfy associative and commutative properties. Records arriving after the low watermark are appended to Timon directly, allowing aggregations to be performed lazily. To improve query efficiency, Timon maintains a TS-LSM-Tree, which keeps the most recent data in memory and contains a time-partitioning tree on disk for high-volume data accumulated over long time span. Besides, Timon supports materialized aggregation views and correlation analysis across multiple streams. Timon has been successfully deployed at Alibaba Cloud and is a critical building block for Alibaba cloud's continuous monitoring and anomaly analysis infrastructure.

Supplemental Material

3318464.3386136.mp4

mp4

110.6 MB

Download

References

T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. Millwheel: Fault-tolerant stream processing at internet scale. PVLDB, 6(11):1033--1044, 2013.Google ScholarDigital Library
T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Ferná ndez-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB, 8(12):1792--1803, 2015.Google ScholarDigital Library
AlibabaCloud. Loghub. https://www.alibabacloud.com/product/log-service.Google Scholar
AlibabaCloud. Polardb. https://www.alibabacloud.com/products/apsaradb-for-polardb.Google Scholar
AlibabaCloud. Rds. https://www.alibabacloud.com/product/apsaradb-for-rds-mysql.Google Scholar
M. P. Andersen and D. E. Culler. Btrdb: Optimizing storage system design for timeseries processing. In FAST, pages 39--52, 2016.Google Scholar
Apache. Cassandra. http://cassandra.apache.org/, 2008.Google Scholar
Apache. Hbase. https://hbase.apache.org/, 2008.Google Scholar
Apache. Kafka. https://kafka.apache.org/, 2011.Google Scholar
Apache. Opentsdb. http://opentsdb.net/, 2011.Google Scholar
Apache. Storm. https://storm.apache.org/, 2017.Google Scholar
AWS. Kinesis. https://aws.amazon.com/kinesis/.Google Scholar
O. Boykin, S. Ritchie, I. O'Connell, and J. Lin. Summingbird: A framework for integrating batch and online mapreduce computations. Proceedings of the VLDB Endowment, 7(13):1441--1451, 2014.Google ScholarDigital Library
W. Cao, Y. Gao, B. Lin, X. Feng, Y. Xie, X. Lou, and P. Wang. Tcprt: Instrument and diagnostic analysis system for service quality of cloud databases at massive scale in real-time. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 615--627, New York, NY, USA, 2018. ACM.Google ScholarDigital Library
W. Cao, Y. Liu, Z. Cheng, N. Zheng, W. Li, W. Wu, L. Ouyang, P. Wang, Y. Wang, R. Kuan, et al. $$POLARDB$$ meets computational storage: Efficiently support analytical workloads in cloud-native relational database. In 18th $$USENIX$$ Conference on File and Storage Technologies ($$FAST$$ 20), pages 29--41, 2020.Google Scholar
W. Cao, Z. Liu, P. Wang, S. Chen, C. Zhu, S. Zheng, Y. Wang, and G. Ma. Polarfs: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proceedings of the VLDB Endowment, 11(12):1849--1862, 2018.Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.Google ScholarDigital Library
Facebook. Beringei. https://github.com/facebookarchive/beringei, 2017.Google Scholar
P. Flajolet, E. Fusy, O. Gandouet, and et al. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In AOFA, 2007.Google Scholar
influxdata. Influxdb. https://github.com/influxdata/influxdb, 2013.Google Scholar
M. Kiran, P. Murphy, I. Monga, J. Dugan, and S. S. Baveja. Lambda architecture for cost-effective batch and speed big data processing. In IEEE Big Data, pages 2785--2792, 2015.Google ScholarDigital Library
S. A. Noghabi, K. Paramasivam, Y. Pan, N. Ramesh, J. Bringhurst, I. Gupta, and R. H. Campbell. Samza: stateful scalable stream processing at linkedin. Proceedings of the VLDB Endowment, 10(12):1634--1645, 2017.Google ScholarDigital Library
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (lsm-tree). Acta Informatica, 33(4):351--385, 1996.Google ScholarDigital Library
T. Pelkonen, S. Franklin, J. Teller, P. Cavallaro, Q. Huang, J. Meza, and K. Veeraraghavan. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment, 8(12):1816--1827, 2015.Google ScholarDigital Library
M. Welsh, D. Culler, and E. Brewer. Seda: an architecture for well-conditioned, scalable internet services. In ACM SIGOPS Operating Systems Review, volume 35, pages 230--243. ACM, 2001.Google ScholarDigital Library
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 2--2. USENIX Association, 2012.Google ScholarDigital Library

Index Terms

Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics
1. Information systems
  1. Data management systems
2. Networks
  1. Network services
    1. Cloud computing

Recommendations

Big data analytics in Cloud computing: an overview
Abstract
Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable ...
Read More
Big Data Analytics
Read More
Issues in complex event processing

Research issues in complex event processing (CEP) emphasizing on query optimization.Cover deterministic probabilistic models, centralized distributed settings.Issues for CEP optimization over Big Data enabling cloud computing platforms.Predictive ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA
Copyright © 2020 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2020
Check for updates
Author Tags
cloud computing
data processing system
out-of-order events
real-time data analytics
time series database
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 1,488
  Total Downloads
- Downloads (Last 12 months)248
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Big data analytics in Cloud computing: an overview

Big Data Analytics

Issues in complex event processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Big data analytics in Cloud computing: an overview

Big Data Analytics

Issues in complex event processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media