column

Data Series Management: The Road to Big Sequence Analytics

Author:
Themis Palpanas

Paris Descartes University

Paris Descartes University
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 44 Issue 2June 2015pp 47–52https://doi.org/10.1145/2814710.2814719

Published:12 August 2015Publication History

ACM SIGMOD Record

Abstract

Massive data series collections are becoming a reality for virtually every scientific and social domain, and have to be processed and analyzed, in order to extract useful knowledge. Current data series management solutions are ad hoc, requiring huge investments in time and effort, and duplication of effort across different teams. Systems like relational databases, Column Stores, and Array Databases are not a suitable solution either, since none of these systems offers native support for data series. Our vision is to design and develop a generalpurpose Data Series Management System, able to copewith big data series, that is, very large and fast-changing collections of data series, which can be heterogeneous (i.e., originate from disparate domains and thus exhibit very different characteristics), and which can have uncertainty in their values (e.g., due to inherent errors in the measurements). Just like databases abstracted the relational data management problem and offered a black box solution that is now omnipresent, the proposed system will allow analysts that are not experts in data series management, as well as common users, to tap in the goldmine of the massive and ever-growing data series collections they (already) have

References

Adhd-200. http://fcon_1000.projects.nitrc.org/ indi/adhd200/, 2011.Google Scholar
Orleans: Distributed virtual actors for programmability and scalability. MSR-TR-2014-41, 2014.Google Scholar
Sloan digital sky survey. https://www.sdss3.org/dr10/data_access/volume.php, 2015.Google Scholar
R. Agrawal, C. Faloutsos, and A. N. Swami. Efficient similarity search in sequence databases. In FODO, 1993. Google ScholarDigital Library
J. Aßfalg, H. Kriegel, P. Kröger, and M. Renz. Probabilistic similarity search for uncertain time series. In SSDBM, 2009.Google Scholar
M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. Gray, P. P. Griffiths, W. F. K. III, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson. System R: relational approach to database management. TODS, 1(2):97--137, 1976. Google ScholarDigital Library
A. Camerra, T. Palpanas, J. Shieh, and E. Keogh. iSAX 2.0: Indexing and mining one billion time series. In ICDM, 2010. Google ScholarDigital Library
A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. J. Keogh. Beyond one billion time series: indexing and mining very large time series collections with isax2+. KAIS, 39(1):123--151, 2014.Google ScholarDigital Library
M. Dallachiesa, B. Nushi, K. Mirylenka, and T. Palpanas. Uncertain time-series similarity: Return to the basics. PVLDB, 5(11):1662--1673, 2012. Google ScholarDigital Library
M. Dallachiesa, T. Palpanas, and I. F. Ilyas. Top-k nearest neighbor search in uncertain data series. PVLDB, 8(1):13--24, 2014. Google ScholarDigital Library
P. Huijse, P. A. Estévez, P. Protopapas, J. C. Principe, and P. Zegers. Computational intelligence challenges and applications on large-scale astronomical time series ndatabases. IEEE Comp. Int. Mag., 9(3):27--39, 2014. Google ScholarDigital Library
S. Kadiyala and N. Shiri. A compact multi-resolution index for variable length queries in time series databases. KAIS, 15(2):131--147, 2008. Google ScholarDigital Library
K. Kashino, G. Smith, and H. Murase. Time-series active search for quick retrieval of audio and video. In ICASSP, 1999. Google ScholarDigital Library
S. Kashyap and P. Karras. Scalable knn search on vertically stored time series. In KDD, 2011. Google ScholarDigital Library
A. Lerner and D. Shasha. Aquery: Query language for ordered data, optimization techniques, and experiments. In VLDB, 2003. Google ScholarDigital Library
H. Liao, J. Han, and J. Fang. Multi-dimensional index on hadoop distributed file system. In NAS, 2010. Google ScholarDigital Library
J. Lin, R. Khade, and Y. Li. Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst., 39(2), 2012. Google ScholarDigital Library
T. Palpanas, M. Vlachos, E. J. Keogh, and D. Gunopulos. Streaming time series summarization using userdefined amnesic functions. TKDE, 20(7), 2008. Google ScholarDigital Library
T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, 2012. Google ScholarDigital Library
V. Raman, G. K. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Müller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. J. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
U. Raza, A. Camerra, A. L. Murphy, T. Palpanas, and G. P. Picco. Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng., accepted for publication, 2015.Google ScholarCross Ref
R. Sadri, C. Zaniolo, A. M. Zarkesh, and J. Adibi. A sequential pattern query language for supporting instant data mining for e-services. In VLDB, 2001. Google ScholarDigital Library
S. R. Sarangi and K. Murthy. DUST: a generalized notion of similarity between uncertain time series. In KDD, 2010. Google ScholarDigital Library
D. Shasha. Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull., 22(2):40--46, 1999.Google Scholar
J. Shieh and E. Keogh. iSAX: Indexing and Mining Terabyte Sized Time Series. KDD, 2008. Google ScholarDigital Library
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
M. Stonebraker, P. Brown, A. Poliakov, and S. Raman. The architecture of scidb. In SSDBM, 2011. Google ScholarDigital Library
Y. Wang, P. Wang, J. Pei, W. Wang, and S. Huang.A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB, 6(10):793--804, 2013. Google ScholarDigital Library
L. Ye and E. J. Keogh. Time series shapelets: a new primitive for data mining. In KDD, 2009. Google ScholarDigital Library
M. Yeh, K. Wu, P. S. Yu, and M. Chen. PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In EDBT, 2009. Google ScholarDigital Library
K. Zoumpatianos, S. Idreos, and T. Palpanas. Indexing for interactive exploration of big data series. In SIGMOD, 2014. Google ScholarDigital Library
K. Zoumpatianos, Y. Lou, T. Palpanas, and J. Gehrke. Query workloads for data series indexes. In KDD, 2015. Google ScholarDigital Library

Index Terms

Data Series Management: The Road to Big Sequence Analytics
1. Information systems
  1. Information retrieval
    1. Document representation

Index terms have been assigned to the content through auto-classification.

Recommendations

Big Data Management: Advanced Issues and Approaches

The objective of this article is to provide the advanced issues and approaches of big data management. The literature review indicates the overview of big data management; the aspects of Big Data Analytics BDA; the importance of big data management; the ...
Read More
A survey of big data management

The rapid growth of emerging applications and the evolution of cloud computing technologies have significantly enhanced the capability to generate vast amounts of data. Thus, it has become a great challenge in this big data era to manage such voluminous ...
Read More
Big data technologies and Management

The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs of big data, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 44, Issue 2
June 2015
56 pages
ISSN:0163-5808
DOI:10.1145/2814710
Editors:
Yanlei Diao
University of Massachusetts Amherst
,
Pablo Barceló
Universidad de Chile
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Marco Brambilla
Politecnico di Milano
,
Chee Yong Chan
National University of Singapore
,
Rada Chirkova
North Carolina State University
,
Anastasios Kementsietsidis
Google Research
,
Olga Papaemmanoui
Brandeis Univesity
,
Aditya Parameswaran
University of Illinois
,
Anish Das Sarma
Google Research
,
Alkis Simitsis
HP Labs
,
Nesime Tatbul
ETH Zurich
,
Marianne Winslett
University of Illinois
,
Jun Yang
Duke University
Issue’s Table of Contents
Copyright © 2015 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2015
Check for updates
Qualifiers
- column
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 75
  Total Citations
  View Citations
- 131
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data Series Management: The Road to Big Sequence Analytics

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Big Data Management: Advanced Issues and Approaches

A survey of big data management

Big data technologies and Management