Abstract
Massive data series collections are becoming a reality for virtually every scientific and social domain, and have to be processed and analyzed, in order to extract useful knowledge. Current data series management solutions are ad hoc, requiring huge investments in time and effort, and duplication of effort across different teams. Systems like relational databases, Column Stores, and Array Databases are not a suitable solution either, since none of these systems offers native support for data series. Our vision is to design and develop a generalpurpose Data Series Management System, able to copewith big data series, that is, very large and fast-changing collections of data series, which can be heterogeneous (i.e., originate from disparate domains and thus exhibit very different characteristics), and which can have uncertainty in their values (e.g., due to inherent errors in the measurements). Just like databases abstracted the relational data management problem and offered a black box solution that is now omnipresent, the proposed system will allow analysts that are not experts in data series management, as well as common users, to tap in the goldmine of the massive and ever-growing data series collections they (already) have
- Adhd-200. http://fcon_1000.projects.nitrc.org/ indi/adhd200/, 2011.Google Scholar
- Orleans: Distributed virtual actors for programmability and scalability. MSR-TR-2014-41, 2014.Google Scholar
- Sloan digital sky survey. https://www.sdss3.org/dr10/data_access/volume.php, 2015.Google Scholar
- R. Agrawal, C. Faloutsos, and A. N. Swami. Efficient similarity search in sequence databases. In FODO, 1993. Google ScholarDigital Library
- J. Aßfalg, H. Kriegel, P. Kröger, and M. Renz. Probabilistic similarity search for uncertain time series. In SSDBM, 2009.Google Scholar
- M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. Gray, P. P. Griffiths, W. F. K. III, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson. System R: relational approach to database management. TODS, 1(2):97--137, 1976. Google ScholarDigital Library
- A. Camerra, T. Palpanas, J. Shieh, and E. Keogh. iSAX 2.0: Indexing and mining one billion time series. In ICDM, 2010. Google ScholarDigital Library
- A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. J. Keogh. Beyond one billion time series: indexing and mining very large time series collections with isax2+. KAIS, 39(1):123--151, 2014.Google ScholarDigital Library
- M. Dallachiesa, B. Nushi, K. Mirylenka, and T. Palpanas. Uncertain time-series similarity: Return to the basics. PVLDB, 5(11):1662--1673, 2012. Google ScholarDigital Library
- M. Dallachiesa, T. Palpanas, and I. F. Ilyas. Top-k nearest neighbor search in uncertain data series. PVLDB, 8(1):13--24, 2014. Google ScholarDigital Library
- P. Huijse, P. A. Estévez, P. Protopapas, J. C. Principe, and P. Zegers. Computational intelligence challenges and applications on large-scale astronomical time series ndatabases. IEEE Comp. Int. Mag., 9(3):27--39, 2014. Google ScholarDigital Library
- S. Kadiyala and N. Shiri. A compact multi-resolution index for variable length queries in time series databases. KAIS, 15(2):131--147, 2008. Google ScholarDigital Library
- K. Kashino, G. Smith, and H. Murase. Time-series active search for quick retrieval of audio and video. In ICASSP, 1999. Google ScholarDigital Library
- S. Kashyap and P. Karras. Scalable knn search on vertically stored time series. In KDD, 2011. Google ScholarDigital Library
- A. Lerner and D. Shasha. Aquery: Query language for ordered data, optimization techniques, and experiments. In VLDB, 2003. Google ScholarDigital Library
- H. Liao, J. Han, and J. Fang. Multi-dimensional index on hadoop distributed file system. In NAS, 2010. Google ScholarDigital Library
- J. Lin, R. Khade, and Y. Li. Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst., 39(2), 2012. Google ScholarDigital Library
- T. Palpanas, M. Vlachos, E. J. Keogh, and D. Gunopulos. Streaming time series summarization using userdefined amnesic functions. TKDE, 20(7), 2008. Google ScholarDigital Library
- T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, 2012. Google ScholarDigital Library
- V. Raman, G. K. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Müller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. J. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
- U. Raza, A. Camerra, A. L. Murphy, T. Palpanas, and G. P. Picco. Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng., accepted for publication, 2015.Google ScholarCross Ref
- R. Sadri, C. Zaniolo, A. M. Zarkesh, and J. Adibi. A sequential pattern query language for supporting instant data mining for e-services. In VLDB, 2001. Google ScholarDigital Library
- S. R. Sarangi and K. Murthy. DUST: a generalized notion of similarity between uncertain time series. In KDD, 2010. Google ScholarDigital Library
- D. Shasha. Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull., 22(2):40--46, 1999.Google Scholar
- J. Shieh and E. Keogh. iSAX: Indexing and Mining Terabyte Sized Time Series. KDD, 2008. Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
- M. Stonebraker, P. Brown, A. Poliakov, and S. Raman. The architecture of scidb. In SSDBM, 2011. Google ScholarDigital Library
- Y. Wang, P. Wang, J. Pei, W. Wang, and S. Huang.A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB, 6(10):793--804, 2013. Google ScholarDigital Library
- L. Ye and E. J. Keogh. Time series shapelets: a new primitive for data mining. In KDD, 2009. Google ScholarDigital Library
- M. Yeh, K. Wu, P. S. Yu, and M. Chen. PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In EDBT, 2009. Google ScholarDigital Library
- K. Zoumpatianos, S. Idreos, and T. Palpanas. Indexing for interactive exploration of big data series. In SIGMOD, 2014. Google ScholarDigital Library
- K. Zoumpatianos, Y. Lou, T. Palpanas, and J. Gehrke. Query workloads for data series indexes. In KDD, 2015. Google ScholarDigital Library
Index Terms
- Data Series Management: The Road to Big Sequence Analytics
Recommendations
Big Data Management: Advanced Issues and Approaches
The objective of this article is to provide the advanced issues and approaches of big data management. The literature review indicates the overview of big data management; the aspects of Big Data Analytics BDA; the importance of big data management; the ...
A survey of big data management
The rapid growth of emerging applications and the evolution of cloud computing technologies have significantly enhanced the capability to generate vast amounts of data. Thus, it has become a great challenge in this big data era to manage such voluminous ...
Big data technologies and Management
The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs of big data, ...
Comments