skip to main content
column

Data Series Management: The Road to Big Sequence Analytics

Published:12 August 2015Publication History
Skip Abstract Section

Abstract

Massive data series collections are becoming a reality for virtually every scientific and social domain, and have to be processed and analyzed, in order to extract useful knowledge. Current data series management solutions are ad hoc, requiring huge investments in time and effort, and duplication of effort across different teams. Systems like relational databases, Column Stores, and Array Databases are not a suitable solution either, since none of these systems offers native support for data series. Our vision is to design and develop a generalpurpose Data Series Management System, able to copewith big data series, that is, very large and fast-changing collections of data series, which can be heterogeneous (i.e., originate from disparate domains and thus exhibit very different characteristics), and which can have uncertainty in their values (e.g., due to inherent errors in the measurements). Just like databases abstracted the relational data management problem and offered a black box solution that is now omnipresent, the proposed system will allow analysts that are not experts in data series management, as well as common users, to tap in the goldmine of the massive and ever-growing data series collections they (already) have

References

  1. Adhd-200. http://fcon_1000.projects.nitrc.org/ indi/adhd200/, 2011.Google ScholarGoogle Scholar
  2. Orleans: Distributed virtual actors for programmability and scalability. MSR-TR-2014-41, 2014.Google ScholarGoogle Scholar
  3. Sloan digital sky survey. https://www.sdss3.org/dr10/data_access/volume.php, 2015.Google ScholarGoogle Scholar
  4. R. Agrawal, C. Faloutsos, and A. N. Swami. Efficient similarity search in sequence databases. In FODO, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Aßfalg, H. Kriegel, P. Kröger, and M. Renz. Probabilistic similarity search for uncertain time series. In SSDBM, 2009.Google ScholarGoogle Scholar
  6. M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. Gray, P. P. Griffiths, W. F. K. III, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson. System R: relational approach to database management. TODS, 1(2):97--137, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Camerra, T. Palpanas, J. Shieh, and E. Keogh. iSAX 2.0: Indexing and mining one billion time series. In ICDM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. J. Keogh. Beyond one billion time series: indexing and mining very large time series collections with isax2+. KAIS, 39(1):123--151, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Dallachiesa, B. Nushi, K. Mirylenka, and T. Palpanas. Uncertain time-series similarity: Return to the basics. PVLDB, 5(11):1662--1673, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Dallachiesa, T. Palpanas, and I. F. Ilyas. Top-k nearest neighbor search in uncertain data series. PVLDB, 8(1):13--24, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Huijse, P. A. Estévez, P. Protopapas, J. C. Principe, and P. Zegers. Computational intelligence challenges and applications on large-scale astronomical time series ndatabases. IEEE Comp. Int. Mag., 9(3):27--39, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Kadiyala and N. Shiri. A compact multi-resolution index for variable length queries in time series databases. KAIS, 15(2):131--147, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Kashino, G. Smith, and H. Murase. Time-series active search for quick retrieval of audio and video. In ICASSP, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kashyap and P. Karras. Scalable knn search on vertically stored time series. In KDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Lerner and D. Shasha. Aquery: Query language for ordered data, optimization techniques, and experiments. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Liao, J. Han, and J. Fang. Multi-dimensional index on hadoop distributed file system. In NAS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Lin, R. Khade, and Y. Li. Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst., 39(2), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Palpanas, M. Vlachos, E. J. Keogh, and D. Gunopulos. Streaming time series summarization using userdefined amnesic functions. TKDE, 20(7), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Raman, G. K. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Müller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. J. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. U. Raza, A. Camerra, A. L. Murphy, T. Palpanas, and G. P. Picco. Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng., accepted for publication, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  22. R. Sadri, C. Zaniolo, A. M. Zarkesh, and J. Adibi. A sequential pattern query language for supporting instant data mining for e-services. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. R. Sarangi and K. Murthy. DUST: a generalized notion of similarity between uncertain time series. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Shasha. Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull., 22(2):40--46, 1999.Google ScholarGoogle Scholar
  25. J. Shieh and E. Keogh. iSAX: Indexing and Mining Terabyte Sized Time Series. KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented DBMS. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Stonebraker, P. Brown, A. Poliakov, and S. Raman. The architecture of scidb. In SSDBM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Wang, P. Wang, J. Pei, W. Wang, and S. Huang.A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB, 6(10):793--804, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Ye and E. J. Keogh. Time series shapelets: a new primitive for data mining. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Yeh, K. Wu, P. S. Yu, and M. Chen. PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Zoumpatianos, S. Idreos, and T. Palpanas. Indexing for interactive exploration of big data series. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Zoumpatianos, Y. Lou, T. Palpanas, and J. Gehrke. Query workloads for data series indexes. In KDD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data Series Management: The Road to Big Sequence Analytics
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGMOD Record
      ACM SIGMOD Record  Volume 44, Issue 2
      June 2015
      56 pages
      ISSN:0163-5808
      DOI:10.1145/2814710
      Issue’s Table of Contents

      Copyright © 2015 Copyright is held by the owner/author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2015

      Check for updates

      Qualifiers

      • column

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader