skip to main content
10.1145/3589132.3625623acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article
Open Access

TrajParquet: A Trajectory-Oriented Column File Format for Mobility Data Lakes

Published:22 December 2023Publication History

ABSTRACT

Columnar data formats, such as Apache Parquet, are increasingly popular nowadays for scalable data storage and querying data lakes, due to compressed storage and efficient data access via data skipping. However, when applied to spatial or spatio-temporal data, advanced solutions are required to go beyond pruning over single attributes and towards multidimensional pruning. Even though there exist solutions for geospatial data, such as GeoParquet and SpatialParquet, they fall short when applied to trajectory data (sequences of spatio-temporal positions). In this paper, we propose TrajParquet, a format for columnar storage of trajectory data, which is highly efficient and scalable. Also, we present a query processing algorithm that supports spatio-temporal range queries over TrajParquet. We evaluate TrajParquet using real-world data sets and in comparison with extensions of GeoParquet and SpatialParquet, suitable for handling spatio-temporal data.

References

  1. Ioannis Alagiannis, Renata Borovica, Miguel Branco, Stratos Idreos, and Anastasia Ailamaki. 2012. NoDB: Efficient query execution on raw data files. In Proc. of SIGMOD. ACM, 241--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of Web-Scale Datasets. Proc. VLDB Endow. 3, 1 (2010), 330--339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bongki Moon, H. V. Jagadish, Christos Faloutsos, and Joel H. Saltz. 2001. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve. IEEE Trans. Knowl. Data Eng. 13, 1 (2001), 124--141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fatemeh Nargesian, Erkang Zhu, Renée J. Miller, Ken Q. Pu, and Patricia C. Arocena. 2019. Data Lake Management: Challenges and Opportunities. Proc. VLDB Endow. 12, 12 (2019), 1986--1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Costas Panagiotakis, Nikos Pelekis, Ioannis Kopanakis, Emmanuel Ramasso, and Yannis Theodoridis. 2012. Segmentation and Sampling of Moving Object Trajectories Based on Representativeness. IEEE Trans. Knowl. Data Eng. 24, 7 (2012), 1328--1343.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cyril Ray, Richard Dréo, Elena Camossi, Anne-Laure Jousselme, and Clément Iphar. 2019. Heterogeneous integrated dataset for Maritime Intelligence, surveillance, and reconnaissance. Data in Brief 25 (2019), 104141.Google ScholarGoogle ScholarCross RefCross Ref
  7. Majid Saeedan and Ahmed Eldawy. 2022. Spatial Parquet: A column file format for geospatial data lakes. In Proc. of SIGSPATIAL. ACM, 102:1--102:4.Google ScholarGoogle Scholar
  8. Paula Ta-Shma, Guy Khazma, Gal Lushi, and Oshrit Feder. 2020. Extensible Data Skipping. In Proc. of IEEE BigData. 372--382.Google ScholarGoogle ScholarCross RefCross Ref
  9. Deepak Vohra. 2016. Apache Parquet. 325--335.Google ScholarGoogle Scholar
  10. Grisha Weintraub, Ehud Gudes, and Shlomi Dolev. 2021. Needle in a haystack queries in cloud data lakes. In Proc. EDBT/ICDT Workshops (CEUR Workshop Proceedings, Vol. 2841). CEUR-WS.org.Google ScholarGoogle Scholar
  11. Yu Zheng, Xing Xie, and Wei-Ying Ma. 2010. GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory. IEEE Data Eng. Bull. 33, 2 (2010), 32--39.Google ScholarGoogle Scholar
  12. Dimitris Zissis, Konstantinos Chatzikokolakis, Giannis Spiliopoulos, and Marios Vodas. 2020. A Distributed Spatial Method for Modeling Maritime Routes. IEEE Access 8 (2020), 47556--47568.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. TrajParquet: A Trajectory-Oriented Column File Format for Mobility Data Lakes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGSPATIAL '23: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems
        November 2023
        686 pages
        ISBN:9798400701689
        DOI:10.1145/3589132

        Copyright © 2023 Owner/Author(s)

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 December 2023

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate220of1,116submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)116
        • Downloads (Last 6 weeks)38

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader