Nested spatial data structures for optimal indexing of LiDAR data

https://doi.org/10.1016/j.isprsjprs.2022.11.018Get rights and content

Highlights

  • Flexible spatial data structure for indexing huge LiDAR datasets.

  • Optimal spatial indexing through ad-hoc nesting of data structures.

  • Can combine grids, quadtrees, octrees, kd-trees and other structures.

  • Suitable for mixing data from ALS, TLS and other sources in the same dataset.

Abstract

In this paper we present a flexible framework for creating spatial data structures to manage LiDAR point clouds in the context of spatial big data. For this purpose, standard approaches typically include the use of a single data structure to index point clouds. Some of them use a hybrid two-tier solution to optimize specific application purposes such as storage or rendering. In this article we introduce a meta-structure that can have unlimited depth and a custom, user-defined combination of nested structures, such as grids, quadtrees, octrees, or kd-trees. With our approach, the out-of-core indexing of point clouds can be adapted to different types of datasets, taking into account the spatial distribution of the data. Therefore, the most suitable spatial indexing can be achieved for any type of dataset, from small TLS-based scenes to planetary-scale ALS-based scenes. This approach allows us to work with overlapping datasets of different resolutions from different acquisition technologies in the same structure.

Introduction

LiDAR scanning is one of the most powerful tools in fields such as civil engineering, surveying, archaeology or environmental engineering. Due to the evolution, popularization and cheapening of LiDAR sensors, nowadays it is very common to have large datasets, especially in the context of the so-called Geospatial Big Data. Handling this amount of data brings with it a number of problems, related to its storage, transmission, organization, visualization, edition and analysis. The situation worsens when it comes to managing data obtained from different sources with different precision levels, such as ALS (Airborne Laser Scanning) and TLS (Terrestrial Laser Scanning), managing data acquired at different times, or having different layers of information. Thus, the data structure used to index all this spatial data is paramount in order to efficiently access the desired information for any processing task.

Typically, LiDAR data is stored as files in a common format, such as LAS/LAZ. The usual way to define the work area is by means of a polygon or a bounding box. As data is usually distributed in tiles (Boehm, 2014), it may be necessary to process much more data than actually affected by the operation. Furthermore, datasets eventually take up terabytes of information scattered across diverse storage media in hundreds of files. This makes it difficult to perform spatial and temporal analyses and data editing. To alleviate this, it is common practice to have different levels of detail of the dataset, by using sub-sampling. In contrast, some approaches use a hierarchical data structure to have a more efficient spatial distribution of the data (Deibe et al., 2019, Huang et al., 2020, Lu et al., 2019, Schuetz, 2016, Schütz et al., 2020), which implies having levels of details (LODs) in an intrinsic way. However, some problems remain, especially when combining multiple datasets generated with different scanning precision or different attributes. This is common in the context of Geospatial Big Data, and therefore, specific approaches and techniques must be applied (Deng et al., 2019, Evans et al., 2014, Lee and Kang, 2015, Pääkkönen and Pakkala, 2015).

The Open Geospatial Consortium (OGC) Testbed-14, in the Point Cloud Data Handling Engineering Report (Boehler et al., 2018), identifies some applications and tasks that need spatially accelerated accesses, such as modeling and simulation, measurement, feature extraction, change detection or bathymetric exploration among others. Therefore, the mere data storage is not the only important issue, but also efficient data search, retrieval and editing. The main problem with standard spatial indexing solutions is their poor adaptation to some datasets, due to the spatial distribution of the data. For example, indexing large terrains scanned with ALS using an octree will produce an excessive depth of the structure, due to its inadequate adaptation to mostly flat geometry. On the other hand, a quadtree will perform poorly when the dataset contains a large number of vertical structures (see Fig. 1), as happens with urban scenarios where data from the interior of buildings is included through TLS. Unfortunately past experience has shown that the appropriate spatial structure is highly dependent on the type of dataset (Poux, 2019). The main problem appears when working with data coming from various sources (ALS, TLS, mobile mapping, etc.), multiple accuracy levels, and especially several types of spatial distributions, all in the same dataset and eventually the same applications.

The objective of this work is the definition of a flexible spatial meta-structure for the managing of LiDAR point clouds in the context of Big Data at a variable scale, ranging from small TLS-based scenes to planet-scale ALS-based datasets. This structure serves as a spatial indexer, and is used for partitioning the data in a way that speeds up spatial queries and data accesses. It allows a user-defined combination of nested structures with an unlimited depth, such as grids, quadtrees, octrees, kd-trees and others. With this approach, the out-of-core indexing of point clouds can be optimized to a given dataset, by adapting the spatial partition to the morphology and distribution of points in space. For example, large portions of land can be treated efficiently with 2D-based spatial indexing, while vertical structures such as buildings can be optimally partitioned and indexed in 3D.

A significant advantage of mixing multiple data structures is the optimal adaptation to levels of detail with a certain purpose, for example reserving a type of spatial structure only to index most dense zones (e.g. buildings). With this approach, it is easier to adjust the level of detail of the environment, which improves all dataset processing tasks, including editing, data analysis and visualization (Poux, 2019). This also improves the integration of data obtained from TLS scans together with the more dispersed data obtained with ALS.

The rest of the paper is organized as follows. Section 2 reviews the most relevant work related to point cloud storage and spatial indexing. Section 3 presents the description of the proposed approach of nested spatial data structures, which is tested and analyzed in Section 4. Finally, Section 5 presents the conclusions and outlines future work.

Section snippets

Previous works

Point cloud models are a widely used resource in all types of decision-making processes (Poux, 2019). With the rapid evolution of 3D data capture hardware, larger and larger datasets are becoming available, which pose a major challenge for efficient data management and processing (Poux, 2019). In addition to the increase in the volume of spatial information captured, there is also a great variety in its scale, from small objects to large areas of the territory (Bräunl, 2020). This allows us to

Nested spatial data structures schema

Our proposal consists of the creation of ad-hoc combinations of nested data structures to take full advantage of the spatial distribution of each point cloud dataset. The nesting of data structures is a problem that has a certain complexity. In this section we present a framework for achieving that goal. The particularities of each type of spatial structure, such as the grid, the octree or the kd-tree, are outside the scope of this work. Here we will focus on how to adapt them to make a

Results and discussion

In this section we present a comparison of several spatial data structures built with our framework. Two of them, the quadtree and the octree, are well-known basic structures (see Fig. 2), and have been selected to highlight the differences in terms of performance and storage footprint compared to standard solutions. The other structure follows the same schema presented in the example of Fig. 3, a 2D grid of quadtrees of 3D grids of octrees, which is well suited for urban scenes. The 2D grid

Conclusions and future work

In this paper we have presented a new approach for defining an ad-hoc nested spatial data structure for indexing LiDAR data. Several basic spatial data structures can be combined and nested in order to adapt the indexing of the data to its spatial distribution, ranging from full ALS datasets to high-precision small-scale TLS datasets, or the mixing of both. We have also presented some guidelines for choosing the correct combination of data structures, depending on the nature of the data and the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (44)

  • BaertJ. et al.

    Out-of-core construction of sparse voxel octrees: Out-of-core construction of sparse voxel octrees

    Comput. Graph. Forum

    (2014)
  • BoehlerW. et al.

    OGC Testbed-14: Point Cloud Data Handling Engineering ReportTechnical Report

    (2018)
  • Boehm, J., 2014. File-centric Organization of large LiDAR Point Clouds in a Big Data context. In: Workshop on...
  • BoehmJ. et al.

    Sideloading ingestion of large point clouds into the Apache spark bid data engine

    ISPRS - Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.

    (2016)
  • BräunlT.

    Lidar sensors

  • CaoC. et al.

    3D point cloud compression: A survey

  • City of New YorkC.

    Topobathymetric LiDAR data

    (2017)
  • DeibeD. et al.

    Big data storage technologies: a case study for web-based LiDAR visualization

  • DeibeD. et al.

    Supporting multi-resolution out-of-core rendering of massive LiDAR point clouds through non-redundant data structures

    Int. J. Geogr. Inf. Sci.

    (2019)
  • DengX. et al.

    Geospatial big data: New paradigm of remote sensing applications

    IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.

    (2019)
  • EvansM.R. et al.

    Spatial big data. case studies on volume, velocity, and varitety

  • GongJ. et al.

    An efficient point cloud management method based on a 3D R-tree

    Photogramm. Eng. Remote Sens.

    (2012)
  • Cited by (0)

    This result is part of the research project RTI2018-099638-B-I00 funded by MCIN/AEI/10.13039/501100011033/ and ERDF funds “A way of doing Europe”. Also, the work has been funded by the Spanish Ministry of Science, Innovation and Universities via a doctoral grant to the second author (FPU19/00100), and the University of Jaén (via ERDF funds) through the research project 1265116/2020.

    View full text