High-dimensional image descriptor matching using highly parallel KD-tree construction and approximate nearest neighbor search

https://doi.org/10.1016/j.jpdc.2019.06.003Get rights and content

Highlights

  • High dimensional datasets: KD-tree construction and P-ANNS are fine-grain parallelized.

  • KD-Tree Structure: for parallel search on the GPU, search points are all located in the leaf nodes.

  • Hybrid Search: linear search (in the leaf node) is traded off with the nonlinear (tree backtracking).

  • Priority Queue: unproductive nodes are avoided, by replacing normal queue with priority queue.

  • Sliding Window: Speed performance of P-ANNS, is achieved with sliding window parallel buffer.

Abstract

To overcome the high computational cost associated with the high-dimensional digital image descriptor matching, this paper presents a set of integrated parallel algorithms for the construction of K-dimensional tree (KD-tree) and P approximate nearest neighbor search (P-ANNS) on the modern massively parallel architectures (MPA). To improve the runtime performance of the P-ANNS, we propose an efficient sliding window for a parallel buffered P-ANNS on KD-tree to mitigate the high cost of global memory accesses. When applied to high dimensional real-world image descriptor datasets, the proposed KD-tree construction and the buffered P-ANNS algorithms are of comparable matching quality to the traditional sequential counterparts on CPU, while outperforming their serial CPU counterparts by speedup factors of up to 17and 163, respectively. The algorithms are also studied for the performance impact factors to obtain the optimal runtime configurations for various datasets. Moreover, we verify the features of the parallel algorithms on typical 3D image matching scenarios. With the classical local image descriptor signature of histograms of orientations (SHOT) datasets, the parallel KD-tree construction and image descriptor matching can achieve up to 11 and 138-fold speedups, respectively.

Introduction

Point descriptors have become popular for obtaining image to image correspondence for 3D reconstruction and object recognition. Searching for the image point descriptors that are similar to the query descriptor, is one of the core techniques in object recognition and surface registration. To increase the feature descriptiveness, the image descriptors, typically, require high dimensionality [9], [16], [23], [25], [27], [35], [36], [37]. However, feature matching in high dimensions demands extremely high computational workload.

There has been a large body of research work in image descriptor matching, exploring the efficient indexing and search algorithms for nearest neighbor search (NNS) that finds the closest point descriptors to a specified number of query point descriptors. A brute force NNS compares a query point to all the N points in the reference set and results in the time complexity of O(N2) [1]. However, it can be made more efficient by using spatial data structures, such as R-tree, B-tree, quad-tree, binary space partitioning (BSP) tree, K-Means tree and K-dimensional tree (KD-tree). These data structures subdivide the space containing all the points into smaller spatial regions, where a hierarchy is imposed on each smaller region in a recursive fashion. The NNS on this hierarchical spatial data structure is generally more efficient since it can prune a large portion of target dataset. KD-tree is shown to be one of the most efficient structures that can work well with high-dimensional image feature matching where the descriptors are correlated [27] and [26]. In 2D/3D point cloud object recognition and perception, both indexing of image descriptors and NNS require fast performance [33], [41]. In these applications, a captured scene changes in a dynamic fashion, and hence, indexing of descriptors in the new scene through the KD-tree construction becomes time critical. Moreover, unlike the typical applications with single point query [19], the NNS in these point cloud applications involves a batch of large number of query points for matching with the points in the model object.

The current trends favor flexibility of heterogeneous computing model that combines multicore CPU and many-core graphical processing unit (GPU). As a typical massively parallel architecture (MPA) complement to CPU, GPU is finding its way beyond graphical processing into general purpose computing. Compute unified device architecture (CUDA) and open computing language (OpenCL) standards exemplify these features [39], [40]. GPU has been widely employed for fast and real-time implementation of 3D image processing algorithms [14], [15], [16], [20], [31], [34]. The inherent massive-parallelism in the KD-tree construction and NNS algorithms can be exploited for implementation on any computing platform that supports fine-grain parallelism.

To mitigate the computational workload associated with high-dimensional digital image descriptor matching, in this paper, we propose two highly parallel algorithms on GPU to accelerate both KD-tree construction and P approximate NNS (P-ANNS). The paper is organized as follows. Section 2 presents the basic concepts of KD-tree construction, NNS/ANNS, and programming model of GPU. Section 3 briefly outlines the related works and highlights our innovations. Section 4 presents the design and implementation details of our massive parallelization on GPU. Section 5 describes the performance optimization considerations. Section 6 presents experimental results. Section 7 concludes the paper.

Section snippets

KD-tree

The KD-tree is a hierarchical spatial partitioning data structure for organizing elements (points) in K-dimensional space RK to perform NNS, with the average and best time complexities of O(NlogN) and O(N), respectively, and a space complexity of O(N) [4]. KD-tree partitions the points in the dataset into axis-aligned cells in a hierarchical fashion, with each cell represented by a node in the tree. Starting from the root, the points are split into two halves by a cutting hyperplane orthogonal

Related work and proposed innovations

As discussed, KD-tree is one of the most efficient structures for high-dimensional image descriptor matching. Therefore, in this paper we primarily address the massive parallelization of KD-tree construction and the ANNS for high-dimensional image descriptor matching.

Highly parallel implementation

This section, describes a salable highly parallel technique to construct a KD-tree from N points in set S, and perform a P-ANNS for all the M query points in the query set Q. We exploit the hierarchical structure of streaming multiprocessor of GPU to achieve high speedup. To facilitate the development of the KD-tree construction with minimal programming effort, we use basic general parallel algorithms and data structures from the Thrust library that comes with high-level abstraction interfaces.

Performance optimization on GPU

Performance of KD-tree construction and NNS algorithms on the GPU are influenced by several factors, including the global memory access coalescing, shared memory bank conflicts, branch divergences, local and global synchronization overhead and the organization of thread blocks [28].

First, as mentioned before, the KD-tree is constructed using linear SOA. The GPU SPMT architecture can process vectors more efficiently than the nonlinear data structure such as tree. Second, to ensure that global

Experiments and results

In this section, we provide experimental performance validations of our GPU accelerated parallel KD-tree construction and P-ANNS algorithms.2 We adopted real-world image descriptor datasets with a wide range of sizes and dimensions from Winder and Brown dataset [5], [38], as well as datasets from high-dimensional

Conclusion

This paper presented the design of high performance parallel KD-tree construction, and P-ANNS on GPU for high-dimensional image descriptor matching. The proposed algorithms are of comparable quality to traditional sequential counterparts on CPU, while achieve high speedup performance in a wide range of dimensions. The massively parallel algorithms presented in this paper were tested on real-world image descriptors with varying dimensionality, as well as classical point cloud descriptors in real

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.jpdc.2019.06.003.

Linjia Hu received the B.E in Electronic Engineering from Central South University (CSU), Changsha, China, in 2006, and M.E. in Software Engineering from University of Science and Technology of China (USTC), Heifei, China in 2009. He was with ZTE Corporation, Shenzhen R&D center, China, from 2007 to 2009. Currently, he is a Ph.D. candidate in the Department of Computer Science, Michigan Technological University. His research interests include parallel and heterogeneous computing, distributed

References (41)

  • HuL. et al.

    G-SHOT: GPU accelerated 3D local descriptor for surface matching

    J. Vis. Commun. Image Represent.

    (2015)
  • AltmanN.S.

    An introduction to kernel and nearest neighbors nonparametric regression

    Amer. Statist.

    (1992)
  • S. Arya, D.M. Mount, Algorithms for fast vector quantization, in: Data Compression Conference, DCC, Snowbird, Utah,...
  • J.S. Beis, D.G. Lowe, Shape indexing using approximate nearest-neighbour search in high-dimensional spaces, in: IEEE...
  • BentleyJ.L.

    Multidimensional binary search trees used for associative searching

    Commun. ACM

    (1975)
  • BrownM. et al.

    Discriminative learning of local image descriptors

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • B. Bustos, O. Deussen, S. Hiller, D. Keim, A Graphics Hardware Accelerated Algorithm for Nearest Neighbor Search, in:...
  • L. Cayton, Accelerating nearest neighbor search on manycore systems, in: IEEE International Parallel and Distributed...
  • T.M. Chan, Approximate nearest neighbor queries revisited, in: Proceedings of the Thirteenth Annual Symposium on...
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in: IEEE...
  • G. Fabian, H. Justin, O. Cosmin, I. Christian, Buffer k-d trees: Processing massive nearest neighbor queries on GPUs,...
  • T. Foley, J. Sugerman, KD-tree acceleration structures for a GPU raytracer, in: Proceedings of the ACM...
  • V. Garcia, E. Debreuve, M. Barlaud, Fast K nearest neighbor search using GPU, in: Computer Vision and Pattern...
  • Handbook of Discrete and Computational Geometry

    (2004)
  • L. Hu, S. Nooshabadi, A 3D local descriptor SHOT on massively parallel processors, in: IEEE International Conference on...
  • L. Hu, S. Nooshabadi, Massively parallel KD-tree construction and nearest neighbor search algorithms, in: IEEE...
  • L. Hu, S. Nooshabadi, Parallel randomized KD-tree forest on GPU cluster for image descriptor matching, in: IEEE...
  • C. Kim, J. Chhugani, N. Satish, E. Sedlar, A.D. Nguyen, T. Kaldewey, V.W. Lee, S.A. Brandt, P. Dubey, FAST: fast...
  • KimJ. et al.

    Exploiting massive parallelism for indexing multi-dimensional datasets on the GPU

    IEEE Trans. Parallel Distrib. Syst.

    (2015)
  • Y. Kitaaki, H. Okuda, H. Kage, K. Sumi, High speed 3D registration using GPU, in: International Conference on...
  • Cited by (0)

    Linjia Hu received the B.E in Electronic Engineering from Central South University (CSU), Changsha, China, in 2006, and M.E. in Software Engineering from University of Science and Technology of China (USTC), Heifei, China in 2009. He was with ZTE Corporation, Shenzhen R&D center, China, from 2007 to 2009. Currently, he is a Ph.D. candidate in the Department of Computer Science, Michigan Technological University. His research interests include parallel and heterogeneous computing, distributed system, embedded system design, digital image processing, multimedia data mining, channel coding and wireless communication.

    Saeid Nooshabadi received M.Tech and Ph.D. degrees in electrical engineering from the India Institute of Technology, Delhi, India, in 1986 and 1992, respectively. Currently, he is the professor of Computer Systems Engineering, having a joint appointment, with Departments of Electrical & Computer Engineering, and Computer Science, Michigan Technological University, Michigan. Prior to his current appointment he has held multiple academic and research positions. His last two appointments were with the Gwangju Institute of Science and Technology, Republic of Korea (2007 to 2010), and with the University of New South Wales, Sydney, Australia (2000 to 2007). His research interests include VLSI information processing and low-power embedded processors for wireless network and biomedical applications.

    View full text