BIM-Tracker: A model-based visual tracking approach for indoor localisation using a 3D building model

doi:10.1016/j.isprsjprs.2019.02.014

ISPRS Journal of Photogrammetry and Remote Sensing

Volume 150, April 2019, Pages 157-171

https://doi.org/10.1016/j.isprsjprs.2019.02.014 Get rights and content

Abstract

This article presents an accurate and robust visual indoor localisation approach that not only is infrastructure-free, but also avoids accumulation error by taking advantage of (1) the widespread ubiquity of mobile devices with cameras and (2) the availability of 3D building models for most modern buildings. Localisation is performed by matching image sequences captured by a camera, with a 3D model of the building in a model-based visual tracking framework. Comprehensive evaluation of the approach with a photo-realistic synthetic dataset shows the robustness of the localisation approach under challenging conditions. Additionally, the approach is tested and evaluated on real data captured by a smartphone. The results of the experiments indicate that a localisation accuracy better than 10 cm can be achieved by using this approach. Since localisation errors do not accumulate the proposed approach is suitable for indoor localisation tasks for long periods of time and augmented reality applications, without requiring any local infrastructure. A MATLAB implementation can be found on https://github.com/debaditya-unimelb/BIM-Tracker.

Introduction

Indoor location information is the key enabler of a range of applications including navigation guidance, location-based services, emergency response, guiding vulnerable people and augmented reality. Indoor environments present a challenge for localisation due to strong attenuation of Global Navigation Satellite System (GNSS) signals (Mautz, 2012) compared to the outdoor environments. In the literature, many different approaches for indoor localisation have been proposed. However, the performance of indoor localisation is still lagging behind compared to outdoor localisation (Lymberopoulos and Liu, 2017), and many localisation applications are waiting for an acceptable solution.

Present indoor localisation approaches are either infrastructure-dependent, whose installation and maintenance are costly and not always feasible, or are infrastructure-free, which are not accurate enough for mass-market applications (Alarifi et al., 2016, Mautz, 2012). By infrastructure, we mean a dedicated network of sensors, transmitters or beacons installed in the indoor environment. Consequently, infrastructure-free indoor localisation has become a focus of research and development during the past decade, and improvements in indoor localisation systems are likely to generate better prospects for business. Among various infrastructure-free approaches, those based on digital television signals, FM radio signals, magnetic field, ambient sound levels and barometers provide metre level accuracy which is not sufficient for many indoor location-based applications (Xie et al., 2014, Muralidharan et al., 2014, Ye et al., 2014, Tarzia et al., 2011, Serant et al., 2011).

Other infrastructure-free methods such as pedestrian dead reckoning (PDR), visual odometry, and simultaneous localisation and mapping (SLAM), suffer from the accumulation of localisation errors resulting in the drift of the estimated trajectory (Khoshelham and Ramezani, 2017, Scaramuzza and Fraundorfer, 2011, Caron et al., 2014). The reported accuracy of the PDR using inertial measurement units (IMU) combined with recalibration from other sources such as Wi-Fi is approximately 1 m (Lymberopoulos and Liu, 2017) or 1% of the trajectory length (Mautz, 2012). A drift of 1% is acceptable for short distances but is not suitable for long distance indoor localisation applications. Moreover, visual odometry and SLAM methods are susceptible to poorly textured indoor environments such as a corridor, due to the lack of image features. Additionally, SLAM involves performing a loop closure, which is not practical for navigation applications, e.g. in a long tunnel, as the user cannot be forced to make loops.

Model-based visual tracking methods overcome the above challenges by using a 3D model to recalibrate the drift (Lepetit and Fua, 2005). Furthermore, the methods using model-based visual tracking (such as the work of Drummond and Cipolla, 2002) eliminate the requirement of textured indoor environments, and are computationally inexpensive (Lepetit and Fua, 2005). However, these model-based tracking approaches have been designed for tracking small objects and in small spaces such as a room and are therefore unsuitable for continuous localisation in large indoor environments. Moreover, the existing works that use model-based visual tracking lacks a comprehensive evaluation of the achievable accuracy, robustness and the estimated trajectory.

In this paper we present BIM-Tracker: a model-based tracking approach to indoor localisation that is based on matching images captured by a mobile device with the corresponding view of a building information model (BIM). The edges in the images are matched with the edges derived from the BIM to estimate the location of the camera in the BIM coordinate system by model-based visual tracking. The advantage of performing localisation in a BIM coordinate system is that the estimated locations are not prone to drifts compared to the incremental tracking methods that perform localisation by local motion estimation (Khoshelham and Ramezani, 2017). Because most of the smartphones and smartglasses are equipped with a camera, and the fact that a low level-of-detail 3D model of the environment is usually available or can be easily generated, the present research proposes a model-based visual tracking approach for accurate and drift-free localisation in an infrastructure-free indoor environment. The following are the main contributions of the article:

1.
We formulate an MSAC (M-estimator sample consensus) framework to use two hypotheses on either side of a back-projected model line to search the corresponding image edges. This strategy is a balance between the higher computational costs of using several multiple hypotheses for tracking and the robustness that it provides.
2.
We provide experimental insight into the optimal camera configurations and factors that contribute to errors for a model-based visual tracking approach in an indoor environment. A detailed analysis of the estimated trajectory is performed using a photo-realistic synthetic dataset with several configurations such as different image resolutions, camera field-of-views (FOV), motion blur, clutter and occlusions.
3.
We demonstrate the ability of BIM-Tracker for drift-free localisation using real images, which makes it suitable for navigation and augmented reality applications.¹

The paper proceeds with a review of visual methods and related works in the field of model-based tracking in Section 2. The theory and methodology for model-based visual tracking using edges are explained in Section 3. The experimental design and the evaluation results are discussed in Section 4, followed by conclusions in Section 6.

Section snippets

Background and related work

Visual methods can be classified as visual odometry, SLAM, model-based tracking and the integration of IMU with these methods. Visual odometry (Nister et al., 2004, Scaramuzza and Fraundorfer, 2011) is a local motion estimation approach, where the motion of the camera is used to perform a incremental tracking. Consequently, errors are accumulated and the estimated locations drift from the true locations. Visual landmarks (Zhu et al., 2007) have been used to reduce drift by recalibrating the

Methodology

The design of an infrastructure-free localisation approach that can be universally adopted requires the knowledge of parameters suitable for its robust performance, considering the limitations of using such an approach on smartphones and wearable devices, and the challenges presented by a dynamic indoor environment. The main hypothesis of the research is that centimetre level accuracy in localisation can be achieved without any drift by integrating image information with a 3D building model. To

Experiments and results

During the evaluation of localisation accuracy, the correctness of the ground truth plays a vital role. Ground truth for the evaluation of trajectories in indoor spaces are usually collected by camera-based (Fod et al., 2002), motion capture (Huang et al., 2017), laser tracking (Teulire et al., 2015), surveying (Brki et al., 2010) or target tracking (Boochs et al., 2010) methods. All the methods, require specific hardware or platforms for the collection of the ground truth and usually have

Discussions

Although BIM-Tracker is quite robust against motion blur, occlusions and illumination variations, it fails under notoriously challenging conditions. Firstly, in the presence of heavy motion blur, the edges extracted by the edge detector might be missed, noisy or displaced by a few pixels. Therefore, wrong 3D-2D correspondences will be generated that results in the estimation of a bad pose. Fig. 17(a) shows one of such poses where the image blur caused the failure of tracking. Although the

Conclusion

An approach is developed for performing infrastructure-free indoor localisation by performing a model-based visual tracking. Evaluation of the approach suggests that a localisation accuracy of 10 cm can be achieved using a low level-of-detail 3D model derived from a BIM. Moreover, there is no accumulation of error, which makes this approach suitable for indoor localisation tasks for long periods. Experiments with photo-realistic synthetic data suggest that a higher resolution of the image and a

Acknowledgements

This research was supported by a Research Engagement Grant from the Melbourne School of Engineering and a Melbourne Research Scholarship. The authors would like to sincerely thank the reviewers for their invaluable and constructive suggestions that helped us to improve the quality of the research.

References (59)

G. Caron et al.
Direct model based visual tracking and pose estimation using mutual information
Image Vis. Comput.
(2014)
K. Khoshelham
Closed-form solutions for estimating a rigid motion from plane correspondences extracted from point clouds
ISPRS J. Photogramm. Remote Sens.
(2016)
E. Marchand et al.
A 2D-3D model-based approach to real-time visual tracking
Image Vis. Comput.
(2001)
N. Piasco et al.
A survey on visual-based localization: on the benefit of heterogeneous data
Pattern Recogn.
(2018)
M. Ramezani et al.
Pose estimation by omnidirectional visual-inertial odometry
Robot. Auton. Syst.
(2018)
P. Torr et al.
MLESAC: a new robust estimator with application to estimating image geometry
Comput. Vis. Image Underst.
(2000)
L. Zhang et al.
Structure and motion from line correspondences: representation, projection, initialization and sparse bundle adjustment
J. Vis. Commun. Image Represent.
(2014)
A. Alarifi et al.
Ultra wideband indoor positioning technologies: analysis and recent advances
Sensors
(2016)
F. Boochs et al.
Increasing the accuracy of untaught robot positions by means of a multi-camera system
J. Briales et al.
A minimal closed-form solution for the perspective three orthogonal angles (P3oA) problem: application to visual odometry
J. Math. Imag. Vision
(2016)

B. Brki et al.

Daedalus: a versatile usable digital clip-on measuring system for total stations

C. Cadena et al.

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age

IEEE Trans. Robot.

(2016)

J. Canny

A computational approach to edge detection

IEEE Trans. Pattern Anal. Mach. Intell.

(1986)

C. Choi et al.

Real-time 3d model-based tracking using edge and keypoint features for robotic manipulation

A.I. Comport et al.

A real-time tracker for markerless augmented reality

P. David et al.

Softposit: simultaneous pose and correspondence determination

P. David et al.

Simultaneous pose and correspondence determination using line features

A.J. Davison

Real-time simultaneous localisation and mapping with a single camera

T. Drummond et al.

Real-time visual tracking of complex structures

IEEE Trans. Pattern Anal. Mach. Intell.

(2002)

M.A. Fischler et al.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

(1981)

A. Fod et al.

A laser-based people tracker

X.-S. Gao et al.

Complete solution classification for the perspective-three-point problem

IEEE Trans. Pattern Anal. Mach. Intell.

(2003)

A.P. Gee et al.

Real-time model-based slam using line segments

R. Gomez-Ojeda et al.

Robust stereo visual odometry through a probabilistic combination of points and line segments

R. Gomez-Ojeda et al.

Geometric-based line segment tracking for HDR stereo sequences

Gomez-Ojeda, R., Moreno, F.A., Scaramuzza, D., Jiménez, J.G., 2017. PL-SLAM: a stereo SLAM system through the...

Groves, P.D., 2013. Principles of GNSS, inertial, and multisensor integrated navigation systems. Artech...

A. Handa et al.

A benchmark for RGB-D visual odometry, 3d reconstruction and slam

M. Hofer et al.

Line-based 3d reconstruction of wiry objects

Cited by (38)

Automated generative design and prefabrication of precast buildings using integrated BIM and graph convolutional neural network
2024, Developments in the Built Environment
Precast construction is a productivity-improving technology in the architectural, engineering, and construction industry that improves construction efficiency by combining factory-based manufacturing and lean assembly. Many international efforts have encouraged the adoption of this approach. This study presents an integrated Building Information Modelling (BIM) with technological automation interoperability to enable generative design and prefabrication for precast buildings. A generic BIM-based graph representation is established to explicitly formulate buildings' spatial and geometric features. Following this, a graph-constrained layout generator is developed, with a generative modelling algorithm and graph convolutional neural network, to extract pairwise spatial-geometric features for generating the optimal precast layout. This is followed by semantic enrichment of BIM data (i.e., Industry Foundation Classes) with precast data schema to facilitate data transformation for prefabrication automation until site delivery. The holistic approach presented in this study empowers pre-construction planning optimisation and fabrication automation in precast construction.
Enhanced visual SLAM for construction robots by efficient integration of dynamic object segmentation and scene semantics
2024, Advanced Engineering Informatics
With the increasing adoption of autonomous mobile robots in the construction industry, accurate localization and mapping in dynamic construction environments have become paramount. This is typically tackled via Simultaneous Localization and Mapping (SLAM) techniques. Primarily designed for static environments, traditional SLAM systems struggle to maintain robustness and accuracy in dynamic settings. To address this challenge, this study presents an enhanced visual SLAM system specifically tailored for dynamic construction environments. The proposed system, named vSLAM-Con, introduces an adaptive dynamic object segmentation method, utilizing an innovative AD-keyframes selection mechanism grounded on optical flow magnitude to diminish computational overhead while preserving competitive tracking accuracy. Additionally, a semantic-based feature update process is developed, leveraging scene understanding and continuous observation to augment the reliability of tracking features. This system's performance, evaluated on both an established public benchmark and a custom construction dataset, shows substantial improvements over the baseline and competitive results with the state-of-the-art algorithms. More importantly, it largely reduces the processing time compared to state-of-the-arts, demonstrating robust tracking performance even under highly dynamic conditions. The findings highlight the system's potential to contribute significantly to autonomous robotics in construction, offering more accurate navigation and interaction capabilities in complex, ever-changing environments.
Aligning the real and the virtual world: Mixed reality localisation using learning-based 3D–3D model registration
2023, Advanced Engineering Informatics
Existing camera localisation methods for indoor augmented and mixed reality (AR/MR) are almost exclusively image based. The main issue with image-based methods is that they do not scale well and, as a consequence, AR/MR applications are mostly limited to small-scale room experiences. To tackle the challenge of large-scale indoor AR/MR localisation, we propose a novel framework for AR/MR localisation based solely on 3D–3D model registration. The localisation is performed by an automated registration of a low-density model of the surroundings created by the device to the existing point cloud of the environment based on learning-based keypoint detection and description. Our solution takes advantage of recent significant improvements in automated coarse-to-fine 3D–3D model registration methods. Unlike the existing image-based AR/MR localisation methods, which are restricted to small room-sized environments, the proposed 3D registration-based approach is applicable to large environments and is robust to changes in colour and illumination of the scene. We perform extensive testing and analysis of the approach with real-world experiments and datasets using a prototype developed for the Microsoft HoloLens. Experimental results show high localisation reliability and accuracy, with a mean translation error of 2.8 cm and a mean rotation error of 0.30°. The method performs well in a large-scale environment (300 m²) and shows good robustness to changes in scene geometry.
Group teaching optimization with improved Chan-Taylor algorithm for 3D indoor localization
2023, Microprocessors and Microsystems
In recent years, indoor localization has become a hot research topic and several researchers have paid attention to design accurate 3-Dimensional (3D) indoor localization techniques. Existing localization techniques cannot be sufficient for 3D indoor localization due to the existence of interferences and the need for high precision. The most widely used option for outdoor localization is known as the global positioning system (GPS), and its precision in location is often within a 10-meter margin of error. Problems arise in the "final mile" of the localization field due to the complicated impediments that buildings provide, which encourages a momentum of indoor localization. Traditional methods of indoor localization are either dependent on range or fingerprinting, which means that the predeployment process takes a significant amount of time and work to complete. The non-line of sight propagation due to the building shielding and background interferences results in high error on the application of Ultra-Wideband technologies to indoor localization. For resolving this issue, this paper presents a novel group teaching optimization algorithm with improved Chan-Taylor algorithm (GTOA-ICTA) for 3D indoor localization. The GTO-ICTA technique initially employs the GTOA for transforming the target problem into the target of variations among the determined and original positions. Besides, the ICTA technique is employed for computing the accurate positions of the targets. The integration of GTOA and ICTA techniques helps to properly achieve maximum localization performance. For validating the enhanced performance of the presented GTOA-ICTA technique, a set of simulations were performed and the results are examined interms of different aspects. A comprehensive results analysis of the proposed GTOA-ICT technique takes place in positioning error with the minimum error of 0.1245, and the proposed method has reached increased results with the maximum error of 0.9660. Moreover the GTOA-ICT has obtained minimum average error of 0.6432, least Standard deviation of 0.4387. The resultant experimental outcomes highlighted the enhanced performance of the GTOA-ICTA techniques over the recent state of art techniques interms of different measures.
BIM-based indoor mobile robot initialization for construction automation using object detection
2023, Automation in Construction
Citation Excerpt :
Therefore, it was not feasible for large-scale complex construction environments with many similar plain walls, pillars and long corridors. Many research has studied the potential of BIM in robot path planning [40] and position tracking [41] using the indoor mobility information in BIM, such as the size of areas and the accessibility to other spaces and transitions [42]. BIM was also used in infrastructure-dependent indoor localization systems to provide the environment information.
In recent years, there has been increasing interest in robotic solutions to revolutionize the conventional construction industry. Despite various advances in developing mobile robotic solutions for construction automation. One key bottleneck towards a fully automated robotic solution in construction is the initialization of the mobile robot. Currently, most of the commercialized mobile construction robots are manually initialized before autonomous navigation can be performed at the construction sites for automated tasks. Even if the robot is initialized, the location information can be lost while navigating and re-initialization is required to resume the navigation. Any wrong initialization can cause failure in robot pose tracking and thus prevent the robot from performing the planned tasks. However, in indoor construction sites, GPS is not accessible, and indoor infrastructures, such as beacon devices are not available for robot initialization. In addition, construction environments are dynamic with significant change in scenes and structures for different construction blocks and floors, making pre-scanning of the environments and map matching difficult and time-consuming. An infrastructure-free and environment-independent robot initialization method is therefore required. In this paper, we propose an integrated Building Information Model (BIM)-based indoor robot initialization system using an object detector to automatically initialize the mobile robot when it is deployed at an unknown location. Convolutional neural network (CNN)-based object detection technique is used to detect and locate the visual features, which are widely distributed building components at construction sites. A feature matching algorithm is developed to correlate the acquired online information of detected features with geometric and semantic information retrieved from BIM. The robot location in the BIM coordinate frame is then estimated based on the feature association. Moreover, the proposed system aggregates the BIM information and the sensory information to supervise the online robot decision making, making the entire system fully automatic. The proposed system is validated through experiments in various environments including a university building and ongoing construction sites.
Geometric BIM verification of indoor construction sites by photogrammetric point clouds and evidence theory
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Citation Excerpt :
Furthermore, Meyer et al. (2021) assess the accuracy of image based change detection for BIM from matching uncertain image lines to corresponding 3D model edges that are also considered to be statistically uncertain. A related approach is visual localization from indoor images in combination with an available BIM based on line-features as introduced by Acharya et al. (2019). In order to manage different sources of uncertainty properly, this contribution makes use of evidence theory following Shafer (1976) and Dempster (1976).
Photogrammetric point clouds offer immense potential for various applications, especially for the AEC industry and ”as-built” BIM. However, despite many advantages such as time and cost efficiency, image based point clouds of indoor environments mostly suffer from inhomogeneous and strongly fluctuating point-wise uncertainties. This lack of area-filling geometric reliability represents a strong barrier for innovations and further development of image based applications for as-built BIM, regarding both software and hardware. Therefore, this paper presents a method for the geometric verification of indoor BIMs by images and uncertainty management in order to unleash the potential of photogrammetry in context of professional building documentation heading towards ”digital twinning”. Individual 3D point accuracies, object’s surface characteristics and BIM related uncertainties according to the Level of Accuracy (LOA) specification are assessed and taken into account. The final decision of whether or not a photogrammetric point cloud confirms a given model within its associated level of accuracy results from a combined reasoning pipeline based on Dempster–Shafer evidence theory. The novel Pho-to-BIM verification method is demonstrated on three real indoor construction sites, each 3D mapped with different image sensors. Based on the experiments it is shown how to set up belief functions for evidence based reasoning individually, depending on the measurement and site characteristics.

View all citing articles on Scopus

View full text

BIM-Tracker: A model-based visual tracking approach for indoor localisation using a 3D building model

Abstract

Introduction

Section snippets

Background and related work

Methodology

Experiments and results

Discussions

Conclusion

Acknowledgements

Image Vis. Comput.

ISPRS J. Photogramm. Remote Sens.

Image Vis. Comput.

Pattern Recogn.

Robot. Auton. Syst.

Comput. Vis. Image Underst.

J. Vis. Commun. Image Represent.

Ultra wideband indoor positioning technologies: analysis and recent advances

Sensors

Increasing the accuracy of untaught robot positions by means of a multi-camera system

A minimal closed-form solution for the perspective three orthogonal angles (P3oA) problem: application to visual odometry

J. Math. Imag. Vision

Daedalus: a versatile usable digital clip-on measuring system for total stations

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age

IEEE Trans. Robot.

A computational approach to edge detection

IEEE Trans. Pattern Anal. Mach. Intell.

Real-time 3d model-based tracking using edge and keypoint features for robotic manipulation

A real-time tracker for markerless augmented reality

Softposit: simultaneous pose and correspondence determination

Simultaneous pose and correspondence determination using line features

Real-time simultaneous localisation and mapping with a single camera

Real-time visual tracking of complex structures

IEEE Trans. Pattern Anal. Mach. Intell.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

A laser-based people tracker

Complete solution classification for the perspective-three-point problem

IEEE Trans. Pattern Anal. Mach. Intell.

Real-time model-based slam using line segments

Robust stereo visual odometry through a probabilistic combination of points and line segments

Geometric-based line segment tracking for HDR stereo sequences

A benchmark for RGB-D visual odometry, 3d reconstruction and slam

Line-based 3d reconstruction of wiry objects