Machine learning-based augmented reality for improved surgical scene understanding

doi:10.1016/j.compmedimag.2014.06.007

Computerized Medical Imaging and Graphics

Volume 41, April 2015, Pages 55-60

https://doi.org/10.1016/j.compmedimag.2014.06.007 Get rights and content

Abstract

In orthopedic and trauma surgery, AR technology can support surgeons in the challenging task of understanding the spatial relationships between the anatomy, the implants and their tools. In this context, we propose a novel augmented visualization of the surgical scene that mixes intelligently the different sources of information provided by a mobile C-arm combined with a Kinect RGB-Depth sensor. Therefore, we introduce a learning-based paradigm that aims at (1) identifying the relevant objects or anatomy in both Kinect and X-ray data, and (2) creating an object-specific pixel-wise alpha map that permits relevance-based fusion of the video and the X-ray images within one single view. In 12 simulated surgeries, we show very promising results aiming at providing for surgeons a better surgical scene understanding as well as an improved depth perception.

Introduction

In orthopedic and trauma surgery, the introduction of AR technology such as the camera augmented mobile C-arm promises to support surgeons in their understanding of the spatial relationships between anatomy, implants and their surgical tools [1], [2]. By using an additional color camera mounted so that its optical center coincides with the X-ray source, the CamC system provides an augmented view created through the superimposition of X-ray and video images using alpha blending. In other words, the resulting image is a linear combination of the optical and the X-ray image by using the same mixing coefficient (alpha) over the whole image domain. While this embodies a simple and intuitive solution, the superimposition of additional X-ray information can harm the understanding of the scene for the surgeon when the field of view becomes highly cluttered (e.g. by surgical tools). It becomes more and more difficult to quickly recognize and differentiate structures in the overlaid image. Moreover, the depth perception of the surgeon is altered as the X-ray anatomy appears on top of the scene in the optical image.

In both the X-ray and the optical image, all pixels in the image domain do not have the same relevance for a good perception and understanding of the scene. Indeed, in the X-ray, while all pixels that belong to the patient bone and soft tissues have a high relevance for surgery, pixels belonging to the background do not provide any information. Concerning the optical images, it is crucial to recognize different objects interacting in the surgical scene, e.g. background, surgical tools or surgeon hands. First, this permits to improve the perception by preserving the natural occlusion clues when surgeon's hand or instruments occlude the augmented scene in the classical CamC view. Second, as a by-product, precious semantic information can be extracted for characterizing the activity performed by the surgeon or tracking the position of different objects present in the scene.

In this paper, we introduce a novel learning-based AR fusion approach aiming at improving surgical scene understanding and depth perception. Therefore, we propose to combine a mobile C-arm with a Kinect sensor, adding not only X-ray but also depth information into the augmented scene. Using the fact that structured light functions through a mirror, the Kinect sensor is integrated with a mirror system on a mobile C-arm, so that both color and depth cameras as well as the X-ray source have the same viewpoint. In this context of learning-based image fusion, a few attempts have been done in [3], [4] based on color and X-ray information only. In these early works, a Naïve Bayes classification approach based on the color and radiodensity is applied to recognize the different objects in respectively the color and X-ray images from the CamC system. Depending on the pair of objects it belongs to, each pixel is associated to a mixing value to create a relevance-based fused image. While this approach provided promising first results, recognizing each object on their color distribution only is very challenging and not robust to changes in illumination. In the present work, we propose to take advantage of additional depth information to provide an improved AR visualization: (i) we define a learning-based strategy based on color and depth information for identifying objects of interest in Kinect data, (ii) we use state-of-the-art random forest for identifying foreground objects in X-ray images and (iii) we use an object-specific mixing look-up table for creating a pixel-wise alpha map. In 12 simulated surgeries, we show that our fusion approach provides surgeons with a better surgical scene understanding as well as an improved depth perception.

Section snippets

System setup: Kinect augmented mobile C-arm

In this work, we propose to extend a common intraoperative mobile C-arm by mounting a Kinect sensor, that consists in a depth sensor coupled to a video camera. The video camera optical center of this RGB-D sensor is mounted so that it coincides with the X-ray projection center. The depth sensor is based on so-called structured light where infrared light patterns are projected into the scene. Using an infrared camera, the depth is inferred from the deformations of those patterns induced by the

Experiments and results

In this paper, we demonstrate the potential of our approach by using our proof-of-concept system illustrated by Fig 2 (on the right). We perform 12 different orthopedic surgeries simulations using a surgical phantom and real X-ray shots acquired from different orthopedic surgeries. Note that the X-ray images are manually aligned into the view of our surgical scene before starting our acquisitions. In each sequence, different types of activities involving different surgical tools, e.g. scalpel,

Conclusion

In this paper, we proposed novel strategies and learning approaches for AR visualization to improve surgical scene understanding and depth perception. Our main contributions were to propose the concept of a combined C-arm with a Kinect sensor to get color as well as depth information, to define a learning-based strategies for identifying objects of interest in Kinect and X-ray data, and to create an object-specific pixel-wise alpha map for improved image fusion. In 12 simulated surgeries, we

References (21)

A. Criminisi et al.
Regression forests for efficient anatomy detection and localization in computed tomography scans
Med Image Anal
(2013)
S. Nicolau et al.
Fusion of C-arm X-ray image on video view to reduce radiation exposure and improve orthopedic surgery planning: first in-vivo evaluation
N. Navab et al.
(2010)
O. Pauly et al.
Supervised classification for customized intraoperative augmented reality visualization
O. Erat et al.
How a surgeon becomes superman by visualization of intelligently fused multi-modalities
M. Enzweiler et al.
Multi-cue pedestrian classification with partial occlusion handling
(2010)
A. Ess et al.
Depth and appearance for mobile scene analysis
(2007)
D.M. Gavrila et al.
Multi-cue pedestrian detection and tracking from a moving vehicle
(2007)
C. Wojek et al.
Multi-cue onboard pedestrian detection
(2009)
M. Sun et al.
Depth-encoded hough voting for joint object detection and shape recovery
(2010)

There are more references available in the full text version of this article.

Cited by (34)

Opportunities and challenges of using augmented reality and heads-up display in orthopaedic surgery: A narrative review
2021, Journal of Clinical Orthopaedics and Trauma
Citation Excerpt :
Post-training questionnaires showed that 11 of 12 participants would have preferred a combination of expert-guided teaching and AR-guided unsupervised learning.13 The application of AR/HUD technology in orthopaedic surgery is still in its infancy and requires further modifications to justify its safety and efficacy for the clinical environment.44,45 Several barriers have hindered the adoption of AR/HUD in trauma and orthopaedic surgery, such as unfamiliarity with technology and a convoluted overhaul of established clinical pathways.46,47
Utilization of augmented reality (AR) and heads-up displays (HUD) to aid orthopaedic surgery has the potential to benefit surgeons and patients alike through improved accuracy, safety, and educational benefits. With the COVID-19 pandemic, the opportunity for adoption of novel technology is more relevant. The aims are to assess the technology available, to understand the current evidence regarding the benefit and to consider challenges to implementation in clinical practice.
PRISMA guidelines were used to filter the literature. Of 1004 articles returned the following exclusion criteria were applied: 1) reviews/commentaries 2) unrelated to orthopaedic surgery 3) use of other AR wearables beyond visual aids leaving 42 papers for review.
This review illustrates benefits including enhanced accuracy and reduced time of surgery, reduced radiation exposure and educational benefits.
Whilst there are obstacles to overcome, there are already reports of technology being used. As with all novel technologies, a greater understanding of the learning curve is crucial, in addition to shielding our patients from this learning curve. Improvements in usability and implementing surgeons’ specific needs should increase uptake.
The status of augmented reality in laparoscopic surgery as of 2016
2017, Medical Image Analysis
Citation Excerpt :
Nonetheless, this technique primarily concerns depth over the surface which may be surgically less significant than depth within the tissues. In orthopedic surgery, an interesting approach uses machine learning to combine Kinect and C-arm information to simulate occlusion from the practitioner over the scene (Pauly et al., 2015). At any case, despite being the strongest clue for depth perception, occlusion only informs about the order, not about the distance, neither relative nor absolute.
This article establishes a comprehensive review of all the different methods proposed by the literature concerning augmented reality in intra-abdominal minimally invasive surgery (also known as laparoscopic surgery). A solid background of surgical augmented reality is first provided in order to support the survey. Then, the various methods of laparoscopic augmented reality as well as their key tasks are categorized in order to better grasp the current landscape of the field. Finally, the various issues gathered from these reviewed approaches are organized in order to outline the remaining challenges of augmented reality in laparoscopic surgery.
Detection of stationary foreground objects: A survey
2016, Computer Vision and Image Understanding
Citation Excerpt :
Applications for video analysis and understanding (e.g. video surveillance (Liu et al., 2015), augmented reality (Pauly et al., 2015), or analysis of people behavior (Morozov, 2015)) typically include strategies for separating the moving objects (MOs) in the scene, called foreground (FG), from the static information, called background (BG).
Detection of stationary foreground objects (i.e., moving objects that remain static throughout several frames) has attracted the attention of many researchers over the last decades and, consequently, many new ideas have been recently proposed, trying to achieve high-quality detections in complex scenarios with the lowest misdetections, while keeping real-time constraints. Most of these strategies are focused on detecting abandoned objects. However, there are some approaches that also allow detecting partially-static foreground objects (e.g. people remaining temporarily static) or stolen objects (i.e., objects removed from the background of the scene).
This paper provides a complete survey of the most relevant approaches for detecting all kind of stationary foreground objects. The aim of this survey is not to compare the existing methods, but to provide the information needed to get an idea of the state of the art in this field: kinds of stationary foreground objects, main challenges in the field, main datasets for testing the detection of stationary foreground, main stages in the existing approaches and algorithms typically used in such stages.
Inverse visualization concept for RGB-D augmented C-arms
2016, Computers in Biology and Medicine
Citation Excerpt :
Based on the previous literature [23–26], the ordinary shading techniques used in photorealistic rendering cannot help to solve this problem. However other methods for enhancing shape perception go even further against our expectation, such as texture [12–20] and line drawing [21,22], because we want to retain the texture of the scene to help clinicians recognize all the objects present in the surgical scene. For this, we design a non-photorealistic shading method, which is inspired by the presentation modality of depth images.
X-ray is still the essential imaging for many minimally-invasive interventions. Overlaying X-ray images with an optical view of the surgery scene has been demonstrated to be an efficient way to reduce radiation exposure and surgery time. However, clinicians are recommended to place the X-ray source under the patient table while the optical view of the real scene must be captured from the top in order to see the patient, surgical tools, and the surgical site. With the help of a RGB-D (red-green-blue-depth) camera, which can measure depth in addition to color, the 3D model of the real scene is registered to the X-ray image. However, fusing two opposing viewpoints and visualizing them in the context of medical applications has never been attempted. In this paper, we propose first experiences of a novel inverse visualization technique for RGB-D augmented C-arms. A user study consisting of 16 participants demonstrated that our method shows a meaningful visualization with potential in providing clinicians multi-modal fused data in real-time during surgery.
Supplementing the markerless AR with machine learning: Methods and approaches
2023, Handbook of Augmented and Virtual Reality
Artificial Intelligence-Based Hazard Detection in Robotic-Assisted Single-Incision Oncologic Surgery
2023, Cancers

View all citing articles on Scopus

View full text

Machine learning-based augmented reality for improved surgical scene understanding

Abstract

Introduction

Section snippets

System setup: Kinect augmented mobile C-arm

Experiments and results

Conclusion

Med Image Anal

Fusion of C-arm X-ray image on video view to reduce radiation exposure and improve orthopedic surgery planning: first in-vivo evaluation

Supervised classification for customized intraoperative augmented reality visualization

How a surgeon becomes superman by visualization of intelligently fused multi-modalities

Multi-cue pedestrian classification with partial occlusion handling

Depth and appearance for mobile scene analysis

Multi-cue pedestrian detection and tracking from a moving vehicle

Multi-cue onboard pedestrian detection

Depth-encoded hough voting for joint object detection and shape recovery