Machine learning-based augmented reality for improved surgical scene understanding

https://doi.org/10.1016/j.compmedimag.2014.06.007Get rights and content

Abstract

In orthopedic and trauma surgery, AR technology can support surgeons in the challenging task of understanding the spatial relationships between the anatomy, the implants and their tools. In this context, we propose a novel augmented visualization of the surgical scene that mixes intelligently the different sources of information provided by a mobile C-arm combined with a Kinect RGB-Depth sensor. Therefore, we introduce a learning-based paradigm that aims at (1) identifying the relevant objects or anatomy in both Kinect and X-ray data, and (2) creating an object-specific pixel-wise alpha map that permits relevance-based fusion of the video and the X-ray images within one single view. In 12 simulated surgeries, we show very promising results aiming at providing for surgeons a better surgical scene understanding as well as an improved depth perception.

Introduction

In orthopedic and trauma surgery, the introduction of AR technology such as the camera augmented mobile C-arm promises to support surgeons in their understanding of the spatial relationships between anatomy, implants and their surgical tools [1], [2]. By using an additional color camera mounted so that its optical center coincides with the X-ray source, the CamC system provides an augmented view created through the superimposition of X-ray and video images using alpha blending. In other words, the resulting image is a linear combination of the optical and the X-ray image by using the same mixing coefficient (alpha) over the whole image domain. While this embodies a simple and intuitive solution, the superimposition of additional X-ray information can harm the understanding of the scene for the surgeon when the field of view becomes highly cluttered (e.g. by surgical tools). It becomes more and more difficult to quickly recognize and differentiate structures in the overlaid image. Moreover, the depth perception of the surgeon is altered as the X-ray anatomy appears on top of the scene in the optical image.

In both the X-ray and the optical image, all pixels in the image domain do not have the same relevance for a good perception and understanding of the scene. Indeed, in the X-ray, while all pixels that belong to the patient bone and soft tissues have a high relevance for surgery, pixels belonging to the background do not provide any information. Concerning the optical images, it is crucial to recognize different objects interacting in the surgical scene, e.g. background, surgical tools or surgeon hands. First, this permits to improve the perception by preserving the natural occlusion clues when surgeon's hand or instruments occlude the augmented scene in the classical CamC view. Second, as a by-product, precious semantic information can be extracted for characterizing the activity performed by the surgeon or tracking the position of different objects present in the scene.

In this paper, we introduce a novel learning-based AR fusion approach aiming at improving surgical scene understanding and depth perception. Therefore, we propose to combine a mobile C-arm with a Kinect sensor, adding not only X-ray but also depth information into the augmented scene. Using the fact that structured light functions through a mirror, the Kinect sensor is integrated with a mirror system on a mobile C-arm, so that both color and depth cameras as well as the X-ray source have the same viewpoint. In this context of learning-based image fusion, a few attempts have been done in [3], [4] based on color and X-ray information only. In these early works, a Naïve Bayes classification approach based on the color and radiodensity is applied to recognize the different objects in respectively the color and X-ray images from the CamC system. Depending on the pair of objects it belongs to, each pixel is associated to a mixing value to create a relevance-based fused image. While this approach provided promising first results, recognizing each object on their color distribution only is very challenging and not robust to changes in illumination. In the present work, we propose to take advantage of additional depth information to provide an improved AR visualization: (i) we define a learning-based strategy based on color and depth information for identifying objects of interest in Kinect data, (ii) we use state-of-the-art random forest for identifying foreground objects in X-ray images and (iii) we use an object-specific mixing look-up table for creating a pixel-wise alpha map. In 12 simulated surgeries, we show that our fusion approach provides surgeons with a better surgical scene understanding as well as an improved depth perception.

Section snippets

System setup: Kinect augmented mobile C-arm

In this work, we propose to extend a common intraoperative mobile C-arm by mounting a Kinect sensor, that consists in a depth sensor coupled to a video camera. The video camera optical center of this RGB-D sensor is mounted so that it coincides with the X-ray projection center. The depth sensor is based on so-called structured light where infrared light patterns are projected into the scene. Using an infrared camera, the depth is inferred from the deformations of those patterns induced by the

Experiments and results

In this paper, we demonstrate the potential of our approach by using our proof-of-concept system illustrated by Fig 2 (on the right). We perform 12 different orthopedic surgeries simulations using a surgical phantom and real X-ray shots acquired from different orthopedic surgeries. Note that the X-ray images are manually aligned into the view of our surgical scene before starting our acquisitions. In each sequence, different types of activities involving different surgical tools, e.g. scalpel,

Conclusion

In this paper, we proposed novel strategies and learning approaches for AR visualization to improve surgical scene understanding and depth perception. Our main contributions were to propose the concept of a combined C-arm with a Kinect sensor to get color as well as depth information, to define a learning-based strategies for identifying objects of interest in Kinect and X-ray data, and to create an object-specific pixel-wise alpha map for improved image fusion. In 12 simulated surgeries, we

References (21)

  • A. Criminisi et al.

    Regression forests for efficient anatomy detection and localization in computed tomography scans

    Med Image Anal

    (2013)
  • S. Nicolau et al.

    Fusion of C-arm X-ray image on video view to reduce radiation exposure and improve orthopedic surgery planning: first in-vivo evaluation

  • N. Navab et al.
    (2010)
  • O. Pauly et al.

    Supervised classification for customized intraoperative augmented reality visualization

  • O. Erat et al.

    How a surgeon becomes superman by visualization of intelligently fused multi-modalities

  • M. Enzweiler et al.

    Multi-cue pedestrian classification with partial occlusion handling

    (2010)
  • A. Ess et al.

    Depth and appearance for mobile scene analysis

    (2007)
  • D.M. Gavrila et al.

    Multi-cue pedestrian detection and tracking from a moving vehicle

    (2007)
  • C. Wojek et al.

    Multi-cue onboard pedestrian detection

    (2009)
  • M. Sun et al.

    Depth-encoded hough voting for joint object detection and shape recovery

    (2010)
There are more references available in the full text version of this article.

Cited by (34)

  • Opportunities and challenges of using augmented reality and heads-up display in orthopaedic surgery: A narrative review

    2021, Journal of Clinical Orthopaedics and Trauma
    Citation Excerpt :

    Post-training questionnaires showed that 11 of 12 participants would have preferred a combination of expert-guided teaching and AR-guided unsupervised learning.13 The application of AR/HUD technology in orthopaedic surgery is still in its infancy and requires further modifications to justify its safety and efficacy for the clinical environment.44,45 Several barriers have hindered the adoption of AR/HUD in trauma and orthopaedic surgery, such as unfamiliarity with technology and a convoluted overhaul of established clinical pathways.46,47

  • The status of augmented reality in laparoscopic surgery as of 2016

    2017, Medical Image Analysis
    Citation Excerpt :

    Nonetheless, this technique primarily concerns depth over the surface which may be surgically less significant than depth within the tissues. In orthopedic surgery, an interesting approach uses machine learning to combine Kinect and C-arm information to simulate occlusion from the practitioner over the scene (Pauly et al., 2015). At any case, despite being the strongest clue for depth perception, occlusion only informs about the order, not about the distance, neither relative nor absolute.

  • Detection of stationary foreground objects: A survey

    2016, Computer Vision and Image Understanding
    Citation Excerpt :

    Applications for video analysis and understanding (e.g. video surveillance (Liu et al., 2015), augmented reality (Pauly et al., 2015), or analysis of people behavior (Morozov, 2015)) typically include strategies for separating the moving objects (MOs) in the scene, called foreground (FG), from the static information, called background (BG).

  • Inverse visualization concept for RGB-D augmented C-arms

    2016, Computers in Biology and Medicine
    Citation Excerpt :

    Based on the previous literature [23–26], the ordinary shading techniques used in photorealistic rendering cannot help to solve this problem. However other methods for enhancing shape perception go even further against our expectation, such as texture [12–20] and line drawing [21,22], because we want to retain the texture of the scene to help clinicians recognize all the objects present in the surgical scene. For this, we design a non-photorealistic shading method, which is inspired by the presentation modality of depth images.

View all citing articles on Scopus
View full text