Machine learning-based augmented reality for improved surgical scene understanding
Introduction
In orthopedic and trauma surgery, the introduction of AR technology such as the camera augmented mobile C-arm promises to support surgeons in their understanding of the spatial relationships between anatomy, implants and their surgical tools [1], [2]. By using an additional color camera mounted so that its optical center coincides with the X-ray source, the CamC system provides an augmented view created through the superimposition of X-ray and video images using alpha blending. In other words, the resulting image is a linear combination of the optical and the X-ray image by using the same mixing coefficient (alpha) over the whole image domain. While this embodies a simple and intuitive solution, the superimposition of additional X-ray information can harm the understanding of the scene for the surgeon when the field of view becomes highly cluttered (e.g. by surgical tools). It becomes more and more difficult to quickly recognize and differentiate structures in the overlaid image. Moreover, the depth perception of the surgeon is altered as the X-ray anatomy appears on top of the scene in the optical image.
In both the X-ray and the optical image, all pixels in the image domain do not have the same relevance for a good perception and understanding of the scene. Indeed, in the X-ray, while all pixels that belong to the patient bone and soft tissues have a high relevance for surgery, pixels belonging to the background do not provide any information. Concerning the optical images, it is crucial to recognize different objects interacting in the surgical scene, e.g. background, surgical tools or surgeon hands. First, this permits to improve the perception by preserving the natural occlusion clues when surgeon's hand or instruments occlude the augmented scene in the classical CamC view. Second, as a by-product, precious semantic information can be extracted for characterizing the activity performed by the surgeon or tracking the position of different objects present in the scene.
In this paper, we introduce a novel learning-based AR fusion approach aiming at improving surgical scene understanding and depth perception. Therefore, we propose to combine a mobile C-arm with a Kinect sensor, adding not only X-ray but also depth information into the augmented scene. Using the fact that structured light functions through a mirror, the Kinect sensor is integrated with a mirror system on a mobile C-arm, so that both color and depth cameras as well as the X-ray source have the same viewpoint. In this context of learning-based image fusion, a few attempts have been done in [3], [4] based on color and X-ray information only. In these early works, a Naïve Bayes classification approach based on the color and radiodensity is applied to recognize the different objects in respectively the color and X-ray images from the CamC system. Depending on the pair of objects it belongs to, each pixel is associated to a mixing value to create a relevance-based fused image. While this approach provided promising first results, recognizing each object on their color distribution only is very challenging and not robust to changes in illumination. In the present work, we propose to take advantage of additional depth information to provide an improved AR visualization: (i) we define a learning-based strategy based on color and depth information for identifying objects of interest in Kinect data, (ii) we use state-of-the-art random forest for identifying foreground objects in X-ray images and (iii) we use an object-specific mixing look-up table for creating a pixel-wise alpha map. In 12 simulated surgeries, we show that our fusion approach provides surgeons with a better surgical scene understanding as well as an improved depth perception.
Section snippets
System setup: Kinect augmented mobile C-arm
In this work, we propose to extend a common intraoperative mobile C-arm by mounting a Kinect sensor, that consists in a depth sensor coupled to a video camera. The video camera optical center of this RGB-D sensor is mounted so that it coincides with the X-ray projection center. The depth sensor is based on so-called structured light where infrared light patterns are projected into the scene. Using an infrared camera, the depth is inferred from the deformations of those patterns induced by the
Experiments and results
In this paper, we demonstrate the potential of our approach by using our proof-of-concept system illustrated by Fig 2 (on the right). We perform 12 different orthopedic surgeries simulations using a surgical phantom and real X-ray shots acquired from different orthopedic surgeries. Note that the X-ray images are manually aligned into the view of our surgical scene before starting our acquisitions. In each sequence, different types of activities involving different surgical tools, e.g. scalpel,
Conclusion
In this paper, we proposed novel strategies and learning approaches for AR visualization to improve surgical scene understanding and depth perception. Our main contributions were to propose the concept of a combined C-arm with a Kinect sensor to get color as well as depth information, to define a learning-based strategies for identifying objects of interest in Kinect and X-ray data, and to create an object-specific pixel-wise alpha map for improved image fusion. In 12 simulated surgeries, we
References (21)
- et al.
Regression forests for efficient anatomy detection and localization in computed tomography scans
Med Image Anal
(2013) - et al.
Fusion of C-arm X-ray image on video view to reduce radiation exposure and improve orthopedic surgery planning: first in-vivo evaluation
- et al.(2010)
- et al.
Supervised classification for customized intraoperative augmented reality visualization
- et al.
How a surgeon becomes superman by visualization of intelligently fused multi-modalities
- et al.
Multi-cue pedestrian classification with partial occlusion handling
(2010) - et al.
Depth and appearance for mobile scene analysis
(2007) - et al.
Multi-cue pedestrian detection and tracking from a moving vehicle
(2007) - et al.
Multi-cue onboard pedestrian detection
(2009) - et al.
Depth-encoded hough voting for joint object detection and shape recovery
(2010)
Cited by (34)
Opportunities and challenges of using augmented reality and heads-up display in orthopaedic surgery: A narrative review
2021, Journal of Clinical Orthopaedics and TraumaCitation Excerpt :Post-training questionnaires showed that 11 of 12 participants would have preferred a combination of expert-guided teaching and AR-guided unsupervised learning.13 The application of AR/HUD technology in orthopaedic surgery is still in its infancy and requires further modifications to justify its safety and efficacy for the clinical environment.44,45 Several barriers have hindered the adoption of AR/HUD in trauma and orthopaedic surgery, such as unfamiliarity with technology and a convoluted overhaul of established clinical pathways.46,47
The status of augmented reality in laparoscopic surgery as of 2016
2017, Medical Image AnalysisCitation Excerpt :Nonetheless, this technique primarily concerns depth over the surface which may be surgically less significant than depth within the tissues. In orthopedic surgery, an interesting approach uses machine learning to combine Kinect and C-arm information to simulate occlusion from the practitioner over the scene (Pauly et al., 2015). At any case, despite being the strongest clue for depth perception, occlusion only informs about the order, not about the distance, neither relative nor absolute.
Detection of stationary foreground objects: A survey
2016, Computer Vision and Image UnderstandingCitation Excerpt :Applications for video analysis and understanding (e.g. video surveillance (Liu et al., 2015), augmented reality (Pauly et al., 2015), or analysis of people behavior (Morozov, 2015)) typically include strategies for separating the moving objects (MOs) in the scene, called foreground (FG), from the static information, called background (BG).
Inverse visualization concept for RGB-D augmented C-arms
2016, Computers in Biology and MedicineCitation Excerpt :Based on the previous literature [23–26], the ordinary shading techniques used in photorealistic rendering cannot help to solve this problem. However other methods for enhancing shape perception go even further against our expectation, such as texture [12–20] and line drawing [21,22], because we want to retain the texture of the scene to help clinicians recognize all the objects present in the surgical scene. For this, we design a non-photorealistic shading method, which is inspired by the presentation modality of depth images.
Supplementing the markerless AR with machine learning: Methods and approaches
2023, Handbook of Augmented and Virtual Reality