An Unsupervised Learning Approach to Discontinuity-Preserving Image Registration

Ng, Eric; Ebrahimi, Mehran

doi:10.1007/978-3-030-50120-4_15

An Unsupervised Learning Approach to Discontinuity-Preserving Image Registration

Eric Ng¹² &
Mehran Ebrahimi¹²

Conference paper
First Online: 09 June 2020

1971 Accesses
4 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12120))

Abstract

Most traditional image registration algorithms aimed at aligning a pair of images impose well-established regularizers to guarantee smoothness of unknown deformation fields. Since these methods assume global smoothness within the image domain, they pose issues for scenarios where local discontinuities are expected, such as the sliding motion between the lungs and the chest wall during the respiratory cycle. Furthermore, an objective function must be optimized for each given pair of images, thus registering multiple sets of images become very time-consuming and scale poorly to higher resolution image volumes.

Using recent advances in deep learning, we propose an unsupervised learning-based image registration model. The model is trained over a loss function with a custom regularizer that preserves local discontinuities, while simultaneously respecting the smoothness assumption in homogeneous regions of image volumes. Qualitative and quantitative validations on 3D pairs of lung CT datasets will be presented.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Image registration is an invaluable tool for medical image analysis and has received vast attention in imaging research for the past several decades. Image registration is used as a tool to find meaningful temporal transformations to align images taken at different time frames. Traditionally, registration algorithms assume smooth transformations. This assumption quickly falls apart for many cases, since different organs move, to a certain degree, independently from one another. Image misalignment becomes inevitable if smoothness is assumed at regions where discontinuities are expected, such as organ boundaries [4]. In this paper, we introduce an unsupervised learning model that learns the relationship between image pairs and a corresponding displacement field. We propose a regularizer that accounts for local image discontinuities while simultaneously respecting local homogeneity. This approach drastically decreases registration time, as the registration task is no longer an optimization task, but becomes a simple function evaluation.

2 Related Work

In traditional image registration, the most common approach is to solve an optimization problem, where the objective function is comprised of two terms, an image dissimilarity term and a regularization term to restrict the solution space. Common methods include elastic and diffusion models [16], free-form deformations using b-splines [3], and more recently, kernel methods [11,12,13]. Because all of these methods optimize an energy function for every image pair, large-scale or successive registration tasks becomes very time consuming. Specialized algorithms such as Thirion’s Demons [5, 17, 22] allow significant reduction in computational time by estimating force vectors that acts to drive the deformation followed by Gaussian smoothing during the optimization process. Unfortunately, this algorithm restricts models to be diffusion-based models only.

With the rise of deep learning over the past decade, learning-based approaches have become extremely popular. Several models are trained in a supervised manner which required ground truth transformations to be available [6, 15, 18]. Although these methods showed promising results, the task of obtaining ground truth transformation fields is cumbersome and highly prone to error. Thus, recent methods have shifted to an unsupervised approach instead, where models are trained based on how transformation fields act on images, rather than strictly on the transformations [1, 9, 25]. For a survey of learning-based image registration methods, refer to the article by Haskins et al. [10].

3 Method

Our model follows a framework popularized by Voxelmorph [2]. Let $I_F$ and $I_M$ denote fixed and moving images. We find a function $g_{\theta }(I_F, I_M)$ that produces the displacement field $\mathbf {u}$, i.e. $\mathbf {u} = g_{\theta }(I_F, I_M)$. The deformation $\phi $ can then expressed as the mapping $\phi = Id + \mathbf {u}$ where Id is the identity mapping. The deformation field is applied to $I_M$ to produce the warped image $I_M \circ \phi $ where $I_F(x)$ is similar to $[I_M \circ \phi ](x)$ for all voxel locations $x \in \varOmega $. Since $\phi $ may map the original coordinate system to non-integer valued voxel locations, interpolation is required to warp $I_M$ under $\phi $. For our experiments, we use trilinear interpolation due to its simplicity. An overview of the model is shown in Fig. 1.

3.1 Network Architecture

The function $g_{\theta }$ is modeled using a convolutional neural network where $\theta $ denotes the network parameters. The neural network follows a modified version of U-Net [19], which contains an encoder and a decoder structure that mirror each other and are connected by skip connections at each layer (Fig. 2). The encoder/decoder architecture is motivated by image pyramid techniques in many computer vision algorithms, where each encoding and decoding layer operate from coarse to fine representations of the input.

The encoder consists of three convolution layers by applying $3 \times 3 \times 3$ convolutions with stride 2 for downsampling, followed by LeakyReLU with slope of 0.2 at each layer. Each convolution layer has 32 output channels except the first layer which contains 16 output channels.

The decoder follows a similar structure as the encoder but in reverse order. In the first decoding layer, we simply use the output of the final encoding layer as the input. In subsequent decoding layers, we first upsample the output of the previous decoding layer. Skip connections are constructed by concatenating layer outputs with that of the mirroring encoding layer. This effectively uses representations of the encoding layers to enforce more precise outputs in the decoding layers. Similar to the encoder, each decoding layer applies $3 \times 3 \times 3$ convolutions followed by LeakyReLU of slope 0.2, but with stride 1 to preserve resolution at each layer. The output of the final decoding layer is passed into an additional convolution layer with 3 output channels, where each output channel contains the coordinate components of the displacement field $\mathbf {u}$.

3.2 Loss Function

We train our model using a loss function in the form

$$\begin{aligned} \mathcal {L}(I_F, I_M, \mathbf {u}) = \lambda _{sim} \mathcal {L}_{sim}(I_F, I_M, \mathbf {u}) + \lambda _{disc} \mathcal {L}_{disc}(\mathbf {u}) + \lambda _{mag} \mathcal {L}_{mag}(\mathbf {u}), \end{aligned}$$

(1)

where $\mathcal {L}_{sim}$ measures image dissimilarity, $\mathcal {L}_{disc}$ is a discontinuity preserving regularizer, and $\mathcal {L}_{mag}$ is a second loss term that manages the (ir)regularities in the magnitude of the displacement fields. $\lambda _{sim}, \lambda _{disc}$, and $\lambda _{mag}$ are corresponding regularization constants.

Similarity Loss. To measure image similarity/dissimilarity, we use a local normalized cross correlation which is defined as

$$\begin{aligned} \mathrm {LNCC}(I_M, I_F) = \sum _{x \in \varOmega } \frac{\left[ \sum _{y \in \mathcal {N}(x)} \left( I_M(y) - \mu _M(x) \right) \left( I_F(y) - \mu _F(x) \right) \right] ^2}{\left[ \sum _{y \in \mathcal {N}(x)} \left( I_M(y) - \mu _M(x) \right) ^2 \right] \left[ \sum _{y \in \mathcal {N}(x)} \left( I_F(y) - \mu _F(x) \right) ^2 \right] } \end{aligned}$$

(2)

where x is any voxel in the image domain $\varOmega $, and $y \in \mathcal {N}(x)$ are the neighborhood points around voxel x, and $\mu _M(x)$ and $\mu _F(x)$ are the average local intensities around x in the moving and fixed images, respectively. LNCC is maximized when $I_F = I_M$ which measures similarity, thus we define the dissimilarity measure as $\mathcal {L}_{\mathrm {sim}} = 1 - \mathrm {LNCC}$.

Discontinuous Loss. In designing the discontinuous loss, we first assume that there are no topological changes, i.e. no new tissue is introduced nor destroyed. We then consider the requirements based on these physical scenarios: 1. Homogeneous movement, 2. Movement along rigid structures, and 3. Sliding organs.

These scenarios help us define the requirements for our regularizer. Firstly, the regularizer must preserve smooth local deformations that occur locally within organ interiors. Secondly, the regularizer must not penalize large local changes in deformation magnitude as long as the movement is in a similar direction. This is to mimic the movement of soft tissues or organs against rigid structures such as the rib cage or the spinal column. Finally, the regularizer must be able to account for movements in the opposite directions along organ boundary. This final requirement is perhaps the most significant as there are many scenarios where sliding organs exist. Common examples include the sliding of the lungs against the chest wall during the respiratory cycle, and the movement of organs against one another in the abdominal region. Figure 3 visually summarizes possible desired behaviors of a discontinuous displacement field.

Let $\mathbf {u}$ be represented by a collection of displacement vectors $\{u_i\}_{i = 1, \dots , N}$, where N is the number of voxels in the image. Now consider two arbitrary vectors $u_i$ and $u_j$, respectively corresponding to locations $x_i$ and $x_j$ in the image domain. The area of the parallelogram spanned by $u_i$ and $u_j$ is maximized when $u_i$ and $u_j$ is orthogonal to one another, and minimized when they are parallel. Thus the three conditions are encouraged for any regularizer in the form

$$\begin{aligned} \mathcal {L}_{disc} = \sum _{i,j = 1}^N g(\mathcal {P}(u_i, u_j)) \end{aligned}$$

(3)

where $\mathcal {P}$ the unsigned area of the parallelogram spanned by $u_i$ and $u_j$, and $g: \mathbb {R} \rightarrow \mathbb {R}$ is a strictly increasing function satisfying $g(0) = 0$. $\mathcal {P}$ is computed as

$$\begin{aligned} \mathcal {P} (u_i, u_j) = \Vert u_i \times u_j \Vert _2 \end{aligned}$$

(4)

where $\times $ denotes the cross product. We propose the regularizer

$$\begin{aligned} \mathcal {L}_{disc} = \sum _{i,j = 1}^N \frac{1}{2} \log \left( 1 + \mathcal {P}(u_i, u_j)^2 \right) k(x_i, x_j) \end{aligned}$$

(5)

where $k(x_i, x_j)$ is a decreasing weight function that depends on the proximity between the locations $x_i$ and $x_j$. For our experiments, we choose the $C^4$ Wendland kernel [26] for $k(x_i, x_j)$.

Magnitude Loss. During preliminary stages of our experiments, we noticed that deformations in large dark image regions (background of CT image, for instance) behave erratically. We found that imposing an additional magnitude-based regularizer is needed to suppress this unpredictable behavior. Thus we add the following term to our loss function

$$\begin{aligned} \mathcal {L}_{mag}(u) = \max _i (\Vert u_i \Vert _2). \end{aligned}$$

(6)

This effectively discourages large magnitudes of u. Evidently, this additional term may become problematic for coarse registration where large-scale movement may be expected. However, since this is aimed towards addressing local discontinuities, it is safe to assume that deformations remain relatively small.

4 Experiments

4.1 Setup

Our model is implemented using PyTorch 1.3.0 and trained using an NVIDIA GeForce GTX 1080Ti with 11 GB of graphics memory. CPU tests are performed on an Intel Xeon E5-1620 at 3.7 GHz. We trained our model using Adam optimizer [14] with $\lambda _{sim} = 100, \lambda _{disc} = \lambda _{mag} = 1$, and learning rate $10^{-4}$.

The model is evaluated over 4DCT datasets provided by DIR-Lab [7, 8] and the POPI-model [23]. The DIR-Lab Reference 4DCT datasets contain ten sets of image volumes of sizes $256 \times 256$ and $512 \times 512$ with various number of axial slices (average of 100 and 128 for the two respective resolutions). To account for these variations, we only keep the middle 96 axial slices of the $256 \times 256$ volume, and the middle 112 axial slices of the $512 \times 512$ volumes. Each set of image volumes are taken over 10 time steps over the period of a single respiratory cycle. Since the input is a pair of image volumes, $I_F$ is chosen as the image volume with a randomly chosen case number and time step, and $I_M$ is selected based on the same case number with a different time step. By choosing eight cases as training data, this allows $8 \times 10 \times 9 = 720$ training samples and $2 \times 10 \times 9 = 180$ test samples, despite only having ten available cases. The POPI-Model contains six image volumes of sizes $512 \times 512$ with 140 to 190 axial slices. For consistency, we only keep the middle 136 axial slices and use five of the six cases as training data. We follow the same approach as DIR-Lab in choosing $I_F$ and $I_M$.

4.2 Results

We first compare our discontinuity-preserving model with one that assumes global smoothness. As a baseline, we trained a second model using the DIR-lab dataset with an identical configuration, with the exception where the discontinuous loss $\mathcal {L}_{\mathrm {disc}}$ is replaced with a total variation loss $\mathcal {L}_{\mathrm {TV}}$ defined as

$$\begin{aligned} \mathcal {L}_{\mathrm {TV}} = \sum _i \Vert \nabla u_i \Vert _2 \end{aligned}$$

(7)

where the summation is over all voxels indexed by i. Figure 4 shows a comparison between our model trained using $\mathcal {L}_{\mathrm {TV}}$ and $\mathcal {L}_{\mathrm {disc}}$. One can quickly identify sudden changes in the displacement field near the lung’s boundaries especially near the lung/vertebrae interface. Additional registration results are shown in Fig. 5. We compare our results (Table 1) quantitatively to the following methods: Free-Form Deformations (FFD) [20], isotropic parametric Total Variation (pTV) [24], and Sparse Kernel Machines (SKM) [12]. For comparison, we fixed frame 1 as the fixed image, and register all remaining frames to the reference. Finally, we compare the time required to register a pair of images using our approach versus a classical registration algorithm using minimization (Table 2). Classical registration is applied using the AIRLab framework [21] via diffusion regularizer.

Table 1. Target Registration Error (TRE) in millimeters (mm) against FFD [20], pTV [24], and SKM [12] on the DIR-Lab and POPI 4DCT Model. Baseline model is the same configuration but trained with $\mathcal {L}_{TV}$ in place of $\mathcal {L}_{disc}$.

Full size table

Table 2. Comparison of registration time between learning-based model and inverse model. For the learning-based model, we used our proposed model for evaluation. For the inverse model, we perform pairwise registration with diffusion regularizer over 1,000 iterations. The inverse model is evaluated using the AIRLab framework [21]. The CPU time for the classical model over DIR-Lab 512 and POPI Model is not computed, as they were much higher than the corresponding GPU time. Time is measured in seconds.

Full size table

5 Conclusion and Future Work

We presented an unsupervised learning-based model for discontinuity preserving image registration. Although the training set was relatively small, our model performed on par with existing methods while begin able to handle locations where discontinuities may occur. Furthermore, our model significantly reduced computation by several orders of magnitude, allowing successive registration to be performed within a relatively short time frame. A drawback of the model is its sensitivity to noise. In particular, since $\mathcal {L}_{disc}$ is computed by comparing local displacement vectors with neighboring displacement vectors individually, there are no mechanisms to discourage local chaotic behaviors in the displacement field. A possible remedy is to extend the current model to incorporate additional information, such as segmentation masks and edge information. This allows image discontinuities to be defined rather than relying on only image intensities to predict boundary regions.

References

Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: An unsupervised learning model for deformable medical image registration. In: Proceedings of the IEEE Conference on CVPR, pp. 9252–9260 (2018)
Google Scholar
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 38, 1788–1800 (2019)
Article Google Scholar
Balci, S.K., Golland, P., Shenton, M.E., Wells, W.M.: Free-form B-spline deformation model for groupwise registration. In: MICCAI (2007)
Google Scholar
Berendsen, F.F., Kotte, A.N., Viergever, M.A., Pluim, J.P.: Registration of organs with sliding interfaces and changing topologies. In: Medical Imaging 2014, vol. 9034, p. 90340E. International Society for Optics and Photonics (2014)
Google Scholar
Cahill, N.D., Noble, J.A., Hawkes, D.J.: A demons algorithm for image registration with locally adaptive regularization. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009. LNCS, vol. 5761, pp. 574–581. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04268-3_71
Chapter Google Scholar
Cao, X., et al.: Deformable image registration based on similarity-steered CNN regression. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 300–308. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_35
Chapter Google Scholar
Castillo, E., Castillo, R., Martinez, J., Shenoy, M., Guerrero, T.: Four-dimensional deformable image registration using trajectory modeling. Phys. Med. & Biol. 55(1), 305 (2009)
Article Google Scholar
Castillo, R., et al.: A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets. Phys. Med. Biol. 54(7), 1849 (2009)
Article Google Scholar
Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast probabilistic diffeomorphic registration. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 729–738. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_82
Chapter Google Scholar
Haskins, G., Kruger, U., Yan, P.: Deep learning in medical image registration: a survey. arXiv preprint arXiv:1903.02026 (2019)
Jud, C., Möri, N., Bitterli, B., Cattin, P.C.: Bilateral regularization in reproducing kernel hilbert spaces for discontinuity preserving image registration. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 10–17. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47157-0_2
Chapter Google Scholar
Jud, C., Mori, N., Cattin, P.C.: Sparse kernel machines for discontinuous registration and nonstationary regularization. In: Proceedings of the IEEE Conference on CVPR Workshops, pp. 9–16 (2016)
Google Scholar
Jud, C., Sandkühler, R., Möri, N., Cattin, P.C.: Directional averages for motion segmentation in discontinuity preserving image registration. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 249–256. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_29
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Krebs, J., et al.: Robust non-rigid registration through agent-based action learning. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 344–352. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_40
Chapter Google Scholar
Modersitzki, J.: FAIR: flexible algorithms for image registration, vol. 6. SIAM (2009)
Google Scholar
Pennec, X., Cachier, P., Ayache, N.: Understanding the “Demon’s Algorithm”: 3D non-rigid registration by gradient descent. In: Taylor, C., Colchester, A. (eds.) MICCAI 1999. LNCS, vol. 1679, pp. 597–605. Springer, Heidelberg (1999). https://doi.org/10.1007/10704282_64
Chapter Google Scholar
Rohé, M.-M., Datar, M., Heimann, T., Sermesant, M., Pennec, X.: SVF-Net: learning deformable image registration using shape matching. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 266–274. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_31
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans. Med. Imaging 18(8), 712–721 (1999)
Article Google Scholar
Sandkühler, R., Jud, C., Andermatt, S., Cattin, P.C.: AirLab: autograd image registration laboratory. arXiv preprint arXiv:1806.09907 (2018)
Thirion, J.P.: Image matching as a diffusion process: an analogy with Maxwell’s demons. Med. Image Anal. 2(3), 243–260 (1998)
Article Google Scholar
Vandemeulebroucke, J., Rit, S., Kybic, J., Clarysse, P., Sarrut, D.: Spatiotemporal motion estimation for respiratory-correlated imaging of the lungs. Med. Phys. 38(1), 166–178 (2011)
Article Google Scholar
Vishnevskiy, V., Gass, T., Szekely, G., Tanner, C., Goksel, O.: Isotropic total variation regularization of displacements in parametric image registration. IEEE Trans. Med. Imaging 36(2), 385–395 (2016)
Article Google Scholar
de Vos, B.D., Berendsen, F.F., Viergever, M.A., Staring, M., Išgum, I.: End-to-end unsupervised deformable image registration with a convolutional neural network. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 204–212. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_24
Chapter Google Scholar
Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4(1), 389–396 (1995)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This research was supported by an NSERC Discovery Grant for M. Ebrahimi. We acknowledge the support of NVIDIA Corporation for the donation of GPUs used in this research.

Author information

Authors and Affiliations

Imaging Lab, Faculty of Science, Ontario Tech University, 2000 Simcoe Street North, Oshawa, ON, L1H 7K4, Canada
Eric Ng & Mehran Ebrahimi

Authors

Eric Ng
View author publications
You can also search for this author in PubMed Google Scholar
Mehran Ebrahimi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehran Ebrahimi .

Editor information

Editors and Affiliations

Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
Žiga Špiclin
Centre for Medical Image Computing, University College London, London, UK
Jamie McClelland
Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Jan Kybic
Computer Vision Lab, ETH Zurich, Zurich, Switzerland
Orcun Goksel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ng, E., Ebrahimi, M. (2020). An Unsupervised Learning Approach to Discontinuity-Preserving Image Registration. In: Špiclin, Ž., McClelland, J., Kybic, J., Goksel, O. (eds) Biomedical Image Registration. WBIR 2020. Lecture Notes in Computer Science(), vol 12120. Springer, Cham. https://doi.org/10.1007/978-3-030-50120-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-50120-4_15
Published: 09 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50119-8
Online ISBN: 978-3-030-50120-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics