Paper The following article is Open access

A multiple-image-based method to evaluate the performance of deformable image registration in the pelvis

, , , , , , and

Published 29 July 2016 © 2016 Institute of Physics and Engineering in Medicine
, , Citation Ziad Saleh et al 2016 Phys. Med. Biol. 61 6172 DOI 10.1088/0031-9155/61/16/6172

0031-9155/61/16/6172

Abstract

Deformable image registration (DIR) is essential for adaptive radiotherapy (RT) for tumor sites subject to motion, changes in tumor volume, as well as changes in patient normal anatomy due to weight loss. Several methods have been published to evaluate DIR-related uncertainties but they are not widely adopted. The aim of this study was, therefore, to evaluate intra-patient DIR for two highly deformable organs—the bladder and the rectum—in prostate cancer RT using a quantitative metric based on multiple image registration, the distance discordance metric (DDM). Voxel-by-voxel DIR uncertainties of the bladder and rectum were evaluated using DDM on weekly CT scans of 38 subjects previously treated with RT for prostate cancer (six scans/subject). The DDM was obtained from group-wise B-spline registration of each patient's collection of repeat CT scans. For each structure, registration uncertainties were derived from DDM-related metrics. In addition, five other quantitative measures, including inverse consistency error (ICE), transitivity error (TE), Dice similarity (DSC) and volume ratios between corresponding structures from pre- and post- registered images were computed and compared with the DDM. The DDM varied across subjects and structures; DDMmean of the bladder ranged from 2 to 13 mm and from 1 to 11 mm for the rectum. There was a high correlation between DDMmean of the bladder and the rectum (Pearson's correlation coefficient, Rp  =  0.62). The correlation between DDMmean and the volume ratios post-DIR was stronger (Rp  =  0.51; 0.68) than the correlation with the TE (bladder: Rp  =  0.46; rectum: Rp  =  0.47), or the ICE (bladder: Rp  =  0.34; rectum: Rp  =  0.37). There was a negative correlation between DSC and DDMmean of both the bladder (Rp  =  −0.23) and the rectum (Rp  =  −0.63). The DDM uncertainty metric indicated considerable DIR variability across subjects and structures. Our results show a stronger correlation with volume ratios and with the DSC using DDM compared to using ICE and TE. The DDM has the potential to quantitatively identify regions of large DIR uncertainties and consequently identify anatomical/scan outliers. The DDM can, thus, be applied to improve the adaptive RT process for tumor sites subject to motion.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction

Deformable image registration (DIR) is essential to ensure accurate delivery of radiotherapy (RT) for tumor sites subject to considerable motion, changes in tumor volume and normal anatomy due to patient's weight loss (Jaffray et al 2010, Kadoya 2014). Given increased use of in-treatment-room volumetric imaging, DIR has the potential to be used routinely for detection of organ motion and anatomical changes over the course of RT (Lu 2006, Zhang 2007, Samant 2008, Wu et al 2009, Jaffray et al 2010, Kadoya 2014).

DIR is offered by most commercial RT treatment planning systems and is being used clinically for multi-modality image fusion and atlas-based segmentation (Sims 2009, Teguh 2011, Thor et al 2011, Hardcastle 2012, Daisne and Blumhofer 2013, Asman et al 2014). However, DIR-induced uncertainties are challenging to interpret and the interpretation is rather subjective to the viewer. Moreover, the lack of a ground truth together with registration errors, owing to organ motion and anatomical differences, e.g. variable bladder filling and variable amount of bowel gas (Brock 2010, Thor et al 2011, Varadhan et al 2013, Zambrano 2013, Kadoya 2014), limit the usefulness of DIR to ultimately adapt treatments (Brock 2010, Jaffray et al 2010, Zambrano 2013, Kadoya 2014, Rigaud et al 2015).

The most commonly used metrics to evaluate the performance of DIR involve the Dice similarity coefficient (DSC), the Hausdorff distance, and the mean surface distance, or by identification of landmarks (Castillo 2009, Kirby et al 2013, Latifi et al 2013, Varadhan et al 2013). However, the former three metrics typically rely on the availability of manually delineated structures (Brock 2010, Varadhan et al 2013), and the landmark technique is limited in regions of soft tissue where robust identification of landmarks is challenging (Dice 1945, Castillo 2009, Li 2013). Other DIR performance metrics include the inverse consistency error (ICE) and transitivity error (TE) (Christensen and Johnson 2001, 2003, Bender and Tomé 2009, Bender et al 2012), mean squared error (MSE), and the Jacobian (Latifi et al 2013, Varadhan et al 2013). The ICE and TE rely on registration between image pairs without consideration to other images in the data set. Meanwhile, MSE relies on the underlying image intensities and does not contain any spatial information. Jacobian, on the other hand, can only provide information about tissue expansion and shrinkage without conveying any information about DIR uncertainties.

In our previous study we introduced the multiple-image-based distance discordance metric (DDM), and showed that the DDM was more strongly correlated with the absolute registration error than ICE and TE when DIR was performed on a digital phantom (Saleh 2014). The current study takes the DDM metric-based evaluation beyond phantom studies and we explore the performance of DDM in the context of intra-patient DIR of pelvic organs (the bladder and the rectum) in a series of subjects treated with RT for prostate cancer where repeat imaging CT data was acquired.

Materials and method

Imaging data

The imaging data consisted of CT scans from 38 subjects previously treated for prostate cancer at Haukeland University Hospital, Bergen, Norway (Thor 2013, Thor et al 2013). The data were collected within a clinical trial that was approved by the relevant ethics committee (REK Vest). Each subject had a planning CT scan (pCT), and also received weekly repeated CT (wCT) scans over the course of RT. All scans were acquired in supine position as close to the treatment session as possible and no filling/emptying protocol was applied to the bladder, nor to the rectum. The bladder and the rectum were manually contoured on all scans under supervision of the same radiation oncologist to limit inter-observer variability. For this study, the first six acquired weekly scans for each patient (wCTs) were used, resulting in a total of 38  ×  6 scans; each with a scan resolution of 1  ×  1  ×  3 mm3. The pCTs were not used in this study due to the systematic use of bladder contrast.

DIR and related uncertainties

Group-wise DIR was performed using a B-spline algorithm with MSE cost function as implemented in the Plastimatch software (Sharp 2009, Shackleford 2012). Rigid registration was initially performed to align the images followed by deformable registration. The DIR imposes a regularization parameter with the purpose of generating accurate deformations. DIR was performed between all pairs of wCTs for each patient, and voxel-by-voxel uncertainties of the generated displacement vector fields (DVFs) were assessed by the DDM (Bender et al 2012). The DDM describes the mean of the distances among set of voxels as they get registered across different image sets. Suppose that a set of voxels from different image sets (wCT2, wCT3 ...) are co-registered to the same location on an image (wCT1), these voxels will be distributed at nearby locations when the image sets are registered to an arbitrary image (wCTn). If the registration is reasonable, then the distances between these voxels will be small. On the contrary, if the registration is bad, then the distances between the voxels will be larger. Therefore, small DDM corresponds to regions of good registration meanwhile large DDM values correspond to regions of bad registration. As the number of images increase, the performance of DDM will improve since it can capture more variations among images (Saleh 2014).

The resulting DDM map was overlaid on the first CT (wCT1). Similarly to the DDM, the ICE and TE voxel-wise maps (Christensen and Johnson 2001, 2003) were calculated between all image pairs of each patient and the mean value at each voxel was overlaid on wCT1 for comparison with the DDM.

For each structure (bladder/rectum) we defined the following two volume ratios:

Where VwCTi represents the volume of manually delineated contour on wCTi, and VdCTi the deformed contour from the wCTi to wCT1. The volume ratio will result in a small value if the volume of the deformed structure is comparable to the volume of the manual contour which indicates a good registration.

The Pearson's correlation coefficient (Rp) was applied between (Vpre/Vref), (Vpost/Vref), or DSC and the DDM, ICE, and TE. A weak, modest, and high correlation was inidcated by Rp  ⩽  0.35, Rp  =  0.36–0.67 and Rp  ⩾  0.68–1.00, respectively (Deasy et al 2003). All metrics were compared using the Wilcoxon rank-sum test, and significance level was defined at a two-sided 5% level. All DIRs were conducted in Plastimatch under Linux, and data extraction and post-processing of the DVFs were performed in MATLAB (R2011a) and in CERR (Taylor 1990).

Results

Within the entire DDM map, regions with the highest DDM values were observed near the skin and in the bladder and the rectum (figure 1). The population median (range) DDM was 6.6 (1.5–14) mm and 5.0 (1.1–15) mm for the bladder and rectum, respectively. There was a moderate correlation between DDMmean in the rectum and the bladder (Rp  =  0.62).

Figure 1.

Figure 1. Color wash representation of the uncertainty maps: ICE (top), TE (middle), and DDM (bottom) overlaid on an axial and sagittal view of a reference CT for an example patient. Deformed contours of the rectum and bladder from weekly CTs are also shown. The largest uncertainties are located near skin, in the rectum due to bowel gas and near top of the bladder due to bladder filling while bony anatomy has the lowest uncertainties.

Standard image High-resolution image

The population median (range) values for the bladder and the rectum using ICE were 7.4 (1.5–15) mm, and 5.4 (0.2–11) mm, respectively, whereas the corresponding values using TE were 3.5 (0.8–13) mm, and 6.4 (1.3–18) mm. There was, however, a wide distribution of the DDM, ICE, and the TE values across all subjects (figure 2).

Figure 2.

Figure 2. Stacked bar plots showing the distribution of the DDM (blue), TE (red), and ICE (green) for the rectum (top) and bladder (bottom) for the investigated 38 subjects. There are large variations among different subjects. ICE values are relatively smaller than DDM values while TE values are relatively larger.

Standard image High-resolution image

A strong correlation was observed between these three metrics, and with the highest correlation being observed between TE and ICE (Rp  =  0.95), followed by DDM and TE (Rp  =  0.93), and DDM and ICE (Rp  =  0.84; figure 3).

Figure 3.

Figure 3. Scatter plot showing the correlation between ICE, TE, and DDM for the rectum (top) and bladder (bottom). Best linear fit is shown in solid lines which are color coded. Highest correlation exists between TE versus ICE followed by TE versus DDM and DDM versus ICE respectively as indicated by R2.

Standard image High-resolution image

Subjects with a DDMmean in the rectum above the population median (>5.0 mm) had significantly larger post-DIR volume ratios than subjects with a DDMmean below the median ( p  =  0.001; table 1). A similar pattern was observed for both ICE (median  >  3.5 mm) and TE (median  >  6.4 mm). For the bladder, however, the differences in the volume ratios were statistically significant for DDM (median  >  6.6 mm, p  =  0.04) and marginally significant for ICE (median  >  5.4 mm, p  =  0.1) and TE (median  >  7.4 mm, p  =  0.1).

Table 1. Summary of p-values for the Wilcoxon rank-sum test statistics.

  Rectum Bladder
Vpre/Vref Vpost/Vref Vpre/Vref Vpost/Vref
DDM 0.90 0.001a 0.68 0.04a
TE 1.00 <  0.001a 0.86 0.10
ICE 0.63 0.002a 0.74 0.10

aStatistical significance level p-value  <  0.05.

The correlation between DDMmean and (Vpost/Vref) was modest to high (rectum: Rp  =  0.68; bladder: Rp  =  0.53), and slightly stronger compared to ICE and TE (table 2). The population median (range) of the DSC was 0.81 (0.51–0.92) for the bladder and 0.72 (0.62–0.84) for the rectum (figure 4). The DSC correlation with DDM, ICE and TE was correspondingly higher in the rectum (Rp  =  −0.63;  −0.56;  −0.53) compared to the bladder (Rp  =  −0.23;  −0.22; 0.18; table 2). The weakest overall correlations were observed with Vpre/Vref (Rp  <  0.10 for all metrics).

Table 2. Pearson correlations (Rp) between DDM, TE, ICE and the DIR volume metrics.

  Rectum Bladder
Vpre/Vref Vpost/Vref DSC Vpre/Vref Vpost/Vref DSC
DDM 0.03 0.68 −0.63 −0.08 0.53 −0.23
TE −0.05 0.49 −0.56 −0.08 0.48 −0.22
ICE −0.06 0.37 −0.53 −0.03 0.36 −0.18
Figure 4.

Figure 4. Average DSCs for the bladder (blue) and rectum (red) for the 38 investigated subjects. DSCs values of the bladder are relatively higher than that of the rectum. The error bars corresponds to 1-standard deviation.

Standard image High-resolution image

Discussion

Our multiple-image based DIR-uncertainty metric, the DDM, as applied to intra-patient DIR indicated considerable variability across the two investigated organs and across the 38 investigated subjects. Within the generated DDM map, the most pronounced variations were observed in regions of the bladder and the rectum, which are both subject to motion due to bladder filling or absence/presence of air/feces in the rectum. The DDM values were slightly higher in the bladder than in the rectum, and the highest values were observed in the superior part of the bladder and the regions invaded by bowel gas in the rectum. Meanwhile, regions of high contrast such as the bony anatomy showed the lowest variations. These results are consistent with the fact that regions of high contrast including the bony anatomy are less challenging to register, thereby, resulting in lower values of DDM uncertainty whereas, parts of anatomy prone to large errors in registration (Castillo 2009, Kirby et al 2013) are associated with higher values of DDM. This indicates that the DDM metric is viable for measuring relative DIR uncertainties.

Both the ICE and TE exhibited similarly large values in areas of poor registration. The extent of our registration uncertainties, as assessed by the DDM, ICE, or the TE, is in a similar range as the mean registration errors reported in previous studies (Brock 2010, Kirby et al 2013, Nie et al 2013, Varadhan et al 2013), although it should be pointed out that variations may be present given the choice of DIR algorithm and anatomy. We found a strong correlation between the DDM values of the bladder and the rectum, which might be an indication of the interplay between motion caused by the bladder filling and bowel gas as illustrated by e.g. Nijkamp et al (2008).

In contrast to the DDM, ICE, and TE values, the DSC was slightly higher for the bladder (DSCmean  =  0.81) than for the rectum (DSCmean  =  0.72). These DSC values are comparable with those from other studies of the same organs (Thor et al 2011, Varadhan et al 2013, Zambrano 2013). The correlation with DSC using ICE, TE, or DDM was, however, modest for the rectum and weak for the bladder. It should be kept in mind that the DSC includes only volume information such that a higher DSC value does not necessarily indicate a more accurate registration (Kirby et al 2013). On the other hand, DDM, ICE, and TE correlated strongly with the ratio of the volumes of the deformed and the manually delineated structures (Vpost/Vref) with the strongest correlation for both structures observed with the DDM (rectum: Rp  =  0.68; bladder: Rp  =  0.53). The lack of correlation between the DDM, ICE, or TE and the pre-registered volume ratios (Vpre/Vref) or the DSC may indicate that this volumetric metric do not fully capture the full extent of the registration uncertainty.

Based on the results from this study, DDM resulted in higher correlations with the investigated volume ratios and the DSC compared to TE and ICE. Therefore, in the absence of a ground truth where absolute registration errors cannot be obtained, DDM can be used to quantify the underlying uncertainties for intra-patient DIR when multiple images (>3) are available. Given the absence of repeat intra-patient imaging, another application of the DDM could be to assess population based uncertainties from inter-patient DIR (Saleh 2014). As such, the generated 'inter-patient DDM ATLAS' could be deformed onto a new subject where patient specific uncertainties can't be obtained due to lack of longitudinal images.

Conclusion

Applied to intra-patient DIR for the bladder and the rectum, our automated DIR performance metric, the DDM, was more strongly correlated with post-DIR volume ratios than the commonly used DSC, the ICE or the TE. The DDM could, thus, be used to quantitatively evaluate DIR-related uncertainties and further identify regions of poor DIR both being essential for adaptive RT purposes.

Conflict of interest

None

Acknowledgment

The CT data were collected at Haukeland University Hospital, Bergen, Norway and provided by the responsible oncologist Svein Inge Helle and physicist Liv Bolstad Hysing. Lise Bentzen, Aarhus University Hospital, is acknowledged for approving the manually contoured bladder and rectal structures. This research was partially supported by the MSK Cancer Center Support Grant/Core Grant (P30 CA008748). This project was also supported in part through the NIH Grant R01 CA85181.

Please wait… references are loading.
10.1088/0031-9155/61/16/6172