Brought to you by:
Paper

Self-paced DenseNet with boundary constraint for automated multi-organ segmentation on abdominal CT images

, , , and

Published 3 July 2020 © 2020 Institute of Physics and Engineering in Medicine
, , Citation Nuo Tong et al 2020 Phys. Med. Biol. 65 135011 DOI 10.1088/1361-6560/ab9b57

0031-9155/65/13/135011

Abstract

Automated multi-organ segmentation on abdominal CT images may replace or complement manual segmentation for clinical applications including image-guided radiation therapy. However, the accuracy of auto-segmentation is challenged by low image contrast, large spatial and inter-patient anatomical variations. In this study, we propose an end-to-end segmentation network, termed self-paced DenseNet, for improved multi-organ segmentation performance, especially for the difficult-to-segment organs. Specifically, a learning-based attention mechanism and dense connection block are seamlessly integrated into the proposed self-paced DenseNet to improve the learning capability and efficiency of the backbone network. To heavily focus on the organs showing low soft-tissue contrast and motion artifacts, a boundary condition is utilized to constrain the network optimization. Additionally, to ease the large learning pace discrepancies of individual organs, a task-wise self-paced-learning strategy is employed to adaptively control the learning paces of individual organs. The proposed self-paced DenseNet was trained and evaluated on a public abdominal CT data set consisting of 90 subjects with manually labeled ground truths of eight organs (including spleen, left kidney, esophagus, gallbladder, stomach, liver, pancreas, and duodenum). For quantitative evaluation, the Dice similarity coefficient (DSC) and average surface distance (ASD) were calculated. An average DSC of 84.46% and ASD of 1.82 mm were achieved on the eight organs, which outperforms the state-of-the-art segmentation methods 2.96% on DSC under the same experimental configuration. Moreover, the proposed segmentation method shows notable improvements on the duodenum and gallbladder, obtaining an average DSC of 69.26% and 80.94% and ASD of 2.14 mm and 2.24 mm, respectively. The results are markedly superior to the average DSC of 63.12% and 76.35% and average ASD of 3.87 mm and 4.33 mm using the vanilla DenseNet, respectively, for the two organs. We demonstrated the effectiveness of the proposed self-paced DenseNet to automatically segment abdominal organs with low boundary conspicuity. The self-paced DenseNet achieved consistently superior segmentation performance on eight abdominal organs with varying segmentation difficulties. The demonstrated computational efficiency (<2 s/CT) makes it well-suited for online applications.

Export citation and abstract BibTeX RIS

1. Introduction

Accurate contouring of target organs is essential in applications including disease diagnosis, radiotherapy treatment planning, and delivery (Hall et al 2006). However, abdominal organ segmentation on CT images is a demanding task. Manual organ delineation is not only tedious but also suffers from substantial intra- and inter-observer variabilities (Nelms et al 2012). An accurate and robust automated segmentation technique would be highly desirable to replace or augment the manual process. However, the accuracy of automated segmentation is limited by the morphological complexities of abdominal organs, the large inter-subject variations, and the low CT soft-tissue contrast. Moreover, artifacts from respiratory and peristaltic motion further blur the boundaries of many abdominal organs (Fu et al 2018).

Automated abdominal organ segmentation has been attempted and much of the attention has been paid to single organ segmentation, such as the segmentation of the pancreas (Farag et al 2017, Oktay et al 2018, Man et al 2019) and liver (Hu et al 2016, Dou et al 2016, Lu et al 2017), while few reports were on the multi-organ segmentation problem, for which, previous work is mainly based on the statistical shape models or probabilistic atlases. (Wolz et al 2013) proposed to generate subject-specific atlas by a hierarchical atlas registration and a weighting scheme for abdominal organ segmentation. The obtained probabilistic atlases were then used in a graph-cuts model to obtain a final segmentation. (Kada et al 2015) introduced conditional shape and location priors and an organ correlation graph for automated segmentation of upper abdominal CT images. Although atlas-based methods have achieved varying degrees of success, the robustness of these methods is limited by the atlas selection, label fusion, as well as the registration accuracy. They struggle to segment organs with large anatomical variations such as the gallbladder (Larsson et al 2017). Additionally, the shape representation capacity of the statistical model and the expensive computational cost further limit its generalization.

Recently, fully convolutional network (FCN) (Shelhamer et al 2017) based methods achieved state-of-the-art performance on segmentation tasks with efficient computational speed. (Roth et al 2018b) proposed a multi-scale pyramid of stacked 3D FCN for abdominal CT segmentation that utilizes auto-context to perform semantic segmentation at higher resolutions while also considering large contextual information at lower resolution levels. In addition, Dense V-network proposed by (Gibson et al 2018b) employed the dense connection pattern and the multi-scale V-network structure, which achieved superior segmentation performance on eight abdominal organs. Additionally, a two-stage system for abdominal organ segmentation was proposed by (Larsson et al 2018), which consists of a robust localization algorithm for finding the region of interest and a convolutional neural network for voxel-wise classification. Furthermore, the attention mechanism has been introduced into the segmentation network to focus the segmentation network on discriminative information of target structures. (Wang et al 2019b) proposed a two-stage organ-attention network with reverse connections for abdominal CT segmentation. In the organ-attention network, deep features from the first stage are combined with the original image as the input of the second stage, to reduce the interference from the complex background and provide spatial attention in the second stage. More recently, several attention-based segmentation networks have been proposed, including the Densely Squeeze-and-Excitation network (DSENet) proposed by (Wu 2019), and Adaptively Dense Feature Pyramid network (ADFPNet) proposed by (Pan et al 2019). DSENet integrates the Squeeze-and-Excitation block with the dense block. However, it neglects the spatial attention. ADFPNet utilizes dilation convolution layers with different dilation rates to produce dense features across multi scales and receptive fields, and then recalibrate the features by a Squeeze-and-Excitation block.

However, for abdominal multi-organ segmentation, a typical method, (Wang et al 2019a) built separate networks for individual organs and then combined the individual segmentations as the results. This method did not utilize contextual information from other organs, which should improve segmentation accuracy. Subsequently, multiclass segmentation networks have also been proposed for abdominal organ segmentation (Gibson et al 2018b) with output channels corresponding to different organs, but new challenges arise due to the large differences in the organ morphology and image contrast. Relatively isolated, large, and organs with clearly conspicuous boundaries (e.g. liver, spleen, and kidneys) are easier to segment, while small, irregularly-shaped organs or organs subject to substantial internal motion (e.g. pancreas and duodenum) are difficult. The following eight organs are ranked in the ascending order based on their reported segmentation accuracy (Gibson et al 2018b, Larsson et al 2017): duodenum, esophagus, gallbladder, pancreas, stomach, kidneys, spleen, and liver. Evidently, because of the large variation in segmentation difficulties, equally treating each semantic class would limit the discriminability and segmentation performance on small and irregular organs. To balance optimization and reduce the performance gaps among organs, the weighted loss function is commonly utilized, but setting an optimal fixed weighting parameter for each class is not straightforward. A. (Novikov et al 2018) introduced weighting coefficients for each semantic class according to its size. However, in extreme class-imbalanced tasks, this strategy can easily result in over-fitting and cause convergence issues.

To manage the challenges in abdominal multi-organ segmentation on CT volumes, an end-to-end segmentation network (self-paced DenseNet), which integrates dense connection block and dual-attention mechanism, is introduced in this paper. The rest of the paper is organized as follows. We introduce the proposed self-paced DenseNet in section 2. Then, the materials used in this study, implementation details, and experimental results are presented in section 3. In section 4, we discuss the contributions and limitations of the current study and possible future directions. Finally, this paper is concluded in section 5.

2. Method

The framework of the proposed self-paced DenseNet is shown in figure 1. Toward the blurry organ boundaries, an edge operator is employed to impose boundary consistency during training. Additionally, to fill the learning gap among the organs of substantially different segmentation difficulties, a self-paced-learning strategy is introduced to optimize each organ sequentially.

Figure 1.

Figure 1. The framework of the proposed segmentation method (the edge maps are overlapped on the CT images).

Standard image High-resolution image

2.1. Segmentation network

The architecture of the segmentation network is shown in the box in figure 1, which is an encoder-decoder based structure. Specifically, the encoder and decoder consist of four dual-attention Dense (DA-Dense) blocks for hieratically extracting and restoring feature maps, respectively. Finally, the deeply supervised mechanism is utilized to fuse the multi-scale feature maps and speed up the network convergence.

2.2. Dual-attention dense block

The dense block is adopted as the backbone block in this study due to its desirable properties that facilitate the gradients propagate to preceding layers and efficiently reuse features with fewer parameters. Nonetheless, in the dense connection block, simply stacking feature maps would propagate and accumulate a large amount of redundant information, which reduces the importance of the discriminative maps. To highlight the discriminative and task-related features, and deemphasize the non-related features, the attention mechanism is seamlessly integrated with the dense block. As shown in figure 2, both spatial and channel-wise attention mechanisms are employed to model the global context over the local features generated by the layers in the dense block.

Figure 2.

Figure 2. (a) DA-Dense block architecture, (b) detailed structure of the dual-attention layer. BN-batch normalization layer (Ioffe and Szegedy 2015).

Standard image High-resolution image

For channel-wise attention, a 'Squeeze-and-Excitation' block (Hu et al 2018) is employed to model the channel-wise relationships, which consists of two main components: global information embedding and adaptive feature recalibration. For the input feature map $X$, a 3D global average pooling operation is used to fuse the global information of the feature maps and produce the channel-wise descriptor $U$. Then, two fully connected layers are employed following the global average pooling layer to recalibrate the channel-wise relationships:

Equation (1)

where $\sigma $ and $\delta $ refer to the sigmoid and ReLU function, respectively. ${W_1}$ and ${W_2}$ represent the weights of the two full connection layers. Then, the recalibrated feature maps can be described as:

Equation (2)

where$\,CR\left( . \right)$ represents the channel-wise attention layer, as shown in the upper part in figure 2(b).

For spatial attention, a convolutional layer with $1 \times 1 \times 1$ kernel in another branch is utilized to model the importance of the spatial locations. Specifically, the convolution layer squeezes the channels and produces the one-channel output $S$ which indicates the importance of each spatial location in the feature map. Therefore, the spatially recalibrated feature maps can be described as follows:

Equation (3)

where$\,SR\left( . \right)$ represents the spatial-wise attention layer, as shown in the lower part in figure 2(b).

The final enhanced feature maps $FR\left( X \right)$ are obtained by fusing the channel-wise and spatial attention feature maps by max out (.) operation.

Equation (4)

where $c$ and $i,j,k$ represent the channel index and spatial location in the feature map, respectively.

2.3. Boundary constraint

Many existing deep learning-based segmentation methods focus on optimizing the overlap between the manually labeled ground truths and the predictions, while boundary information of target organs has received less attention. However, accurately detecting the boundary and maintaining the edge continuity as an isolated task is difficult. The low image contrast, image noises, and organ motions commonly degrade the boundary detectability and continuity of abdominal organs such as the pancreas and duodenum.

To strengthen the discriminability of the segmentation network on organ boundaries, while maintaining the edge smoothness and continuity, an edge operator is employed to detect the organ boundaries in the ground truths and predictions during network training. Specifically, an additional boundary loss term based on the similarity of the detected edge maps from the ground truths and predictions is incorporated in the loss function for network optimization. Sobel operator is employed as the edge detector due to its simplicity, and that its derivative can easily be computed for back-propagation.

Three Sobel edge filters are used to convolve the ground truth or network prediction $Y$ and generate three edge maps corresponding to the intensity gradients along horizontal $i$, vertical $j$, and depth $k$ directions, respectively. Then, tanh activation is utilized, followed by an absolute operation to map the value to $\left[ {0,1} \right]$. Finally, the final edge map is obtained by fusing the three edge maps as following:

Equation (5)

where ${E_i}$, ${E_j}$, and ${E_k}$ represent the edge maps corresponding to the intensity gradients along horizontal $i$, vertical $j$, and depth $k$ directions, respectively. $E\left( Y \right)$ indicates the final edge map with input $Y$.

In this study, to mitigate the extreme class imbalance (e.g. liver vs. duodenum), the multiclass probability maps produced by the network are fused into a binary probability map (two classes representing background and foreground) before applying the edge detector.

2.4. Task-wise self-paced learning strategy

Different from the single target segmentation problem, despite being able to utilize the contextual information, multi-organ segmentation has its own challenge. The relative weighting of the losses for contrasting organs can have unpredictable effects on convergence and final errors. In the case of abdominal organ segmentation, where large volume imbalance is typical, such a weighting strategy leads to a diminished contribution by the large organs, network over-fitting, and segmentation accuracy degradation. To effectively weigh the contributions of each semantic class and improve the discriminability of the segmentation network on difficult-to-segment organs, a task-wise self-paced learning scheme is proposed. Self-paced learning strategy was originally proposed by (Kumar et al 2010) to dynamically allocate greater weights to the easy samples in the early phase of training thereby reducing the influence of complicated samples such as noisy and unreliable data (Kumar et al 2010). In the original self-paced learning scheme, different weights are allocated for each sample based on its easiness. Therefore, it is a sample-wise learning strategy. However, different from the original single-task learning, for multi-organ segmentation, the difficulty of learning multiple organs can vary substantially due to their different shapes, sizes, and conspicuity. Instead, we propose a task-wise self-paced learning strategy based on the segmentability of an organ to adaptively adjust the weight of each class in the loss function. The change from sample-wise to task-wise learning allows sequential improvement in the segmentation performance of individual organs. Specifically, the task-wise self-paced learning strategy is performed in the following steps:

  • Initialize the scale parameter $\lambda $ and increase rate $h$. In this work, $\lambda $ and $h$ are initialized as 0.1 and 0.2, respectively.
  • With the initialized parameters, the network training starts. At the end of each epoch, the training loss for each class, and the average loss are obtained, update the scale parameter $\lambda $ as:

Equation (6)

Then, for each class i (from 1 to C), update the weight parameter as follows:

Equation (7)

where $G$ and $S\left( X \right)$ represent the ground truths and the network predictions, respectively, ${l_i}$ represents the segmentation loss of the i-th class. C denotes the number of classes to segment, ${\alpha _i}$ denotes the weight for the i-th class. Thus, the objective function is as follows:

Equation (8)

  • Continue the next epoch training of the network.
  • Repeat the Step (2) and (3) until reaching the early stopping criterion.

2.5. Objective function

For multiclass segmentation, the objective function is commonly defined as:

Equation (9)

where ${\text{X}}$ and ${\text{G}}$ are the input CT and the corresponding ground truth, respectively. ${\text{S}}\left( {\text{X}} \right)$ and ${{\text{l}}_{\text{i}}}\left( {{\text{G}},{\text{S}}\left( {\text{X}} \right)} \right)$ represent the prediction from the segmentation network ${\text{S}}$ and the i-th class loss, respectively. ${{\theta }_{\text{S}}}$ denotes all the trainable parameters in the segmentation network ${\text{S}}$.

In this study, with the aim of balancing the learning paces of individual organs and guiding the boundary detection of the organs, the objective function of the proposed self-paced DenseNet is formulated as:

Equation (10)

where ${{\alpha }_{\text{i}}}$ represents the learning difficulty of the individual organs, and thereby the weight for each class during training as described in section D, ${\text{E}}\left( {\text{G}} \right)$, and ${\text{E}}\left( {{\text{S}}\left( {\text{X}} \right)} \right)$ denote the edge maps of the ground truth and the segmentation result, respectively. ${{\lambda }_{{\text{edge}}}}$ represents the weight for the boundary constraint term.

The segmentation loss ${{\text{l}}_{\text{i}}}\left( {{\text{G}},{\text{S}}\left( {\text{X}} \right)} \right)$ is formulated as a Dice's coefficient loss to maximize the overlap between the predictions and the manual ground truths. To constrain and guide the prediction of the organ boundaries, the boundary constraint term ${{\text{L}}_{{\text{edge}}}}$ is formulated as a cross-entropy loss between the detected edge maps ${\text{E}}\left( {\text{G}} \right)$ and ${\text{E}}\left( {{\text{S}}\left( {\text{X}} \right)} \right)$.

3. Experimental results

3.1. Data and preprocessing

In this study, all of the experiments were conducted on a public CT dataset,(Gibson et al 2018a) which comprises of 90 abdominal CT subjects with detailed manual segmentations of eight organs (include duodenum, esophagus, gallbladder, pancreas, stomach, left kidney, spleen, and liver). The public dataset consists of two available data sets: 43 subjects from the Cancer Imaging Archive Pancreas-CT data set (Clark et al 2013, Roth et al 2015, Hu et al 2017) and 47 subjects from the 'Beyond the Cranial Vault' (BTCV) segmentation challenge (Landman et al 2015, Gibson et al 2018a). The Pancreas-CT dataset was collected from the National Institutes of Health Clinical Center from pre-nephrectomy healthy kidney donors or patients with neither major abdominal pathologies nor pancreatic cancer lesions (Hu et al 2017). The BTCV data set was acquired at Vanderbilt University from metastatic liver cancer patients or post-operative ventral hernia patients. The pixel spacing and inter-slice thickness of the 90 CT cases ranging from 0.6–0.9 mm and 0.5–5.0 mm, respectively. Segmentations of the pancreas and multiple organs are included in the original Pancreas-CT and BTCV datasets, respectively. However, to enable complete training and evaluation of the proposed segmentation method on the eight organs, the missing annotations were obtained from Gibson et al (2018a), (Gibson et al 2018a, Clark et al 2013, Landman et al 2015) which were manually delineated by an image research fellow under the supervision of a board-certified radiologist. Moreover, to deal with the inconsistency of segmentations that were present in the original datasets, additional manual editing was also performed to ensure a consistent segmentation protocol across the dataset (Gibson et al 2018a). It is worth noting that due to truncation, the whole esophagus cannot be observed in the abdominal CT scans. However, the truncation does not affect the network training and testing for the portion of the organs that were delineated. Therefore, we kept the esophagus as one class for direct comparison with other state-of-the-art methods on the same dataset.

To reduce the input volume size, facilitate the subsequent model training, and enable fair comparison with the segmentation method proposed by the public dataset provider (Gibson et al 2018a), same parameters were utilized to crop the CT images to the rib-cage and abdominal cavity transversely, to the superior extent of the liver or spleen and the inferior extent of the liver or kidneys, which are also obtained from the dataset provider at https://zenodo.org/record/1169361#.XkTvsi2B1N0. Furthermore, to homogenize the data, all cropped images and the corresponding manual labels were resampled to a uniform volume size of $144 \times 144 \times 144$.

3.2. Implementation details

To evaluate the performance of the proposed method, 9-fold cross-validation was performed on the 90 CT scans. In each fold, 80 subjects were used for training and the remaining ten subjects for testing. For tuning the parameters in the proposed framework, 15% of the training data (12 subjects) was held out as a validation dataset in each fold. The final segmentation metrics were obtained by the average value on the nine folds.

The proposed self-paced DenseNet was implemented using the Tensorflow framework (v1.9.0), with NVIDIA Cuda (v8.0) and cuDNN (v6.0) libraries for acceleration. All experiments were performed under an Ubuntu 16.04 operating system with Intel (R) Xeon (R) CPU E5-2698 v4 @ 2.20 GHZ, NVIDIA Tesla GPU (16 GB memory). To fit the limited GPU memory, the mini-batch size was set as 1 with a uniform input volume size of $144 \times 144 \times 144$. Totally, there are 2.9 M trainable parameters in the proposed self-paced DenseNet. For optimization of the model, Adam (Kingma and Ba 2014) optimizer was employed during training.

In experiments, the learning rate was initially set as $5 \times {10^{ - 3}}$, which was reduced by half every ten epochs when the validation loss stopped decreasing. Moreover, to improve the generalizability of the network and prevent overfitting, early stopping strategy was used during training if there was no improvement in the validation loss after 40 epochs. Additionally, the data augmentation step (e.g. random translation, scaling, and rotation) was performed on the fly to enlarge the training data set during training. With data augmentation, the number of training subjects was increased to 480 cases in each fold.

3.3. Evaluation metrics

Two metrics were adopted to evaluate the performance:

  • Dice similarity coefficient (DSC):

Equation (11)

  • Average surface distance (ASD):

Equation (12)

where ${{\text{V}}_{\text{A}}}$ and ${{\text{V}}_{\text{B}}}$ refer to the voxel sets of ground truth and automatic segmentation, respectively. and ${{\text{S}}_{\text{B}}}$ represent the voxels on the ground truth surface and automatically segmented organ surface, respectively. ${\text{d}}\left( {{\text{z}},{{\text{s}}_{\text{A}}}} \right)$ is the minimum Euclidean distance of voxel ${\text{z}} \in {{\text{S}}_{\text{B}}}$ to the voxels in ${{\text{S}}_{\text{A}}}$, ${\text{d}}\left( {{\text{z}},{{\text{s}}_{\text{B}}}} \right)$ is the minimum Euclidean distance of voxel ${\text{z}} \in {{\text{S}}_{\text{A}}}$ to the voxels in ${{\text{S}}_{\text{B}}}$.

3.4. Comparison algorithms

Since DenseNet is utilized as the backbone in the proposed segmentation network, the comparison between the proposed network and the corresponding vanilla DenseNet was conducted. In addition, to verify the effectiveness of the proposed DA-Dense block in the segmentation framework, we adopted several state-of-the-art attention-based models—SE-Net (Hu et al 2018), which consists of 'Squeeze-and-Excitation' (SE) blocks, and CSCE-Net (Roy et al 2019), which consists of Concurrent Spatial and Channel 'Squeeze-and-Excitation' (CSCE) blocks, for the segmentation of the eight abdominal organs. To afford direct and fair comparisons, we fixed the architecture of the segmentation network and only replaced the blocks. It is worth noting that SE-Net is also equipped with spatial attention, thus is referred to as SCSA-Net, to enable a fair comparison. Specifically, spatial attention is performed by a convolutional layer with $1 \times 1 \times 1$ kernel. The detailed architectures of the network and blocks in the DenseNet, CSCE-Net, and SCSA-Net are shown in figures 3(a), (b), and (c), respectively.

Figure 3.

Figure 3. The detailed network architecture (upper part) and block structures of (a) DenseNet, (b) SCSA-Net, and (c) CSCE-Net.

Standard image High-resolution image

3.5. Qualitative and quantitative evaluation

We segmented eight abdominal organs (duodenum, esophagus, gallbladder, pancreas, stomach, left kidney, spleen, and liver) using the proposed method. The segmentation results of the three subjects are shown in figure 4, and their corresponding DSC values are listed in the boxes that next to the images. The corresponding CT slices and segmentation results of the three subjects in figure 4 are shown in figure 5. The automated segmentation results are overlapped with the manual segmentation results. For a clear illustration, ground truths of the eight organs are shown in yellow, automated segmentation results are shown in a different color (liver-pink, spleen-red, stomach-purple, esophagus-blue, pancreas-green, gallbladder-brown, left kidney-deep blue). As demonstrated in the fifth columns in figures 4 and 5, the proposed method achieved automated segmentation on the eight abdominal organs with high visual accuracy. Continuous boundaries similar to the manual segmentation were obtained on the duodenum, which is one of the most difficult to segment organs due to its complex morphology and motion blurriness. The quantitative evaluation results of the segmentation on volume (DSC) and surface (ASD) measurements also demonstrate consistent improvements, as shown in tables 1 and 2. The corresponding 95% confidence intervals are given in tables 3 and 4, calculated using bootstrapping. Specifically, for the difficult-to-segment organs, the proposed self-paced DenseNet gives the average DSC values of 69.26 and 80.94 on duodenum and gallbladder, which outperforms the average DSC of 63.12 and 76.35 by the vanilla DenseNet by a large margin. The quantitative evaluation on the surface measurement (duodenum: 2.14 vs. 3.87, gallbladder: 2.24 vs. 4.33) further demonstrates the improved effectiveness. Moreover, for the liver, left kidney, and spleen, the proposed method still further improves the segmentation performance.

Table 1. Quantitative evaluation results on DSC measurement (%). Abbreviations: BC-boundary constraint, TSP-task-wise self-paced learning. (The best results are indicated in bold, with asterisks implying statistical significance (P< 0.05)). 95% confidence intervals are given in table 3.

Methods/Organs Duodenum Esophagus Gallbladder Pancreas Stomach Left Kidney Spleen Liver Average
DenseNet SCSA-Net 63.12 ± 13.9* 65.22 ± 12.2* 66.99 ± 13.2* 69.24 ± 11.5* 76.35 ± 20.0* 76.20 ± 8.74* 76.99 ± 9.46* 85.21 ± 10.9* 86.67 ± 9.75* 93.87 ± 5.01* 94.05 ± 3.11* 93.64 ± 9.76* 93.48 ± 7.92* 95.20 ± 1.95* 81.32
78.39 ± 16.5* 95.42 ± 1.25* 82.43
CSCE-Net 63.29 ± 13.6* 68.08 ± 10.9* 76.55 ± 19.0* 75.82 ± 9.79* 85.46 ± 8.53* 93.17 ± 4.59* 93.53 ± 6.92* 95.17 ± 1.47* 81.63
DA-DenseNet (backbone network) 66.75 ± 12.5* 69.73 ± 11.0* 78.94 ± 14.7* 77.19 ± 9.01* 87.26 ± 6.92* 93.94 ± 3.46* 94.03 ± 6.92 95.75 ± 1.61* 82.95
DA-DenseNet with BC 68.95 ± 13.5* 70.30 ± 10.3* 79.76 ± 18.1* 78.89 ± 9.17 87.74 ± 9.22* 94.28 ± 2.55 93.96 ± 7.41* 95.85 ± 1.25* 83.72
DA-DenseNet with BC and TSP (proposed) 69.26 ± 12.05 71.64 ± 10.51 80.94 ± 15.49 79.24 ± 8.62 88.66 ± 8.16 94.70 ± 3.35 95.06 ± 6.95 96.20 ± 1.38 84.46

Table 2. Quantitative evaluation results on ASD measurement (mm). (The best results are indicated in bold, with asterisks implying statistical significance (P < 0.05)). 95% confidence intervals are given in table 4.

Methods/Organs Duodenum Esophagus Gallbladder Pancreas Stomach Left Kidney Spleen Liver Average
DenseNet 3.87 ± 3.33* 2.18 ± 1.82* 4.33 ± 9.18* 2.22 ± 1.66* 3.76 ± 4.87* 1.18 ± 1.15* 1.86 ± 4.39* 1.54 ± 0.79* 2.61
SCSA-Net 3.70 ± 3.10* 3.91 ± 2.87* 2.09 ± 1.75* 3.89 ± 8.99* 2.08 ± 1.46* 4.30 ± 6.19* 1.08 ± 1.15* 2.27 ± 5.81* 1.60 ± 1.32* 2.63
CSCE-Net 2.15 ± 1.62 4.46 ± 10.7* 2.15 ± 1.86* 3.37 ± 2.58* 1.12 ± 1.05* 1.67 ± 3.78* 1.53 ± 0.87* 2.54
DA-DenseNet (backbone network) 3.43 ± 2.47* 2.20 ± 2.28* 3.17 ± 6.36* 2.01 ± 1.45* 3.14 ± 4.10* 0.90 ± 0.42* 1.59 ± 3.47* 1.51 ± 1.21* 2.24
DA-DenseNet with BC 2.341 ± 2.33* 2.16 ± 2.25 2.28 ± 6.48* 1.82 ± 1.31 2.84 ± 6.69* 0.89 ± 0.57 1.34 ± 4.49* 1.32 ± 0.88 1.87
DA-DenseNet with BC and TSP (proposed) 2.14 ± 3.09 2.14 ± 6.44 2.24 ± 6.26 1.82 ± 1.23 2.76 ± 3.73 0.86 ± 0.38 1.28 ± 4.40 1.39 ± 0.91 1.82

Table 3. 95% confidence intervals on the change in DSC (%) measurement (positive value when the proposed self-paced DenseNet is better).

Methods/Organs Duodenum Esophagus Gallbladder Pancreas Stomach Left Kidney Spleen Liver
DenseNet SCSA-Net 5.92, 6.37 3.87, 4.22 4.55, 5.05 1.77, 2.98 4.16, 5.02 2.91, 3.15 2.11, 2.37 3.29, 3.59 1.82, 2.14 0.85, 1.00 0.69, 0.80 1.29, 1.54 1.35, 1.54 1.02, 1.17
2.12, 3.29 0.82, 0.93
CSCE-Net 5.73, 6.20 3.44, 3.91 3.97, 4.80 3.30, 3.53 3.07, 3.31 1.55, 1.68 1.45, 1.60 1.15,1.28
DA-DenseNet (backbone network) 2.60, 2.88 1.86, 2.27 2.07, 3.18 2.19, 2.35 1.19, 1.35 1.03, 1.08 1.05, 1.12 0.54, 0.58
DA-DenseNet with BC 0.12, 0.50 1.27, 1.58 0.74, 1.93 0.26, 0.44 0.80, 1.00 0.46, 0.57 1.05, 1.13 0.39, 0.50

Table 4. 95% confidence intervals on the change in ASD (mm) measurement (negative value when the proposed self-paced DenseNet is better).

Methods/Organs Duodenum Esophagus Gallbladder Pancreas Stomach Left Kidney Spleen Liver
DenseNet −1.82, − 1.65 −0.18, 0.09 −2.26, − 1.92 −0.43, − 0.37 −1.10, − 0.90 −0.35, − 0.30 −0.53, − 0.43 −0.40, − 0.36
SCSA-Net −1.60, − 1.51 −0.08, 0.22 −1.83, − 1.48 −0.28, − 0.23 −1.66, − 1.43 −0.25, − 0.19 −0.97, − 0.80 −0.47, − 0.41
CSCE-Net −1.84,− 1.70 −0.15, 0.13 −2.42, − 2.02 −0.36, − 0.30 −0.70, − 0.53 −0.28, − 0.24 −0.32, − 0.25 −0.38, − 0.35
DA-DenseNet (backbone network) −1.32, − 1.26 −0.21, 0.08 −1.01, − 0.86 −0.21, − 0.17 −0.48, − 0.26 −0.05, − 0.03 −0.22, − 0.14 −0.37, − 0.35
DA-DenseNet with BC −0.26, − 0.15 −0.16, 0.12 −0.18, 0.10 −0.02, 0.02 −0.21, 0.02 −0.04, − 0.02 −0.02, 0.01 −0.17, − 0.15
Figure 4.

Figure 4. Examples of the abdominal segmentation results by DenseNet, SCSA-Net, CSCE-Net, and the proposed self-paced DenseNet. The first column shows the ground truths, the second, third, fourth, and fifth columns present the segmentation results by DenseNet, SCSA-Net, CSCE-Net, and the proposed self-paced DenseNet, respectively. The white arrows denote the significant improvements. The box next to each image indicates the quantitative evaluation (DSC%) of the segmentation results by each method. Abbreviations: DU: duodenum, ES: esophagus, GA: gallbladder, PA: pancreas, ST: stomach, KI: left kidney, SP: spleen, LI: liver, AV: average.

Standard image High-resolution image
Figure 5.

Figure 5. 2D views of the abdominal segmentation results by DenseNet, SCSA-Net, CSCE-Net, and the proposed self-paced DenseNet. The first column shows the CT slices, the second, third, fourth, and fifth columns present the segmentation results by DenseNet, SCSA-Net, CSCE-Net, and the proposed self-paced DenseNet, respectively. The automated segmentation results are overlapped with the manual segmentation results. For clear illustration, ground truths of the eight organs are shown in yellow, automated segmentation results are shown in different colors. Liver-pink, spleen-red, stomach-purple, esophagus-blue, pancreas-green, gallbladder-brown, left kidney-deep blue.

Standard image High-resolution image

Moreover, we performed paired Student's t-test between the results of each comparison method and the proposed method to get the corresponding P-value. The significance level of each organ and comparison method are shown in tables 1 and 2.

3.6. Impact of the aggregation strategy

There are several strategies for feature map aggregation, including element-wise max-out, element-wise addition, and concatenation (Fu et al 2019). Actually, all strategies boost performance. However, considering both model complexity and performance, we removed the concatenation strategy and investigated the max-out and addition aggregation strategies in the section. The average DSC (%) and ASD (mm) of using max-out and addition aggregation strategies are reported in table 5. The average DSC and ASD of using max-out strategy outperform the addition strategy by 0.81% and 0.45 mm, respectively. An intuitive explanation behind the superior performance of the max-out based aggregation is its ability to induce element-wise selectivity by making both of the excitations compete (Roy et al 2019). In all of the experiments, max-out based aggregation was employed to merge the channel-wise and spatial attention maps.

Table 5. Comparison results of using different aggregation strategies.

  DSC (%) ASD (mm)
Organs Max-out Addition Max-out Addition
Duodenum 66.75 ± 12.5 59.60 ± 13.5 3.43 ± 2.47 5.32 ± 2.33
Esophagus 69.73 ± 11.0 73.07 ± 10.3 2.20 ± 2.28 2.58 ± 2.25
Gallbladder 78.94 ± 14.7 76.26 ± 18.1 3.17 ± 6.36 2.28 ± 6.48
Pancreas 77.19 ± 9.01 75.83 ± 9.17 2.01 ± 1.45 2.82 ± 1.31
Stomach 87.26 ± 6.92 87.74 ± 9.22 3.14 ± 4.10 3.99 ± 6.69
Left Kidney 93.94 ± 3.46 95.42 ± 2.55 0.90 ± 0.42 0.89 ± 0.57
Spleen 94.03 ± 6.92 93.86 ± 7.41 1.59 ± 3.47 1.89 ± 4.49
Liver Average 95.75 ± 1.61 82.95 95.35 ± 1.25 82.14 1.51 ± 1.21 2.24 1.75 ± 0.88 2.69

3.7. Impact of the network architecture

To evaluate the impact of the network architecture, the comparisons between the proposed network and DenseNet, SCSA-Net, and CSCE-Net were conducted. The segmentation results by the DenseNet, SCSA-Net, CSCE-Net, and the proposed network are shown in the second, third, and fourth columns in figure 4, respectively. It can be observed that the proposed self-paced DenseNet consistently outperforms the three networks, with more robust boundary detection performance and notable improvements on duodenum, esophagus, gallbladder, and pancreas.

Moreover, to eliminate the effects of boundary constraint and task-wise self-paced learning strategy and quantify the contribution of the improved network architecture, the quantitative evaluation of the segmentation results by the proposed DA-DenseNet (without boundary constraint and task-wise self-paced learning) are also presented in the fifth rows in tables 1 and 2. It can be observed that DA-DenseNet consistently outperforms the vanilla DenseNet, SCSA-Net, and CSCE-Net on the eight organs. Specifically, compared with the three networks, the proposed segmentation network improves the average DSC value of the eight organs by 1.63, 0.52, and 1.32, respectively, which demonstrates the advantages of the proposed DA-Dense block.

Additionally, to illustrate the improvements brought by the proposed DA-DensNet, the output feature maps of the eighth block from vanilla DenseNet and DA-DenseNet are shown in figure 6. Specifically, the size of the selected features maps is $72 \times 72 \times 72$, with a channel number of 80. It can be observed that the dual-attention mechanism effectively suppressed the irrelevant regions and enhanced the regions that include the targets to be segmented.

Figure 6.

Figure 6. The output feature maps of the eighth block (channel 1–80) from vanilla DenseNet (right upper part) and DA-DenseNet (right lower part). To show the feature maps in 2D, the 28th slices of the CT (left part) and feature maps are selected.

Standard image High-resolution image

3.8. Impact of boundary constraint

To show the effectiveness of the boundary constraint used in the framework, the comparison experiments between the proposed self-paced DenseNet with and without the boundary constraint were conducted. Due to the irregular shape and blur boundaries of the duodenum, the boundary constraint shows noticeable improvements on it. Four examples of the duodenum segmentation results and edge maps produced by the proposed method with and without the boundary constraint are shown in figures 7 and 8, respectively. Compared with the results in the second and third columns in figure 7, the boundary detection accuracy and continuity are noticeably improved. Moreover, the fifth and sixth rows in Tables 1 and 2 quantitatively compare the segmentation results with and without boundary constraint, which shows consistent improvements and further demonstrates the effectiveness of the boundary constraint.

Figure 7.

Figure 7. Examples of the duodenum segmentation results by the proposed method with and without boundary constraint.

Standard image High-resolution image
Figure 8.

Figure 8. Boundary maps of the ground truths and automated segmentation results by the proposed segmentation method with and without the boundary constraint. The first column shows the original CT images. The second, third, and fourth columns show the boundary maps of the ground truths, segmentation results produced by the propsed method without and with boundary constraint, respectively. For a clear illustration, the boundary maps (in green color) are overlapped with the CT images.

Standard image High-resolution image

However, the quality of the extracted boundary map highly depends on the predictions produced by the segmentation network. In order to provide effective boundary similarity loss to the objective function, the start epoch number, and weight parameter ${\lambda _{edge}}$ are two important parameters that influence the effectiveness of the boundary constraint. We varied epoch number between 1 and 10 and the weight ${\lambda _{edge}}$ between 0.1 and 1.0 on the abdominal data set. Figure 9 shows the segmentation performance with different start epochs and weights. We note that the segmentation network achieves the optimal performance when integrating the boundary constraint from the end of the 2nd epoch and then starts to decrease. This is because the network gradually converges after the 2nd epoch, and then the boundary constraint would have minimal effects on the segmentation performance. We also notice that ${\lambda _{edge}}$= 0.3 leads to optimal performance.

Figure 9.

Figure 9. Boxplot of DSC values of the abdominal CT segmentation with respect to different epoch number (a) and different boundary loss weight (b).

Standard image High-resolution image

3.9. Impact of task-wise self-paced learning strategy

To illustrate the advantages of the task-wise self-paced learning strategy in the framework, we compared the evaluation curves of the DSC during training. Figure 10 (a), (b), (c), and (d) present the average DSC loss curves, the loss curves for the pancreas, duodenum, and stomach, respectively. Generally, we note that the network with task-wise self-paced learning strategy shows lower validation loss as compared to the network without it. More specifically, the learning curves of the network with task-wise self-paced learning decrease slowly at the beginning, mainly due to the lower weights assigned to the poorly-segmented organs and the learning of the well-segmented organs has been emphasized. With increasing discriminability and convergence of the well-segmented organs, the weighting parameters for poorly segmented organs start to increase, shifting learning emphases to them. The learning curves demonstrate that the task-wise self-paced learning strategy can adaptively control the learning paces of the organs based on their segmentation performance, optimize them sequentially, and improve the segmentation robustness on organs that are difficult to segment. The sixth and seventh rows in tables 1 and 2 shows consistent improvements using the task-wise self-paced learning strategy.

Figure 10.

Figure 10. Learning curves of the segmentation network. (a), (b), (c), and (d) display the average DSC loss curve and the DSC loss curves of the pancreas, duodenum, and stomach, respectively.

Standard image High-resolution image

3.10. Comparison with state-of-the-art segmentation methods

We selected four state-of-the-art deep learning abdominal segmentation methods for comparison, which include the Dense V-networks by (Gibson et al 2018b), the regional FCN by (Larsson et al 2017), the hierarchical 3D FCN by (Roth et al 2017), and shape representation model (SRM) constrained FCN (SRM-FCN) proposed by (Tong et al 2018). It is worth noting that we used the same data set provided by Gibson et al in (Landman et al 2015, Gibson et al 2018a, Clark et al 2013) and also did 9-fold cross-validation. To enable fair comparison, the hierarchical 3D FCN and SRM-FCN were implemented on the same dataset under the same experimental setting. Paired Student's t-test was also conducted between the results of 3D FCN, SRM-FCN, and the proposed method. As shown in table 6, the proposed method consistently outperforms the other four methods on the eight abdominal organs. Specifically, the proposed method improves the average DSC by 2.96% in comparison with the method in (Gibson et al 2018b). Moreover, the proposed method shows greater improvements on difficult-to-segment organs including the gallbladder, pancreas, and duodenum, by 7.94%, 4.24%, and 6.26%, respectively, while maintaining the highest accuracy of segmentation on easy-to-segment organs such as the liver, left kidney, and spleen. Additionally, consistent improvements can be observed on distance metrics (ASD and symmetric 95% Hausdorff distance) in tables 7 and 8. The proposed self-paced DenseNet shows smaller surface distance with the manual ground truth than other state-of-the-art segmentation methods.

Table 6. Comparison of segmentation performance between the state-of-the-art methods and the proposed method (mean DSC %). (The best results are indicated in bold, with asterisks implying statistical significance (P < 0.05)).

Organs/Methods Gibson et al (2018b) (Dense V-network) Larsson et al (2018) (Regional FCN) Roth et al (2017) (Hierarchical 3D FCN) Tong et al (2018) (SRM-FCN) Self-paced DenseNet (proposed)
Duodenum Esophagus Gallbladder Pancreas Stomach Left Kidney Spleen Liver Average 63 71 73 75 87 93 95 95 81.5 N/A 58.8 61.3 64.6 76.4 91.1 93.6 94.9 N/A 62.34* 66.28* 72.83* 68.13* 85.76* 92.98* 91.49* 94.98 79.34 61.07* 69.42* 70.58* 76.63* 84.53* 93.40* 93.63* 95.51 80.60 69.26 71.64 80.94 79.24 88.66 94.70 95.06 96.20 84.46

Table 7. Comparison of segmentation performance between the state-of-the-art methods and the proposed method (median ASD mm). (The best results are indicated in bold, with asterisks implying statistical significance (P < 0.05)).

Organs/Methods Gibson et al (2018b) (Dense V-network) Roth et al (2017) (Hierarchical 3D FCN) Tong et al (2018) (SRM-FCN) Self-paced DenseNet (proposed)
Duodenum Esophagus Gallbladder Pancreas Stomach Left Kidney Spleen Liver Average 4.1 1.7 1.6 1.9 2.5 0.9 0.8 1.6 1.89 4.13* 2.45* 1.97* 2.79* 2.63* 0.93* 0.90* 1.82* 2.20 4.28* 2.02* 2.34* 1.86* 2.89* 0.92* 0.89 1.75* 2.12 2.04 1.86 1.72 1.65 2.43 0.81 0.79 1.19 1.57

Table 8. Comparison of segmentation performance between the state-of-the-art methods and the proposed method (median symmetric 95% Hausdorff distance mm). (The best results are indicated in bold, with asterisks implying statistical significance (P < 0.05)).

Organs/Methods Gibson et al (2018b) (Dense V-network) Roth et al (2017) (Hierarchical 3D FCN) Tong et al (2018) (SRM-FCN) Self-paced DenseNet (proposed)
Duodenum Esophagus Gallbladder Pancreas Stomach Left Kidney Spleen Liver Average 15.0 5.6 4.6 5.9 9.1 3.1 2.4 4.9 6.33 9.74* 7.45* 4.79* 4.94* 5.36* 3.39* 3.01* 4.67* 5.42 7.82* 6.12* 4.23 4.68* 4.29* 2.79* 2.94* 4.25* 4.64 4.67 5.01 3.98 3.88 3.92 2.56 2.45 3.62 3.76

Despite the segmentation performance, the proposed self-paced DenseNet is more efficient than the other state-of-the-art methods. Specifically, both of the regional FCN and hierarchical FCN consist of coarse and fine segmentation stages, which not only increase the model complexity, but lower the computation efficiency and generalizability. Similar to the boundary constraint proposed in this study, SRM-FCN also included an additional SRM to constrain the shape of the predictions produced by the FCN. However, different from the end-to-end self-paced DenseNet in this study, SRM-FCN requires pre-training of the SRM for each dataset and organ set to be segmented, which also limits its computation efficiency and generalizability.

4. Discussions

In this study, a multi-organ segmentation network is presented, which aims to address the challenges in automated abdominal CT segmentation, which is challenged by the large learning gaps among organs (duodenum vs. liver), boundaries with low conspicuity, and complex organ morphology. We improved the segmentation network in the following three different aspects: the network architecture, the loss function, and the optimization relationships among the organs. We discuss their contributions, limitations, and future improvements as follows.

4.1. Network architecture

We built the dual-attention DenseNet (DA-DenseNet) for voxel-wise prediction, which takes the advantages of dense connection mechanism. However, to deal with a large number of redundant features in the dense block and facilitate the propagation of the task-related feature maps, channel-wise, and spatial attention mechanisms are seamlessly integrated with the dense block. Unlike the CSCE-Net proposed by (Roy et al 2019), which directly places the attention layer after each block, the proposed dual-attention block shows stronger discriminative capacity as compared in Tables 12 and figure 6 due to the seamless integration.

Currently, channel-wise and spatial attention strategies are conducted separately. A single spatial attention map for all channels and a single scalar to weight the importance of each channel may be insufficient. Instead, jointly learning the channel-wise and spatial relationships should be performed, which we leave as future work.

4.2. Boundary constraint

A boundary detector is utilized in this work to supervise the generation of the boundaries, enhance the boundary consistency between the predictions and ground truths, and thus aid the segmentation performance. Generally, the boundary refinement process is conducted at the post-processing steps. However, additional post-processing will increase the computation complexity and decrease the algorithm efficiency, which is one of the main advantages of the deep learning segmentation methods. Unlike other post-processing steps and regularization models, e.g. conditional random field (CRF) and discriminators in generative adversarial network (GAN), the boundary detector is a traditional image processing technique and can be effectively incorporated into the network with a minimal computational cost. Moreover, the boundary detector is robust to parameter tuning. With the boundary constraint, the boundary continuity has been significantly improved, especially the duodenum, as shown in Tables 12 and figures 7 and 8.

4.3. Relationships among organs

Individual organs in multi-organ segmentation tasks are either treated equally or weighted by the volume proportion of each organ. With equal weights, the organs that are large, relatively isolated, regularly shaped, and having clear contrast will dominate the learning process, compromising the segmentation of small, low contrast and blurry organs. Alternatively, the learning could be weighted proportionally to the organ size. In some tasks, the weighting strategy was able to counter the volume imbalance among organs (e.g. chest segmentation includes hearts, clavicles, and lungs in (Novikov et al 2018)). However, in the more extreme class imbalance case of abdominal organ segmentation (e.g. liver vs. duodenum), such a strategy would result in network over-fitting and segmentation accuracy degeneration.

To effectively control and balance the learning pace of each organ, a task-wise self-paced-learning strategy is utilized in this work. The weights of individual classes are adaptively updated based on their relative performance. At the earlier epochs, before networks mature, the difficult-to-segment organs are assigned with small weights to help the network converge steadily. With the training of the network, the discriminative capacity of the network increases, and the weighting parameters of the difficult-to-segment organs gradually increase to divert network focus towards these organs. In other words, the eight abdominal organs are focused in sequence. The effectiveness of the proposed task-wise self-paced learning strategy has been shown in Tables 1, 2, and figure 10.

Compared with the network with equal weights for all organs, the loss of the network with task-wise self-paced learning strategy decreases more slowly at the earlier epochs but shows a more steady decrease of loss with the training of the network that leads to eventually lower validation loss. This trend is more obvious in the loss curves of the pancreas, duodenum, and stomach in figures 9(b), (c), and (d). These results demonstrate that the task-wise self-paced learning strategy can effectively control the learning paces of individual organs in a multi-organ segmentation task and improve the network performance on the difficult-to-segment organs.

Although significant improvements were shown in cross-validation, multi-institutional studies and validation on unseen data are desired to further test the method performance, robustness, and generalizability. These additional tests will be pursued in future studies.

5. Conclusion

In this work, we introduced an end-to-end network, i.e. self-paced DenseNet, for automated multi-organ segmentation on abdominal CT images. The self-paced DenseNet combines the strengths of the attention mechanism and boundary constraints to combat the specific challenges due to motion artifacts, low image contrast, class imbalance, and morphological complexity. The learning paces of the organs are balanced by the task-wise self-paced learning mechanism. The strategy adaptively and sequentially updates the weighting parameters based on the organ segmentation difficulties. Compared with the state-of-the-art abdominal segmentation methods, the proposed method shows promising boundary localization accuracy and consistent superior segmentation performance on all of the organs segmented.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of Shaanxi Province under Grant No. 2019ZDLGY03-02-02, in part by NIH R44CA183390, R01CA230278 and R01CA188300. N Tong was supported in part by the CSC Chinese Government Scholarship.

Please wait… references are loading.
10.1088/1361-6560/ab9b57