1 Introduction

According to CBTRUS (Central Brain Tumor Registry of the United States), an estimated 78,980 new cases of primary malignant and non-malignant brain and other CNS tumors are expected to be diagnosed in the United States in 2018. Moreover, 16,616 deaths are likely to be attributed to primary malignant brain and other CNS tumors in the US in 2018. Magnetic resonance imaging (MRI) and computed tomography (CT) scans provide high-resolution images of the brain. Based on the degree of excitation and repetition times, different modalities of MRI images maybe obtained, i.e. FLAIR, T1, T2, T1c.

These modalities prove to be highly useful in detecting different subregions of the brain tumor, namely: edema (Whole tumor), non-enhancing solid core (tumor core), necrotic/cystic core and enhancing core. Manual detection, segmentation of the brain tumors for cancer diagnosis, from large amounts of MRI images generated during clinical routine, is a difficult and time consuming task. Thus, there is substantial importance for automatic brain tumor image segmentation from MRI for diagnosis and radiotherapy.

A fundamental difficulty with segmenting brain tumors automatically is that they can appear anywhere in the brain, and vary in their shape, size and structure. Additionally, brain tumors along with their surrounding edema are often diffused, poorly contrasted, and have extended tentacle-like structures. VLS method with Active Contour (AC) is widely applied in image segmentation [1] due to its ability to automatically handle such various topological changes. Some of the remarkable works [2,3,4] on brain tumor segmentation that utilized VLS have shown the potential of VLS in achieving highly accurate brain tumor segmentation. However, the segmentation accuracy in the VLS based methods dramatically reduces when dealing with numerous intervening factors such as lighting, shadows, colors and backgrounds with large variety or complexity. MRI images are modalities that contain such factors.

The limitations of the VLS approaches can be summarized as follows: (1) they are largely handicapped in capturing variations of real-world objects. (2) they fail to memorize and to fully infer target objects. (3) they are limited in segmenting multiple objects with semantic information. (4) the segmentations generated by them are quite sensitive to numerous predefined parameters such as initial contour and number of iterations. To overcome the limitation of VLS in solving the problem of brain tumor segmentation, our motivation is to answer the following questions: (1) How to incorporate LS into deep learning to inherit the merits of both LS algorithm and deep learning? (2) Is it possible to replace the softmax function by a LS energy minimization to get a better outcome on MRI images? If so, how is curve evolution performed with forward and backward processes of the deep learning framework. (3) How are VLS iteration processes performed in the deep framework?

To address these issues and boost the classic VLS methods to learn-able deep network approaches, we propose a new formulation of VLS integrated in a deep framework, called the Deep Recurrent Level Set (DRLS) which combines the advantages of both fully convolutional network (FCN)[5] and LS method [6]. For MRI images, the local, global, and contextual information is important to obtain a high quality feature map. To achieve this goal, the proposed DRLS is designed by incorporating LevelSet Layers into VGGNet-16 [7] with three types of layers: Convolutional layers, deconvolutional layers and LevelSet layers. The proposed DRLS contains two main parts corresponding to visual representation and curve evolution. The first part extracts features using a Fully Convolutional Network (FCN) while incorporating skip connection to up-sample the feature map. The second part is composed of a level set layer that drives the contour such that the energy function attains a minima as shown in Fig. 1. Notably, our target is to show that it is completely possible to promote VLS to the higher level of learnable framework.

2 Literature Review

Over the years, discriminative, generative and deep learning methods have been used to segment brain tumors from MRI images. The following is brief description of such methodologies.

Classical Segmentation Methods. Anitha et al. [8], proposed segmentation using adaptive pillar K-means followed by extracting crucial features from the segmented image using discrete wavelet transforms. The features are put through two-tier classifiers namely, k-Nearest Neighbor Classifier(k-NN) and self-organizing maps(SOM). Dimah et al. [9] proposed a level set based approach for tumor segmentation by using histogram based clustering. The method also provides a local statistical characterization of the image by integrating the probabilistic non-negative matrix factorization (PNMF) framework into level set formulation. Tustison et al. [10] used asymmetry and first order statistical features to train concatenated Random Forests (RF) by introducing the output of the first RF as an input to the another.

Level-Set Methods. Some of the initial works that utilized VLS for brain tumor segmentations are [2, 3]. [2] combined VLS evolution and global smoothness with the flexibility of topology changes followed by mathematical morphology. Thus, achieving significant advantages over conventional statistical classification. [3] introduced a threshold-based scheme that uses level sets for 3D tumor segmentation (TLS). A global threshold is used to design the speed function which is defined based on confidence interval and is iteratively updated throughout the evolution process that require different degrees of user involvement. Thapaliya, et al. [4] introduced a new SPF that can efficiently stop the contours at weak or blurred edges.

Deep Learning Methods. In the year 2015, the top finisher of BRATS 2015 challenge was the first to apply CNN to brain tumor segmentation [11]. The proposed CNN architecture exploits both local features as well as more global contextual features simultaneously and was 30 times faster than the then state of art solutions. Additionally, the architecture uses convolutional implementation of a fully connected layer thereby allowing a 40 fold speed up. Urban, et al. [12] proposed a 3D CNN architecture which extracts 3D voxel patches from different brain MRI modalities. The tissue label of the center voxel is predicted by feeding 3D voxels into a 4-layered CNN architecture. In order to avoid high computations of 3D voxels, Zikic et al. [13] transformed the 4D data into 2D data such that standard 2D-CNN architectures can be used to solve the brain tumor segmentation task.

Recently, [14] evaluated a 11-layered CNN architecture on BRATS dataset by implementing small \(3 \times 3\) sized filters in the convolutional layers and reported comparative dice scores. In order to improve the performance and overcome the limitation of training data, CNNs is designed in a another fashion which combines with other classification methods or clustering methods [15]. One of the state of the art deep learning - based approach for segmenting brain tumor was developed called DMRes which is an improvement of Deep Medic [16].

3 Proposed Network

The pipeline of the proposed network is as illustrated in Fig. 1.

Fig. 1.
figure 1

The proposed DRLS network and algorithm

3.1 Formulation of Level Sets

Consider a binary image segmentation problem in 2D space, \(\varOmega \). The boundary C of an open set \(\omega \in \varOmega \) is defined as: \(C = \partial \omega \). In VLS framework, the boundary C can be represented by the zero level set \(\phi \) as follows:

$$\begin{aligned} \forall (x,y) \in \varOmega \left\{ \begin{matrix} C = &{} \{(x,y) : \phi (x,y) = 0\}\\ inside (C) &{} \{(x,y) : \phi (x,y) > 0\}\\ output (C) &{} \{(x,y) : \phi (x,y) < 0\} \end{matrix}\right. \end{aligned}$$
(1)

For image segmentation, \(\varOmega \) denotes the entire domain of an image \(\mathbf I \). The zero LS function \(\phi \) divides the region \(\varOmega \) into two regions: region inside \(\omega \) (foreground), denoted as inside(C) and region outside \(\omega \) (background) denoted as outside(C). The length of the contour C is defined as: \(Length(C) = \int _{\varOmega } {|\nabla H(\phi (x,y))|dxdy} = \int _{\varOmega } {\delta (\phi (x,y))|\nabla \phi (x,y)| dxdy}\) and the area inside the contour C is defined as \(Area(C) = \int _{\varOmega } H(\phi (x,y))dxdy\)

3.2 Recurrent Fully Convolutional Neural Network (RFCN)

[17] extended the classic CNNs to infer and learn from arbitrary-sized inputs. Later, [5] proposed a FCN model which adapts and extends deep classification architectures to learn efficiently from whole input and whole ground truth images. By casting fully connected layers into convolutional neural network with kernels that cover their entire input regions, FCN allows to take input of any size and generate spatial outputs in one forward pass. To map the coarse feature map into a pixel-wise prediction of the input image, FCN up-samples the coarse feature map by a stack of deconvolution layers. Figures 2(a-1, a-2) show the comparison between classic CNN and FCN.

The RFCN is an extension of FCN architecture and given in Fig. 2(a-3). In the proposed RFCN, the output feature map of the current step is the input to the next step.

Fig. 2.
figure 2

The details of the proposed DRLS

3.3 Deep Recurrent Level Set (DRLS) - Proposed

Our proposed DRLS network is built based on VGG-16 with three different layers: convolutional layer, deconvolutional layer and LevelSet layer as shown in Fig. 1.

Convolutional Layer: The output feature map is computed by convolving the input feature map with convolution kernels \(\mathbf Y ^{(s, \theta )} = f^s(\mathbf X , \theta ) = \mathbf X \,*\,\mathbf W ^s + \mathbf b \) where \(\mathbf X \) is the input feature map. \(\mathbf W ,\mathbf b \) are convolution kernel and bias. \(\mathbf W ^s\) indicates convolution at a stride s. \(\mathbf Y ^{(s, \theta )}\) denotes the output feature map generated by the convolutional layers with total stride of s and parameterized by \(\theta \). Because of the stride of conv and pooling layers, the final output feature maps \(\mathbf Y ^{(s, \theta )}\) is downsampled by a factor of the total stride of s compared to the input feature map. Deconvolutional Layer: The deconv layer is used to upsample the input feature maps using the stored max-pooling indices from the corresponding conv feature map. Here, a skip connection is introduced to concatenate the output of deconv feature map with the corresponding convolutional feature map. Figure 2(b) illustrates the network’s architecture.

Let \(g^s(; \tau )\) denote a deconv layer parameterized by \(\tau \) that up-samples the input by a factor of s. The output is then concatenated with the corresponding convolutional layer \(\mathbf Y ^{(s-1,\theta )}\) via skip connection as \(\hat{\mathbf{Y }}^{s, \tau } = concat\left[ g^s(\mathbf Y ^{(s,\theta )}; \tau ), \mathbf Y ^{(s-1,\theta )}\right] \).

LevelSet Layer.

The network is trained to minimize the following energy function:

$$\begin{aligned} \begin{aligned} E(c_1, c_2, \phi )&= \int _{\varOmega }\mu {H(\phi )} + \nu {\delta (\phi )|\nabla \phi | } \\&+\,\alpha (H(\phi ) - GT)^2+ \lambda _1 {|H(\phi )-c_1|^{2}H(\phi ) + \lambda _2 |H(\phi )-c_2|^{2}(1-H(\phi )) } dxdy \end{aligned} \end{aligned}$$
(2)

In Eq. 2, the first term defines the area inside the contour C whereas the second term defined the length of the contour C. In the third term, GT is the groundtruth. Minimizing this term with \(\alpha > 0\) supervises the network to learn where a brain tumor occurs in the MRI images. The last two terms correspond to energy inside and outside of the contour C. To optimize the energy function, the calculus of variations is used. The derivative of energy function E w.r.t \(\phi \) is,

$$\begin{aligned} \begin{aligned} \frac{\partial E}{\partial \phi }&= \delta (\varphi )\left[ \mu -\nu div\frac{\bigtriangledown \phi }{|\bigtriangledown \phi |} + 2\alpha (H(\phi )- GT) + \lambda _1(H(\phi )-c_1)^2\right. \\&\quad \left. {} + 2\lambda _1(H(\phi ) - c_1)H(\phi ) - \lambda _2(H(\phi ) - c_2)^2 + 2\lambda _2(H(\phi ) - c_2) (1-H(\phi )) \right] \end{aligned} \end{aligned}$$
(3)

The derivatives of energy function E w.r.t \(c_1\) and \(c_2\) are,

$$\begin{aligned} \begin{aligned} \frac{\partial E}{\partial c_1}\,=\,-2\lambda _1(H(\phi ) - c_1)H(\phi ) \ \ \ \ \frac{\partial E}{\partial c_2}\,=\,-2\lambda _2(H(\phi ) - c_2)(1-H(\phi )) \end{aligned} \end{aligned}$$
(4)

Fix \(\phi \) and minimize the energy function w.r.t \(c_1\), \(c_2\):

$$\begin{aligned} \begin{aligned} c_1 = \frac{\int _{\varOmega }{H(\phi )(x,y) H(\phi ) dxdy}}{\int _{\varOmega } H(\phi )dxdy} \text { , } c_2 = \frac{\int _{\varOmega }{H(\phi )(x,y)(1-H(\phi ))dxdy}}{\int _{\varOmega }(1-H(\phi ))dxdy} \end{aligned} \end{aligned}$$
(5)

Fix \(c_1\) and \(c_2\), and minimize the energy function w.r.t \(\phi \)

$$\begin{aligned} \begin{aligned} \frac{\partial \phi }{\partial t}&= \delta _\epsilon (\varphi )\left[ -\mu +\nu div\frac{\bigtriangledown \varphi }{|\bigtriangledown \varphi |} -\lambda _1(H(\phi )-c_1)^2 - 2\lambda _1(H(\phi ) - c_1)H(\phi ) \right. \\&\quad \left. {} +\lambda _2(H(\phi ) - c_2)^2 - 2\lambda _2(H(\phi ) - c_2) (1-H(\phi )) \right] \end{aligned} \end{aligned}$$
(6)

4 Experimental Results

Dataset and Measurements: The proposed DRLS method is evaluated on BRATS17. Additionally, BRATS13 and BRATS15 datasets were used for comparing its performance with the state of the art techniques. The dataset is divided into training (80%) and testing (20%) datasets. The network is first trained on 168 HGG and 60 LGG training set of BRATS17 and then is fine tuned on the training sets of BRATS15 and BRATS13, respectively. To evaluate the performance of the proposed method, standard metrics are used as suggested in BRATS challenge [18] namely, Dice, Sensitivity(Sens) and Specificity(Spec). Besides metric scores, time consumption is also a key factor.

Results: The algorithm was tested on 42 HGG and 15 LGG from BRATS17, 44 HGG and 11 LGG from BRATS15 and 9 HGG and 7 LGG from BRATS13 for comparison with other methods.

Table 1. Performance of DRLS in comparison with other methods on BRATS13
Table 2. Performance of DRLS in comparison with other methods on BRATS15

Tables 1 and 2 have shown that the proposed algorithm outperforms other methods in terms of Dice scores and Sensitivity.

Besides metric scores, time consumption is also a key factor. Certain methods such as Tustison et al. [10] take 100 min to compute predictions per brain. However, when run on 4 GPUs the proposed DRLS shows a run time of just 55 seconds per patient. The algorithm is robust to outliers, runs fast and consistently shows improved core tumor segmentation.