Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

White matter fibers are important structures in the brain, connecting its various components, and are vulnerable to breakdown in a variety of brain diseases. Studying white matter (WM) fiber bundles brings new insight into disease progression, and into the structural network supporting communication in the brain. The brain’s neural pathways - or fiber tracts - have complex individual variations in geometry, and they are interspersed with each other - which makes it very difficult to cluster them into anatomically meaningful groups or units for further statistical analysis. One commonly used clustering method [9] uses manual ROI delineation on the images of fractional anisotropy (FA), a scalar metric derived from diffusion-weighted MRI. These regions can be used to seed whole-brain fiber tractography, and the set of resulting curves - or streamlines - is then grouped into white matter bundles using spectral clustering. One method uses Hausdorff’s distance [4] as a distance metric between two fibers, prior to hierarchical clustering based on the distances between fibers. Recently, several unsupervised clustering methods [6, 8, 11, 15] have been proposed. Though mathematically elegant, these methods come with a baggage of assumptions and thus with their own limitations. With the increasing amount of data, we can make a completely data driven clustering algorithm using the convolution neural network (CNN) framework. Though, texture based neural networks have been used even in the early years of brain imaging [10], they were not used for white matter clustering. Recently, CNNs have been extensively used in the computer vision community for object detection, clustering and segmentation [13]. Deep learning methods offer some attractive properties. The most important is automatic feature selection. A well designed CNN should be able to extract the most discriminative features to achieve a given task. Other attractive properties of CNN include re-usability and scalability. To use the CNN framework, we introduce a volumetric parameterization technique to transform the brain into a topologically equivalent spherical domain. In computational anatomy, few algorithms have been devoted for surface [5, 14] and volumetric parameterization. In this paper, we developed a novel method that parameterizes the entire volume of the brain and every structure contained in it. We then use the parameterized coordinates of the tracts to cluster the white matter fibers.

2 Harmonic Function

We parameterize the 3D volume using a potential function \(\phi \) with harmonic property. A function with the harmonic property is a \(C^2\) continuous function that satisfies Laplace’s equation. Harmonic functions can be used to establish a bijective mapping between the brain and the topologically equivalent spherical shape. If \({\phi }:U{\rightarrow }R\), where \(U{\subseteq } R^{n}\) is some domain and \(\phi \) is some function defined over U, the function \(\phi \) is harmonic if its Laplacian vanishes over U, i.e., \({\nabla }^2\phi =0\). In terms of Cartesian coordinate system, we can write

$$\begin{aligned} {\nabla }^2{\phi }={\sum }_{i=1}^{3}\frac{{\partial }^2\phi }{{\partial }^{2}{x_i}}=0 \end{aligned}$$
(1)

where \(x_i\) is the \(i^{th}\) Cartesian coordinate.

2.1 Defining a Shape-Center

Ideally, the shape-center should be located at approximately the same anatomical location. This particular point is located at (98, 113, 111) in the standard MNI template. The corresponding point among the subjects is located using a linear registration process. The B\(_0\), T1 images and the MNI templates are rigidly registered in the same order. The transformations are combined and inverted. When applied to the aforementioned point, the inverse transformation outputs the shape-center in the subject space.

2.2 Boundary Conditions

We apply the Dirichlet and Neumann boundary conditions on the shape-center and the boundary surface \((\partial {U})\), i.e., we assign the value of the function \(\phi \) on all the boundary points and the shape-center to 1 and 0 respectively. These values remain unchanged across computation. All the remaining points inside the brain are assigned random values between 0 and 1 as an initial condition.

2.3 Potential Computation

An iterative finite difference scheme is used to solve the Laplace equations. If \(\phi (x,y,z)\) is a harmonic function, its second derivative is computed using the Taylor’s series expansion and using the Laplace equation from 1 we have

$$\begin{aligned} \phi (x_i, y_i, z_i)&= \frac{{\phi }(x_{i+1}, y_i, z_i) + {\phi }(x_{i-1}, y_i, z_i)}{6h^2} \\&+ \frac{{\phi }(x_{i}, y_{i-1}, z_i) + {\phi }(x_{i}, y_{i+1}, z_i)}{6k^2} + \frac{{\phi }(x_{i}, y_i, z_{i-1}) + {\phi }(x_{i}, y_i, z_{i+1})}{6l^2} \end{aligned}$$

where hk and l are the grid resolution in xy and z directions respectively. The above potential values are computed until the maximum difference between two successive iterations is below a certain threshold \(\zeta \) \(({<}\mathrm {10^{-6}})\).

2.4 Computing Potential-Flow Lines

Streamlines or the potential flow lines are orthogonal to the equi-potential surfaces created in the previous step. Each of the streamlines starts from the boundary points on the brain surface and progresses towards the designated shape-center. Each of these streamlines approaches the shape-center at unique angle(s), which remain constant along the streamline. This property is endowed by construction. The streamlines are computed by solving the following differential equation,

$$\begin{aligned} \frac{{\partial }X}{{\partial }t}=-{\eta }{\nabla }{\phi }[X(t)] \end{aligned}$$
(2)

where \(X=[x,y,z]^{T}\) is the coordinate vector and \(\eta \) is the normalization constant. MATLAB’s ode23 routine is used to solve the system of differential equations.

2.5 Parameterizing the Brain

Each streamline originating from each of the boundary points approaches the shape-center at a unique angle of approach. These angles remain constant along the streamlines. In case of three dimensional objects, the angle of approach is characterized by the elevation (\(\theta \)) and the azimuthal (\(\psi \)) angles. The vector between the shape-center and the end point of the streamline is calculated. The angles are calculated using the Cartesian to spherical coordinate transformation

$$\begin{aligned} \psi =\mathrm {atan2}(y,x);\;\;\; \theta = \mathrm {atan2}(z, \sqrt{x^2 + y^2}) \end{aligned}$$
(3)

The streamlines intersect the equipotential surfaces at right angles (see Fig. 1). Each point of intersection generates a tuple \([\phi , \theta , \psi ]^{T}\) for the corresponding Cartesian coordinates \([x,y,z]^T\).

Fig. 1.
figure 1

Left: 3D view of different equipotential surfaces are shown with different colors. The streamlines lines emanating from the surface approach the shape-center at unique polar and azimuthal angles. These angles of approach remain constant along the streamlines and intersect the surfaces at right angles. Middle: The equivalent spherical shape. The streamlines are radial lines emanating from the surface. Right: Location of shape-center on the MNI template.

3 Mapping the White Matter Fibers

After the whole brain is parameterized as above, each fiber tract is mapped to the new coordinate system, i.e., in the spherical space. At this stage, we have a bijective mapping between the Cartesian coordinates of every voxel in the brain and the newly computed coordinate system. A KD-tree accessor is built using the native brain coordinates for \(\phi \), \(\theta \) and \(\psi \). For every point on the fiber tract, the algorithm searches for ten neighborhood points and computes a weighted average to get the corresponding coordinate in the target domain. This process establishes the mapping of fibers in the target spherical domain.

4 Data Pre-processing and Fiber Tracking

Each of the diffusion images has 46 volumes acquired with 5 T2-weighted B0 volumes and 41 diffusion-weighted volumes with voxel size of \(1.36\,\times \,1.36\,\times \,2.7\,\mathrm {mm^3}\). The scans are acquired using a GE 3.0T scanner, using with echo planar imaging with parameters: TR/TE = 9050/62 ms. The raw DWI volumes are aligned to the first b\({}_{0}\) image using FSL’s eddy_correct [12] to correct for head motion and eddy current distortions. The corresponding T1 image is skull stripped and a brain-mask was calculated. The B0 and T1 images are registered using an affine registration. The resulting inverse transformation is used to transfer the mask to the diffusion image space. Diffusion tensors and whole brain probabilistic tractography are computed using the Camino toolkit [3]. Voxels with fractional anisotropy (FA) values greater than 0.2 were chosen as seed points. The maximum fiber turning angle was set to 60\({}^\circ \)/voxel, and tracing stopped at any voxel with FA < 0.2. As a pre-filtering step, any fiber tract shorter than 35 mm was eliminated. We arrived at this values through inspection of our training dataset. Also any tract whose ends are in unlikely ROIs were categorized as noisy.

4.1 Data Augmentation

One challenge in multi-class classification problems like the one presented here is the uneven training samples for each class. Ideally, the training samples should contain an even number of samples to avoid bias against any particular class. However, it is natural to have an unbalanced group of WM fibers. In order to have an even distribution of the training samples, we create copies of the original data set by convolving with a three 1-D Gaussian filters along xy and z axes to create noisy variants of the tracts in the training dataset. In addition, the order of points in the tracts is flipped. This process makes the training process invariant to the order of points in the tracts. This data augmentation process is only applied on the manually clustered bundles, thus ascertaining not to augment the noisy fibers in the training set.

5 Network Architecture

The augmented data is mapped to the spherical domain as described in Sect. 3. We experimented with multiple kernel sizes and layers and present the most successful architecture. The input data consisted of a \(50\,\times \,3\) matrix, i.e., each tract consisted of 50 points. Initially, we resampled the tracts into 25, 50, 75 and 100 points for exploration. More than 50 points showed depreciating returns in terms of accuracy. The network contains two convolutional layers with 32 and 64 feature maps. Feature maps define the number of filters used in each layer, and their sizes determine the receptive field. A large receptive field can acquire higher-order spatial information while the trade-off is the increase in the number of parameters. The non-linearity inducing functions used are rectified linear units (ReLU) and the hyperbolic tangent function (tanh). To prevent over-fitting we use 80% dropout, which randomly switches off neuronal units in each layer thereby reducing their influence at any particular iteration during back-propagation. Finally, the last layer is a ‘softmax’ layer with 17 outputs for 17 classes. The method is implemented using TensorFlow version r0.11 [1].

Fig. 2.
figure 2

Each fiber in the input layer is a \(50\,\times \,3\) matrix. There are two convolutional layers of size 32 and 64 respectively. Each layer contains convolutional kernels of size \(3\,\times \,3\). The last convolutional layer is followed by two fully connected (FC) layers of size 128 and 256 followed by a softmax (output) layer.

5.1 Training Data and Majority Voting

Out of 96 subjects, manual segmentation is performed on four randomly selected subjects. These four subjects act as the training data set. Each subject’s tractography data is manually clustered into 17 anatomically relevant fiber bundles. We use the manual tract segmentation on each of the four subjects used as the training data for the method. Then we perform a leave one out cross-validation (LOOCV) procedure. Another method to improve the accuracy of the method was to use bootstrap aggregation or bagging [2]. We sample our data with replacements for 600,000 fibers, 20 times. In essence we fit 20 different “weak” classifiers to the given dataset. Bagging prevents overfitting, thus accounting for the variability across datasets. For prediction, we used the majority voting procedure from all the different models (Fig. 2).

Fig. 3.
figure 3

The confusion matrices show the classification accuracy of FiberNET. Left: Validation set of 4 subjects. Right: 92 subjects in the dataset when compared to autoMATE [7]

6 Results

The LOOCV process proves the feasibility of our network architecture. Further, we compare our method to an existing clustering algorithm, autoMATE [7]. The reliability can be assessed using the confusion matrix. In Fig. 3, we show two matrices that compare the accuracy of predictions. The diagonal and the off-diagonal values represent the true positive and the false positive rates of prediction respectively. On the left, we show the mean prediction accuracy on the 4 cross-validation dataset. The prediction labels are compared against manual segmentation. On the right, we show a similar mean matrix for all the 92 subjects in the dataset, when compared to autoMATE. One of the drawbacks of autoMATE is that it is very conservative, in the sense that it will not mis-classify tracts, but it is quite likely that it will miss out on true positives. The presented method has no such bias. In Fig. 4, the left panel shows a comparison of FiberNET with manual segmentation. On the right, a comparison between FiberNET and autoMATE is presented. As we can see, FiberNET mis-classifies certain tracts when compared against a random test subject from the dataset. However, we would like to argue that the CNN based method is much more flexible and gives us an opportunity to search for a better architecture that could possibly address the misclassification problem. Since there is no linear mapping between the spherical space and the original coordinate system and the learned features lie in the spherical space, it is difficult at the moment to comment on the properties of learned shape features.

Fig. 4.
figure 4

Comparison of ground truth (top row) versus predicted clusters (bottom row) on a validation subject (left) and a random test subject (right). We present here six representative tracts to show examples of good and average clustering; the right inferior fronto-occipital fasciculus (red), right cortico-spinal tract (orange), right inferior longitudinal fasciculus (green), right cingulum (yellow), right uncinate fasciculus (pink), and right anterior thalamic radiation (blue). The arrows show the mis-classified tracts.

6.1 Conclusions

In this paper, we presented a volumetric parameterization procedure to define an intrinsic coordinate system for the brain. We also showed its efficacy using an ensembled deep learning approach to cluster WM fibers into anatomically meaningful fiber bundles. We have shown for the first time a reliable method to apply deep learning approaches to the fiber clustering problem. Though our method is not 100% accurate across different sections of corpus callosum, it is understandable due to the very close resemblance between the shapes of these tract bundles. Nonetheless, this is a first step towards solving this clustering problem using a neural network. We also expect that the reliability of the method should increase with time as more datasets are added to the model, to better capture the spectrum of variability.