1 Introduction

Brain organization displays high variability across individuals and species. Studying brain connectivity therefore faces the challenge of locating homogeneous regions while accounting for this variability. Different techniques have been proposed to parcellate the brain based on its structural connectivity. However, matching the resulting parcels across different subjects is still an open problem in neuroscience. Even when produced by the same technique, parcellations tend to differ in the number, shape, and spatial localization of parcels across subjects [8]. Current theories hold that long-range structural connectivity, namely, extrinsic connectivity, is strongly related to brain function [14]. Therefore, being able to match parcels with similar connectivity across subjects can help to understand brain function while also enabling the comparisons of cortical areas across different species [9].

Most of the current methods to match parcels across subjects are strongly linked to the technique used to create them. For example, Moreno-Dominguez et al. [11] seek correspondences between dendrograms created by means of Hierarchical Clustering. Parisot et al. [13] impose the consistence of parcels across subjects while creating the parcellation. In recent works Mars et al. propose to use the Manhattan distance, cosine similarity [10] or the Kullback–Leibler (KL) divergence [9] to compare and match connectivity fingerprints, successfully identifying common areas across humans and primates.

In this work, we propose to match parcels based on their extrinsic connectivity fingerprint using Optimal Transportation theory. Optimal Transport (OT) is a technique that seeks the optimal way to transport mass between probability distributions. While KL divergence computes the difference between two distributions, OT computes a matching between them. In particular, our method adopts a discrete regularized version of Optimal Transport (OT), which has been presented in Gayraud et al. [6] and Courty et al. [2] as a solution to the domain adaptation problem.

Fig. 1.
figure 1

From the cortico-cortical structural connectivity matrix of a subject, we can estimate the connectivity fingerprints of each parcel in three different types of parcellations. For each parcellation we compute the amount of correct matches (green lines) that each matching technique produces.

We validate our method with four different experiments. In the first experiment, we test the feasibility of our method by generating parcels with synthetic connectivity fingerprints and matching them. In the second one, we show that our technique is able to match parcels of the same atlas across subjects. We use the anatomical atlas of Desikan [4] as its parcels have high spatial coherence and consistent connectivity profiles across subjects [16]. Finally, we show the capacity of our method to match parcels generated with the same criteria but have some spatial cross-subject variability. We assess this for two different situations. In the first one, we derive the parcels from functional activations [1]. We use responses to motor and visual stimuli since they have been shown to be strongly related to structural connectivity [12, 15]. In the second one, we divide the Lateral Occipital Gyrus in 3 parcels using a structurally-based parcellation technique [5]. We use the Lateral Occipital Gyrus since it has been shown to have a consistent parcellation across subjects [5, 17]. The outline of the last three experiments can be seen in Fig. 1.

In each experiment, we compare our technique against three other ways to match parcels based on the Euclidean distance; the cosine similarity; and the Kullback-Leibler divergence. Our results on real data show that our method based on OT always achieves the highest number of correct matches.

2 Methods

Given two subjects with their respective parcellations, we compute their parcel matching by considering one as the origin and the other one as target. More formally, let \(X^a = \{x^a_i\}_{i=1}^{N_a}\), \(x^a_i \in \varOmega ^a \subset \mathbb {R}^n\) be an origin dataset where \(N_a\) denotes the number of parcels; \(x^a_i\) is the extrinsic connectivity fingerprint of parcel i; and n denotes its dimension. We wish to recover a matching between \(X^a\) and a target dataset \({X^b = \{x^b_i\}_{i=1}^{N_b}}\), \(x^b_i \in \varOmega ^b \subset \mathbb {R}^n\).

In this section, we start by formulating our regularized discrete OT-based method and proceed by presenting three ways of computing this matching that are based on the Euclidean distance; the cosine similarity; and the KL-divergence.

2.1 Discrete Regularized Optimal Transport

Optimal Transport (OT) theory boils down to finding the optimal way to transport or redistribute mass from one probability distribution to another with respect to some cost function. In this work, since the datasets \(X^a\) and \(X^b\) are discrete datasets, we use their empirical probability distributions and apply the discrete formulation of OT [2, 6] to solve the parcel matching problem. A simplified example of how our method proceeds is presented in Fig. 2.

Assume that \(X^a\) and \(X^b\) follow probability distributions \(p_a(x^a)\) and \(p_b(x^b)\), respectively. We suppose that \(X^a\) has undergone a transformation \(\mathbf {T}:\varOmega ^a \rightarrow \varOmega ^b\), such that \(p_b(\mathbf {T}(x^a)) = p_b(x^b)\). We wish to recover \(\mathbf {T}\) and use it to match the parcels of \(X^a\) and \(X^b\). Using discrete regularized OT we compute a transport plan \(\gamma _0\) between these two probability distributions. This transport plan is a doubly stochastic matrix which minimizes a certain transportation cost C over the vectors of \(X^a\) and \(X^b\). In other words, it defines the optimal exchange of mass between the two probability distributions. We use \(\gamma _0\) to compute an estimation \(\hat{\mathbf {T}}\) by selecting the pairs of vectors, i.e., parcels that exchange the most mass.

Since \(p_a(x^a)\) and \(p_b(x^b)\) are not known, we use the corresponding empirical distributions \(\mu _a = \sum _{i=1}^{N^a} p_i^a\delta _{x^a_i}\) and \(\mu _b =\sum _{j=1}^{N^b} p_j^b\delta _{x^b_j}\) instead, where \(p_i^a\) and \(p_j^b\) are the probability masses associated to each sample. However, given that the dimension of our data depends on the number of vertices in the cortical mesh, the curse of dimensionality makes the estimation of \(\mu _a\) and \(\mu _b\) intrinsically difficult. We therefore simply assume a uniform probability distribution over all vectors, \(p_i^a = \frac{1}{N^a}\) and \(p_j^b = \frac{1}{N^b}\). We compute the transport plan \(\gamma _0\) such that, if

$$\begin{aligned} {\mathcal {B} = \big \{ \gamma \in (\mathbb {R}^+)^{N_a\times N_b} \mid \gamma \mathbf {1}_{N_b} = \frac{1}{N^a}\mathbf {1}_{N_a}, \gamma ^{\mathbf {T}} \mathbf {1}_{N_a} = \frac{1}{N^b}\mathbf {1}_{N_b} \big \}} \end{aligned}$$
(1)

denotes the set of all doubly stochastic matrices whose marginals are the probability measures \(\mu _a\) and \(\mu _b\), where \(\mathbf {1}_N\) is an N-dimensional vector of ones, then \(\gamma _0 \in \mathcal {B}\) is the output of the following minimization problem.

$$\begin{aligned} \gamma _0 = \mathop {{{\mathrm{arg\,min}}}}\limits _{\gamma \in \mathcal {B}} ~ \langle \gamma ,C \rangle _F + \lambda \sum _{i,j} \gamma (i,j) ~log~ \gamma (i,j) \end{aligned}$$
(2)

The matrix C, where \(C(i,j) = \Vert x^a_i - x^b_j \Vert ^2_2\), represents the cost of moving probability mass from location \(x^a_j\) to location \(x^b_i\), in terms of their squared Euclidean distance. The rightmost term is a regularization term based on the negative entropy of \(\gamma \) allows us to solve this optimization problem using the Sinkhorn-Knopp algorithm [3] which improves the computation time.

Fig. 2.
figure 2

A 2-d example of using OT to compute the matching between two different datasets. On the left we show the original and target datasets. The real matchings are displayed as green dashed edges. In the middle, the edge densities represent the values of the computed coupling \(\gamma _0\), which denote the amount of mass that is exchanged between vectors \(x_i^a\) and \(x_j^b\). On the right, we see the recovered matching. The blue edges represent the correct matchings, while the red dotted edges represent the incorrect ones.

Matrix \(\gamma _0\) contains information about the exchange of probability mass between the vectors of \(X^a\) and \(X^b\). By construction, this exchange depends on the selected cost function. The choice of the squared euclidean distance is motivated both by the fact that it renders the optimization problem convex and because it will allow the parcels to be matched according to the vicinity of their feature vectors. Hence, the origin feature vectors will distribute their corresponding probability mass to the target feature vectors that are closest to them. Consequently, we define \(\hat{\mathbf {T}}:\varOmega ^a \rightarrow \varOmega ^b\) as \(\hat{\mathbf {T}}(x^a_i) = x^b_{\hat{j}}\) where \(\hat{j} = {{\mathrm{arg\,max}}}_{j} \gamma _0(i,j)\). Therefore, i will be matched to the parcel \(\hat{j}\) that it sent the most mass to.

2.2 Matching Parcels Based on Dissimilarity Between Features

Let \(d(x^a_i,x^b_j)\) be some dissimilarity measure between the elements of \(X^a\) and \(X^b\). Then, we say that parcel i matches parcel j if \({{\mathrm{arg\,min}}}_k d(x^a_i,x^b_k) = j\). We compare three dissimilarity measures against our method. First, we use the Euclidean distance, which can be interpreted as matching the parcel i to the parcel j whose feature vector \(x^b_j\) is the closest to \(x^a_i\). Then, we use the cosine similarity, which is minimized when two feature vectors are colinear. Lastly, we use the Kullback-Leibler divergence, which measures the difference between two probability distributions in terms of their relative entropy. Note that we need to convert our vectors into probability vectors in order to evaluate \(d_{KL}\).

3 Experiments and Results

3.1 Data and Preprocessing

For this work we randomly selected 20 subjects from the S500 group of the Human Connectome Project (HCP), all preprocessed with the HCP minimum pipeline [7]. Fiber orientation distributions functions where computed using spherical constrained deconvolution with a spherical harmonic order of 8. Probabilistic tractography was then performed using 1000 seeds per vertex of the cortical mesh provided with the HCP data. For each subject, we computed a connectivity matrix by counting the number of streamlines that connect each pair of vertices of the cortical mesh. Each row in the matrix is a vertex connectivity vector, representing the probability that a connection exists between a surface vertex and the rest of the surface’s vertices.

Given a whole brain cortical parcellation, we compute the connectivity fingerprint of each parcel by averaging the connectivity fingerprint of its vertices. Because the mesh’s vertices are coregistered across subjects [7], we are able to compare the connectivity fingerprints across subjects. The criterion to compute the parcel matching between two subjects is the similarity between connectivity fingerprints. That is, we match two parcels if they are connected to the rest of the brain in a similar manner. Due to the distance bias that occurs in tractography, a parcel tends to be highly connected to the vertices that compose it. To prevent the matching to be influenced by this bias, we disconnect each parcel from its own vertices.

3.2 Matching Parcels

In this section we evaluate the performance of our method by comparing it to the methods presented in Sect. 2.2. For each experiment we compute parcel matchings between all possible pairs of connectivity matrices. To quantify the result of each technique, we compute the accuracy in terms of percentage of correctly matched parcels per pairwise matching.

Matching Parcels with Synthetic Fingerprints. In this first experiment, we test the feasibility of our method by generating parcels with synthetic connectivity fingerprints and matching them. We start by generating a connectivity matrix M using probabilistic Constrained Spherical Deconvolution based tractography to use as ground truth. Our ground truth matrix is a square matrix that represents the connectivity between the 64 parcels of the Desikan atlas in one subject of the HCP dataset. Each coefficient \(M(i,j) = \theta _{ij}\) is the parameter of a random variable that follows a Bernoulli distribution \(X_{ij} ~ B(\theta _{ij})\). This variable \(X_{ij}\) represents the probability of a connection existing between the parcels i and j. Using M, we generate 20 synthetic matrices in such a way that the coefficients of each synthetic connectivity matrix are random variables that follow a binomial distribution \(X(i,j) \sim B(p=M(i,j),n)\). By doing this we simulate doing tractography for various values of the number n of particles. Figure 3a shows the performance of each method as a function of n.

Fig. 3.
figure 3

Proportion of parcels correctly matched by each method (see Sect. 2.2) when matching: (a) synthetic connectivity fingerprints and (b) connectivity fingerprints of a cortical parcellation, for three different parcellations (as described in Sect. 3.2). OT always performs significantly better.

Matching Parcels of the Desikan Atlas. For each subject, we compute the connectivity fingerprint of each parcel in their Desikan atlas as explained in Sect. 3.1. When matching parcels across subjects, Fig. 3b shows that on average OT achieves an accuracy of \(98\%\pm 2\%\), followed by cosine similarity \((94\%\pm 3\%)\), KL divergence \((87\%\pm 4\%)\), and finally Euclidean distance \((77\%\pm 11\%)\).

Matching Parcels Created Using Functional criteria. Each subject in the HCP dataset possesses z-score maps representing responses to different stimuli obtained with functional MRI (fMRI) [1]. We derive parcels for each subject from the responses to motor (hand, foot and tongue movement) and visual stimuli (faces vs shape recognition). We do so by keeping only the vertices whose z-score is in the top 35%. Figure 3b shows that OT performs best with an average of \(98\%\pm 6\%\). The cosine similarity, KL divergence, and Euclidean distance achieve average accuracies of \(97\%\pm 6\%\), \(92\%\pm 10\%\), and \(90\%\pm 13\%\) respectively.

Matching Parcels Created Using Structural criteria. For each subject, we first mask their Lateral Occipital Gyrus using the Desikan atlas. Then, we divide it into 3 parcels using the structural based parcellation technique of Gallardo et al. [5]. Once more, we can see on Fig. 3b that optimal transport has the highest average accuracy, equal to \(92\%\pm 16\%\). It is followed by the cosine similarity, the KL divergence, and the Euclidean distance, whose average accuracies equal \(85\%\pm 17\%\), \(84\%\pm 17\%\), and \(75\%\pm 17\%\)

4 Discussion

In this work we proposed a method to match parcels across subjects based on the connectivity fingerprint of a parcel.

We tested our method with four different experiments. In the first experiment our technique correctly matched connectivity fingerprints created in a synthetic way. Specifically, each entry in a fingerprint was sampled from a Binomial distribution, whose parameter was chosen as the corresponding value of a ground truth connectivity matrix. This can be thought as a simulation of the process of tracking in tractography with different number of streamlines.

Our second experiment shows that we can correctly match parcels of the Desikan atlas across subjects with a 98% of correct matches. The parcels of the Desikan atlas are known to have high spatial coherence and consistent connectivity profiles across subjects [16]. We therefore use this experiment as a reference point to benchmark our technique. The last two experiments show that our technique can match parcels generated with a same criteria, even when they have some spatial variability across-subjects. The first experiment uses parcels created from the functional response to specific motor and visual stimuli, known to be strongly linked to functional connectivity [12, 15]. The second one, parcels created from the structural parcellation of the Lateral Occipital Gyrus, a structure documented to have a consistent structural division [5, 17].

It’s important to notice that our technique achieved more than a 90% of correct matches in every experiment with real data. Given that we used 20 subjects, this represents a total of \(20\times 19=380\) cross-subject matches. In the case of the Desikan atlas, which possesses 64 parcels, this translates into a total of 24320 matches, from which 98% where correctly matched. Furthermore, when tested with a paired t-test to compare the number of correct matches, our method always performs significantly better than the other three (\(p<10^{-256}\)).

5 Conclusion

Matching structural parcels across different subjects is an open problem in neuroscience. In this work, we proposed a novel parcel matching method based on Optimal Transport. We tested its performance with four different experiments, always obtaining the highest number of correctly matched parcels, which is an improvement over the results of the currently used techniques. Our technique could have major implications in the study of brain connectivity and its relationship with brain function, allowing for the location of parcels with similar connectivity but not high spatial coherence. Also, it could help to understand the link between different brain atlases, and improve the comparisons of cortical areas between higher primates.