Sliced and Radon Wasserstein Barycenters of Measures

Bonneel, Nicolas; Rabin, Julien; Peyré, Gabriel; Pfister, Hanspeter

doi:10.1007/s10851-014-0506-3

Sliced and Radon Wasserstein Barycenters of Measures

Published: 08 April 2014

Volume 51, pages 22–45, (2015)
Cite this article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Nicolas Bonneel^1,2,
Julien Rabin³,
Gabriel Peyré⁴ &
…
Hanspeter Pfister¹

4744 Accesses
164 Citations
18 Altmetric
1 Mention
Explore all metrics

Abstract

This article details two approaches to compute barycenters of measures using 1-D Wasserstein distances along radial projections of the input measures. The first method makes use of the Radon transform of the measures, and the second is the solution of a convex optimization problem over the space of measures. We show several properties of these barycenters and explain their relationship. We show numerical approximation schemes based on a discrete Radon transform and on the resolution of a non-convex optimization problem. We explore the respective merits and drawbacks of each approach on applications to two image processing problems: color transfer and texture mixing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic Distance Transform: Theory, Algorithms and Applications

Article Open access 19 June 2020

Johan Öfverstedt, Joakim Lindblad & Nataša Sladoje

Regularizing Image Intensity Transformations Using the Wasserstein Metric

Primal Heuristics for Wasserstein Barycenters

Notes

https://github.com/gpeyre/2014-JMIV-SlicedTransport

References

Agueh, M., Carlier, G.: Barycenters in the wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)
Article MATH MathSciNet Google Scholar
Averbuch, A., Coifman, R., Donoho, D., Israeli, M., Shkolnisky, Y., Sedelnikov, I.: A framework for discrete integral transformations: II. The 2D discrete radon transform. SIAM J. Sci. Comput. 30(2), 785–803 (2008)
Article MATH MathSciNet Google Scholar
Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution of the monge-kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)
Article MATH MathSciNet Google Scholar
Benamou, J.D., Froese, B.D., Oberman, A.M.: A viscosity solution approach to the Monge-Ampere formulation of the Optimal Transportation Problem. arXiv:1208.4873v2 (2013, unpublished)
Bertsekas, D.: The auction algorithm: a distributed relaxation method for the assignment problem. Ann. Operat. Res. 14, 105–123 (1988)
Article MATH MathSciNet Google Scholar
Bigot, J., Klein, T.: Consistent estimation of a population barycenter in the wasserstein space. Preprint arXiv:1212.2562v3 (2014)
Boman, J., Lindskog, F.: Support theorems for the radon transform and Cramèr–Wold theorems. J. Theor. Prob. 22(3), 683–710 (2009)
Article MATH MathSciNet Google Scholar
Bonneel, N., van de Panne, M., Paris, S., Heidrich, W.: Displacement interpolation using lagrangian mass transport. ACM Trans. Graph. (SIGGRAPH ASIA’11) 30(6), 1–12 (2011)
Article Google Scholar
Brady, M.L.: A fast discrete approximation algorithm for the radon transform. J. Comput. 27(1), 107–119 (1998)
MATH MathSciNet Google Scholar
Cuturi, M., Doucet, A.: Fast computation of wasserstein barycenters. arXiv:1310.4375v1 (2013, unpublished)
Dellacherie, C., Meyer, P.A.: Probabilities and Potential Math. Stud. 29. North Holland, Amsterdam (1978)
Google Scholar
Delon, J.: Movie and video scale-time equalization application to flicker reduction. IEEE Trans. Image Process. 15(1), 241–248 (2006)
Article MathSciNet Google Scholar
Desolneux, A., Moisan, L., Ronsin, S.: A compact representation of random phase and Gaussian textures. In: Proc. the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1381–1384 (2012)
Digne, J., Cohen-Steiner, D., Alliez, P., Goes, F., Desbrun, M.: Feature-preserving surface reconstruction and simplification from defect-laden point sets. J. Math. Imaging Vis. 48(2), 369–382 (2013)
Google Scholar
Ferradans, S., Xia, G.S., Peyré, G., Aujol, J.F.: Optimal transport mixing of gaussian texture models. In: Proc. SSVM’13 (2013)
Galerne, B., Gousseau, Y., Morel, J.M.: Random phase textures: theory and synthesis. IEEE Trans. Image Process. 20(1), 257–267 (2011)
Article MathSciNet Google Scholar
Galerne, B., Lagae, A., Lefebvre, S., Drettakis, G.: Gabor noise by example. ACM Trans. Graph. (Proceedings of ACM SIGGRAPH 2012) 31(4), 73.1–73.9 (2012)
Google Scholar
Haker, S., Zhu, L., Tannenbaum, A., Angenent, S.: Optimal mass transport for registration and warping. Int. J. Comput. Vis. 60(3), 225–240 (2004)
Article Google Scholar
Helgason, S.: The Radon Transform. Birkhauser, Boston (1980)
Book MATH Google Scholar
Kantorovich, L.: On the transfer of masses. Doklady Akademii Nauk 37(2), 227–229 (1942). (in russian)
Google Scholar
Kuhn, H.W.: The Hungarian method of solving the assignment problem. Naval Res. Logist. Quart. 2, 83–97 (1955)
Article MathSciNet Google Scholar
Matusik, W., Zwicker, M., Durand, F.: Texture design using a simplicial complex of morphable textures. ACM Trans. Graph. 24(3), 787–794 (2005)
Article Google Scholar
McCann, R.J.: A convexity principle for interacting gases. Adv. Math. 128(1), 153–179 (1997)
Article MATH MathSciNet Google Scholar
Mérigot, Q.: A multiscale approach to optimal transport. Comput. Graph. Forum 30(5), 1583–1592 (2011)
Article Google Scholar
Papadakis, N., Peyré, G., Oudet, E.: Optimal transport with proximal splitting. SIAM J. Imaging Sci. 7(1), 212–238 (2014)
Article MATH MathSciNet Google Scholar
Pitié, F., Kokaram, A.C., Dahyot, R.: N-Dimensional probability density function transfer and its application to color transfer. In: Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp. 1434–1439. (2005)
Rabin, J., Delon, J., Gousseau, Y.: Removing artefacts from color and contrast modifications. IEEE Trans. Image Process. 20(11), 3073–3085 (2011)
Article MathSciNet Google Scholar
Rabin, J., Peyré, G., Delon, J., Bernot, M.: Wasserstein barycenter and its application to texture mixing. In: Scale Space and Variational Methods in Computer Vision (SSVM’11), vol. 6667, pp. 435–446 (2011).
Reinhard, E., Pouli, T.: Colour spaces for colour transfer. In: Proceedings of the Third international conference on Computational color imaging. CCIW’11, pp. 1–15. Springer, Berlin (2011)
Rubner, Y., Tomasi, C., Guibas, L.: A metric for distributions with applications to image databases. In: IEEE International Conference on Computer Vision (ICCV’98), pp. 59–66 (1998)
Solodov, M.: Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11(1), 23–35 (1998)
Article MATH MathSciNet Google Scholar
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics Series. American Mathematical Society (2003)

Download references

Acknowledgments

We thank Marco Cuturi for applying his method to our dataset and for sharing his results. We thank Thouis R. Jones for useful feedback on our draft, and anonymous reviewers for their help in improving this paper. We also thank the authors of all the images used to demonstrate our color transfers. This work has been partially supported by NSF CGV-1111415. Gabriel Peyré acknowledges support from the European Research Council (ERC project SIGMA-Vision).

Author information

Authors and Affiliations

Harvard University, Cambridge, MA, USA
Nicolas Bonneel & Hanspeter Pfister
CNRS-LIRIS, Lyon, France
Nicolas Bonneel
GREYC, Université de Caen, Caen, France
Julien Rabin
CNRS-CEREMADE, Université Paris-Dauphine, Paris, France
Gabriel Peyré

Authors

Nicolas Bonneel
View author publications
You can also search for this author in PubMed Google Scholar
Julien Rabin
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Peyré
View author publications
You can also search for this author in PubMed Google Scholar
Hanspeter Pfister
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel Peyré.

Appendices

Appendix 1: Proofs of Section 2

Proof of Proposition 1

From the definition (5), one verifies that

$$\begin{aligned} \text {W}_{\mathbb {R}^d}(\varphi _{s,u} \sharp \mu _1,\varphi _{s,u} \sharp \mu _2) = s \text {W}_{\mathbb {R}^d}(\mu _1,\mu _2), \end{aligned}$$

(49)

so that

$$\begin{aligned} \fancyscript{E}_{s,u}(\mu )&= \sum _{i \in I} \lambda _i \text {W}_{\mathbb {R}^d}( \varphi _{s,u} \sharp \mu _i, \mu )^2\\&= s^2 \sum _{i \in I} \lambda _i \text {W}_{\mathbb {R}^d}( \mu _i, \varphi _{s,u}^{-1} \sharp \mu )^2 = s^2 \fancyscript{E}_{1,0}(\tilde{\mu }), \end{aligned}$$

where we have introduced the following change of variable

$$\begin{aligned} \mu = \varphi _{s,u} \sharp \tilde{\mu } \quad \Longleftrightarrow \quad \tilde{\mu } = \varphi _{s,u}^{-1} \sharp \mu , \end{aligned}$$

(note that $\varphi _{s,u}^{-1} = \varphi _{s^{-1},-s^{-1}u}$). One thus has

$$\begin{aligned} \underset{ \mu }{{{\mathrm{argmin}}}}\; \fancyscript{E}_{s,u}(\mu )&= \varphi _{s,u} \sharp \underset{ \tilde{\mu }}{{{\mathrm{argmin}}}}\; \fancyscript{E}_{1,0}(\tilde{\mu }) \end{aligned}$$

which proves (7). Property (8) is proved similarly. Properties (9) and (11) directly follow from (8). $\square $

Proof of Proposition 2

We aim at determining $(s^\star ,u^\star )$ such that

$$\begin{aligned} \mu ^\star \in \text {Bar}_{\mathbb {R}^d}^W(\mu _i,\lambda _i)_{i \in I} \quad \text {where} \quad \left\{ \begin{array}{l} \mu ^\star = \varphi ^\star \sharp \mu , \\ \mu _i = \varphi _i \sharp \mu , \end{array} \right. \end{aligned}$$

and where for simplicity we have denoted $\varphi _i = \varphi _{s_i,u_i}$ and $\varphi ^\star = \varphi _{s^\star ,u^\star }$. First, let us notice that

$$\begin{aligned} \varphi _{s,u}(x) = \nabla \left( \frac{s}{2}|| x+u/s ||^2 \right) , \end{aligned}$$

so that the set $\fancyscript{T}$ of maps of the form $\varphi _{s,u}$ is a subset of gradients of convex functions. This point is important since optimal maps between $\mu _i$ and $\mu ^\star $ are characterized as the gradient of convex functions that push forward $\mu _i$ onto $\mu ^\star $, see [32]. Following [1], we thus only need to show that

$$\begin{aligned}&\sum _{i \in I} \lambda _i T_i = \mathrm {Id}_{\mathbb {R}^d} \quad \text {where} \quad T_i = \varphi ^\star \circ \varphi _i^{-1} = \varphi _{ \tilde{s}_i, \tilde{u}_i }\\&\quad \text {where} \quad \left\{ \begin{array}{l} \tilde{s}_i = s^\star s_i^{-1} \\ \tilde{u}_i = u^{\star }-s^\star s_i^{-1} u_i \end{array} \right. \end{aligned}$$

since $T_i \sharp \mu _i = \mu ^\star $ and $T_i \in \fancyscript{T}$ is a gradient of a convex function. So that $\mu ^\star $ is a barycenter if and only if

$$\begin{aligned} \sum _{i \in I} \lambda _i T_i&= \sum _{i \in I} \lambda _i \varphi _{\tilde{s}_i, \tilde{u}_i} \\&= \varphi _{\sum _{i \in I} \lambda _i \tilde{s}_i, \sum _{i \in I} \lambda _i \tilde{u}_i } = \mathrm {Id}_{\mathbb {R}^d} = \varphi _{1,0}. \end{aligned}$$

This in turn is equivalent to the relationships

$$\begin{aligned} \sum _{i \in I} \lambda _i \tilde{s}_i = 1 \quad \text {and} \quad \sum _{i \in I} \lambda _i \tilde{u}_i = 0, \end{aligned}$$

which corresponds to (14). $\square $

Proof of Proposition 3

The proof is done in [1] for $\mu = \mu _j$ for some $j \in I$, which is supposed to be absolutely continuous. It extends to an arbitrary measure $\mu $. $\square $

Proof of Corollary 1

When using $\mu $, the uniform and normalized measure on $[0,1]$, with the notation of Proposition (3), one has $T_i = C_{\mu _i}^+$. This is indeed a classical result for 1-D optimal transport, see for instance [1], Section 6.1. One then recognizes that formula (18) is the same as formula (15). $\square $

Proof of Proposition 4

One has

$$\begin{aligned} \nu ^\star&\in \underset{\nu \in \bar{\fancyscript{M}}_1^+(\varOmega ^d)}{{{\mathrm{argmin}}}}\; \sum _{i \in I} \lambda _i \text {W}_{\varOmega ^d}(\nu _i,\nu )^2 \\&= \underset{\nu \in \bar{\fancyscript{M}}_1^+(\varOmega ^d)}{{{\mathrm{argmin}}}}\; \int _{\mathbb {S}^{d-1}} \sum _{i \in I} \lambda _i \text {W}_{\mathbb {R}}( \nu _i^\theta , \nu ^\theta )^2 \mathrm {d}\theta . \end{aligned}$$

This is equivalent to the fact that for almost all $\theta \in \mathbb {S}^{d-1}$, one has

$$\begin{aligned} \nu ^{\star , \theta } \in \text {Bar}_{\mathbb {R}}^W(\nu _i^\theta ,\lambda _i)_{i \in I}. \end{aligned}$$

$\square $

Proof of Proposition 5

Proof of (20). Similarly to the proof of (7), the proof of (20) is obtained by using the following invariance of the Wasserstein distance on $\varOmega ^d$

$$\begin{aligned} \text {W}_{\varOmega ^d}( \psi _{s,u} \sharp \nu _1, \psi _{s,u} \sharp \nu _2 ) = s \text {W}_{\varOmega ^d}( \nu _1, \nu _2 ). \end{aligned}$$

(50)

Proof of (21). One has that $\nu ^\star \in \text {Bar}_{\varOmega ^d}^W( \psi _{s_i,u_i} \sharp \nu , \lambda _i )_{i \in I}$ is equivalent to

$$\begin{aligned} \text {for almost all}\, \theta \in \mathbb {S}^{d-1}, \,\, (\nu ^\star )^\theta \in \text {Bar}_{\mathbb {R}}^W( \varphi _{s_i,\langle u_i,\,\theta \rangle } \sharp \nu ^\theta , \lambda _i )_{i \in I}. \end{aligned}$$

Using the property of proposition 2 for $d=1$, one obtains that

$$\begin{aligned} \text {Bar}_{\mathbb {R}}^W( \varphi _{s_i,\langle u_i,\,\theta \rangle } \sharp \nu ^\theta , \lambda _i )_{i \in I} \;\ni \; \varphi _{s^\star ,\langle u^\star ,\,\theta \rangle } \sharp \nu ^\theta , \end{aligned}$$

which gives the desired result. $\square $

Appendix 2: Proofs of Section 3

Proof of Proposition 6

For all $g \in \fancyscript{C}_0(\varOmega ^d)$, one has

$$\begin{aligned}&\int _{\mathbb {S}^{d-1}}\int _{\mathbb {R}} g(t, \theta ) \mathrm {d}(R(\mu )^\theta )(t) \mathrm {d}\theta = \int _{\varOmega ^d}g(t, \theta )\mathrm {d}(R(\mu ))(t, \theta ) \\&\qquad = \int _{\mathbb {R}^d}(R^*g)(x)\mathrm {d}\mu (x) \\&\qquad = \int _{\mathbb {R}^d}\int _{\mathbb {S}^{d-1}} g(P_\theta (x),\theta ) \mathrm {d}\theta \mathrm {d}\mu (x)\\&\qquad = \int _{\mathbb {S}^{d-1}} \int _{\mathbb {R}} g(y,\theta ) \mathrm {d}(P_\theta \sharp \mu )(y) \mathrm {d}\theta . \end{aligned}$$

$\square $

Proof of Lemma 1

Proof of (26): For all $g \in \fancyscript{C}_0(\varOmega ^d)$, one has

$$\begin{aligned} \int _{\mathbb {R}^d} g \mathrm {d}[ R (\varphi _{s,u} \sharp \mu )]&= \int _{\mathbb {R}^d} R^*(g) \mathrm {d}[ \varphi _{s,u} \sharp \mu ] \\&= \int _{\mathbb {R}^d} \int _{\mathbb {S}^{d-1}} g(\langle sx+u,\,\theta \rangle ,\theta ) \mathrm {d}\theta \mathrm {d}\mu (x) \\&= \!\int _{\mathbb {R}^d}\! \int _{\mathbb {S}^{d-1}} \!(g \!\circ \!\psi _{s,u})(\langle x,\,\theta \rangle ,\theta ) \mathrm {d}\theta \mathrm {d}\mu (x) \\&= \int _{\mathbb {R}^d} (g \circ \psi _{s,u}) \mathrm {d}[R(\mu )] \\&= \int _{\mathbb {R}^d} g \mathrm {d}[ \psi _{s,u} \sharp R(\mu )]. \end{aligned}$$

Proof of (27): First we notice, using (22), that

$$\begin{aligned}&R( f \circ \varphi _{s,u} )(t,\theta ) = \int _{{\mathbb R}^{d-1} } f\left( s(t\theta + U_\theta \gamma ) + u \right) \mathrm {d}\gamma \\&\qquad = \int _{{\mathbb R}^{d-1} } f\left( st\theta + U_\theta s\gamma + \langle u,\,\theta \rangle \theta + U_\theta (U_\theta )^{T}u \right) \mathrm {d}\gamma \\&\qquad = \int _{{\mathbb R}^{d-1} } f\left( (st+\langle u,\,\theta \rangle )\theta + U_\theta (s\gamma + (U_\theta )^{T}u) \right) \mathrm {d}\gamma \\&\qquad = s^{1-d} \int _{{\mathbb R}^{d-1} } f\left( \psi _{s,u} (t,\theta ) \theta + U_\theta \gamma ' \right) \mathrm {d}\gamma ' \end{aligned}$$

which proves

$$\begin{aligned} R( f \circ \varphi _{s,u} ) = s^{1-d} R(f) \circ \psi _{s,u}. \end{aligned}$$

(51)

We write $H = (R^*R)^{-1}$ the filtering operator with kernel $h^+$. One has, for smooth functions $f \in \fancyscript{S}(\mathbb {R}^d)$, denoting $\fancyscript{F}(f)=\hat{f}$,

$$\begin{aligned} \fancyscript{F}( H(f \circ \varphi _{s,u}) )&= c^{-1}|| \omega ||^{1-d} \hat{f}(s\omega ) e^{-\mathrm {i}\langle \omega ,\,u\rangle }, \\ \fancyscript{F}( H(f) \circ \varphi _{s,u} )&= c^{-1}|| s\omega ||^{1-d} \hat{f}(s\omega ) e^{-\mathrm {i}\langle \omega ,\,u\rangle }, \end{aligned}$$

and hence

$$\begin{aligned} H(f) \circ \varphi _{s,u} = s^{1-d} H(f \circ \varphi _{s,u}). \end{aligned}$$

(52)

This shows, using (51) and (52) that for all $f \in \fancyscript{D}(\mathbb {R}^d)$,

$$\begin{aligned} \int _{\mathbb {R}^d} f \mathrm {d}[ R^+( \psi _{s,u} \sharp \nu ) ]&= \int _{\mathbb {R}^d} (RHf) \circ \psi _{s,u} \mathrm {d}\nu \\&= s^{d-1} \int _{\mathbb {R}^d} R(H(f) \circ \varphi _{s,u}) \mathrm {d}\nu \\&= \int _{\mathbb {R}^d} RH(f \circ \varphi _{s,u}) \mathrm {d}\nu \\&= \int _{\mathbb {R}^d} f \mathrm {d}[ \varphi _{s,u} \sharp R^+(\nu ) ]. \end{aligned}$$

Proof of (28): the proof is similar to the one of (26). $\square $

Proof of Proposition 8

Using Lemma 1, one has

$$\begin{aligned} \text {Bar}_{\mathbb {R}^d}^R(\varphi _{s,u} \sharp \mu _i,\lambda _i)_{i \in I}&= R^+ \text {Bar}_{\varOmega ^d}^W(R(\varphi _{s,u}\sharp \mu _i), \lambda _i)_{i \in I}\\&= R^+ \text {Bar}_{\varOmega ^d}^W(\psi _{s,u}\sharp (R(\mu _i)), \lambda _i)_{i \in I}\\&= R^+ \psi _{s,u} \sharp \text {Bar}_{\varOmega ^d}^W(R(\mu _i), \lambda _i)_{i \in I}\\&= \varphi _{s,u} \sharp R^+ \text {Bar}_{\varOmega ^d}^W(R(\mu _i), \lambda _i)_{i \in I}\\&= \varphi _{s,u} \sharp \text {Bar}_{\mathbb {R}^d}^R(\mu _i,\lambda _i)_{i \in I}. \end{aligned}$$

which proves (7) for $\text {Bar}_{\mathbb {R}^d}^R$. Property (8) for $\text {Bar}_{\mathbb {R}^d}^R$ is proved similarly using (28). $\square $

Proof of Proposition 9

One has

$$\begin{aligned} \text {Bar}_{\mathbb {R}^d}^R(\varphi _{s_i,u_i} \sharp \mu ,\lambda _i)_{i \in I}&= R^+ \text {Bar}_{\varOmega ^d}^W(R(\varphi _{s_i,u_i}\sharp \mu ), \lambda _i)_{i \in I}\\&= R^+ \text {Bar}_{\varOmega ^d}^W( \psi _{s_i,u_i}\sharp R(\mu ), \lambda _i)_{i \in I}\\&= R^+ \psi _{s^\star ,u^\star } \sharp \text {Bar}_{\varOmega ^d}^W(R(\mu ), \lambda _i)_{i \in I}\\&= \varphi _{s^\star ,u^\star } \sharp R^+ \text {Bar}_{\varOmega ^d}^W(R(\mu ), \lambda _i)_{i \in I}\\&= \varphi _{s^\star ,u^\star } \sharp \text {Bar}_{\mathbb {R}^d}^R(\mu ,\lambda _i)_{i \in I}, \end{aligned}$$

which proves (13) for $\text {Bar}_{\mathbb {R}^d}^R$. $\square $

Appendix 3: Proof of Section 4

Proof of Proposition 10

Property (34) is a re-statement of property (19). Property (35) corresponds to the change of variable $\nu = R\mu \in \text { Im}(R)$ in (32), which is a bijection thanks to the injectivity of $R$, see proposition 7. $\square $

Proof of Proposition 11

The proof is the same as Proposition 1, replacing the invariance (49) by

$$\begin{aligned} \text {SW}_{\mathbb {R}^d}(\varphi _{s,u} \sharp \mu _1,\varphi _{s,u} \sharp \mu _2)&= \!\text {W}_{\varOmega ^d}( \!R( \varphi _{s,u} \sharp \mu _1 ), R( \varphi _{s,u} \sharp \mu _2 ) ) \\&= \text {W}_{\varOmega ^d}( \psi _{s,u}\! \sharp \!R( \mu _1 ), \psi _{s,u} \sharp R( \mu _1 ) ) \\&= \text {W}_{\varOmega ^d}( R( \mu _1 ), R( \mu _1 ) ) \\&= \text {SW}_{\mathbb {R}^d}( \mu _1, \mu _2), \end{aligned}$$

where we have used the invariance (50) of the Wasserstein distance on $\varOmega ^d$. $\square $

Proof of Proposition 12

One has,

$$\begin{aligned} \forall \,\theta \in \mathbb {S}^{d-1}, \quad P_\theta \sharp \varphi _{s,u} \sharp \mu = \varphi _{s,\langle u,\,\theta \rangle } \sharp P_\theta \sharp \mu . \end{aligned}$$

Thus, for an arbitrary $\tilde{\mu } \in \fancyscript{M}_1^+(\mathbb {R}^d)$, one has

$$\begin{aligned}&\sum _{i\in I} \lambda _i \text {W}_{\mathbb {R}}( P_\theta \sharp (\varphi _{s_i,u_i} \sharp \mu ), P_\theta \sharp \tilde{\mu } )^2\\&\qquad = \sum _{i\in I} \lambda _i \text {W}_{\mathbb {R}}( \varphi _{s_i,\langle u_i,\,\theta \rangle } \sharp (P_\theta \sharp \mu ), P_\theta \sharp \tilde{\mu } )^2 \\&\qquad \geqslant \sum _{i\in I} \lambda _i \text {W}_{\mathbb {R}}( \varphi _{s_i,\langle u_i,\,\theta \rangle } \sharp (P_\theta \sharp \mu ), \varphi _{s^\star ,\langle u^\star ,\,\theta \rangle } \sharp (P_\theta \sharp \mu ) )^2 \\&\qquad = \sum _{i\in I} \lambda _i \text {W}_{\mathbb {R}}( P_\theta \sharp (\varphi _{s_i,u_i} \sharp \mu ), P_\theta \sharp (\varphi _{s^\star ,u^\star } \sharp \mu ) )^2 \end{aligned}$$

where the inequality comes from the properties of 1-D Wasserstein barycenters. Integrating the resulting inequality with respect to $\theta \in \mathbb {S}^{d-1}$ gives

$$\begin{aligned}&\sum _i \lambda _i \text {SW}_{\mathbb {R}^d}( \varphi _{s_i,u_i} \sharp \mu , \tilde{\mu } )^2 \\&\quad \geqslant \sum _i \lambda _i \text {SW}_{\mathbb {R}^d}( \varphi _{s_i,u_i} \sharp \mu , \varphi _{s^\star ,u^\star } \sharp \mu )^2. \end{aligned}$$

This inequality is an equality if and only for almost all $\theta \in \mathbb {S}^{d-1}$, one has

$$\begin{aligned} P_\theta \sharp \tilde{\mu } = P_\theta \sharp (\varphi _{s^\star ,u^\star } \sharp \mu ) \end{aligned}$$

so that, using Proposition (7), this corresponds to $\tilde{\mu } = \varphi _{s^\star ,u^\star } \sharp \mu $. Since the measure $\tilde{\mu }$ is arbitrary, this gives the desired result. This proves (13) in the case $\text {Bar}_{\mathbb {R}^d}^S$. $\square $

Appendix 4: Proof of Theorem 1

Notations. Without loss of generality, for a fixed $Y \in \mathbb {R}^{d \times N}$, we study the smoothness of

$$\begin{aligned}&\forall \,X \in \mathbb {R}^{d\times N}, \quad \fancyscript{E}(X) = \frac{1}{2} \text {SW}_{\mathbb {R}^d}(\mu _X,\mu _Y)^2\\&\quad = \int _{\mathbb {S}^{d-1}} \fancyscript{E}_\theta (X) \mathrm {d}x\\&\quad \text {where} \quad \fancyscript{E}_\theta (X) = \frac{1}{2} \fancyscript{W}(X_\theta ,Y_\theta )^2. \end{aligned}$$

We have used, for $x, y \in \mathbb {R}^N$, the shorthand notation

$$\begin{aligned} \fancyscript{W}(x,y) = \text {W}_{\mathbb {R}}(\mu _x,\mu _y). \end{aligned}$$

The result of Theorem 1 then follows by summations of such functionals.

We define $\mathbb {U}(N,d)$ to be vectors of $\mathbb {R}^{d \times N}$ with distinct entries:

$$\begin{aligned}&\mathbb {U}(N,d) \nonumber \\&\quad = \left\{ W = (W_1, \ldots , W_N) \in {\mathbb R}^{d \times N} \;;\; \forall \,i \not =j,\, X_i \not = X_j \right\} . \end{aligned}$$

The hypothesis is that $X \in \mathbb {U}(N,d)$. One has

$$\begin{aligned} \fancyscript{E}_\theta (X) = \frac{1}{2} || X_\theta - Y_\theta \circ \sigma _\theta ||^2 \quad \text {where} \quad \sigma _\theta = \sigma _X^\theta \circ (\sigma _Y^{\theta })^{-1} \end{aligned}$$

is a permutation depending on both $X$ and $Y$. Note that the permutation involved are not necessarily unique, and are assumed to be arbitrary valid sorting permutations.

For $X \in \mathbb {R}^{N \times d}$ and $\varepsilon >0$ we introduce

$$\begin{aligned}&\Theta _\varepsilon (X)\\&\quad = \left\{ \theta \in \mathbb {S}^{d-1} \;;\; \forall \,|| \delta ||_{\mathbb {R}^{N \times d}} \leqslant \varepsilon , \quad X_\theta + \delta _\theta \in \mathbb {U}(N,1) \right\} . \end{aligned}$$

This is the set of directions for which any perturbation of $X$ of amplitude smaller than $\varepsilon $ has a projection with disjoint points.

Overview of the proof. In the following, we thus aim at proving that $\fancyscript{E}$ is $C^1$, that

$$\begin{aligned}&\tilde{\nabla } \fancyscript{E}(X) = \int _{\mathbb {S}^{d-1}} \tilde{\nabla } \fancyscript{E}_\theta (X) \mathrm {d}\theta \\&\quad \quad \text {where} \quad \tilde{\nabla } \fancyscript{E}_\theta (X) = (X_\theta - Y_\theta \circ \sigma _\theta ) \theta \end{aligned}$$

is indeed equal to $\nabla \fancyscript{E}(X)$, and that this gradient is Lipschitz continuous.

The general strategy of the proof is to split the integration between the directions $\theta \in \Theta _\varepsilon (X)$, for which we can locally assume that the permutations $\sigma _\theta $ are constant (see Lemma 2), which in turn defines a smooth quadratic energy, and the remaining directions in $\Theta _\varepsilon (X)^c$, which are shown to have a negligible contribution to the energy and to the derivative (see Lemma 3).

Preparatory results. The following lemma shows that if $\theta \in \Theta _\varepsilon (X)$ the permutations $\sigma _X^\theta $ are stable to small perturbations of $X$.

Lemma 2

Let $X \in \mathbb {U}(N,d)$. For all $\theta \in \Theta _\varepsilon (X)$, for all $\delta $ with $|| \delta ||_{\mathbb {R}^{N \times d}} \leqslant \varepsilon $, the permutation $\sigma _{X+\delta }^\theta $ that sorts $( \langle X_i+\delta _i,\,\theta \rangle )_i$ is uniquely defined and satisfies $\sigma _{X+\delta }^\theta = \sigma _X^\theta $.

Proof

If one has $\sigma _{X+\delta }^\theta \ne \sigma _X^\theta $, then necessarily there exists some $t \in [0,1]$ such that $\sigma _{X+t\delta }^\theta $ is not uniquely defined, which is equivalent to $X_\theta +t\delta _\theta $ not being in $\mathbb {U}(N,1)$. Since $|| t \delta ||_{\mathbb {R}^{N \times d}} \leqslant \varepsilon $, this shows that $\theta \notin \Theta _\varepsilon (X)$. $\square $

In order to prove Theorem 1, we need the following lemma.

Lemma 3

For $X \in \mathbb {U}(N, d)$, one has

$$\begin{aligned} \text {Vol}(\Theta _\varepsilon (X)^c) = \int _{\Theta _\varepsilon (X)^c} \mathrm {d}\theta = O( \varepsilon ). \end{aligned}$$

(53)

Proof

One has $X_\theta + \delta _\theta \notin \mathbb {U}(N,1)$ if and only there exists a pair of points $u=X_i+\delta _i$ and $v=X_j+\delta _j$ with $i \ne j$ such that

$$\begin{aligned}&\theta \in A(u,v)\\&\quad \quad \text {where} \quad A(u,v) = \left\{ \xi \in \mathbb {S}^{d-1} \;;\; \langle \xi ,\,u-v\rangle =0 \right\} \end{aligned}$$

Note that $A(u,v)$ is a great circle of the sphere $\mathbb {S}^{d-1}$.

One can thus covers $\Theta _\varepsilon (X)^c$ using the union of all such circles $A(u,v)$, which shows

$$\begin{aligned}&\Theta _\varepsilon (X)^c \subset \bigcup _{i \ne j} A_\varepsilon (X_i,X_j) \quad \text {where} \quad A_\varepsilon (x,y) \\&\quad = \bigcup _{ {\begin{matrix} || u-x || \leqslant \varepsilon \\ || v-y || \leqslant \varepsilon \end{matrix}} } A(u,v) \end{aligned}$$

Note that the geodesic distance $d$ on the sphere $\mathbb {S}^{d-1}$ between two circles is equal to the angle between the normal to the planes of the circles

$$\begin{aligned} d(A(u,v),A(x,y))&= \text {Angle}(u-v,x-y) \\&= \text {Angle}(x-y + \varepsilon w,x-y) \end{aligned}$$

where $|| w ||\leqslant 2$. As $\varepsilon \rightarrow 0$, after some computations, one has the following asymptotic decay of the angle

$$\begin{aligned} \text {Angle}(x-y + \varepsilon w,x-y) = O(\varepsilon /|| x-y ||) \end{aligned}$$

and thus $d(A(u,v),A(x,y)) \leqslant C \varepsilon $ for some constant $C$. This proves that $\forall \,u,v$, one has

$$\begin{aligned} \left\{ \begin{array}{l} || u-x || \leqslant \varepsilon \\ || v-y || \leqslant \varepsilon \end{array} \right. \quad \Longrightarrow \quad A(u,v) \subset B_{C\varepsilon }(x,y) \end{aligned}$$

for some constant $C>0$, where

$$\begin{aligned} B_{\varepsilon }(x,y) = \left\{ \xi \in \mathbb {S}^{d-1} \;;\; d(\xi ,A(x,y)) \leqslant \varepsilon \right\} \end{aligned}$$

One thus has

$$\begin{aligned} A_{\varepsilon }(x,y) \subset B_{C\varepsilon }(x,y). \end{aligned}$$

The volume of the spherical band $B_{C\varepsilon }(x,y)$ of width $C\varepsilon $ is proportional to $\varepsilon $, and thus Vol$(A_\varepsilon (x,y)) = O(\varepsilon )$. Since $\Theta _\varepsilon (X)^c$ is a finite union of such sets, one obtains the result. $\square $

Proof of continuity. For each $\theta $, the function $\fancyscript{E}_\theta $ is continuous as a minimum of continuous functions. The function $\fancyscript{E}$ being an integral of $\fancyscript{E}_\theta $ on a compact set $\mathbb {S}^{d-1}$, it is thus continuous.

Proof of differentiability. Let $\delta \in \mathbb {R}^{N \times d}$ and $\varepsilon = || \delta ||_{\mathbb {R}^{N \times d}}$. The definition of the Wasserstein distance reads

$$\begin{aligned} \fancyscript{W}((X+\delta )_\theta ,Y_\theta )^2 = || (X_\theta + \delta _\theta ) \circ \sigma _{X+\delta }^\theta - Y_\theta \circ \sigma _Y^\theta ||^2. \end{aligned}$$

For all $\theta \in \Theta _\varepsilon (X)$, thanks to Lemma 2, $\sigma _{X+\delta }^\theta = \sigma _{X}^\theta $. One can thus compute the variation of the 1-D Wasserstein distance with respect to $\delta $ as

$$\begin{aligned}&\fancyscript{W}((X+\delta )_\theta ,Y_\theta )^2 = || X_\theta +\delta _\theta - Y_\theta \circ \sigma _\theta ||^2 \end{aligned}$$

(54)

$$\begin{aligned}&\quad = \fancyscript{W}(X_\theta ,Y_\theta )^2 + \langle \tilde{\nabla } \fancyscript{E}_\theta (X) ,\, \delta \rangle _{\mathbb {R}^{N \times d}} + || \delta _\theta ||^2. \end{aligned}$$

(55)

Note that the fact that $\sigma _Y^{\theta }$ might not be uniquely defined has no impact on the value of (55). One thus has

$$\begin{aligned}&\fancyscript{E}(X+\delta )-\fancyscript{E}(X) - \langle \tilde{\nabla } \fancyscript{E}(X),\,\delta \rangle _{\mathbb {R}^{N \times d}}\\&\quad = A(\delta ) + B(\delta ) + O(|| \delta ||_{\mathbb {R}^{N \times d}}^2) \end{aligned}$$

where

$$\begin{aligned}&A(\delta ) = \int _{\Theta _\varepsilon (X)^c} \left( \fancyscript{W}(X_\theta +\delta _\theta ,Y_\theta )^2 - \fancyscript{W}(X_\theta ,Y_\theta )^2 \right) \mathrm {d}\theta \\&\quad \text {and} \quad B(\delta ) = -\int _{\Theta _\varepsilon (X)^c} \langle \tilde{\nabla } \fancyscript{E}_\theta (X) ,\, \delta \rangle _{\mathbb {R}^{N \times d}} \mathrm {d}\theta \end{aligned}$$

Note that in the expression of $B(\delta )$ the permutation $\sigma _\theta $ involved in $\tilde{\nabla } \fancyscript{E}_\theta (X)$ is not necessary unique, and can be chosen arbitrarily.

One has,

$$\begin{aligned} |\langle \tilde{\nabla } \fancyscript{E}_\theta (X) ,\, \delta \rangle _{\mathbb {R}^{N \times d}}| \leqslant || X-Y\circ \sigma ^\theta ||_{\mathbb {R}^{N \times d}} || \delta ||_{\mathbb {R}^{N \times d}} \end{aligned}$$

which implies, using Lemma 3

$$\begin{aligned}&|B(\delta )| \leqslant O( \text {Vol}(\Theta _\varepsilon (X)^c) || \delta ||_{\mathbb {R}^{N \times d}} )\nonumber \\&\quad = O(|| \delta ||_{\mathbb {R}^{N \times d}}^2) = o(|| \delta ||_{\mathbb {R}^{N \times d}}). \end{aligned}$$

(56)

Since $(\theta ,X) \mapsto \fancyscript{E}_\theta (X)$ is continuous and defined on a compact set, it is uniformly continuous, and thus

$$\begin{aligned} |\fancyscript{W}(X_\theta +\delta _\theta ,Y_\theta )^2 - \fancyscript{W}(X_\theta ,Y_\theta )^2| \leqslant C(\delta ) \end{aligned}$$

where $C(\delta ) \rightarrow 0$ where $\delta \rightarrow 0$. This shows that

$$\begin{aligned} |A(\delta )| \leqslant \text {Vol}(\Theta _\varepsilon (X)^c) C(\delta ) = o(|| \delta ||_{\mathbb {R}^{N \times d}}). \end{aligned}$$

(57)

Putting together (56) and (57) leads to

$$\begin{aligned} |\fancyscript{E}(X+\delta )-\fancyscript{E}(X) - \langle \tilde{\nabla } \fancyscript{E}(X),\,\delta \rangle | = o(|| \delta ||_{\mathbb {R}^{N \times d}}) \end{aligned}$$

which shows that $\fancyscript{E}$ is differentiable with $\nabla \fancyscript{E}= \tilde{\nabla } \fancyscript{E}$.

Proof of Lipschitzianity of the gradient. For all $\theta \in \varTheta _0(X)$, $\nabla \fancyscript{E}_\theta (X)$ is continuous and uniformly bounded, and thus $\nabla \fancyscript{E}$ is continuous. One has, for $\delta \in \mathbb {R}^{N \times d}$, and denoting $\varepsilon =|| \delta ||$,

$$\begin{aligned}&\nabla \fancyscript{E}(X+\delta ) - \nabla \fancyscript{E}(X) = M( \varTheta _\varepsilon (X) ) + M( \varTheta _\varepsilon (X)^c )\\&\quad \text {where} \quad M(U) = \int _U ( \nabla \fancyscript{E}_\theta (X+\delta ) - \nabla \fancyscript{E}_\theta (X) ) \mathrm {d}\theta . \end{aligned}$$

One has

$$\begin{aligned} M( \varTheta _\varepsilon (X) ) = \int _{ \varTheta _\varepsilon (X) } \delta _\theta \theta \mathrm {d}\theta \end{aligned}$$

whereas

$$\begin{aligned} M( \varTheta _\varepsilon (X)^c )&= \int _{ \varTheta _\varepsilon (X)^c } \delta _\theta \theta \mathrm {d}\theta \\&+ \int _{ \varTheta _\varepsilon (X)^c } ( Y \circ \tilde{\sigma }_\theta - Y \circ \sigma _\theta ) \theta \mathrm {d}\theta \end{aligned}$$

where $\tilde{\sigma }_\theta = \sigma _{Y_\theta } \circ \,\, \sigma _{X_\theta + \delta _\theta }^{-1}$. Using Lemma (3), one has for some constant $C>0$, $\text {Vol}(\Theta _\varepsilon (X)^c) \leqslant C || \delta ||_{\mathbb {R}^{N \times d}}$ and hence

$$\begin{aligned}&|| \nabla \fancyscript{E}(X+\delta ) - \nabla \fancyscript{E}(X) ||_{\mathbb {R}^{N \times d}} \\&\quad \leqslant (1 + 2 C || Y ||_{\mathbb {R}^{N \times d}})|| \delta ||_{\mathbb {R}^{N \times d}} \end{aligned}$$

which shows that $\nabla \fancyscript{E}$ is $(1 + 2 C || Y ||_{\mathbb {R}^{N \times d}})$-Lipschitz continuous.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonneel, N., Rabin, J., Peyré, G. et al. Sliced and Radon Wasserstein Barycenters of Measures. J Math Imaging Vis 51, 22–45 (2015). https://doi.org/10.1007/s10851-014-0506-3

Download citation

Received: 09 November 2013
Accepted: 03 March 2014
Published: 08 April 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s10851-014-0506-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sliced and Radon Wasserstein Barycenters of Measures

Abstract

Access this article

Similar content being viewed by others

Stochastic Distance Transform: Theory, Algorithms and Applications

Regularizing Image Intensity Transformations Using the Wasserstein Metric

Primal Heuristics for Wasserstein Barycenters

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs of Section 2

Proof of Proposition 1

Proof of Proposition 2

Proof of Proposition 3

Proof of Corollary 1

Proof of Proposition 4

Proof of Proposition 5

Appendix 2: Proofs of Section 3

Proof of Proposition 6

Proof of Lemma 1

Proof of Proposition 8

Proof of Proposition 9

Appendix 3: Proof of Section 4

Proof of Proposition 10

Proof of Proposition 11

Proof of Proposition 12

Appendix 4: Proof of Theorem 1

Lemma 2

Proof

Lemma 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sliced and Radon Wasserstein Barycenters of Measures

Abstract

Access this article

Similar content being viewed by others

Stochastic Distance Transform: Theory, Algorithms and Applications

Regularizing Image Intensity Transformations Using the Wasserstein Metric

Primal Heuristics for Wasserstein Barycenters

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs of Section 2

Proof of Proposition 1

Proof of Proposition 2

Proof of Proposition 3

Proof of Corollary 1

Proof of Proposition 4

Proof of Proposition 5

Appendix 2: Proofs of Section 3

Proof of Proposition 6

Proof of Lemma 1

Proof of Proposition 8

Proof of Proposition 9

Appendix 3: Proof of Section 4

Proof of Proposition 10

Proof of Proposition 11

Proof of Proposition 12

Appendix 4: Proof of Theorem 1

Lemma 2

Proof

Lemma 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation