Hartigan’s Method for $$k$$ -MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval

Saint-Jean, Christophe; Nielsen, Frank

doi:10.1007/978-3-319-05317-2_11

Hartigan’s Method for $k$-MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval

Christophe Saint-Jean² &
Frank Nielsen^3,4

Chapter
First Online: 01 January 2014

1758 Accesses
5 Citations

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

We describe a novel algorithm called $k$-Maximum Likelihood Estimator ($k$-MLE) for learning finite statistical mixtures of exponential families relying on Hartigan’s $k$-means swap clustering method. To illustrate this versatile Hartigan $k$-MLE technique, we consider the exponential family of Wishart distributions and show how to learn their mixtures. First, given a set of symmetric positive definite observation matrices, we provide an iterative algorithm to estimate the parameters of the underlying Wishart distribution which is guaranteed to converge to the MLE. Second, two initialization methods for $k$-MLE are proposed and compared. Finally, we propose to use the Cauchy-Schwartz statistical divergence as a dissimilarity measure between two Wishart mixture models and sketch a general methodology for building a motion retrieval system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Otherwise, convergence to a pointwise estimate of the parameters would be replaced by convergence in distribution of a Markov chain.
2.
Product $\hat{\theta }_n^{(t)}\hat{\theta }_S^{(t)}$ is constant through iterations.
3.
For translation invariance, $\mathbb {X}_{i}$ are column centered before.
4.
Since $|2S|=2^d|S|$, we have $2^{\frac{nd}{2}} |S|^{\frac{n}{2}}$ that is equivalent to $|2S|^{\frac{n}{2}}$.

References

McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley Series in Probability and Statistics. Wiley-Interscience, New York (2008)
Google Scholar
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MATH MathSciNet Google Scholar
Nielsen, F.: $k$-MLE: a fast algorithm for learning statistical mixture models. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 869–872 (2012). Long version as arXiv:1203.5181
Jain, A.K.: Data clustering: 50 years beyond $K$-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Article Google Scholar
Wishart, J.: The generalised product moment distribution in samples from a Normal multivariate population. Biometrika 20(1/2), 32–52 (1928)
Article Google Scholar
Tsai, M.-T.: Maximum likelihood estimation of Wishart mean matrices under Lwner order restrictions. J. Multivar. Anal. 98(5), 932–944 (2007)
Article MATH Google Scholar
Formont, P., Pascal, T., Vasile, G., Ovarlez, J.-P., Ferro-Famil, L.: Statistical classification for heterogeneous polarimetric SAR images. IEEE J. Sel. Top. Sign. Proces. 5(3), 567–576 (2011)
Article Google Scholar
Jian, B., Vemuri, B.: Multi-fiber reconstruction from diffusion MRI using mixture of wisharts and sparse deconvolution. In: Information Processing in Medical Imaging, pp. 384–395, Springer, Berlin (2007)
Google Scholar
Cherian, A., Morellas, V., Papanikolopoulos, N., Bedros, S.: Dirichlet process mixture models on symmetric positive definite matrices for appearance clustering in video surveillance applications. In: Computer Vision and Pattern Recognition (CVPR), pp. 3417–3424 (2011)
Google Scholar
Nielsen, F., Garcia, V.: Statistical exponential families: a digest with flash cards. http://arxiv.org/abs/0911.4863.. Accessed Nov 2009
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1997)
Google Scholar
Wainwright, M.J., Jordan, M.J.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. (Methodological). 39 1–38 (1977)
Google Scholar
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)
Article MATH MathSciNet Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A $k$-means clustering algorithm. J. Roy. Stat. Soc. C (Applied Statistics). 28(1), 100–108 (1979)
Google Scholar
Telgarsky, M., Vattani, A.: Hartigan’s method: $k$-means clustering without Voronoi. In: Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 820–827 (2010)
Google Scholar
Nielsen, F., Boissonnat, J.D., Nock, R.: On Bregman Voronoi diagrams. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 746–755 (2007)
Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3), 645–678 (2005)
Article Google Scholar
Kulis, B., Jordan, M.I.: Revisiting $k$-means: new algorithms via Bayesian nonparametrics. In: International Conference on Machine Learning (ICML) (2012)
Google Scholar
Ackermann, M.R.: Algorithms for the Bregman $K$-median problem. PhD thesis. Paderborn University (2009)
Google Scholar
Arthur, D., Vassilvitskii, S.: $k$-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Ji, S., Krishnapuram, B., Carin, L.: Variational Bayes for continuous hidden Markov models and its application to active learning. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 522–532 (2006)
Article Google Scholar
Hidot, S., Saint-Jean, C.: An Expectation-Maximization algorithm for the Wishart mixture model: application to movement clustering. Pattern Recogn. Lett. 31(14), 2318–2324 (2010)
Article Google Scholar
Brent. R.P.: Algorithms for Minimization Without Derivatives. Courier Dover Publications, Mineola (1973)
Google Scholar
Bezdek, J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., Windham, M.P.: Local convergence analysis of a grouped variable version of coordinate descent. J. Optim. Theory Appl. 54(3), 471–477 (1987)
Article MATH MathSciNet Google Scholar
Bogdan, K., Bogdan, M.: On existence of maximum likelihood estimators in exponential families. Statistics 34(2), 137–149 (2000)
Article MATH MathSciNet Google Scholar
Ciuperca, G., Ridolfi, A., Idier, J.: Penalized maximum likelihood estimator for normal mixtures. Scand. J. Stat. 30(1), 45–59 (2003)
Article MATH MathSciNet Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
Google Scholar
Nielsen, F.: Closed-form information-theoretic divergences for statistical mixtures. In: International Conference on Pattern Recognition (ICPR), pp. 1723–1726 (2012)
Google Scholar
Haff, L.R., Kim, P.T., Koo, J.-Y., Richards, D.: Minimax estimation for mixtures of Wishart distributions. Ann. Stat. 39(6), 3417–3440 (2011)
Article MATH MathSciNet Google Scholar
Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5, 819–844 (2004)
MATH MathSciNet Google Scholar
Moreno, P.J., Ho, P., Vasconcelos, N.: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing Systems (2003)
Google Scholar
Petersen, K.B., Pedersen, M.S.: The matrix cookbook. http://www2.imm.dtu.dk/pubdb/p.php?3274. Accessed Nov 2012

Download references

Author information

Authors and Affiliations

Mathématiques, Image, Applications (MIA), Université de La Rochelle, 17000, La Rochelle, France
Christophe Saint-Jean
Sony Computer Science Laboratories, Inc., 3-14-13 Higashi Gotanda, 141-0022, Shinagawa-Ku, Tokyo, Japan
Frank Nielsen
Laboratoire d’Informatique (LIX), Ecole Polytechnique, Palaiseau Cedex, France
Frank Nielsen

Authors

Christophe Saint-Jean
View author publications
You can also search for this author in PubMed Google Scholar
Frank Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christophe Saint-Jean .

Editor information

Editors and Affiliations

Laboratoire d'Informatique (LIX), Polytechnique School, Palaiseau Cedex, France
Frank Nielsen

Appendix A

This Appendix details some calculations for distributions $\mathcal {W}_{d}$, $\mathcal {W}_{d,\underline{n}}$, $\mathcal {W}_{d,\underline{S}}$.

11.1.1 Wishart Distribution $\mathcal {W}_{d}$

$$\begin{aligned} \mathcal {W}_{d}(X;n,S)&= \frac{|X|^{\frac{n-d-1}{2}}\exp \{-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)\}}{2^{\frac{nd}{2}} |S|^{\frac{n}{2}} \varGamma _{d}(\frac{n}{2})}\\&= \exp \left\{ \frac{n-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)- \frac{nd}{2}\log (2) - \frac{n}{2} \log |S| - \log \varGamma _{d}\left( \frac{n}{2}\right) \right\} \end{aligned}$$

Letting $(\theta _{n},\theta _{S})=(\frac{n-d-1}{2},S^{-1}) \longleftrightarrow (n,S) = (2\theta _{n}+d+1,\theta _{S}^{-1})$

$$\begin{aligned} \mathcal {W}_{d}(X;\theta _{n},\theta _{S}) =&\exp \left\{ \frac{2\theta _{n} + d + 1-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(\theta _{S} X)- \frac{(2\theta _{n} + d + 1)d}{2}\log (2) \right. \\&- \left. \frac{(2\theta _{n} + d + 1)}{2} \log |\theta _{S}^{-1}| - \log \varGamma _{d}\left( \frac{2\theta _{n} + d + 1}{2}\right) \right\} \\ =&\exp \left\{ \theta _{n}\log |X|-\frac{1}{2} {\mathrm {tr}}(\theta _{S} X)- \left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) \right. \\&\left. - \log \varGamma _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) \right\} \\ =&\exp \left\{ <\theta _{n},\log |X|>_{\mathbb {R}} + <\theta _{S},- \frac{1}{2}X>_{HS} - F(\varTheta )\right\} \\&\text{ with } F(\varTheta ) = \left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) + \log \varGamma _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) \\ =&\exp \left\{ <\varTheta ,t(X)> - F(\varTheta ) + k(X)\right\} \\&\text{ with } t(X) =(\log |X|,- \frac{1}{2}X) \text{ and } k(X)=0 \end{aligned}$$

$$F(\varTheta ) = \left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) + \log \varGamma _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) $$

$$\begin{aligned} \frac{\partial F}{\partial \theta _{n}}(\theta _{n},\theta _{S}) = d\log (2) - \log |\theta _{S}|+ \varPsi _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) \end{aligned}$$

(11.43)

where $\varPsi _{d}$ is the multivariate Digamma function (or multivariate polygamma of order 0).

$$\begin{aligned} \frac{\partial F}{\partial \theta _{S}}(\theta _{n},\theta _{S}) = - \left( \theta _{n} + \frac{(d + 1)}{2}\right) \theta _{S}^{-1} \end{aligned}$$

(11.44)

Dissimilarity $\varDelta (\theta ,\theta ')$ between natural parameters $\theta =(\theta _{n},\theta _{S})$ and $\theta '=(\theta '_{n},\theta '_{S})$ is

$$\begin{aligned} \varDelta (\theta ,\theta ') =&F(\theta +\theta ') - (F(\theta ) + F(\theta ')) = \left( \theta _{n} + \theta _{n}'+ \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}+\theta '_{S}|\right) \nonumber \\&-\left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) - \left( \theta _{n}'+ \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta '_{S}|\right) \nonumber \\&+ \log \varGamma _{d}\left( \theta _{n} + \theta '_{n} + \frac{(d + 1)}{2}\right) - \log \varGamma _{d}\left( \theta _{n}+ \frac{(d + 1)}{2}\right) - \log \varGamma _{d}\left( \theta '_{n} + \frac{(d + 1)}{2}\right) \nonumber \\ =&- \frac{(d + 1)}{2}d\log (2) + \left( \theta _{n}+ \frac{(d + 1)}{2}\right) \log |\theta _{S}| + \left( \theta _{n}'+ \frac{(d + 1)}{2}\right) \log |\theta _{S}'|\nonumber \\&- \left( \theta _{n} + \theta _{n}'+ \frac{(d + 1)}{2}\right) \log |\theta _{S}+\theta _{S}'| + \log \left( \frac{\varGamma _{d}\left( \theta _{n} + \theta '_{n} + \frac{(d + 1)}{2}\right) }{\varGamma _{d}\left( \theta _{n} + \frac{(d \,+\, 1)}{2}\right) \varGamma _{d}\left( \theta '_{n} + \frac{(d\, +\, 1)}{2}\right) }\right) \end{aligned}$$

(11.45)

Remark $\varDelta (\theta ,\theta ) \ne 0$. Same quantity with source parameters $\lambda =(n,S)$ and $\lambda '=(n',S')$ is

$$\begin{aligned} \varDelta (\lambda ,\lambda ') =\,&- \frac{(d + 1)}{2}d\log (2) -\frac{n}{2} \log |S| - \frac{n'}{2} \log |S'| - \frac{n+n'-d-1}{2} \log |S^{-1}\nonumber \\&+ S'^{-1}| + \log \left( \frac{\varGamma _{d}\left( \frac{n+n'-d-1}{2}\right) }{\varGamma _{d}\left( \frac{n}{2}\right) \varGamma _{d}\left( \frac{n'}{2}\right) }\right) \end{aligned}$$

(11.46)

11.1.2 Distribution $\mathcal {W}_{d,\underline{n}}$

$$\begin{aligned} \mathcal {W}_{d}(X;\underline{n},S)&= \frac{|X|^{\frac{\underline{n}-d-1}{2}}\exp \{-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)\}}{2^{\frac{\underline{n}d}{2}} |S|^{\frac{\underline{n}}{2}} \varGamma _{d}(\frac{\underline{n}}{2})}\\&= \exp \left\{ \frac{\underline{n}-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)- \frac{\underline{n}d}{2}\log (2) - \frac{\underline{n}}{2} \log |S| - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \right\} \end{aligned}$$

Letting $\theta _{S}=S^{-1}$,

$$\begin{aligned} \mathcal {W}_{d}(X;\underline{n},\theta _{S})&= \exp \left\{ -\frac{1}{2} {\mathrm {tr}}(\theta _{S} X) + \frac{\underline{n}-d-1}{2} \log |X| - \frac{\underline{n}d}{2}\log (2) - \frac{\underline{n}}{2} \log |\theta _{S}^{-1}| - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \right\} \\&= \exp \left\{ <\theta _{S},- \frac{1}{2}X>_{HS} + k(X) - F_{\underline{n}}(\theta _{S})\right\} \\&\quad \quad \text{ with } F_{\underline{n}}(\theta _{S}) = \frac{\underline{n}d}{2}\log (2) - \frac{\underline{n}}{2} \log |\theta _{S}| + \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\&\quad \quad \text{ with } k_{\underline{n}}(X) = \frac{\underline{n}-d-1}{2}\log |X| \end{aligned}$$

Using the rule $\frac{\partial log |X|}{\partial X} = ^{t}(X^{-1})$ [33] and the symmetry of $\theta _{S}$, we get

$$\nabla _{\theta _{S}} F_{\underline{n}} (\theta _{S})= - \frac{\underline{n}}{2}\theta _{S}^{-1}$$

The correspondence between natural parameter $\theta _{S}$ and expectation parameter $\eta _{S}$ is

$$\eta _{S} = \nabla _{\theta _{S}} F_{\underline{n}} (\theta _{S}) = - \frac{\underline{n}}{2}\theta _{S}^{-1} \longleftrightarrow \theta _{S} = \nabla _{\eta _{S}} F_{\underline{n}}^{*}(\eta _{S}) = (\nabla _{\theta _{S}} F_{\underline{n}})^{-1}(\eta _{S}) = - \frac{\underline{n}}{2}\eta _{S}^{-1}$$

Finally, we obtain the MLE for $\theta _{S}$ in this sub family:

$$\hat{\theta }_{S} = - \frac{\underline{n}}{2} \left( \frac{1}{N}\sum _{i=1}^{N} - \frac{1}{2}X_{i}\right) ^{-1} = \underline{n} N \left( \sum _{i=1}^{N} X_{i}\right) ^{-1}$$

Same formulation with source parameter $S$:

$$\hat{S} = \hat{\theta }_{S}^{-1} = \left( \underline{n} N\left( \sum _{i=1}^{N} X_{i}\right) ^{-1}\right) ^{-1} = \frac{\sum _{i=1}^{N} X_{i}}{\underline{n}N}$$

Dual log-normalizer $F_{\underline{n}}^{*}$ for $\mathcal {W}_{d,\underline{n}}$ is

$$\begin{aligned} F_{\underline{n}}^{*} (\eta _{S})&= \langle (\nabla F_{\underline{n}})^{-1} (\eta _{S}), \eta _{S}\rangle - F_{\underline{n}}((\nabla F_{\underline{n}})^{-1} (\eta _{S}))\\&= \langle - \frac{\underline{n}}{2}\eta _{S}^{-1}, \eta _{S}\rangle - F_{\underline{n}}(- \frac{\underline{n}}{2}\eta _{S}^{-1})\\&= - \frac{\underline{n}}{2} {\mathrm {tr}}(\eta _{S}^{-1} \eta _{S}) - \frac{\underline{n}d}{2}\log (2) + \frac{\underline{n}}{2} \log \left[ (\frac{\underline{n}}{2})^{d} |-\eta _{S}^{-1} | \right] - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\&= - \frac{\underline{n}d}{2} (1+ \log (2) - \log \underline{n} + \log 2) + \frac{\underline{n}}{2} \log |-\eta _{S}^{-1} | - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\&= \frac{\underline{n}d}{2} \log \left( \frac{\underline{n}}{4e}\right) + \frac{\underline{n}}{2} \log |-\eta _{S}^{-1}| - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\[-28pt] \end{aligned}$$

$$\begin{aligned} {\mathrm {KL}}(\mathcal {W}_{d,\underline{n}}^{1} || \mathcal {W}_{d,\underline{n}}^{2})&= B_{F_{\underline{n}}}(\theta _{S_{2}}:\theta _{S_{1}})\\&= F_{\underline{n}}(\theta _{S_{2}}) - F_{\underline{n}}(\theta _{S_{1}}) - {<}\theta _{S_{2}}-\theta _{S_{1}}, \nabla _{\theta _{S}} F_{\underline{n}}(\theta _{S_{1}}){>}\\&= \frac{\underline{n}}{2} \left( \log |\theta _{S_{1}}| - \log |\theta _{S_{2}}| \right) + \frac{\underline{n}}{2} {\mathrm {tr}}((\theta _{S_{2}}-\theta _{S_{1}})\theta _{S_{1}}^{-1})\\&= \frac{\underline{n}}{2} \left( \log \frac{|\theta _{S_{1}}|}{|\theta _{S_{2}}|} + {\mathrm {tr}}(\theta _{S_{2}}\theta _{S_{1}}^{-1}) - d\right) \\&= \frac{\underline{n}}{2} \left( - \log \frac{|\theta _{S_{2}}|}{|\theta _{S_{1}}|} + {\mathrm {tr}}(\theta _{S_{2}}\theta _{S_{1}}^{-1}) - d\right) \end{aligned}$$

also with source parameter

$${\mathrm {KL}}(\mathcal {W}_{d,\underline{n}}^{1} || \mathcal {W}_{d,\underline{n}}^{2}) = \frac{\underline{n}}{2} \left( - \log \frac{|S_{1}|}{|S_{2}|} + {\mathrm {tr}}(S_{2}^{-1}S_{1}) - d\right) $$

Let’s remark that KL divergence depends now on $\underline{n}$.

$$\begin{aligned} B_{F_{\underline{n}}^*} (\eta _{S_{1}} : \eta _{S_{2}})&= {F_{\underline{n}}^*}(\eta _{S_{1}}) - {F_{\underline{n}}^*}(\eta _{S_{2}}) - {<}\eta _{S_{1}} - \eta _{S_{2}}, \nabla F_{\underline{n}}^{*}(\eta _{S_{2}}){>}_{HS}\\&= \frac{\underline{n}}{2} \left( \log |-\eta _{S_{1}}^{-1}| - \log |-\eta _{S_{2}}^{-1}|\right) - {<}\eta _{S_{1}} - \eta _{S_{2}}, -\frac{\underline{n}}{2} \eta _{S_{2}}^{-1} {>}_{HS}\\&= \frac{\underline{n}}{2} \left( \log \frac{|-\eta _{S_{1}}^{-1}|}{|-\eta _{S_{2}}^{-1}|} + {\mathrm {tr}}(\eta _{S_{1}}\eta _{S_{2}}^{-1}) - d\right) \end{aligned}$$

11.1.3 Distribution $\mathcal {W}_{d,\underline{S}}$

For fixed $\underline{S}$, the p.d.f of $\mathcal {W}_{d,\underline{S}}$ can be rewritten^{Footnote 4} as

$$\begin{aligned} \mathcal {W}_{d}(X;n,\underline{S})&= \frac{|X|^{\frac{n-d-1}{2}}\exp \{-\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X)\}}{|2\underline{S}|^{\frac{n}{2}} \varGamma _{d}(\frac{n}{2})}\\&= \exp \left\{ \frac{n-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X)- \frac{n}{2}\log |2\underline{S}| - \log \varGamma _{d}\left( \frac{n}{2}\right) \right\} \end{aligned}$$

Letting $\theta _{n}=\frac{n-d-1}{2}$ ($n=2\theta _{n}+d+1$)

$$\begin{aligned} \mathcal {W}_{d}(X;\theta _{n},\underline{S})&= \exp \left\{ \theta _{n} \log |X| -\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X) - \left( \theta _{n} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| - \log \varGamma _{d}\left( \theta _{n} + \frac{d+1}{2}\right) \right\} \\&= \exp \left\{ <\theta _{n},\log |X|> + k_{\underline{S}}(X) - F_{\underline{S}}(\theta _{n})\right\} \\&\quad \quad \text{ with } F_{\underline{S}}(\theta _{n}) = \left( \theta _{n} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| + \log \varGamma _{d}\left( \theta _{n} + \frac{d+1}{2}\right) \\&\quad \quad \text{ with } k_{\underline{S}}(X) = -\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X) \end{aligned}$$

The correspondence between natural parameter $\theta _{n}$ and expectation parameter $\eta _{n}$ is

$$\eta _{n} = \nabla _{\theta _{n}} F_{\underline{S}} (\theta _{n}) = \log \left| 2\underline{S}\right| + \varPsi _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) $$

$$\begin{aligned} \Leftrightarrow&\qquad \qquad \varPsi _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) = \eta _{n} - \log \left| 2\underline{S}\right| \\ \Leftrightarrow&\qquad \qquad \theta _{n} + \frac{(d + 1)}{2} = \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \\ \Leftrightarrow&\theta _{n} = \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2} = (\nabla F_{\underline{S}})^{-1} (\eta _{n}) = \nabla F_{\underline{S}}^* (\eta _{n})\\ \end{aligned}$$

Finally, we obtain the MLE for $\theta _{n}$ in this sub family:

$$\hat{\theta }_{n} = \varPsi _{d}^{-1}\left( \left[ \frac{1}{N} \sum _{i=1}^{N} \log \left| X\right| \right] - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}$$

Same formulation with source parameter $n$:

$$\begin{aligned} \frac{\hat{n} - d -1}{2}&= \varPsi _{d}^{-1}\left( \left[ \frac{1}{N} \sum _{i=1}^{N} \log \left| X\right| \right] - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}\\ \hat{n}&= 2 \varPsi _{d}^{-1}\left( \left[ \frac{1}{N} \sum _{i=1}^{N} \log \left| X\right| \right] - \log \left| 2\underline{S}\right| \right) \end{aligned}$$

Dual log-normalizer $F_{\underline{S}}^{*}$ for $\mathcal {W}_{d,\underline{S}}$ is

$$\begin{aligned} F_{\underline{S}}^{*} (\eta _{n}) = \,&\langle (\nabla F_{\underline{S}})^{-1} (\eta _{n}), \eta _{n}\rangle - F_{\underline{S}}((\nabla F_{\underline{S}})^{-1} (\eta _{n}))\\ = \,&\langle \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}, \eta _{n}\rangle \\&-\varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \log \left| 2\underline{S}\right| - \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \right) \\ =\,&\varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \left( \eta _{n} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2} \eta _{n} - \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \right) \\[-38pt] \end{aligned}$$

$$\begin{aligned} {\mathrm {KL}}(\mathcal {W}_{d,{\underline{S}}}^{1} || \mathcal {W}_{d,{\underline{S}}}^{2}) =\,&B_{F_{\underline{S}}} (\theta _{n_{2}} : \theta _{n_{1}}) = F_{\underline{S}}(\theta _{n_{2}}) - F_{\underline{S}}(\theta _{n_{1}}) - \langle \theta _{n_{2}} - \theta _{n_{1}} , \nabla F_{\underline{S}} (\theta _{n_{1}})\rangle \\ = \,&\left( \theta _{n_{2}} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| + \log \varGamma _{d}\left( \theta _{n_{2}} + \frac{d+1}{2}\right) \\&-\left( \theta _{n_{1}} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| - \log \varGamma _{d}\left( \theta _{n_{1}} + \frac{d+1}{2}\right) \\&-\langle \theta _{n_{2}} - \theta _{n_{1}}, \log \left| 2\underline{S}\right| + \varPsi _{d}\left( \theta _{n_{1}} + \frac{(d + 1)}{2}\right) \rangle \\ {\mathrm {KL}}(\mathcal {W}_{d,{\underline{S}}}^{1}|| \mathcal {W}_{d,{\underline{S}}}^{2}) =\,&\log \frac{\varGamma _{d}\left( \theta _{n_{2}} + \frac{d+1}{2}\right) }{\varGamma _{d}\left( \theta _{n_{1}} + \frac{d+1}{2}\right) } - (\theta _{n_{2}} - \theta _{n_{1}}) \varPsi _{d}\left( \theta _{n_{1}} + \frac{(d + 1)}{2}\right) \\ =\,&- \log \frac{\varGamma _{d}\left( \theta _{n_{1}} + \frac{d+1}{2}\right) }{\varGamma _{d}\left( \theta _{n_{2}} + \frac{d+1}{2}\right) } + (\theta _{n_{1}} - \theta _{n_{2}}) \varPsi _{d}\left( \theta _{n_{1}} + \frac{(d + 1)}{2}\right) \end{aligned}$$

also with source parameter

$${\mathrm {KL}}(\mathcal {W}_{d,{\underline{S}}}^{1}|| \mathcal {W}_{d,{\underline{S}}}^{2}) = - \log \left( \frac{\varGamma _{d}\left( \frac{n_{1}}{2}\right) }{\varGamma _{d}\left( \frac{n_{2}}{2}\right) }\right) + \left( \frac{n_{1} - n_{2}}{2}\right) \varPsi _{d}\left( \frac{n_{1}}{2}\right) $$

Let us remark that this quantity does not depend on $\underline{S}$.

$$\begin{aligned} B_{F_{\underline{S}}^*} (\eta _{n_{1}} : \eta _{n_{2}}) =\,&{F_{\underline{S}}^*}(\eta _{n_{1}}) - {F_{\underline{S}}^*}(\eta _{n_{2}}) - {<}\eta _{n_{1}} - \eta _{n_{2}}, \nabla F_{\underline{S}}^{*}(\eta _{n_{2}}){>}_{HS}\\ =\,&\varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2} \eta _{n_{1}} \\&\quad - \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \right) \\&- \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) \left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) + \frac{(d + 1)}{2} \eta _{n_{2}} \\&\quad + \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) \right) \\&- \langle \eta _{n_{1}} - \eta _{n_{2}}, \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}\rangle _{HS}\\ B_{F_{\underline{S}}^*} (\eta _{n_{1}} : \eta _{n_{2}}) = \,&\log \frac{\varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) \right) }{\varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \right) }\\&-\left[ \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) - \varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \right] \left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Saint-Jean, C., Nielsen, F. (2014). Hartigan’s Method for $k$-MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval. In: Nielsen, F. (eds) Geometric Theory of Information. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-05317-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-05317-2_11
Published: 09 May 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05316-5
Online ISBN: 978-3-319-05317-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Hartigan’s Method for \(k\)-MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval

Abstract

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A

11.1.1 Wishart Distribution \(\mathcal {W}_{d}\)

11.1.2 Distribution \(\mathcal {W}_{d,\underline{n}}\)

11.1.3 Distribution \(\mathcal {W}_{d,\underline{S}}\)

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A

Appendix A

11.1.1 Wishart Distribution \(\mathcal {W}_{d}\)

11.1.2 Distribution \(\mathcal {W}_{d,\underline{n}}\)

11.1.3 Distribution \(\mathcal {W}_{d,\underline{S}}\)

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation