Skip to main content
Log in

Cycle-Based Cluster Variational Method for Direct and Inverse Inference

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to \(10^5\) are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. This ensures the independence of these new cycles among each others and with \(B_{t+1}\).

  2. Thresholds are comparable after dividing our \(\beta \) by \(\sqrt{3}\) to have random models with identical variance of the couplings.

References

  1. Bethe, H.A.: Statistical theory of superlattices. Proc. R. Soc. Lond. A 150(871), 552–575 (1935)

    Article  ADS  MATH  Google Scholar 

  2. Chertkov, M., Chernyak, V.Y.: Loop series for discrete statistical models on graphs. J. Stat. Mech. 6, P06009 (2006)

  3. Cocco, S., Monasson, R.: Adaptive cluster expansion for the inverse Ising problem: convergence, algorithm and tests. J. Stat. Phys. 147(2), 252–314 (2012)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  4. Cooper, G.: The computational complexity of probabilistic inference using bayesian belief networks (research note). Artif. Intell. 42(2–3), 393–405 (1990)

    Article  MATH  Google Scholar 

  5. Decelle, A., Ricci-Tersenghi, F.: Pseudolikelihood decimation algorithm improving the inference of the interaction network in a general class of Ising models. Phys. Rev. Lett. 112, 070603 (2014)

    Article  ADS  Google Scholar 

  6. Dominguez, E., Lage-Castellanos, A., Mulet, R., Ricci-Tersenghi, F., Rizzo, T.: Characterizing and improving generalized belief propagation algorithms on the 2d Edwards-Anderson model. J. Stat. Mech. Theory Exp. 12, P12007 (2011)

  7. Furtlehner, C.: Approximate inverse Ising models close to a Bethe reference point. J. Stat. Mech. 09, P09020 (2013)

    MathSciNet  Google Scholar 

  8. Gabrié, M., Tramel, E.W., Krzakala, F.: Training restricted Boltzmann machine via the Thouless–Anderson–Palmer free energy. Adv. Neural Inf. Process. Syst. 28, 640–648 (2015)

    Google Scholar 

  9. Gelfand, A., Welling, M.: Generalized belief propagation on tree robust structured region graphs. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, vol. 28 (2012)

  10. Globerson, A., Jaakkola, T.: Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In: NIPS, pp. 553–560 (2007)

  11. Heskes, T.: Stable fixed points of loopy belief propagation are minima of the Bethe free energy. Adv. Neural Inf. Process. Syst. 15, 343–350 (2003)

    Google Scholar 

  12. Heskes, T., Albers, K., Kappen, B.: Approximate inference and constrained optimization. In: UAI (2003)

  13. Höfling, H., Tibshirani, R.: Estimation of sparse binary pairwise Markov networks using pseudo-likelihood. JMLR. 10, 883–906 (2009)

    MathSciNet  MATH  Google Scholar 

  14. Horton, J.: A polynomial-time algorithm to find the shortest cycle basis of a graph. SIAM J. Comput. 16(2), 358–366 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  15. Jörg, T., Lukic, J., Marinari, E., Martin, O.C.: Strong universality and algebraic scaling in two-dimensional Ising spin glasses. Phys. Rev. Lett. 96, 237205 (2006)

    Article  ADS  Google Scholar 

  16. Kappen, H., Rodríguez, F.: Efficient learning in Boltzmann machines using linear response theory. Neural Comput. 10(5), 1137–1156 (1998)

    Article  Google Scholar 

  17. Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3(4), 199–243 (2009)

    Article  MATH  Google Scholar 

  18. Kikuchi, R.: A theory of cooperative phenomena. Phys. Rev. 81, 988–1003 (1951)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  19. Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)

    Article  Google Scholar 

  20. Kolmogorov, V., Wainwright, M.: On the optimality of tree-reweighted max-product message-passing. In: UAI, pp. 316–323 (2005)

  21. Kudekar, S., Johnson, J., Chertkov, M.: Improved linear programming decoding using frustrated cycles. In: Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, pp. 1496–1500, 7–12 July 2013

  22. Lage-Castellanos, A., Mulet, R., Ricci-Tersenghi, F., Rizzo, T.: A very fast inference algorithm for finite-dimensional spin glasses: belief propagation on the dual lattice. Phys. Rev. E 84, 046706 (2011)

    Article  ADS  Google Scholar 

  23. Lauritzen, S.: Graphical Models. Oxford University Press, Oxford (1996)

    MATH  Google Scholar 

  24. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)

    Article  ADS  Google Scholar 

  25. Lee, S.I., Ganapathi, V., Koller, D.: Efficient structure learning of Markov networks using \({L}_1\)-regularization. In: NIPS (2006)

  26. Martin, V., Furtlehner, C., Han, Y., Lasgouttes, J.-M.: GMRF estimation under topological and spectral constraints. In: ECML, vol. 8725, pp. 370–385 (2014)

  27. Martin, V., Lasgouttes, J.-M., Furtlehner, C.: Latent binary MRF for online reconstruction of large scale systems. In: Annals of Mathematics and Artificial Intelligence, pp. 1–32. Springer, Dordrecht (2015)

  28. Mézard, M., Mora, T.: Constraint satisfaction problems and neural networks: a statistical physics perspective. J. Physiol. Paris 103(1–2), 107–113 (2009)

    Article  Google Scholar 

  29. Montanari, A., Rizzo, T.: How to compute loop corrections to the Bethe approximation. J. Stat. Mech. Theory Exp. 2005(10), P10011 (2005)

    Article  Google Scholar 

  30. Mooij, J., Kappen, H.: Loop corrections for approximate inference on factor graphs. JMLR. 8, 1113–1143 (2007)

    MathSciNet  MATH  Google Scholar 

  31. Morita, T.: Cluster variation method and Möbius inversion formula. J. Stat. Phys. 59(3–4), 819–825 (1990)

    Article  ADS  MATH  Google Scholar 

  32. Nguyen, H., Berg, J.: Bethe–Peierls approximation and the inverse Ising model. J. Stat. Mech. 1112(3501), P03004 (2012)

    Google Scholar 

  33. Pakzad, P., Anantharam, V.: Estimation and marginalization using the Kikuchi approximation methods. Neural Comput. 17(8), 1836–1873 (2005)

    Article  MATH  Google Scholar 

  34. Parisi, G., Slanina, F.: Loop expansion around the Bethe–Peierls approximation for lattice models. J. Stat. Mech. Theory Exp. 2006(02), L02003 (2006)

    Article  Google Scholar 

  35. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference. Morgan Kaufmann, San Mateo (1988)

    MATH  Google Scholar 

  36. Pelizzola, A.: Cluster variation method in statistical physics and probabilistic graphical models. J. Phys. A Math. Gen. 38(33), R309–R339 (2005)

    Article  ADS  MathSciNet  Google Scholar 

  37. Ramezanpour, A.: Computing loop corrections by message passing. Phys. Rev. E 87, 060103 (2013)

    Article  ADS  Google Scholar 

  38. Ravikumar, P., Wainwright, M.J., Lafferty, J.D.: High-dimensional Ising model selection using L\(_1\)-regularized logistic regression. Ann. Stat. 38(3), 1287–1319 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  39. Rizzi, R.: Minimum weakly fundamental cycle bases are hard to find. Algorithmica 53(3), 402–424 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  40. Ruozzi, N.: Message passing algorithms for optimization. PhD Thesis, Yale University (2011)

  41. Savit, R.: Duality in field theory and statistical systems. Rev. Mod. Phys. 52(2), 453–487 (1980)

    Article  ADS  MathSciNet  Google Scholar 

  42. Shimony, S.: Finding MAPs for belief networks is NP-hard. Artif. Intell. 68(2), 399–410 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  43. Sontag, D., Jaakkola, T.: New outer bounds on the marginal polytope. In: Neural Information Processing Systems. MIT, Cambridge (2007)

  44. Sontag, D., Meltzer, T., Globerson, A., Jaakkola, T., Weiss, Y.: Tightening LP-relaxations for MAP using message passing. In: Uncertainty in Artificial Intelligence (UAI) (2008)

  45. Sontag, D., Choe, D., Li, Y.: Efficiently searching for frustrated cycles in MAP inference. In: UAI, pp. 795–804 (2012)

  46. Sudderth, E., Wainwright, M., Willsky, A.: Loop series and Bethe variational bounds in attractive graphical models. NIPS. 20, 1425–1432 (2008)

    Google Scholar 

  47. Tanaka, K.: Statistical-mechanical approach to image processing. J. Phys. A Math. Gen. 35(37), R81 (2002)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  48. Wainwright, M., Jordan, M.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)

    MATH  Google Scholar 

  49. Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on trees: message-passing and linear programming. IEEE Trans. Inf. Theory 51(11), 3697–3717 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  50. Weiss, Y.: Correctness of local probability propagation in graphical models with loops. Neural Comput. 12(1), 1–41 (2000)

    Article  Google Scholar 

  51. Welling, M.: On the choice of regions for generalized belief propagation. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04), pp. 585–592 (2004)

  52. Welling, M., Teh, Y.: Approximate inference in Boltzmann machines. Artif. Intell. 143(1), 19–50 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  53. Welling, M., Minka, T., Teh, Y.W.: Structured region graphs: morphing EP into GBP. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, vol. 21 (2005)

  54. Xiao, J., Zhou, H.: Partition function loop series for a general graphical model: free-energy corrections and message-passing equations. J. Phys. A Math. Theor. 44(42), 425001 (2011)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  55. Yasuda, M., Tanaka, K.: Approximate learning algorithm in Boltzmann machines. Neural Comput. 21, 3130–3178 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  56. Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory. 51(7), 2282–2312 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  57. Yuille, A.L.: CCCP algorithms to minimize the Bethe and Kikuchi free energies: convergent alternatives to belief propagation. Neural Comput. 14, 1691–1722 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cyril Furtlehner.

Appendices

Appendix 1: Proof of Proposition 3.1

If \(\mathcal {G}^\star \) is acyclic, we can build a junction tree using each cycle as a clique, so the form 3.1 is correct except maybe for the specific form chosen for \(p_c\). The leaf nodes of \(\mathcal {G}^\star \) correspond either to dandling trees or to cycle regions of the primal graph \(\mathcal {G}\). From the hypothesis on \(\mathcal {G}\) these components are connected to the rest of the primal graph \(\mathcal {G}\) either via a single node or via a link. So summing over all variables contained in each of these regions except the contact node or link results in a subgraph of \(\mathcal {G}\) whose dual is still acyclic, with a modified factor corresponding to the contact link or vertex. By induction, \(\mathcal {G}\) can be reduced until one single arbitrary loop region remains, which still corresponds to a sub-graph of \(\mathcal {G}\). This results therefore in a marginal probability \(p_c\) having pairwise form with factor graph corresponding to cycle c.

Appendix 2: Dual Loop-Based Instabilities

Let us consider an Ising model on the single dual loop graph of Fig. 7 with uniform external field h and coupling J. We give the label 0 to the central node with counting number \(\kappa _0 = 1\) and labels \(\{1,2,3\}\) to the peripheral ones, these having \(\kappa _v=0\). Links with non-vanishing counting numbers (\(\kappa _\ell = -1\)) are for \(\ell \in \{01,02,03\}\), cycles are labelled \(\{012,023,031\}\). Using the corresponding minimal factor graph, we attach arbitrarily the only v-node, indexed by 0, to \(\ell =01\). The following exponential parameterization of the messages is adopted:

$$\begin{aligned} m_{c\rightarrow \ell }({\>s}_\ell )&=e^{w_{c\rightarrow \ell }+h_{c\rightarrow \ell }^1s_{\ell _1}+h_{c\rightarrow \ell }^2s_{\ell _2}+J_{c\rightarrow \ell }s_{\ell _1}s_{\ell _2}}\\ m_{\ell \rightarrow 0}(s_0)&= e^{w_{\ell \rightarrow 0}+h_{\ell \rightarrow 0}s_0}. \end{aligned}$$

From the update rules (3.8, 3.9) we get in particular for \((i,j)\in \{(1,2),(2,3),(3,1)\}\)

$$\begin{aligned} m_{0ij\rightarrow 0i}(s_0) \longleftarrow \sum _{s_j}\exp \left( h_{0kj\rightarrow 0j}^0s_0+\left( h_j+h_{0kj\rightarrow 0j}^j\right) s_j+\left( J_{0j}+J_{0kj\rightarrow 0j}\right) s_0s_j\right) , \end{aligned}$$

and more specifically

$$\begin{aligned} h_{0ij\rightarrow 0j}^0 \longleftarrow h_{0kj\rightarrow 0j}^0 + \frac{1}{4}\log \frac{A_{++}A_{-+}}{A_{+-}A_{--}} \end{aligned}$$

with

$$\begin{aligned} A_{\sigma _1\sigma _2} \mathop {=}\limits ^\mathrm{def}h_{0kj\rightarrow 0j}^0+\sigma _1(h_j+h_{0kj\rightarrow 0j}^j)+\sigma _2(J_{0j}+J_{0kj\rightarrow 0j}). \end{aligned}$$

From this we see that these iterative equations are at least marginally unstable, by the presence of an eigenmode of the Jacobian of eigenvalue 1 corresponding to \(h_{0kj\rightarrow 0j}^0=cte,\ \forall kj\). One additional dual loop centered on v-node 0 would actually render this mode unstable.

Appendix 3: Proof of Proposition 4.1

By definition of the Lagrange multipliers, when a fixed point is obtained, the corresponding set of beliefs \(\{b_i,b_\ell ,b_c\}\) allows one to factorize the joint measure as (3.1), where for all cycles of the basis, \(b_c(\>x_c)\) is itself in Bethe form

$$\begin{aligned} b_c(\>x_c) = \frac{1}{Z_c}\prod _{i=1}^b\frac{b_{ii+1}^c(x_i,x_{i+1})}{b_i^c(x_i)} \end{aligned}$$

where the \(b_i^c\) and \(b_{ii+1}^c\) are obtained from \(b_c\) by running BP on the cycle and are in general different from the \(b_i\) and \(b_\ell \) computed globally. The relation between the two corresponds to the loop correction. Let us call trivial, an edge (ij) whose factor is trivial \(\psi _{(ij)}(x_i,x_j) = f(x_i)f(x_j)\). Similarly we say that a cycle has a trivial belief if it is related to variable and pairwise beliefs as

$$\begin{aligned} b_c(\>x_c) = \prod _{i=1}^b\frac{b_{ii+1}(x_i,x_{i+1})}{b_i(x_i)}, \end{aligned}$$

i.e. the \(b_i\) and \(b_i^c\) coincide. First we remark that a cycle c containing one such trivial edge, not contained in any other cycle, has necessarily a trivial belief, because from the factorization (3.1) for any edge \(\ell \) we have in that case

$$\begin{aligned} \psi _\ell ^{(0)}(\>x_\ell )&= f(x_i)g(x_j)b_\ell (x_\ell )\prod _{c\ni \ell } \frac{b_\ell ^c(x_\ell )}{b_\ell (x_\ell )},\\&= f(x_i)g(x_j) b_\ell ^c(x_\ell ), \end{aligned}$$

so the pairwise cycle belief has to be of the form \(b_\ell ^c(x_\ell ) = b_i^c(x_i)b_j^c(x_j)\). As a result the factorized joint measure actually coincides with the same CVM approximation form (3.2) on a reduced graph, where link \(\ell \) has been removed and c is now discarded. From hypothesis (ii) the set of trivial links contained in one single cycle is non empty. As a results all these link can be removed and all corresponding cycles discarded. On the reduced graph, again since all cycles have a trivial belief, there is a non-empty subset of trivial links, that can be removed and so on. The procedure stops after eliminating all trivial links until only the underlying dual tree remains. The definition of the counting numbers ensures that we then end up with the Bethe form of the joint measure associated with this dual tree.

Appendix 4: Proof of Proposition 5.1

The proof is based on the following factorization of the joint measure on a cycle with help of a belief propagation fixed point:

$$\begin{aligned} P(\>x) = \frac{1}{Z_\text {BP}}\prod _{i=1}^n\frac{b_{i}(x_i,x_j)}{b_i(x_i)b_{i+1}(x_{i+1})}\prod _{i\in \mathcal {V}}b_i(x_i) \end{aligned}$$

with

$$\begin{aligned} \frac{b_{i}(x_i,x_j)}{b_i(x_i)b_{i+1}(x_{i+1})}&= 1 +\frac{b_i(x_i,x_{i+1})-b_i(x_i)b_{i+1}(x_{i+1})}{b_i(x_i)b_{i+1}(x_{i+1})}\\&\mathop {=}\limits ^\mathrm{def}1+\frac{B_{x_ix_{i+1}}^{(i)}}{b_{i+1}(x_{i+1})}, \end{aligned}$$

and then by expanding the factors when taking averages. Let us call bond \(ii+1\) the contribution corresponding to the factor \(\frac{B_{x_ix_{i+1}}^{(i)}}{b_{i+1}(x_{i+1})}\) instead of 1. The point is that one extremity of a bond cannot be left alone in this expansion, if the corresponding variable is summed over, because of the following identities:

$$\begin{aligned} \sum _{x_i} b_i(x_i)\frac{B_{x_ix_{i+1}}^{(i)}}{b_{i+1}(x_{i+1})} = \sum _{x_i} B_{x_{i-1}x_i}^{(i-1)} = 0. \end{aligned}$$

For the partition function for instance, either all or none of the bound have to be selected, yielding only the two contributions:

$$\begin{aligned} Z_\text {BP}&= \sum _{\>x} \left( \prod _{i=1}^n b_i(x_i)+\prod _{i=1}^n B_{x_ix_i+1}^{(i)}\right) ,\\&= 1 + {{\mathrm{\text {Tr}}}}(U). \end{aligned}$$

For the single variable marginal, say \(p_i(x_i)\), again either none or either all of the bonds have to be selected, giving

$$\begin{aligned} p_i(x_i)&= \frac{1}{Z_\text {BP}} \sum _{\>x\backslash x_i} \left( \prod _{j=1}^n b_j(x_j)+\prod _{j=1}^n B_{x_jx_{j+1}}^{(i)}\right) \\&= \frac{b_i(x_i)+U_{x_ix_i}^{(i)}}{Z_\text {BP}}. \end{aligned}$$

For the pairwise marginals \(p_i(x_i,x_{i+1})\) two additional contributions emerge corresponding to selecting only the bond \(ii+1\) or to selecting all the bonds except this one, yielding the announced expression.

Appendix 5: Proof of Proposition 6.2

The problem is to bound in absolute value the largest eigenvalue of the Jacobian. Let \(\lambda \) be an eigenvalue and \(\mathbf{v}\) be an eigenvector of J Let

$$\begin{aligned} v = \max _j v_j \end{aligned}$$

and i the corresponding index, such that \(v_i = v\). We have

$$\begin{aligned} \vert \lambda \vert&= \vert \sum _j J_{ij}\frac{v_j}{v}\vert \\&\le \sum _j \vert J_{ij}\vert \\&\le \vert \frac{Q}{\Theta _j}A'_i(Q)\vert +\sum _{j\ne i} \vert \frac{Q}{\Theta _i\Theta _j}\vert \\&\le \frac{\vert Q\vert }{\Theta _{min}^{(1)}\Theta _{min}^{(2)}}\left( B(Q)+n-1\right) , \end{aligned}$$

with the definition of B(Q) and \(\Theta _{min}^{(1,2)}\) given in the text. Imposing \(\vert \lambda \vert \le 1\) leads to the conditions given in the proposition. In particular when magnetizations are absent, i.e. when \(h_i=0,\forall i\), we have

$$\begin{aligned} A'(Q) = \hat{\Theta }_i \end{aligned}$$

so

$$\begin{aligned} B(Q) = \max _i\vert \hat{\Theta }_i\Theta _i\vert \le 1. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Furtlehner, C., Decelle, A. Cycle-Based Cluster Variational Method for Direct and Inverse Inference. J Stat Phys 164, 531–574 (2016). https://doi.org/10.1007/s10955-016-1566-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10955-016-1566-0

Keywords

Navigation