Abstract
Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to \(10^5\) are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.
Similar content being viewed by others
Notes
This ensures the independence of these new cycles among each others and with \(B_{t+1}\).
Thresholds are comparable after dividing our \(\beta \) by \(\sqrt{3}\) to have random models with identical variance of the couplings.
References
Bethe, H.A.: Statistical theory of superlattices. Proc. R. Soc. Lond. A 150(871), 552–575 (1935)
Chertkov, M., Chernyak, V.Y.: Loop series for discrete statistical models on graphs. J. Stat. Mech. 6, P06009 (2006)
Cocco, S., Monasson, R.: Adaptive cluster expansion for the inverse Ising problem: convergence, algorithm and tests. J. Stat. Phys. 147(2), 252–314 (2012)
Cooper, G.: The computational complexity of probabilistic inference using bayesian belief networks (research note). Artif. Intell. 42(2–3), 393–405 (1990)
Decelle, A., Ricci-Tersenghi, F.: Pseudolikelihood decimation algorithm improving the inference of the interaction network in a general class of Ising models. Phys. Rev. Lett. 112, 070603 (2014)
Dominguez, E., Lage-Castellanos, A., Mulet, R., Ricci-Tersenghi, F., Rizzo, T.: Characterizing and improving generalized belief propagation algorithms on the 2d Edwards-Anderson model. J. Stat. Mech. Theory Exp. 12, P12007 (2011)
Furtlehner, C.: Approximate inverse Ising models close to a Bethe reference point. J. Stat. Mech. 09, P09020 (2013)
Gabrié, M., Tramel, E.W., Krzakala, F.: Training restricted Boltzmann machine via the Thouless–Anderson–Palmer free energy. Adv. Neural Inf. Process. Syst. 28, 640–648 (2015)
Gelfand, A., Welling, M.: Generalized belief propagation on tree robust structured region graphs. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, vol. 28 (2012)
Globerson, A., Jaakkola, T.: Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In: NIPS, pp. 553–560 (2007)
Heskes, T.: Stable fixed points of loopy belief propagation are minima of the Bethe free energy. Adv. Neural Inf. Process. Syst. 15, 343–350 (2003)
Heskes, T., Albers, K., Kappen, B.: Approximate inference and constrained optimization. In: UAI (2003)
Höfling, H., Tibshirani, R.: Estimation of sparse binary pairwise Markov networks using pseudo-likelihood. JMLR. 10, 883–906 (2009)
Horton, J.: A polynomial-time algorithm to find the shortest cycle basis of a graph. SIAM J. Comput. 16(2), 358–366 (1987)
Jörg, T., Lukic, J., Marinari, E., Martin, O.C.: Strong universality and algebraic scaling in two-dimensional Ising spin glasses. Phys. Rev. Lett. 96, 237205 (2006)
Kappen, H., Rodríguez, F.: Efficient learning in Boltzmann machines using linear response theory. Neural Comput. 10(5), 1137–1156 (1998)
Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3(4), 199–243 (2009)
Kikuchi, R.: A theory of cooperative phenomena. Phys. Rev. 81, 988–1003 (1951)
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)
Kolmogorov, V., Wainwright, M.: On the optimality of tree-reweighted max-product message-passing. In: UAI, pp. 316–323 (2005)
Kudekar, S., Johnson, J., Chertkov, M.: Improved linear programming decoding using frustrated cycles. In: Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, pp. 1496–1500, 7–12 July 2013
Lage-Castellanos, A., Mulet, R., Ricci-Tersenghi, F., Rizzo, T.: A very fast inference algorithm for finite-dimensional spin glasses: belief propagation on the dual lattice. Phys. Rev. E 84, 046706 (2011)
Lauritzen, S.: Graphical Models. Oxford University Press, Oxford (1996)
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)
Lee, S.I., Ganapathi, V., Koller, D.: Efficient structure learning of Markov networks using \({L}_1\)-regularization. In: NIPS (2006)
Martin, V., Furtlehner, C., Han, Y., Lasgouttes, J.-M.: GMRF estimation under topological and spectral constraints. In: ECML, vol. 8725, pp. 370–385 (2014)
Martin, V., Lasgouttes, J.-M., Furtlehner, C.: Latent binary MRF for online reconstruction of large scale systems. In: Annals of Mathematics and Artificial Intelligence, pp. 1–32. Springer, Dordrecht (2015)
Mézard, M., Mora, T.: Constraint satisfaction problems and neural networks: a statistical physics perspective. J. Physiol. Paris 103(1–2), 107–113 (2009)
Montanari, A., Rizzo, T.: How to compute loop corrections to the Bethe approximation. J. Stat. Mech. Theory Exp. 2005(10), P10011 (2005)
Mooij, J., Kappen, H.: Loop corrections for approximate inference on factor graphs. JMLR. 8, 1113–1143 (2007)
Morita, T.: Cluster variation method and Möbius inversion formula. J. Stat. Phys. 59(3–4), 819–825 (1990)
Nguyen, H., Berg, J.: Bethe–Peierls approximation and the inverse Ising model. J. Stat. Mech. 1112(3501), P03004 (2012)
Pakzad, P., Anantharam, V.: Estimation and marginalization using the Kikuchi approximation methods. Neural Comput. 17(8), 1836–1873 (2005)
Parisi, G., Slanina, F.: Loop expansion around the Bethe–Peierls approximation for lattice models. J. Stat. Mech. Theory Exp. 2006(02), L02003 (2006)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference. Morgan Kaufmann, San Mateo (1988)
Pelizzola, A.: Cluster variation method in statistical physics and probabilistic graphical models. J. Phys. A Math. Gen. 38(33), R309–R339 (2005)
Ramezanpour, A.: Computing loop corrections by message passing. Phys. Rev. E 87, 060103 (2013)
Ravikumar, P., Wainwright, M.J., Lafferty, J.D.: High-dimensional Ising model selection using L\(_1\)-regularized logistic regression. Ann. Stat. 38(3), 1287–1319 (2010)
Rizzi, R.: Minimum weakly fundamental cycle bases are hard to find. Algorithmica 53(3), 402–424 (2009)
Ruozzi, N.: Message passing algorithms for optimization. PhD Thesis, Yale University (2011)
Savit, R.: Duality in field theory and statistical systems. Rev. Mod. Phys. 52(2), 453–487 (1980)
Shimony, S.: Finding MAPs for belief networks is NP-hard. Artif. Intell. 68(2), 399–410 (1994)
Sontag, D., Jaakkola, T.: New outer bounds on the marginal polytope. In: Neural Information Processing Systems. MIT, Cambridge (2007)
Sontag, D., Meltzer, T., Globerson, A., Jaakkola, T., Weiss, Y.: Tightening LP-relaxations for MAP using message passing. In: Uncertainty in Artificial Intelligence (UAI) (2008)
Sontag, D., Choe, D., Li, Y.: Efficiently searching for frustrated cycles in MAP inference. In: UAI, pp. 795–804 (2012)
Sudderth, E., Wainwright, M., Willsky, A.: Loop series and Bethe variational bounds in attractive graphical models. NIPS. 20, 1425–1432 (2008)
Tanaka, K.: Statistical-mechanical approach to image processing. J. Phys. A Math. Gen. 35(37), R81 (2002)
Wainwright, M., Jordan, M.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on trees: message-passing and linear programming. IEEE Trans. Inf. Theory 51(11), 3697–3717 (2005)
Weiss, Y.: Correctness of local probability propagation in graphical models with loops. Neural Comput. 12(1), 1–41 (2000)
Welling, M.: On the choice of regions for generalized belief propagation. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04), pp. 585–592 (2004)
Welling, M., Teh, Y.: Approximate inference in Boltzmann machines. Artif. Intell. 143(1), 19–50 (2003)
Welling, M., Minka, T., Teh, Y.W.: Structured region graphs: morphing EP into GBP. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, vol. 21 (2005)
Xiao, J., Zhou, H.: Partition function loop series for a general graphical model: free-energy corrections and message-passing equations. J. Phys. A Math. Theor. 44(42), 425001 (2011)
Yasuda, M., Tanaka, K.: Approximate learning algorithm in Boltzmann machines. Neural Comput. 21, 3130–3178 (2009)
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory. 51(7), 2282–2312 (2005)
Yuille, A.L.: CCCP algorithms to minimize the Bethe and Kikuchi free energies: convergent alternatives to belief propagation. Neural Comput. 14, 1691–1722 (2002)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Proposition 3.1
If \(\mathcal {G}^\star \) is acyclic, we can build a junction tree using each cycle as a clique, so the form 3.1 is correct except maybe for the specific form chosen for \(p_c\). The leaf nodes of \(\mathcal {G}^\star \) correspond either to dandling trees or to cycle regions of the primal graph \(\mathcal {G}\). From the hypothesis on \(\mathcal {G}\) these components are connected to the rest of the primal graph \(\mathcal {G}\) either via a single node or via a link. So summing over all variables contained in each of these regions except the contact node or link results in a subgraph of \(\mathcal {G}\) whose dual is still acyclic, with a modified factor corresponding to the contact link or vertex. By induction, \(\mathcal {G}\) can be reduced until one single arbitrary loop region remains, which still corresponds to a sub-graph of \(\mathcal {G}\). This results therefore in a marginal probability \(p_c\) having pairwise form with factor graph corresponding to cycle c.
Appendix 2: Dual Loop-Based Instabilities
Let us consider an Ising model on the single dual loop graph of Fig. 7 with uniform external field h and coupling J. We give the label 0 to the central node with counting number \(\kappa _0 = 1\) and labels \(\{1,2,3\}\) to the peripheral ones, these having \(\kappa _v=0\). Links with non-vanishing counting numbers (\(\kappa _\ell = -1\)) are for \(\ell \in \{01,02,03\}\), cycles are labelled \(\{012,023,031\}\). Using the corresponding minimal factor graph, we attach arbitrarily the only v-node, indexed by 0, to \(\ell =01\). The following exponential parameterization of the messages is adopted:
From the update rules (3.8, 3.9) we get in particular for \((i,j)\in \{(1,2),(2,3),(3,1)\}\)
and more specifically
with
From this we see that these iterative equations are at least marginally unstable, by the presence of an eigenmode of the Jacobian of eigenvalue 1 corresponding to \(h_{0kj\rightarrow 0j}^0=cte,\ \forall kj\). One additional dual loop centered on v-node 0 would actually render this mode unstable.
Appendix 3: Proof of Proposition 4.1
By definition of the Lagrange multipliers, when a fixed point is obtained, the corresponding set of beliefs \(\{b_i,b_\ell ,b_c\}\) allows one to factorize the joint measure as (3.1), where for all cycles of the basis, \(b_c(\>x_c)\) is itself in Bethe form
where the \(b_i^c\) and \(b_{ii+1}^c\) are obtained from \(b_c\) by running BP on the cycle and are in general different from the \(b_i\) and \(b_\ell \) computed globally. The relation between the two corresponds to the loop correction. Let us call trivial, an edge (ij) whose factor is trivial \(\psi _{(ij)}(x_i,x_j) = f(x_i)f(x_j)\). Similarly we say that a cycle has a trivial belief if it is related to variable and pairwise beliefs as
i.e. the \(b_i\) and \(b_i^c\) coincide. First we remark that a cycle c containing one such trivial edge, not contained in any other cycle, has necessarily a trivial belief, because from the factorization (3.1) for any edge \(\ell \) we have in that case
so the pairwise cycle belief has to be of the form \(b_\ell ^c(x_\ell ) = b_i^c(x_i)b_j^c(x_j)\). As a result the factorized joint measure actually coincides with the same CVM approximation form (3.2) on a reduced graph, where link \(\ell \) has been removed and c is now discarded. From hypothesis (ii) the set of trivial links contained in one single cycle is non empty. As a results all these link can be removed and all corresponding cycles discarded. On the reduced graph, again since all cycles have a trivial belief, there is a non-empty subset of trivial links, that can be removed and so on. The procedure stops after eliminating all trivial links until only the underlying dual tree remains. The definition of the counting numbers ensures that we then end up with the Bethe form of the joint measure associated with this dual tree.
Appendix 4: Proof of Proposition 5.1
The proof is based on the following factorization of the joint measure on a cycle with help of a belief propagation fixed point:
with
and then by expanding the factors when taking averages. Let us call bond \(ii+1\) the contribution corresponding to the factor \(\frac{B_{x_ix_{i+1}}^{(i)}}{b_{i+1}(x_{i+1})}\) instead of 1. The point is that one extremity of a bond cannot be left alone in this expansion, if the corresponding variable is summed over, because of the following identities:
For the partition function for instance, either all or none of the bound have to be selected, yielding only the two contributions:
For the single variable marginal, say \(p_i(x_i)\), again either none or either all of the bonds have to be selected, giving
For the pairwise marginals \(p_i(x_i,x_{i+1})\) two additional contributions emerge corresponding to selecting only the bond \(ii+1\) or to selecting all the bonds except this one, yielding the announced expression.
Appendix 5: Proof of Proposition 6.2
The problem is to bound in absolute value the largest eigenvalue of the Jacobian. Let \(\lambda \) be an eigenvalue and \(\mathbf{v}\) be an eigenvector of J Let
and i the corresponding index, such that \(v_i = v\). We have
with the definition of B(Q) and \(\Theta _{min}^{(1,2)}\) given in the text. Imposing \(\vert \lambda \vert \le 1\) leads to the conditions given in the proposition. In particular when magnetizations are absent, i.e. when \(h_i=0,\forall i\), we have
so
Rights and permissions
About this article
Cite this article
Furtlehner, C., Decelle, A. Cycle-Based Cluster Variational Method for Direct and Inverse Inference. J Stat Phys 164, 531–574 (2016). https://doi.org/10.1007/s10955-016-1566-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-016-1566-0