Skip to main content
Log in

A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

We describe a divide-and-conquer technique for generating a Markov basis that connects all tables of counts having a fixed set of marginal totals. This procedure is based on decomposing the independence graph induced by these marginals. We discuss the practical imports of using this method in conjunction with other algorithms for determining Markov bases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • Agresti, A. (1992), ‘A survey of exact inference for contingency tables (with discussion)’, Statistical Science.

  • Bishop, Y. M. M., Fienberg, S. E. & Holland, P. W. (1975), Discrete Multivariate Analysis: Theory and Practice, M.I.T. Press, Cambridge, MA.

    MATH  Google Scholar 

  • Blair, J. R. S. & Barry, P. (1993), An introduction to chordal graphs and clique trees, in IMA, ed., ‘Graph Theory and Sparse Matrix Computation’, Vol. 56, Springer-Verlag, New York, pp. 1–30.

    Google Scholar 

  • Cox, D., Little, J. & O’Shea, D. (1992), Ideals, Varieties and Algorithms, Springer-Verlag, New York.

    Book  Google Scholar 

  • Dalenius, T. & Reiss, S. P. (1982), ‘Data-swapping: a technique for disclosure control’, Journal of Statistical Planning and Inference 6, 73–85.

    Article  MathSciNet  Google Scholar 

  • De Loera, J. & Sturmfels, B. (2001), Algebraic unimodular counting, Manuscript.

  • Diaconis, P. & Efron, B. (1985), ‘Testing for independence in a two-way table: New interpretations of the chi-square statistic’, The Annals of Statistics 13, 845–874.

    Article  MathSciNet  Google Scholar 

  • Diaconis, P. & Gangolli, A. (1995), Rectangular arrays with fixed margins, in ‘Discrete Probability and Algorithms’, Springer-Verlag, New York, pp. 15–41.

    Chapter  Google Scholar 

  • Diaconis, P. & Sturmfels, B. (1998), ‘Algebraic algorithms for sampling from conditional distributions’, The Annals of Statistics 26, 363–397.

    Article  MathSciNet  Google Scholar 

  • Dinwoodie, I. H. (1998), ‘The Diaconis-Sturmfels algorithm and rules of succession’, Bernoulli 4, 401–410.

    Article  MathSciNet  Google Scholar 

  • Dobra, A. (2002), Statistical Tools for Disclosure Limitation in Multi-way Contingency Tables, PhD thesis, Department of Statistics, Carnegie Mellon University.

  • Dobra, A. (2003), ‘Markov bases for decomposable graphical models’, Bernoulli to appear.

    Article  MathSciNet  Google Scholar 

  • Dobra, A. & Fienberg, S. E. (2000), ‘Bounds for cell entries in contingency tables given marginal totals and decomposable graphs’, Proceedings of the National Academy of Sciences 97, 11885–11892.

    Article  MathSciNet  Google Scholar 

  • Fienberg, S. E., Makov, E. U. & Steele, R. J. (1998), ‘Disclosure limitation using perturbation and related methods for categorical data’, Journal of Official Statistics 14, 485–511.

    Google Scholar 

  • Fienberg, S. E., Makov, U. E., Meyer, M. M. & Steele, R. J. (2001), Computing the exact distribution for a multi-way contingency table conditional on its marginals totals, in A. Saleh, ed., ‘Data Analysis from Statistical Foundations: A Festschrift in Honor of the 75th Birthday of D. A. S. Fraser’, Nova Science Publishers, Huntington, NY, pp. 145–165.

    Google Scholar 

  • Hosten, S. & Sullivant, S. (2002), ‘Groebner bases and polyhedral geometry of reducible and cyclic models’, Journal of Combinatorial Theory: Series A 2, 277–301.

    Article  Google Scholar 

  • Lauritzen, S. L. (1996), Graphical Models, Clarendon Press, Oxford.

    MATH  Google Scholar 

  • Leimer, H. G. (1993), ‘Optimal decomposition by clique separators’, Discrete Mathematics 113, 99–123.

    Article  MathSciNet  Google Scholar 

  • Madigan, D. & York, J. (1995), ‘Bayesian graphical models for discrete data’, International Statistical Review 63, 215–232.

    Article  Google Scholar 

  • Mehta, C. (1994), ‘The exact analysis of contingency tables in medical research’, Statistical Methods in Medical Research.

    Article  Google Scholar 

  • Mount, J. (1995), Application of Convex Sampling to Optimization and Contingency Table Generation/Counting, PhD thesis, Carnegie Mellon University.

  • Sullivant, S. (2002), Algebraic geometry and combinatorics of hierarchical models, Master’s thesis, San Francisco State University.

  • Takken, A. (1999), Monte Carlo Goodness-of-Fit Tests for Discrete Data, PhD thesis, Stanford University.

  • Tarjan, R. E. (1985), ‘Decomposition by clique separators’, Discrete Mathematics 55, 221–232.

    Article  MathSciNet  Google Scholar 

  • Vlach, M. (1986), ‘Conditions for the existence of solutions of the three-dimensional planar transportation problem’, Discrete Applied Mathematics 13, 61–78.

    Article  MathSciNet  Google Scholar 

  • Whittaker, J. (1990), Graphical Models in Applied Multivariate Statistics, John Wiley & Sons. New York.

    MATH  Google Scholar 

  • Willenborg, L. & de Waal, T. (2000), Elements of Statistical Disclosure Control, Vol. 155, Lecture Notes in Statistics. Springer-Verlag, New York.

    MATH  Google Scholar 

Download references

Acknowledgments

The work of Adrian Dobra was supported in part by the National Science Foundation under Grant EIA-9876619 to the National Institute of Statistical Sciences. Seth Sullivant was supported in part under a National Science Foundation Graduate Research Fellowship.

Author information

Authors and Affiliations

Authors

A Decomposable and Reducible Graphs

A graph \(\mathcal{G}\) is a pair (K, E), where K = {1, 2, …, k} is a finite set ofvertices and EK × K is a set of edges linking the vertices. For any vertex set AK, we define the edge set associated with it as

$$E(A) := \{(u, v) \in E|u, v \in A\}.$$

Let \(\mathcal{G}(A) = (A, E(A))\) denote the subgraph of \(\mathcal{G}\) induced by A. Two vertices u, vK are adjacent if (u, v) ∈ E. A set ofvertices of \(\mathcal{G}\) is independent if no two of its elements are adjacent. An induced subgraph \(\mathcal{G}(A)\) is complete if the vertices in A are pairwise adjacent in \(\mathcal{G}\). We also say that A is complete in \(\mathcal{G}\). A complete vertex set A in \(\mathcal{G}\) that is maximal is a clique.

Let u, vK. A path (or chain) from u to v is a sequence u = v0, …, vn = v of distinct vertices such that (vi−1, vi) ∈ E for all i = 1, 2, …, n. The path is a cycle if the end points are allowed to be the same, u = v. If there is a path from u to v we say that u and v are connected. The sets A, BK are disconnected if u and v are not connected for all uA, vB. The connected component of a vertex uK is the set of all vertices connected with u. A graph is connected if all the pairs of vertices are connected.

The set CK is an uv-separator if all paths from u to v intersect C. The set CK separates A from B if it is an uv-separator for every uA, vB. C is a separator of \(\mathcal{G}\) if two vertices in the same connected component of \(\mathcal{G}\) are in two distinct connected components of \(\mathcal{G}\backslash{C}\) or, equivalently, if \(\mathcal{G}\backslash{C}\) is disconnected. In addition, C is a minimal separator of \(\mathcal{G}\) if C is a separator and no proper subset of C separates the graph. Unless otherwise stated, the separators we work with will be complete.

Decomposable graphs possess the special property that allows us to “decompose” them into components or subgraphs and work directly with these components. The idea is to decompose the graph \(\mathcal{G}\) in two possibly overlapping subgraphs \(\mathcal{G'}\) and \(\mathcal{G''}\) so that no information of the graph is lost when transforming \(\mathcal{G}\) into \(\mathcal{G'}\) and \(\mathcal{G''}\). Furthermore, by “correctly” decomposing \(\mathcal{G'}\) and \(\mathcal{G''}\), and so on, one ends up with a set of subgraphs of \(\mathcal{G}\) which allow for no further decompositions. A set of subgraphs of \(\mathcal{G}\) generated in this way is called a derived system of \(\mathcal{G}\), while its elements are called atoms (Tarjan 1985). We define what we mean by “correct” decomposition.

Definition A.1. The partition (A1, S, A2) of K is said to form a decomposition of \(\mathcal{G}\) if S is a minimal separator of A1 and A2.

In this case (A1, S, A2) decomposes \(\mathcal{G}\) into the components \(\mathcal{G}(A_1 \cup S)\) and \(\mathcal{G}(S \cup A_2)\). The decomposition is proper if A1 and A2 are not empty.

Definition A.2. The graph \(\mathcal{G}\) is decomposable if it is complete or if there exists a proper decomposition (A1, S, A2) into decomposable graphs \(\mathcal{G}(A_1 \cup S)\) and \(\mathcal{G}(S \cup A_2)\).

Graphs that are not decomposable, but can still be decomposed in sequences of atoms are described in Tarjan (1985) and Leimer (1993). In this case, the resulting atoms are not necessarily complete.

Definition A.3. A graph \(\mathcal{G}\) is reducible if \(\mathcal{G}\) admits a proper decomposition, otherwise \(\mathcal{G}\) is a prime graph.

Given that every reducible graph \(\mathcal{G}\) might have several derived systems (Tarjan 1985), we would like to be able to isolate one of them which could fully characterize the input graph \(\mathcal{G}\).

Definition A.4. A subgraph \(\mathcal{G}\) is a maximal prime (mp-) subgraph of \(\mathcal{G}\), if \(\mathcal{G}(A)\) is prime and \(\mathcal{G}(B)\) is reducible for all B with ABK.

The set of mp-subgraphs of \(\mathcal{G}\) is contained in every derived system of \(\mathcal{G}\). Moreover, the set of mp-subgraphs of \(\mathcal{G}\) is always a derived system of \(\mathcal{G}\) (Leimer 1993), and consequently it is the unique minimal derived system. If \(\mathcal{G}\) is decomposable, the mp-subgraphs of \(\mathcal{G}\) are complete, hence the unique minimal derived system of a decomposable graph contains only its cliques (Leimer 1993).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dobra, A., Sullivant, S. A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables. CompStat 19, 347–366 (2004). https://doi.org/10.1007/BF03372101

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03372101

Keywords

Navigation