A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables

Dobra, Adrian; Sullivant, Seth

doi:10.1007/BF03372101

A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables

Published: 15 July 2016

Volume 19, pages 347–366, (2004)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Adrian Dobra¹ &
Seth Sullivant²

97 Accesses
24 Citations
Explore all metrics

Summary

We describe a divide-and-conquer technique for generating a Markov basis that connects all tables of counts having a fixed set of marginal totals. This procedure is based on decomposing the independence graph induced by these marginals. We discuss the practical imports of using this method in conjunction with other algorithms for determining Markov bases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

Article 18 October 2014

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Article 20 August 2022

A survey of density based clustering algorithms

Article 29 September 2020

References

Agresti, A. (1992), ‘A survey of exact inference for contingency tables (with discussion)’, Statistical Science.
Bishop, Y. M. M., Fienberg, S. E. & Holland, P. W. (1975), Discrete Multivariate Analysis: Theory and Practice, M.I.T. Press, Cambridge, MA.
MATH Google Scholar
Blair, J. R. S. & Barry, P. (1993), An introduction to chordal graphs and clique trees, in IMA, ed., ‘Graph Theory and Sparse Matrix Computation’, Vol. 56, Springer-Verlag, New York, pp. 1–30.
Google Scholar
Cox, D., Little, J. & O’Shea, D. (1992), Ideals, Varieties and Algorithms, Springer-Verlag, New York.
Book Google Scholar
Dalenius, T. & Reiss, S. P. (1982), ‘Data-swapping: a technique for disclosure control’, Journal of Statistical Planning and Inference 6, 73–85.
Article MathSciNet Google Scholar
De Loera, J. & Sturmfels, B. (2001), Algebraic unimodular counting, Manuscript.
Diaconis, P. & Efron, B. (1985), ‘Testing for independence in a two-way table: New interpretations of the chi-square statistic’, The Annals of Statistics 13, 845–874.
Article MathSciNet Google Scholar
Diaconis, P. & Gangolli, A. (1995), Rectangular arrays with fixed margins, in ‘Discrete Probability and Algorithms’, Springer-Verlag, New York, pp. 15–41.
Chapter Google Scholar
Diaconis, P. & Sturmfels, B. (1998), ‘Algebraic algorithms for sampling from conditional distributions’, The Annals of Statistics 26, 363–397.
Article MathSciNet Google Scholar
Dinwoodie, I. H. (1998), ‘The Diaconis-Sturmfels algorithm and rules of succession’, Bernoulli 4, 401–410.
Article MathSciNet Google Scholar
Dobra, A. (2002), Statistical Tools for Disclosure Limitation in Multi-way Contingency Tables, PhD thesis, Department of Statistics, Carnegie Mellon University.
Dobra, A. (2003), ‘Markov bases for decomposable graphical models’, Bernoulli to appear.
Article MathSciNet Google Scholar
Dobra, A. & Fienberg, S. E. (2000), ‘Bounds for cell entries in contingency tables given marginal totals and decomposable graphs’, Proceedings of the National Academy of Sciences 97, 11885–11892.
Article MathSciNet Google Scholar
Fienberg, S. E., Makov, E. U. & Steele, R. J. (1998), ‘Disclosure limitation using perturbation and related methods for categorical data’, Journal of Official Statistics 14, 485–511.
Google Scholar
Fienberg, S. E., Makov, U. E., Meyer, M. M. & Steele, R. J. (2001), Computing the exact distribution for a multi-way contingency table conditional on its marginals totals, in A. Saleh, ed., ‘Data Analysis from Statistical Foundations: A Festschrift in Honor of the 75th Birthday of D. A. S. Fraser’, Nova Science Publishers, Huntington, NY, pp. 145–165.
Google Scholar
Hosten, S. & Sullivant, S. (2002), ‘Groebner bases and polyhedral geometry of reducible and cyclic models’, Journal of Combinatorial Theory: Series A 2, 277–301.
Article Google Scholar
Lauritzen, S. L. (1996), Graphical Models, Clarendon Press, Oxford.
MATH Google Scholar
Leimer, H. G. (1993), ‘Optimal decomposition by clique separators’, Discrete Mathematics 113, 99–123.
Article MathSciNet Google Scholar
Madigan, D. & York, J. (1995), ‘Bayesian graphical models for discrete data’, International Statistical Review 63, 215–232.
Article Google Scholar
Mehta, C. (1994), ‘The exact analysis of contingency tables in medical research’, Statistical Methods in Medical Research.
Article Google Scholar
Mount, J. (1995), Application of Convex Sampling to Optimization and Contingency Table Generation/Counting, PhD thesis, Carnegie Mellon University.
Sullivant, S. (2002), Algebraic geometry and combinatorics of hierarchical models, Master’s thesis, San Francisco State University.
Takken, A. (1999), Monte Carlo Goodness-of-Fit Tests for Discrete Data, PhD thesis, Stanford University.
Tarjan, R. E. (1985), ‘Decomposition by clique separators’, Discrete Mathematics 55, 221–232.
Article MathSciNet Google Scholar
Vlach, M. (1986), ‘Conditions for the existence of solutions of the three-dimensional planar transportation problem’, Discrete Applied Mathematics 13, 61–78.
Article MathSciNet Google Scholar
Whittaker, J. (1990), Graphical Models in Applied Multivariate Statistics, John Wiley & Sons. New York.
MATH Google Scholar
Willenborg, L. & de Waal, T. (2000), Elements of Statistical Disclosure Control, Vol. 155, Lecture Notes in Statistics. Springer-Verlag, New York.
MATH Google Scholar

Download references

Acknowledgments

The work of Adrian Dobra was supported in part by the National Science Foundation under Grant EIA-9876619 to the National Institute of Statistical Sciences. Seth Sullivant was supported in part under a National Science Foundation Graduate Research Fellowship.

Author information

Authors and Affiliations

Institute of Statistics and Decision Sciences and Department of Molecular Genetics & Microbiology, Duke University, Durham, NC, 27708, USA
Adrian Dobra
Department of Mathematics, University of California, Berkeley, CA, 94720, USA
Seth Sullivant

Authors

Adrian Dobra
View author publications
You can also search for this author in PubMed Google Scholar
Seth Sullivant
View author publications
You can also search for this author in PubMed Google Scholar

A Decomposable and Reducible Graphs

A graph $\mathcal{G}$ is a pair (K, E), where K = {1, 2, …, k} is a finite set ofvertices and E ⊆ K × K is a set of edges linking the vertices. For any vertex set A ⊆ K, we define the edge set associated with it as

$$E(A) := \{(u, v) \in E|u, v \in A\}.$$

Let $\mathcal{G}(A) = (A, E(A))$ denote the subgraph of $\mathcal{G}$ induced by A. Two vertices u, v ∈ K are adjacent if (u, v) ∈ E. A set ofvertices of $\mathcal{G}$ is independent if no two of its elements are adjacent. An induced subgraph $\mathcal{G}(A)$ is complete if the vertices in A are pairwise adjacent in $\mathcal{G}$. We also say that A is complete in $\mathcal{G}$. A complete vertex set A in $\mathcal{G}$ that is maximal is a clique.

Let u, v ∈ K. A path (or chain) from u to v is a sequence u = v₀, …, v_n = v of distinct vertices such that (v_i−1, v_i) ∈ E for all i = 1, 2, …, n. The path is a cycle if the end points are allowed to be the same, u = v. If there is a path from u to v we say that u and v are connected. The sets A, B ⊂ K are disconnected if u and v are not connected for all u ∈ A, v ∈ B. The connected component of a vertex u ∈ K is the set of all vertices connected with u. A graph is connected if all the pairs of vertices are connected.

The set C ⊂ K is an uv-separator if all paths from u to v intersect C. The set C ⊂ K separates A from B if it is an uv-separator for every u ∈ A, v ∈ B. C is a separator of $\mathcal{G}$ if two vertices in the same connected component of $\mathcal{G}$ are in two distinct connected components of $\mathcal{G}\backslash{C}$ or, equivalently, if $\mathcal{G}\backslash{C}$ is disconnected. In addition, C is a minimal separator of $\mathcal{G}$ if C is a separator and no proper subset of C separates the graph. Unless otherwise stated, the separators we work with will be complete.

Decomposable graphs possess the special property that allows us to “decompose” them into components or subgraphs and work directly with these components. The idea is to decompose the graph $\mathcal{G}$ in two possibly overlapping subgraphs $\mathcal{G'}$ and $\mathcal{G''}$ so that no information of the graph is lost when transforming $\mathcal{G}$ into $\mathcal{G'}$ and $\mathcal{G''}$. Furthermore, by “correctly” decomposing $\mathcal{G'}$ and $\mathcal{G''}$, and so on, one ends up with a set of subgraphs of $\mathcal{G}$ which allow for no further decompositions. A set of subgraphs of $\mathcal{G}$ generated in this way is called a derived system of $\mathcal{G}$, while its elements are called atoms (Tarjan 1985). We define what we mean by “correct” decomposition.

Definition A.1. The partition (A₁, S, A₂) of K is said to form a decomposition of $\mathcal{G}$ if S is a minimal separator of A₁ and A₂.

In this case (A₁, S, A₂) decomposes $\mathcal{G}$ into the components $\mathcal{G}(A_1 \cup S)$ and $\mathcal{G}(S \cup A_2)$. The decomposition is proper if A₁ and A₂ are not empty.

Definition A.2. The graph $\mathcal{G}$ is decomposable if it is complete or if there exists a proper decomposition (A₁, S, A₂) into decomposable graphs $\mathcal{G}(A_1 \cup S)$ and $\mathcal{G}(S \cup A_2)$.

Graphs that are not decomposable, but can still be decomposed in sequences of atoms are described in Tarjan (1985) and Leimer (1993). In this case, the resulting atoms are not necessarily complete.

Definition A.3. A graph $\mathcal{G}$ is reducible if $\mathcal{G}$ admits a proper decomposition, otherwise $\mathcal{G}$ is a prime graph.

Given that every reducible graph $\mathcal{G}$ might have several derived systems (Tarjan 1985), we would like to be able to isolate one of them which could fully characterize the input graph $\mathcal{G}$.

Definition A.4. A subgraph $\mathcal{G}$ is a maximal prime (mp-) subgraph of $\mathcal{G}$, if $\mathcal{G}(A)$ is prime and $\mathcal{G}(B)$ is reducible for all B with A ⊂ B ⊆ K.

The set of mp-subgraphs of $\mathcal{G}$ is contained in every derived system of $\mathcal{G}$. Moreover, the set of mp-subgraphs of $\mathcal{G}$ is always a derived system of $\mathcal{G}$ (Leimer 1993), and consequently it is the unique minimal derived system. If $\mathcal{G}$ is decomposable, the mp-subgraphs of $\mathcal{G}$ are complete, hence the unique minimal derived system of a decomposable graph contains only its cliques (Leimer 1993).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dobra, A., Sullivant, S. A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables. CompStat 19, 347–366 (2004). https://doi.org/10.1007/BF03372101

Download citation

Published: 15 July 2016
Issue Date: September 2004
DOI: https://doi.org/10.1007/BF03372101

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables

Summary

Access this article

Similar content being viewed by others

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

A survey of density based clustering algorithms

References

Acknowledgments

Author information

Authors and Affiliations

A Decomposable and Reducible Graphs

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables

Summary

Access this article

Similar content being viewed by others

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

A survey of density based clustering algorithms

References

Acknowledgments

Author information

Authors and Affiliations

A Decomposable and Reducible Graphs

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation