Calibrating covariate informed product partition models

Page, Garritt L.; Quintana, Fernando A.

doi:10.1007/s11222-017-9777-z

Calibrating covariate informed product partition models

Published: 06 October 2017

Volume 28, pages 1009–1031, (2018)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

664 Accesses
8 Citations
Explore all metrics

Abstract

Covariate informed product partition models incorporate the intuitively appealing notion that individuals or units with similar covariate values a priori have a higher probability of co-clustering than those with dissimilar covariate values. These methods have been shown to perform well if the number of covariates is relatively small. However, as the number of covariates increase, their influence on partition probabilities overwhelm any information the response may provide in clustering and often encourage partitions with either a large number of singleton clusters or one large cluster resulting in poor model fit and poor out-of-sample prediction. This same phenomenon is observed in Bayesian nonparametric regression methods that induce a conditional distribution for the response given covariates through a joint model. In light of this, we propose two methods that calibrate the covariate-dependent partition model by capping the influence that covariates have on partition probabilities. We demonstrate the new methods’ utility using simulation and two publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Gaussian parsimonious clustering models with covariates and a noise component

Article 20 September 2019

Keefe Murphy & Thomas Brendan Murphy

Assessing trimming methodologies for clustering linear regression data

Article Open access 30 July 2018

Francesca Torti, Domenico Perrotta, … Andrea Cerioli

Prediction models for clustered data with informative priors for the random effects: a simulation study

Article Open access 06 August 2018

Haifang Ni, Rolf H. H. Groenwold, … Irene Klugkist

Notes

https://archive.ics.uci.edu/ml/machine-learning-databases/00211/CommViolPredUnnormalizedData.txt.

References

Antoniano-Villalobos, I., Walker, S.G.: A nonparametric model for stationary time series. J. Time Ser. Anal. 37(1), 126–142 (2016)
Article MathSciNet MATH Google Scholar
Barcella, W., Iorio, M.D., Baio, G.: A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models (2016). https://arxiv.org/pdf/1508.00129.pdf
Barcella, W., Iorio, M.D., Baio, G., Malone-Lee, J.: Variable selection in covariate dependent random partition models: an application to urinary tract infection. Stat. Med. 35, 1373–1389 (2016)
Article MathSciNet MATH Google Scholar
Barrientos, A.F., Jara, A., Quintana, F.A.: On the support of MacEachern’s dependent Dirichlet processes and extensions. Bayes Anal. 7, 277–310 (2012)
Article MathSciNet MATH Google Scholar
Blei, D.M., Frazier, P.I.: Distant dependent chinese restaurant processes. J. Mach. Learn. Res. 12, 2461–2488 (2011)
MathSciNet MATH Google Scholar
Christensen, R., Johnson, W., Branscum, A.J., Hanson, T.: Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. CRC Press, Boca Raton (2011). http://www.ics.uci.edu/~wjohnson/BIDA/BIDABook.html
Chung, Y., Dunson, D.B.: Nonparametric bayes conditional distribution modeling with variable selection. J. Am. Stat. Assoc. 104, 1646–1660 (2009)
Article MathSciNet MATH Google Scholar
Cook, R.D., Weisberg, S.: Sliced inverse regression for dimension reduction: comment. J. Am. Stat. Assoc. 86, 328–332 (1991)
MATH Google Scholar
Dahl, D.B.: Model-based clustering for expression data via a Dirichlet process mixture model. In: Vannucci, M., Do, K.A., Müller, P. (eds.) Bayesian Inference for Gene Expression and Proteomics, pp. 201–218. Cambridge University Press, Cambridge (2006)
Chapter Google Scholar
Dahl, D.B., Day, R., Tsai, J.W.: Random partition distribution indexed by pairwise information. J. Am. Stat. Assoc. (2016). doi:10.1080/01621459.2016.1165103
De Iorio, M., Müller, P., Rosner, G., MacEachern, S.: An ANOVA model for dependent random measures. J. Am. Stat. Assoc. 99, 205–215 (2004)
Article MathSciNet MATH Google Scholar
Dunson, D.B., Park, J.H.: Kernel stick-breaking processes. Biometrika 95, 307–323 (2008)
Article MathSciNet MATH Google Scholar
Geisser, S., Eddy, W.F.: A predictive approach to model selection. J. Am. Stat. Assoc. 74(365), 153–160 (1979)
Article MathSciNet MATH Google Scholar
Gelfand, A.E., Kottas, A., MacEachern, S.N.: Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 102, 1021–1035 (2005)
Article MathSciNet MATH Google Scholar
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27, 857–871 (1971)
Article Google Scholar
Griffin, J.E., Steel, M.F.J.: Order-based dependent Dirichlet processes. J. Am. Stat. Assoc. 101, 179–194 (2006)
Article MathSciNet MATH Google Scholar
Guhaniyogi, R., Dunson, D.B.: Bayesian compressed regression. J. Am. Stat. Assoc. 110, 1500–1514 (2015)
Article MathSciNet MATH Google Scholar
Hannah, L., Blei, D., Powell, W.: Dirichlet process mixtures of generalized linear models. J. Mach. Learn. Res. 12, 1923–1953 (2011)
MathSciNet MATH Google Scholar
Hartigan, J.A.: Partition models. Commun. Stat. Theory Methods 19, 2745–2756 (1990)
Article MathSciNet Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3, 79–87 (1991)
Article Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
MacEachern, S.N.: Dependent Dirichlet processes. Ohio State University, Department of Statistics, Technical report (2000)
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: Cluster: Cluster Analysis Basics and Extensions (2016). R package version 2.0.4—For new features, see the ’Changelog’ file (in the package source)
McLachlan, G., Peel, D.: Finite Mixture Models, 1st edn. Wiley Series in Probability and Statistics, New York (2000)
Book MATH Google Scholar
Miller, J.W., Dunson, D.B.: Robust Bayesian inference via coarsening (2015). http://arxiv.org/abs/arXiv:1506.06101
Molitor, J., Papathomas, M., Jerrett, M., Richardson, S.: Random partition models with regression on covariates. Biostatistics 11, 484–498 (2010)
Article Google Scholar
Müller, P., Erkanli, A., West, M.: Bayesian curve fitting using multivariate normal mixutres. Biometrika 83, 67–79 (1996)
Article MathSciNet MATH Google Scholar
Müller, P., Quintana, F.A., Jara, A., Hanson, T.: Bayesian Nonparametric Data Analysis, 1st edn. Springer, Switzerland (2015)
Book MATH Google Scholar
Müller, P., Quintana, F.A., Rosner, G.L.: A product partition model with regression on covariates. J. Comput. Graph. Stat. 20(1), 260–277 (2011)
Article MathSciNet Google Scholar
Müller, P., Quintana, F.A., Rosner, G.L., Maitland, M.L.: Bayesian inference for longitudinal data with non-parametric treatment effects. Biostatistics 15(2), 341–352 (2013)
Article Google Scholar
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9, 249–265 (2000)
MathSciNet Google Scholar
Page, G.L., Bhattacharya, A., Dunson, D.B.: Classification via Bayesian nonparametric learning of affine subspaces. J. Am. Stat. Assoc. 108, 187–201 (2013)
Article MathSciNet MATH Google Scholar
Page, G.L., Quintana, F.A.: Predictions based on the clustering of heterogeneous functions via shape and subject-specific covariates. Bayesian Anal. 10, 379–410 (2015)
Article MathSciNet MATH Google Scholar
Page, G.L., Quintana, F.A.: Spatial product partition models. Bayesian Anal. 11(1), 265–298 (2016)
Article MathSciNet MATH Google Scholar
Papathomas, M., Molitor, J., Hoggart, C., Hastie, D., Richardson, S.: Exploring data from genetic association studies using bayesian variable selection and the Dirichlet process: application to searchingfor gene $\times $ gene patterns. Genet. Epidemiol. 36, 663–674 (2012)
Article Google Scholar
Park, J.H., Dunson, D.B.: Bayesian generalized product partition model. Stat. Sin. 20, 1203–1226 (2010)
MathSciNet MATH Google Scholar
Quintana, F.A., Müller, P., Papoila, A.L.: Cluster-specific variable selection for product partition models. Scand. J. Stat. 42, 1065–1077 (2015). doi:10.1111/sjos.12151
Article MathSciNet MATH Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2016). https://www.R-project.org/
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)
Article Google Scholar
Rodriguez, A., Dunson, D.B., Gelfand, A.E.: Bayesian nonparametric functional data analysis through density estimation. Biometrika 96, 149–162 (2009)
Article MathSciNet MATH Google Scholar
Wade, S., Dunson, D.B., Petrone, S., Trippa, L.: Improving prediction from Dirichlet process mixtures via enrichment. J. Mach. Learn. Res. 15, 1041–1071 (2014)
MathSciNet MATH Google Scholar
Wang, H., Xia, Y.: Sliced regression for dimension reduction. J. Am. Stat. Assoc. 103, 811–821 (2008)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank Peter Müller for helpful comments. The authors also thank all the reviewers for their valuable suggestions that substantially improved presentation. Garritt L. Page gratefully acknowledges the financial support of FONDECYT Grant 11121131 and Fernando A. Quintana was partially funded by Grant FONDECYT 1141057.

Author information

Authors and Affiliations

Department of Statistics, Brigham Young University, Provo, UT, USA
Garritt L. Page
Departamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, Chile
Fernando A. Quintana

Authors

Garritt L. Page
View author publications
You can also search for this author in PubMed Google Scholar
Fernando A. Quintana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Garritt L. Page.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 363 KB)

Appendices

Appendix: MCMC algorithm for the calibrated similarity and tempered mixture of experts

Here we provide pertinent computation details for the MCMC algorithm used to fit the TME model and PPMx with calibrated similarity. We focus primarily on the updating of cluster labels, as conditional on these updating the remaining model parameters is straightforward employing a Gibbs sampler or Metropolis–Hastings steps.

1.1 Calibrated similarity

To update the cluster membership of subject i for the calibrated similarity, cluster weights are created by comparing the unnormalized posterior for the jth cluster when subject i is excluded from that when subject i is included. In addition to weights for existing clusters, algorithm 8 of Neal (2000) requires calculating weights for p empty clusters whose cluster-specific parameters are auxiliary variables generated from the prior. To make this more concrete, let $S_j^{-i}$ denote the jth cluster and $k^{-i}$ the number of clusters when subject i is not considered. Similarly $\varvec{x}_j^{\star -i}$ will denote the vector of covariates corresponding to cluster h when subject i has been removed. Then the multinomial weights associated with the $k^{-i}$ existing clusters and one empty cluster are

$$\begin{aligned}&Pr(s_i = j | - ) \propto \end{aligned}$$

(16)

$$\begin{aligned}&{\left\{ \begin{array}{ll} N(y_i ; \mu ^{\star }_{j}, \sigma ^{2\star }_{j}) \displaystyle \frac{c(S_{j}^{-i}\cup \{i\})\tilde{g}(\varvec{x}^{\star -i}_{j}\cup \{\varvec{x}_i\})}{c(S_{j}^{-i})\tilde{g}(\varvec{x}^{\star -i}_{j})}\ \quad \text{ for } \ j = 1, \ldots , k^{-i}\\ N(y_i; \mu ^{\star }_{\mathrm{new}, j}, \sigma ^{2\star }_{\mathrm{new},j}) c(\{i\}) \tilde{g}(\{\varvec{x}_i\}) p^{-1}\ \quad \text{ for } \ j = k^{-i}+1. \end{array}\right. } \end{aligned}$$

(17)

where as mentioned $\mu ^{\star }_{\mathrm{new}, j}$ and $\sigma ^{2\star }_{\mathrm{new},j}$ are auxiliary variables that are drawn from their respective prior distributions. Since $\tilde{g}(\{\varvec{x}_i\})$ needs to be accounted for when standardizing the multinomial weights, we employ the following ratios in the MCMC algorithm

$$\begin{aligned} \tilde{g}(\varvec{x}^{\star -i}_{j} \cup \varvec{x}_i)&= \frac{g(\varvec{x}^{\star -i}_{j} \cup \varvec{x}_i)}{\sum _{\ell }g(\varvec{x}^{\star -i}_{\ell } \cup \varvec{x}_i) } \\ \tilde{g}(\varvec{x}^{\star -i}_{j})&= \frac{g(\varvec{x}^{\star -i}_{j})}{\sum _{\ell }g(\varvec{x}^{\star -i}_{\ell })+ g(\{\varvec{x}_i\})}\\ \tilde{g}(\{\varvec{x}_i\})&= \frac{g(\{\varvec{x}_i\})}{\sum _{\ell }g(\varvec{x}^{\star -i}_{\ell })+ g(\{\varvec{x}_i\})}.\\ \end{aligned}$$

When $\varvec{x}_i$ is included in the j cluster then it is not able to form its own singleton. However, when it is excluded from the jth cluster, then it is completely plausible that it forms its own singleton cluster. For these reasons the similarity value $g(\{\varvec{x}_i\})$ is only included in $\tilde{g}(\varvec{x}^{\star -i}_{j})$ and $\tilde{g}(\{\varvec{x}_i\})$.

1.2 TME with unknown $\varvec{\xi }^{\star }_j$ and fixed J

Upon introducing latent component labels $s_i$ such that $Pr(s_i = j) = w(\varvec{x}_i; \varvec{\xi }^{\star }_j)$, the data model (14) can be written hierarchically as

$$\begin{aligned} p(\varvec{y} | \varvec{x}, \varvec{\mu }^{\star }, \varvec{\sigma }^{2\star }, \varvec{\xi }^{\star }, \varvec{c})&= \prod _{i=1}^m\prod _{\ell =1}^J N(y_i|\mu ^{\star }_{\ell }, \sigma ^{2\star }_{\ell })^{I[s_i = \ell ]}\end{aligned}$$

(18)

$$\begin{aligned} s_i&\sim \sum _{\ell =1}^J \delta _{\ell }w(\varvec{x}_i; \varvec{\xi }^{\star }_{\ell }) \end{aligned}$$

(19)

where $\delta _{\ell }$ is the dirac measure. With this hierarchical representation, a MCMC algorithm can be constructed by cycling through the following

Update component labels using
$$\begin{aligned} Pr(s_i = h | - )&\propto N(y_i | \mu ^{\star }_h, \sigma ^{2\star }_h) w(\varvec{x}_i ; \varvec{\xi }^{\star }_h) \end{aligned}$$
If $\varvec{x}_i$ is comprised of continuous and categorical variables, then without loss of generality let $\varvec{x}_i = (x_{1i}, x_{2i})$ where $x_{1i}$ is continuous and $x_{2i}$ is categorical. Further, $\varvec{\xi }_j^{\star } = (\eta ^{\star }_j,v^{2\star }_j, \varvec{\pi }^{\star }_j)$ with $\varvec{\xi }^{\star } = (\varvec{\xi }_1^{\star },\ldots , \varvec{\xi }_J^{\star })$. Then $\varvec{\xi }_j^{\star }=(\eta ^{\star }_j,v^{2\star }_j, \varvec{\pi }^{\star }_j)$ can be updated within the MCMC algorithm by way of a Metroplis–Hastings step employing
$$\begin{aligned} {[}\varvec{\xi }^{\star }_j | - ]&\propto \prod _{i=1}^m Pr(s_i | \varvec{\xi }^{\star })\prod _{j=1}^J p(\varvec{\xi }^{\star }_j) \\&\propto \prod _{i=1}^m w(\varvec{x}_{i}; \varvec{\xi }^{\star }_1)^{I[s_i=1]} \times \ldots \times w(\varvec{x}_{i}; \varvec{\xi }^{\star }_1)^{I[s_i=J]} p(\varvec{\xi }^{\star }_j) \\&\propto \prod _{i:s_i=j} w(\varvec{x}_{i}; \varvec{\xi }^{\star }_j) p(\varvec{\xi }^{\star }_j)\\&= \prod _{i:s_i=j} \frac{q(x_{i1}|\eta ^{\star }_j, v^{2\star }_j) q(x_{i2}|\varvec{\pi }^{\star }_j)}{\sum _{\ell }^Jq (x_{i1}|\eta ^{\star }_{\ell }, v^{2\star }_{\ell })q(x_{i2}|\varvec{\pi }^{\star }_{\ell })}p(\eta ^{\star }_j,v^{2\star }_j, \varvec{\pi }^{\star }_j) \end{aligned}$$
where $q(x_{i1}|\eta ^{\star }_j, v^{2\star }_j)$ is normal density and and $q(x_{i2}|\varvec{\pi }^{\star }_j)$ a multinomial density. For $\varvec{\pi }^{\star }_j$ an independent Metropolis–Hastings sampler with uniform (over the simplex) candidate density may be considered. This candidate density will cancel out in the Metropolis–Hastings ratio (though this may become more inefficient as the number of categories in $x_{i2}$ increases). Updating $\eta ^{\star }_j$ and $v^{2\star }_j$ can be accomplished using a random walk Metropolis step with normal candidate density for both.
Updating the likelihood parameters $\mu _j^{\star }$ and $\sigma _j^{2\star }$ can be carried out using Gibbs steps as their full conditionals have well known closed forms.

Computing Gower’s dissimilarity

The daisy function found in the cluster package (Maechler et al. 2016) of the statistical software R was employed to calculate the Gower dissimilarity. The calculated dissimilarity is an “average” of the individual p dissimilarities

$$\begin{aligned} d(\varvec{x}_i, \varvec{x}_j) = \frac{1}{p}\sum _{\ell =1}^p d(x_{i\ell },x_{j\ell }). \end{aligned}$$

For numeric or continuous x’s, $d(x_{i\ell },x_{j\ell }) = |x_{i\ell } - x_{j\ell }|/R_{\ell }$ where $R_{\ell } = \max _h(x_{h\ell }) - \min _h(x_{h\ell })$. For nominal variables

$$\begin{aligned} d(x_{i\ell },x_{j\ell }) =\left\{ \begin{array}{cl} 0 &{} \quad \text{ if } \,\,x_{i\ell }=x_{j\ell } \\ 1 &{} \quad \text{ otherwise }. \end{array} \right. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Page, G.L., Quintana, F.A. Calibrating covariate informed product partition models. Stat Comput 28, 1009–1031 (2018). https://doi.org/10.1007/s11222-017-9777-z

Download citation

Received: 23 November 2016
Accepted: 22 September 2017
Published: 06 October 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11222-017-9777-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Calibrating covariate informed product partition models

Abstract

Access this article

Similar content being viewed by others

Gaussian parsimonious clustering models with covariates and a noise component

Assessing trimming methodologies for clustering linear regression data

Prediction models for clustered data with informative priors for the random effects: a simulation study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 363 KB)

Appendices

Appendix: MCMC algorithm for the calibrated similarity and tempered mixture of experts

1.1 Calibrated similarity

1.2 TME with unknown \(\varvec{\xi }^{\star }_j\) and fixed J

Computing Gower’s dissimilarity

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Calibrating covariate informed product partition models

Abstract

Access this article

Similar content being viewed by others

Gaussian parsimonious clustering models with covariates and a noise component

Assessing trimming methodologies for clustering linear regression data

Prediction models for clustered data with informative priors for the random effects: a simulation study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 363 KB)

Appendices

Appendix: MCMC algorithm for the calibrated similarity and tempered mixture of experts

1.1 Calibrated similarity

1.2 TME with unknown \(\varvec{\xi }^{\star }_j\) and fixed J

Computing Gower’s dissimilarity

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation