Mixed-membership naive Bayes models

Shan, Hanhuai; Banerjee, Arindam

doi:10.1007/s10618-010-0198-2

Mixed-membership naive Bayes models

Published: 29 August 2010

Volume 23, pages 1–62, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Hanhuai Shan¹ &
Arindam Banerjee¹

499 Accesses
23 Citations
Explore all metrics

Abstract

In recent years, mixture models have found widespread usage in discovering latent cluster structure from data. A popular special case of finite mixture models is the family of naive Bayes (NB) models, where the probability of a feature vector factorizes over the features for any given component of the mixture. Despite their popularity, naive Bayes models do not allow data points to belong to different component clusters with varying degrees, i.e., mixed memberships, which puts a restriction on their modeling ability. In this paper, we propose mixed-membership naive Bayes (MMNB) models. On one hand, MMNB can be viewed as a generalization of NB by putting a Dirichlet prior on top to allow mixed memberships. On the other hand, MMNB can also be viewed as a generalization of latent Dirichlet allocation (LDA) with the ability to handle heterogeneous feature vectors with different types of features, e.g., real, categorical, etc.. We propose two variational inference algorithms to learn MMNB models. The first one is based on ideas originally used in LDA, and the second one uses substantially fewer variational parameters, leading to a significantly faster algorithm. Further, we extend MMNB/LDA to discriminative mixed-membership models for classification by suitably combining MMNB/LDA with multi-class logistic regression. The efficacy of the proposed mixed-membership models is demonstrated by extensive experiments on several datasets, including UCI benchmarks, recommendation systems, and text datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Third Kind of Bayes’ Theorem Links Membership Functions to Likelihood Functions and Sampling Distributions

An Infinite Latent Generalized Linear Model

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

Article 06 October 2015

Taoufik Bdiri, Nizar Bouguila & Djemel Ziou

References

Airoldi E, Blei D, Fienberg S, Xing E (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9: 1823–1856
MathSciNet Google Scholar
Banerjee A (2007) An analysis of logistic models: exponential family connections and online performance. In: Proceedings of the 7th SIAM international conference on data mining (SDM)
Banerjee A, Dhillon I, Ghosh J, Merugu S (2004) An information theoretic analysis of maximum likelihood mixture estimation for exponential families. In: Proceedings of the 21st international conference on machine learning (ICML)
Banerjee A, Dhillon I, Ghosh J, Sra S (2005a) Clustering on the unit hypersphere using von (M)ises-(F)isher distributions. J Mach Learn Res 6: 1345–1382
MATH MathSciNet Google Scholar
Banerjee A, Krumpelman C, Basu S, Mooney R, Ghosh J (2005b) Model based overlapping clustering. In: Proceedings of the 11th international conference on knowledge discovery and data mining (KDD), pp 532–537
Banerjee A, Merugu S, Dhillon I, Ghosh J (2005c) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749
MATH MathSciNet Google Scholar
Barndorff-Nielsen O (1978) Information and exponential families in statistical theory. Wiley, Chichester
MATH Google Scholar
Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR conference on research and development in information retrieval, pp 127–134
Blei D, Jordan M (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1(1): 121–144
MathSciNet Google Scholar
Blei D, Lafferty J (2005) Correlated topic models. In: Proceedings of the 18th annual conference on neural information processing systems (NIPS)
Blei D, Lafferty J (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning (ICML)
Blei D, McAuliffe J (2007) Supervised topic models. In: Proceedings of the 20th annual conference on neural information processing systems (NIPS)
Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 993–1022
Article MATH Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2: 121–167
Article Google Scholar
Chang C, Lin C (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
de Finetti B (1990) Theory of probability. Wiley, Chichester
MATH Google Scholar
Deerwester S, Dumais S, Landauer T, Furnas G, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6): 391–407
Article Google Scholar
DeGroot M (1970) Optimal statistical decisions. McGraw-Hill, New York
MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39: 1–38
MATH MathSciNet Google Scholar
Dhillon I, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (KDD), pp 89–98
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29: 103–130
Article MATH Google Scholar
Erosheva E, Fienberg S, Lafferty J (2004) Mixed-membership models of scientific publications. In: Proceedings of the national academy of science, pp 5220–5227
Fei-Fei L, Perona P (2005) A (B)ayesian hierarchical model for learning natural scene categories. In: Proceedings of the 15th IEEE international conference of computer vision and pattern recognition (CVPR), pp 524–531
Flaherty P, Giaever G, Jordan M, Arkin A (2005) A latent variable model for chemogenomic profiling. Bioinformatics 21: 3286–3293
Article Google Scholar
Fu Q, Banerjee A (2008) Multiplicative mixture models for overlapping clustering. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), pp 791–796
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6: 721–741
Article MATH Google Scholar
Ghahramani Z (1995) Factorial learning and the EM algorithm. In: Proceedings of the 8th annual conference on neural information processing systems (NIPS)
Griffiths T, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101: 5228–5235
Article Google Scholar
Heller K, Williamson S, Ghahramani Z (2008) Statistical models for partial membership. In: Proceedings of the 25th international conference on machine learning (ICML), pp 392–399
Hoffman T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 15th conference in uncertainty in artificial intelligence (UAI)
Jaakkola T (2000) Algorithms for clustering data. MIT Press, Cambridge
Google Scholar
Koutsourelakis P, Eliassi-Rad T (2008) Finding mixed-memberships in social networks. In: Proceedings of the 23rd national conference on artificial intelligence (AAAI)
Lacoste-Julien S, Sha F, Jordan M (2008) DiscLDA: discriminative learning for dimensionality reduction and classification. In: Proceedings of the 21st annual conference on neural information processing systems (NIPS)
Lang K (1995) News weeder: Learning to filter netnews. In: Proceedings of the 12th international conference on machine learning (ICML)
McLachlan G, Krishnan T (1996) The EM algorithm and extensions. Wiley-Interscience, New York
Google Scholar
Mimno D, McCallum A (2008) Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. In: Proceedings of the 24th conference in uncertainty in artificial intelligence (UAI)
Minka T (2003a) A comparison of numerical optimizers for logistic regression. Tech. rep., Carnegie Mellon University
Minka T (2003b) Estimating a Dirichlet distribution. Tech. rep., Massachusetts Institute of Technology
Mitchell T, Hutchinson R, Niculescu R, Pereira F, Wang X, Just M, Newman S (2004) Learning to decode cognitive states from brain images. Mach Learn 57: 145–175
Article MATH Google Scholar
Neal R, Hinton G (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan M (eds) Learning in graphical models. MIT Press, Cambridge, pp 355–368
Google Scholar
Newman D, Asuncion A, Smyth P, Welling M (2007) Distributed inference for latent Dirichlet allocation. In: Proceedings of the 20th annual conference on neural information processing systems (NIPS)
Ng A, Jordan M (2001) On discrminative vs generative classifiers: a comparison of logistic regression and naive Bayes. In: Proceedings of the 14th annual conference on neural information processing systems (NIPS)
Nigam K, McCallum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2/3): 103–134
Article MATH Google Scholar
Pampel F (2000) Logistic Regression: A Primer. Sage, Thousand Oaks
MATH Google Scholar
Porteous I, Newman D, Ihler A, Asuncion A, Smyth P, Welling M (2008) Fast collapsed Gibbs sampling for latent Dirichlet allocation. In: Proceeding of the 14th ACM international conference on knowledge discovery and data mining (KDD), pp 569–577
Redner R, Walker H (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2): 195–239
Article MATH MathSciNet Google Scholar
Saund E (1994) Unsupervised learning of mixtures of multiple causes in binary data. In: Proceedings of the 7th annual conference on neural information processing systems (NIPS)
Segal E, Battle A, Koller D (2003) Decomposing gene expression into cellular processes. In: Proceedings of 8th pacific symposium on biocomputing (PSB)
Shahami M, Hearst M, Saund E (1997) Applying the multiple cause model to text categorization. In: Proceedings of the 14th international conference on machine learning (ICML), pp 435–443
Shan H, Banerjee A (2008) Bayesian co-clustering. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), pp 530–539
Wainwright M, Jordan M (2003) Graphical models, exponential families, and variational inference. Tech. Rep. TR 649, Department of Statistics, University of California at Berkeley
Wang C, Blei D, Fei-Fei L (2009) Simultaneous image classification and annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Wang H, Huang M, Zhu X (2008) A generative probabilistic model for multi-label classification. In: Proceedings of the 8th IEEE international conference on data mining (ICDM)
Yousef M, Jung S, Kossenkov A, Showe L, Showe M (2007) Naive Bayes for microRNA target predictions machine learning for microRNA targets. Bioinformatics 23(22): 2987–2992
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of Minnesota, Twin Cities, Minneapolis, MN, USA
Hanhuai Shan & Arindam Banerjee

Authors

Hanhuai Shan
View author publications
You can also search for this author in PubMed Google Scholar
Arindam Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanhuai Shan.

Additional information

Responsible editor: Charles Elkan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shan, H., Banerjee, A. Mixed-membership naive Bayes models. Data Min Knowl Disc 23, 1–62 (2011). https://doi.org/10.1007/s10618-010-0198-2

Download citation

Received: 24 December 2009
Accepted: 03 August 2010
Published: 29 August 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10618-010-0198-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixed-membership naive Bayes models

Abstract

Access this article

Similar content being viewed by others

The Third Kind of Bayes’ Theorem Links Membership Functions to Likelihood Functions and Sampling Distributions

An Infinite Latent Generalized Linear Model

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mixed-membership naive Bayes models

Abstract

Access this article

Similar content being viewed by others

The Third Kind of Bayes’ Theorem Links Membership Functions to Likelihood Functions and Sampling Distributions

An Infinite Latent Generalized Linear Model

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation