research-article

Class-distribution regularized consensus maximization for alleviating overfitting in model combination

Authors:
Sihong Xie

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

,
Jing Gao

University at Buffalo, Buffalo, NY, USA

University at Buffalo, Buffalo, NY, USA
View Profile

,
Wei Fan

Huawei Noah's Ark Lab, Hong Kong, Hong Kong

Huawei Noah's Ark Lab, Hong Kong, Hong Kong
View Profile

,
Deepak Turaga

IBM T.J Watson Research, Yorktown Height, NY, USA

IBM T.J Watson Research, Yorktown Height, NY, USA
View Profile

,
Philip S. Yu

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2014Pages 303–312https://doi.org/10.1145/2623330.2623676

Published:24 August 2014Publication History

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 303–312

ABSTRACT

In data mining applications such as crowdsourcing and privacy-preserving data mining, one may wish to obtain consolidated predictions out of multiple models without access to features of the data. Besides, multiple models usually carry complementary predictive information, model combination can potentially provide more robust and accurate predictions by correcting independent errors from individual models. Various methods have been proposed to combine predictions such that the final predictions are maximally agreed upon by multiple base models. Though this maximum consensus principle has been shown to be successful, simply maximizing consensus can lead to less discriminative predictions and overfit the inevitable noise due to imperfect base models. We argue that proper regularization for model combination approaches is needed to alleviate such overfitting effect. Specifically, we analyze the hypothesis spaces of several model combination methods and identify the trade-off between model consensus and generalization ability. We propose a novel model called Regularized Consensus Maximization (RCM), which is formulated as an optimization problem to combine the maximum consensus and large margin principles. We theoretically show that RCM has a smaller upper bound on generalization error compared to the version without regularization. Experiments show that the proposed algorithm outperforms a wide spectrum of state-of-the-art model combination methods on 11 tasks.

Supplemental Material

p303-sidebyside.mp4

mp4

338.6 MB

Download

References

Acharya Ayan, Hruschka Eduardo, R., Ghosh Joydeep, Sarwar Badrul, and Ruvini Jean-David. Probabilistic combination of classifier and cluster ensembles for non-transductive learning. In SDM, 2013.Google Scholar
Acharya Ayan, R. Hruschka Eduardo, Ghosh Joydeep, and Acharyya Sreangsu. An optimization framework for semi-supervised and transfer learning using multiple classifiers and clusterers. In CoRR, 2012.Google Scholar
P. L. Bartlett. The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Trans. Inf. Theor., 2006. Google ScholarDigital Library
Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik. Support vector clustering. Journal of Machine Learning Research, 2002. Google ScholarDigital Library
Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarDigital Library
Ralph Allan Bradley. Rank analysis of incomplete block designs: Ii. additional tables for the method of paired comparisons. Biometrika, 41(3/4):pp. 502--537, 1954.Google ScholarCross Ref
Ralph Allan Bradley and Milton E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):pp. 324--345, 1952.Google ScholarCross Ref
Sébastien Bubeck and Ulrike von Luxburg. Nearest neighbor clustering: A baseline method for consistent clustering with arbitrary objective functions. Journal of Machine Learning Research, 2009. Google ScholarDigital Library
Xi Chen, Paul N. Bennett, Kevyn Collins-Thompson, and Eric Horvitz. Pairwise ranking aggregation in a crowdsourced setting. WSDM, 2013. Google ScholarDigital Library
Janez Dem\vsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 2006. Google ScholarDigital Library
Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. Kernel k-means: spectral clustering and normalized cuts. In SIGKDD, 2004. Google ScholarDigital Library
John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the l1-ball for learning in high dimensions. ICML, 2008.Google ScholarDigital Library
Wang Fei, Wang Xin, and Li Tao. Generalized cluster aggregation. In IJCAI, 2009. Google ScholarDigital Library
Xiaoli Zhang Fern and Carla E. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. ICML, 2004.Google ScholarDigital Library
Jing Gao, Wei Fan, Deepak Turaga, Olivier Verscheure, Xiaoqiao Meng, Lu Su, and Jiawei Han. Consensus extraction from heterogeneous detectors to improve performance over network traffic anomaly detection. In INFOCOM, 2011.Google ScholarCross Ref
Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, and Jiawei Han. Graph-based consensus maximization among multiple supervised and unsupervised models. In NIPS, 2009.Google Scholar
Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill(tm): A bayesian skill rating system. NIPS.Google Scholar
Tao Li and Chris Ding. Weighted Consensus Clustering. SDM, 2008.Google ScholarCross Ref
Tao Li, Chris Ding, and Michael I. Jordan. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. ICDM, 2007. Google ScholarDigital Library
Xudong Ma, Ping Luo, Fuzhen Zhuang, Qing He, Zhongzhi Shi, and Zhiyong Shen. Combining supervised and unsupervised models via unconstrained probabilistic embedding. IJCAI, 2011. Google ScholarDigital Library
Meila Marina and Shortreed Susan. Regularized spectral learning. Journal of Machine Learning Research, 2006.Google Scholar
Kaixiang Mo, Erheng Zhong, and Qiang Yang. Cross-task crowdsourcing. KDD, 2013. Google ScholarDigital Library
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.Google ScholarDigital Library
H. Paugam-Moisy, A. Elisseeff, and Y. Guermeur. Generalization performance of multiclass discriminant models. In Neural Networks, 2000. IJCNN 2000, 2000. Google ScholarDigital Library
Arun Rajkumar and Shivani Agarwal. A statistical convergence perspective of algorithms for rank aggregation from pairwise data. In ICML, 2014.Google Scholar
Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. Cheap and fast--but is it good?: Evaluating non-expert annotations for natural language tasks. EMNLP, 2008. Google ScholarDigital Library
Alexander Strehl and Joydeep Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 2003. Google ScholarDigital Library
Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin markov networks. In NIPS. 2004.Google Scholar
Vladimir Vapnik. Statistical learning theory. Wiley, 1998.Google ScholarDigital Library
Hongjun Wang, Hanhuai Shan, and Arindam Banerjee. Bayesian cluster ensembles. In SDM, 2009.Google ScholarCross Ref
Naiyan Wang and Dit-Yan Yeung. Ensemble-based tracking: Aggregating crowdsourced structured time series data. In ICML, 2014.Google Scholar
Pu Wang, Carlotta Domeniconi, and Kathryn Blackmond Laskey. Nonparametric bayesian clustering ensembles. In ECML PKDD, 2010. Google ScholarDigital Library
Ruby C. Weng and Chih-Jen Lin. A bayesian approximation method for online ranking. J. Mach. Learn. Res., 12, 2011. Google ScholarDigital Library
Sihong Xie, Wei Fan, and Philip S. Yu. An iterative and re-weighting framework for rejection and uncertainty resolution in crowdsourcing. In SDM, 2012.Google ScholarCross Ref
Sihong Xie, Xiangnan Kong, Jing Gao, Wei Fan, and Philip S. Yu. Multilabel consensus classification. In ICDM, 2013.Google ScholarCross Ref
Yan Yan, Romer Rosales, Glenn Fung, and Jennifer Dy. Active learning from crowds. ICML, 2011.Google Scholar
Jinfeng Yi, Tianbao Yang, Rong Jin, A.K. Jain, and M. Mahdavi. Robust ensemble clustering by matrix completion. ICDM, 2012. Google ScholarDigital Library
Yi Zhang and Jeff Schneider. Maximum margin output coding. ICML, 2012.Google Scholar
Jun Zhu, Amr Ahmed, and Eric P. Xing. Medlda: maximum margin supervised topic models for regression and classification. ICML, 2009. Google ScholarDigital Library

Index Terms

Class-distribution regularized consensus maximization for alleviating overfitting in model combination
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Heterogeneous source consensus learning via decision propagation and negotiation
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Nowadays, enormous amounts of data are continuously generated not only in massive scale, but also from different, sometimes conflicting, views. Therefore, it is important to consolidate different concepts for intelligent decision making. For example, to ...
Read More
A Novel Regularized Model for Third-Order Tensor Completion
Inspired by the accuracy and efficiency of the <inline-formula><tex-math notation="LaTeX">$\gamma$</tex-math></inline-formula>-norm of a matrix, which is closer to the original rank minimization problem than nuclear norm minimization (NNM), we generalize ...
Read More
Loose L_1/2 regularised sparse representation for face recognition

Sparse representation (or sparse coding) has been applied to deal with frontal face recognition. Two representative methods are the sparse representation‐based classification (SRC) and the collaborative representation‐based classification (CRC), in which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ensemble
generalization error
large margin
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 518
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Class-distribution regularized consensus maximization for alleviating overfitting in model combination

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Heterogeneous source consensus learning via decision propagation and negotiation

A Novel Regularized Model for Third-Order Tensor Completion

Loose L_1/2 regularised sparse representation for face recognition