Learning from Label Proportions by Optimizing Cluster Model Selection

Stolpe, Marco; Morik, Katharina

doi:10.1007/978-3-642-23808-6_23

Marco Stolpe²³ &
Katharina Morik²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6913))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5940 Accesses
29 Citations

Abstract

In a supervised learning scenario, we learn a mapping from input to output values, based on labeled examples. Can we learn such a mapping also from groups of unlabeled observations, only knowing, for each group, the proportion of observations with a particular label? Solutions have real world applications. Here, we consider groups of steel sticks as samples in quality control. Since the steel sticks cannot be marked individually, for each group of sticks it is only known how many sticks of high (low) quality it contains. We want to predict the achieved quality for each stick before it reaches the final production station and quality control, in order to save resources. We define the problem of learning from label proportions and present a solution based on clustering. Our method empirically shows a better prediction performance than recent approaches based on probabilistic SVMs, Kernel k-Means or conditional exponential models.

Download to read the full chapter text

Chapter PDF

Learning from Noisy Label Distributions

Learning from Label Proportions via an Iterative Weighting Scheme and Discriminant Analysis

Clustering

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: Proc. of the Int. Conf. on Management of Data, SIGMOD 1999, pp. 61–72. ACM, New York (1999)
Chapter Google Scholar
Aha, D.: Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. Int. J. of Man-Machine Studies 36(2), 267–287 (1992)
Article Google Scholar
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Google Scholar
Breimann, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Book Google Scholar
Chen, S., Liu, B., Qian, M., Zhang, C.: Kernel k-Means based framework for aggregate outputs classification. In: Proc. of the Int. Conf. on Data Mining Workshops (ICDMW), pp. 356–361 (2009)
Google Scholar
Dara, R., Kremer, S., Stacey, D.: Clustering unlabeled data with SOMs improves classification of labeled real-world data. In: Proc. of the 2002 Int. Joint Conf. on Neural Networks (IJCNN), vol. 3, pp. 2237–2242 (2002)
Google Scholar
Demiriz, A., Bennett, K., Bennett, K.P., Embrechts, M.J.: Semi-supervised clustering using genetic algorithms. In: Proc. of Artif. Neural Netw. in Eng (ANNIE), pp. 809–814. ASME Press (1999)
Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Dhillon, I., Guan, Y., Kulis, B.: Kernel k-Means: spectral clustering and normalized cuts. In: Proc. of the 10th Int. Conf. on Knowl. Discov. and Data Mining, SIGKDD 2004, pp. 551–556. ACM, New York (2004)
Chapter Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. of the 20th Int. Conf. on Machine Learning (ICML) (2003)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd Int. Conf. on Knowl. Discov. and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Statistics, 2nd edn. Springer, Heidelberg (2009)
Book MATH Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proc. of the 11th Conf. on Uncertainty in Artif. Int., pp. 338–345. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Kueck, H., de Freitas, N.: Learning about individuals from group statistics. In: Uncertainty in Artif. Int. (UAI), pp. 332–339. AUAI Press, Arlington (2005)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Symp. Math. Stat. & Prob., pp. 281–297 (1967)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Musicant, D.R., Christensen, J.M., Olson, J.F.: Supervised learning by training on aggregate outputs. In: Proc. of the 7th Int. Conf. on Data Mining (ICDM), pp. 252–261. IEEE Computer Society, Washington, DC, USA (2007)
Google Scholar
Quadrianto, N., Smola, A.J., Caetano, T.S., Le, Q.V.: Estimating labels from label proportions. J. Mach. Learn. Res. 10, 2349–2374 (2009)
MATH MathSciNet Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Rüping, S.: SVM classifier estimation from group probabilities. In: Proc. of the 27th Int. Conf. on Machine Learning (ICML) (2010)
Google Scholar
Vapnik, V.: The nature of statistical learning theory, 2nd edn. Springer, New York (1999)
MATH Google Scholar
Witten, I.H., Eibe, F., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. In: Data Management Systems, 3rd edn. Elsevier, Inc., Burlington (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Group, Technical University of Dortmund, Baroper Strasse 301, 44227, Dortmund, Germany
Marco Stolpe & Katharina Morik

Authors

Marco Stolpe
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Morik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stolpe, M., Morik, K. (2011). Learning from Label Proportions by Optimizing Cluster Model Selection. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23808-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-23808-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23807-9
Online ISBN: 978-3-642-23808-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning from Label Proportions by Optimizing Cluster Model Selection

Abstract

Chapter PDF

Similar content being viewed by others

Learning from Noisy Label Distributions

Learning from Label Proportions via an Iterative Weighting Scheme and Discriminant Analysis

Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning from Label Proportions by Optimizing Cluster Model Selection

Abstract

Chapter PDF

Similar content being viewed by others

Learning from Noisy Label Distributions

Learning from Label Proportions via an Iterative Weighting Scheme and Discriminant Analysis

Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation