Skip to main content
Log in

Abstract

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction of predictive models. With the outsourcing of small tasks becoming easier, for example via Amazon’s Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a set of robust techniques that combine different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. This setting is in direct contrast to the setting motivating active learning and semi-supervised learning, where unlabeled points are relatively inexpensive, but labeling is expensive.

  2. http://www.mturk.com

  3. http://www.espgame.org

  4. The test set has perfect quality with zero noise.

  5. We do not assume that the quality is the same across all examples. In fact, LU indirectly relies on the assumption that the labeling quality is different across examples.

  6. As a shorthand we will simply call that Label Uncertainty (LU).

  7. We do not use selective labeling strategies for this experiment, as we want to keep the labeling allocation strategy constant, and independent of the two uncertainty scoring strategies. The goal is to see which uncertainty score can separate best the correctly from the incorrectly labeled examples.

  8. Since the Proposition and proof sketch are mainly to give theoretical motivation to MU, let’s assume that the induction algorithm is no worse than a standard classification tree learner.

  9. Subsequent to these experiments, we also experimented with other approaches for combining probabilities from multiple sources, following the discussion in Clemen and Winkler (1990). For our experiments, taking the geometric mean was the best performing and most robust approach for combining the uncertainty scores, even after transforming the uncertainty scores into proper probability estimates.

  10. From Provost and Danyluk (1995): “No two experts, of the five experts surveyed, agreed upon diagnoses more than 65 % of the time. This might be evidence for the differences that exist between sites, as the experts surveyed had gained their expertise at different locations. If not, however, it raises questions about the correctness of the expert data.”

  11. http://crowdflower.com

References

  • Baram Y, El-Yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291

    MathSciNet  Google Scholar 

  • Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 11 Mar 2013

  • Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167

    MATH  Google Scholar 

  • Carpenter B (2008) Multilevel bayesian model of categorical data annotation. http://lingpipe-blog.com/lingpipe-white-papers/. Accessed 11 Mar 2013

  • Clemen RT, Winkler RL (1990) Unanimity and compromise among probability forecasters. Manag Sci 36(7):767–779

    Article  MATH  Google Scholar 

  • Cohn DA, Atlas LE, Ladner RE (1994) Improving generalization with active learning. Mach Learn 15(2):201–221

    Google Scholar 

  • Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl Stat 28(1):20–28

    Article  Google Scholar 

  • Domingos P (1999) MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99). pp 155–164

  • Donmez P, Carbonell JG (2008) Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In: Proceedings of the 17th ACM conference on information and knowledge management (CIKM 2008). pp 619–628

  • Donmez P, Carbonell JG, Schneider J (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2009). pp 259–268

  • Donmez P, Carbonell JG, Schneider J (2010) A probabilistic framework to learn from multiple annotators with time-varying accuracy. In: Proceedings of the SIAM international conference on data mining (SDM 2010). pp 826–837

  • Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on, artificial intelligence (IJCAI-01). pp 973–978

  • Gelman A, Carlin JB, Stern HS, Rubin DB (2003) Bayesian data analysis, 2nd edn. Chapman and Hall/CRC, Boca Raton

    Google Scholar 

  • Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation (HCOMP 2010). pp 64–67

  • Jin R, Ghahramani Z (2002) Learning with multiple labels. In: Advances in neural information processing systems 15 (NIPS 2002). pp 897–904

  • Kapoor A, Greiner R (2005) Learning and classifying under hard budgets. In: ECML 2005, 16th European conference on machine learning. pp 170–181

  • Lizotte DJ, Madani O, Greiner R (2003) Budgeted learning of naive-bayes classifiers. In: 19th conference on uncertainty in artificial intelligence (UAI 2003). pp 378–385

  • Lugosi G (1992) Learning with an unreliable teacher. Pattern Recognit 25(1):79–87

    Article  MathSciNet  Google Scholar 

  • Margineantu DD (2005) Active cost-sensitive learning. In: Proceedings of the nineteenth international joint conference on, artificial intelligence (IJCAI-05). pp 1622–1613

  • Mason W, Watts DJ (2009) Financial incentives and the performance of crowds. In: Proceedings of the human computation workshop (HCOMP 2009). pp 77–85

  • McCallum A (1999) Multi-label text classification with a mixture model trained by EM. In: AAAI’99 workshop on text learning

  • Melville P, Saar-Tsechansky M, Provost FJ, Mooney RJ (2004) Active feature-value acquisition for classifier induction. In: Proceedings of the 4th IEEE international conference on data mining (ICDM 2004). pp 483–486

  • Melville P, Provost FJ, Mooney RJ (2005) An expected utility approach to active feature-value acquisition. In: Proceedings of the 5th IEEE international conference on data mining (ICDM 2005). pp 745–748

  • Morrison CT, Cohen PR (2005) Noisy information value in utility-based decision making. In: Proceedings of the 1st international workshop on utility-based data mining (UBDM’05). pp 34–38

  • Provost F (2005) Toward economic machine learning and utility-based data mining. In: Proceedings of the 1st international workshop on utility-based data mining (UBDM’05). p 1

  • Provost F, Danyluk AP (1995) Learning from bad data. In: Proceedings of the ML-95 workshop on applying machine learning, in practice. pp 27–33

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Quinlan JR (1992) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Mateo

    Google Scholar 

  • Raykar VC, Yu S, Zhao LH, Jerebko A, Florin C, Valadez GH, Bogoni L, Moy L (2009) Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: Proceedings of the 26th annual international conference on machine learning (ICML 2009). pp. 889–896

  • Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11(7):1297–1322

    MathSciNet  Google Scholar 

  • Rebbapragada U, Brodley CE (2007) Class noise mitigation through instance weighting. In: 18th European conference on machine learning (ECML’07). pp. 708–715

  • Saar-Tsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Mach Learn 54(2):153–178

    Article  MATH  Google Scholar 

  • Saar-Tsechansky M, Melville P, Provost F (2009) Active feature-value acquisition. Manag Sci 55(4):664–684

    Google Scholar 

  • Sheng VS, Provost F, Ipeirotis P (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the fourteenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2008). pp. 614–622

  • Silverman BW (1980) Some asymptotic properties of the probabilistic teacher. IEEE Trans Inf Theory 26(2):246–249

    Article  MATH  Google Scholar 

  • Smyth P (1995) Learning with probabilistic supervision. In: Petsche T (ed) Computational learning theory and natural learning systems, vol. III: selecting good models. MIT Press, Cambridge

    Google Scholar 

  • Smyth P (1996) Bounds on the mean classification error rate of multiple experts. Pattern Recognit Lett 17(12):1253–1257

    Article  Google Scholar 

  • Smyth P, Burl MC, Fayyad UM, Perona P (1994a) Knowledge discovery in large image databases: Dealing with uncertainties in ground truth. In: Knowledge discovery in databases: papers from the 1994 AAAI, workshop (KDD-94). pp 109–120

  • Smyth P, Fayyad UM, Burl MC, Perona P, Baldi P (1994b) Inferring ground truth from subjective labelling of Venus images. In: Advances in neural information processing systems 7 (NIPS 1994). pp 1085–1092

  • Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast–but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP’08). pp 254–263

  • Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665

    Article  Google Scholar 

  • Turney PD (1995) Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409

    Google Scholar 

  • Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the ICML-2000 workshop on cost-sensitive, learning. pp 15–21

  • Verbaeten S, Assche AV (2003) Ensemble methods for noise elimination in classification problems. In: Fourth international workshop on multiple classifier systems. Springer, pp 317–325

  • von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the 2004 conference on human factors in computing systems (CHI 2004). pp 319–326

  • Weiss GM, Provost FJ (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19:315–354

    MATH  Google Scholar 

  • Whitehill J, Ruvolo P, fan Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems 22 (NIPS 2009). pp 2035–2043

  • Whittle P (1973) Some general points in the theory of optimal experimental design. J R Stat Soc Ser B 35(1):123–130

    MATH  MathSciNet  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishing, San Francisco

    Google Scholar 

  • Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3th IEEE international conference on data mining (ICDM 2003). pp 435–442

  • Zheng Z, Padmanabhan B (2006) Selectively acquiring customer information: a new data acquisition problem and an active learning-based solution. Manag Sci 52(5):697–712

    Article  Google Scholar 

  • Zhu X, Wu X (2005) Cost-constrained data acquisition for intelligent data preparation. IEEE Trans Knowl Data Eng 17(11):1542–1556

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Science Foundation under Grant No. IIS-0643846 and IIS-1115417, by an NSERC Postdoctoral Fellowship, by an NEC Faculty Fellowship, by a Google Focused Award, and a George Kellner Fellowship. Thanks to Carla Brodley, John Langford, and Sanjoy Dasgupta for enlightening discussions and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis G. Ipeirotis.

Additional information

Communicated by Johannes Fürnkranz

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ipeirotis, P.G., Provost, F., Sheng, V.S. et al. Repeated labeling using multiple noisy labelers. Data Min Knowl Disc 28, 402–441 (2014). https://doi.org/10.1007/s10618-013-0306-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0306-1

Keywords

Navigation