Skip to main content

Advertisement

Springer Nature Link
Account
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Annals of Mathematics and Artificial Intelligence
  3. Article

How many crowdsourced workers should a requester hire?

  • Open access
  • Published: 06 January 2016
  • Volume 78, pages 45–72, (2016)
  • Cite this article
Download PDF

You have full access to this open access article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript
How many crowdsourced workers should a requester hire?
Download PDF
  • Arthur Carvalho1,
  • Stanko Dimitrov2 &
  • Kate Larson3 
  • 1303 Accesses

  • 3 Altmetric

  • Explore all metrics

Abstract

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a number of workers to work on a set of similar tasks. After completing the tasks, each worker reports back outputs. The requester then aggregates the reported outputs to obtain aggregate outputs. A crucial question that arises during this process is: how many crowd workers should a requester hire? In this paper, we investigate from an empirical perspective the optimal number of workers a requester should hire when crowdsourcing tasks, with a particular focus on the crowdsourcing platform Amazon Mechanical Turk. Specifically, we report the results of three studies involving different tasks and payment schemes. We find that both the expected error in the aggregate outputs as well as the risk of a poor combination of workers decrease as the number of workers increases. Surprisingly, we find that the optimal number of workers a requester should hire for each task is around 10 to 11, no matter the underlying task and payment scheme. To derive such a result, we employ a principled analysis based on bootstrapping and segmented linear regression. Besides the above result, we also find that overall top-performing workers are more consistent across multiple tasks than other workers. Our results thus contribute to a better understanding of, and provide new insights into, how to design more effective crowdsourcing processes.

Article PDF

Download to read the full article text

Similar content being viewed by others

Picking Peaches or Squeezing Lemons: Selecting Crowdsourcing Workers for Reducing Cost of Redundancy

Chapter © 2020

What You Sow, So Shall You Reap! Toward Preselection Mechanisms for Macrotask Crowdsourcing

Chapter © 2019

Algorithmic Management for Improving Collective Productivity in Crowdsourcing

Article Open access 02 October 2017

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.
  • Artificial Intelligence
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  1. von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51(8), 58–67 (2008)

    Article  Google Scholar 

  2. Armstrong, J.S.: Combining Forecasts. In: Armstrong, J.S. (ed.) Principles of Forecasting: A Handbook for Researchers and Practitioners, pp 1–19. Kluwer Academic Publishers (2001)

  3. Bacon, D.F., Chen, Y., Kash, I., Parkes, D.C., Rao, M., Sridharan, M.: Predicting your own effort. In: Proceedings of the 11th International conference on autonomous agents and multiagent systems, pp 695–702 (2012)

  4. Bai, J., Perron, P.: Computation and analysis of multiple structural change models. J. Appl. Econ. 18(1), 1–22 (2003)

    Article  MathSciNet  Google Scholar 

  5. Buhrmester, M.D., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data. Perspect. Psychol. Sci. 6(1), 3–5 (2011)

    Article  Google Scholar 

  6. Carvalho, A.: Tailored proper scoring rules elicit decision weights. Judgment and Decision Making 10(1), 86–96 (2015)

    Google Scholar 

  7. Carvalho, A., Dimitrov, S., Larson, K.: Inducing honest reporting without observing outcomes: an application to the peer-review process (2013). arXiv preprint arXiv: 1309.3197

  8. Carvalho, A., Dimitrov, S., Larson, K.: The output-agreement method induces honest behavior in the presence of social projection. ACM SIGecom Exchanges 13(1), 77–81 (2014)

    Article  Google Scholar 

  9. Carvalho, A., Dimitrov, S., Larson, K.: A study on the influence of the number of mturkers on the quality of the aggregate output. In: Bulling, N. (ed.) Multi-agent systems, lecture notes in computer science, vol. 8953, pp 285–300. Springer (2015)

  10. Carvalho, A., Larson, K.: Sharing a reward based on peer evaluations. In: Proceedings of the 9th International conference on autonomous agents and multiagent systems, pp 1455–1456 (2010)

  11. Carvalho, A., Larson, K.: A truth serum for sharing rewards. In: Proceedings of the 10th International conference on autonomous agents and multiagent systems, pp 635–642 (2011)

  12. Carvalho, A., Larson, K.: Sharing rewards among strangers based on peer evaluations. Decis. Anal. 9(3), 253–273 (2012)

    Article  MathSciNet  Google Scholar 

  13. Carvalho, A., Larson, K.: A consensual linear opinion pool. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp 2518–2524 (2013)

  14. Chen, Y., Chu, C.H., Mullen, T., Pennock, D.M.: Information markets vs. opinion pools: an empirical comparison. In: Proceedings of the 6th ACM Conference on Electronic Commerce, pp 58–67 (2005)

  15. Chiu, C.M., Liang, T.P., Turban, E.: What can crowdsourcing do for decision support. Decis. Support. Syst. 65, 40–49 (2014)

    Article  Google Scholar 

  16. Clemen, R.T.: Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4), 559–583 (1989)

    Article  Google Scholar 

  17. Dietterich, T.G.: Ensemble methods in machine learning. In: Multiple classifier systems, lecture notes in computer science, vol. 1857, pp 1–15. Springer (2000)

  18. Gao, X.A., Mao, A., Chen, Y.: Trick or treat: putting peer prediction to the test. In: Proceedings of the 1st workshop on crowdsourcing and online behavioral experiments (2013)

  19. Hansen, L.K., Salamon, P.: Neural Network Ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)

    Article  Google Scholar 

  20. Hanson, R.: Combinatorial information market design. Inf. Syst. Front. 5(1), 107–119 (2003)

    Article  Google Scholar 

  21. Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 45–51 (2012)

  22. Huang, S.W., Fu, W.T.: Enhancing reliability using peer consistency evaluation in human computation. In: Proceedings of the 2013 conference on computer supported cooperative work, pp 639–648 (2013)

  23. Ipeirotis, P.G.: Analyzing the amazon mechanical turk marketplace. XRDS Crossroads: The ACM Magazine for Students 17(2), 16–21 (2010)

    Article  Google Scholar 

  24. Ipeirotis, P.G., Provost, F., Sheng, V.S., Wang, J.: Repeated labeling using multiple noisy labelers. Data Min. Knowl. Disc 28(2), 402–441 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lin, C.H., Weld, D.S.: Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 132–133 (2012)

  26. Marge, M., Banerjee, S., Rudnicky, A.I.: Using the amazon mechanical turk for transcription of spoken language. In: Proceedings of the 2010 IEEE International conference on acoustics speech and signal processing, pp 5270–5273 (2010)

  27. Mason, W., Suri, S.: Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods 44(1), 1–23 (2012)

    Article  Google Scholar 

  28. Neruda: P.: 100 Love Sonnets. Exile (2007)

  29. Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How many trees in a random forest In: Perner, P. (ed.) Machine learning and data mining in pattern recognition, Lecture notes in computer science, vol. 7376, pp 154–168. Springer, Berlin (2012)

  30. Paolacci, G., Chandler, J., Ipeirotis, P.G.: Running experiments on amazon mechanical turk. Judgment and Decision making 5(5), 411–419 (2010)

    Google Scholar 

  31. Plous, S.: The Psychology of Judgment and Decision Making. Mcgraw-Hill Book Company (1993)

  32. Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 SIGCHI conference on human factors in computing systems, pp 1403–1412 (2011)

  33. Ren, J., Nickerson, J.V., Mason, W., Sakamoto, Y., Graber, B.: Increasing the crowd’s capacity to create: how alternative generation affects the diversity, relevance and effectiveness of generated ads. Decis. Support. Syst. 65, 28–39 (2014)

    Article  Google Scholar 

  34. Savage, L.J.: Elicitation of personal probabilities and expectations. J. Am. Stat. Assoc 66(336), 783–801 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  35. Selten, R.: Axiomatic characterization of the quadratic scoring rule. Exp. Econ 1(1), 43–62 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  36. Shaw, A.D., Horton, J.J., Chen, D.L.: Designing incentives for inexpert human raters. In: Proceedings of the ACM 2011 conference on computer supported cooperative work, pp 275–284 (2011)

  37. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th International conference on knowledge discovery and data mining, pp 614–622 (2008)

  38. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing, pp 254–263 (2008)

  39. Taylor, J., Taylor, A., Greenaway, K.: Little Ann and Other Poems. Nabu Press (2010)

  40. Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using multi-armed bandits. In: Proceedings of the 20th European conference on artificial intelligence, pp 768–773 (2012)

  41. Winkler, R.L., Clemen, R.T.: Multiple experts vs. multiple methods: combining correlation assessments. Decis. Anal 1(3), 167–176 (2004)

    Article  Google Scholar 

  42. Winkler, R.L., Murphy, A.H.: “Good” Probability Assessors. J. Appl. Meteorol 7(5), 751–758 (1968)

    Article  Google Scholar 

  43. Yuen, M.C., King, I., Leung, K.S.: A survey of crowdsourcing systems. In: Proceedings of IEEE 3rd International Conference on Social Computing, pp 766–773 (2011)

  44. Zeileis, A., Leisch, F., Hornik, K., Kleiber, C.: strucchange: an R package for testing for structural change in linear regression models. J. Stat. Softw 7(2), 1–38 (2002)

    Article  Google Scholar 

  45. Zhang, H., Horvitz, E., Parkes, D.: Automated workflow synthesis. In: Proceedings of the 27th AAAI conference on artificial intelligence, pp 1020–1026 (2013)

Download references

Author information

Authors and Affiliations

  1. Rotterdam School of Management, Erasmus University, Burgemester Oudlaan 50, 3062 PA, Rotterdam, The Netherlands

    Arthur Carvalho

  2. Department of Management Sciences, University of Waterloo, 200 Universtiy Ave W., Waterloo, ON, N2L 3G1, Canada

    Stanko Dimitrov

  3. David R. Cheriton School of Computer Science, University of Waterloo, 200 Universtiy Ave W., Waterloo, ON, N2L 3G1, Canada

    Kate Larson

Authors
  1. Arthur Carvalho
    View author publications

    You can also search for this author inPubMed Google Scholar

  2. Stanko Dimitrov
    View author publications

    You can also search for this author inPubMed Google Scholar

  3. Kate Larson
    View author publications

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Arthur Carvalho.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carvalho, A., Dimitrov, S. & Larson, K. How many crowdsourced workers should a requester hire?. Ann Math Artif Intell 78, 45–72 (2016). https://doi.org/10.1007/s10472-015-9492-4

Download citation

  • Published: 06 January 2016

  • Issue Date: September 2016

  • DOI: https://doi.org/10.1007/s10472-015-9492-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Crowdsourcing
  • Human computation
  • Amazon mechanical turk

Mathematics Subject Classification (2010)

  • 68T99
  • 90B99
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

Not affiliated

Springer Nature

© 2025 Springer Nature