How many crowdsourced workers should a requester hire?

Carvalho, Arthur; Dimitrov, Stanko; Larson, Kate

doi:10.1007/s10472-015-9492-4

How many crowdsourced workers should a requester hire?

Open access
Published: 06 January 2016

Volume 78, pages 45–72, (2016)
Cite this article

Download PDF

You have full access to this open access article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

How many crowdsourced workers should a requester hire?

Download PDF

Arthur Carvalho¹,
Stanko Dimitrov² &
Kate Larson³

1303 Accesses
3 Altmetric
Explore all metrics

Abstract

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a number of workers to work on a set of similar tasks. After completing the tasks, each worker reports back outputs. The requester then aggregates the reported outputs to obtain aggregate outputs. A crucial question that arises during this process is: how many crowd workers should a requester hire? In this paper, we investigate from an empirical perspective the optimal number of workers a requester should hire when crowdsourcing tasks, with a particular focus on the crowdsourcing platform Amazon Mechanical Turk. Specifically, we report the results of three studies involving different tasks and payment schemes. We find that both the expected error in the aggregate outputs as well as the risk of a poor combination of workers decrease as the number of workers increases. Surprisingly, we find that the optimal number of workers a requester should hire for each task is around 10 to 11, no matter the underlying task and payment scheme. To derive such a result, we employ a principled analysis based on bootstrapping and segmented linear regression. Besides the above result, we also find that overall top-performing workers are more consistent across multiple tasks than other workers. Our results thus contribute to a better understanding of, and provide new insights into, how to design more effective crowdsourcing processes.

Article PDF

Picking Peaches or Squeezing Lemons: Selecting Crowdsourcing Workers for Reducing Cost of Redundancy

What You Sow, So Shall You Reap! Toward Preselection Mechanisms for Macrotask Crowdsourcing

Algorithmic Management for Improving Collective Productivity in Crowdsourcing

Article Open access 02 October 2017

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51(8), 58–67 (2008)
Article Google Scholar
Armstrong, J.S.: Combining Forecasts. In: Armstrong, J.S. (ed.) Principles of Forecasting: A Handbook for Researchers and Practitioners, pp 1–19. Kluwer Academic Publishers (2001)
Bacon, D.F., Chen, Y., Kash, I., Parkes, D.C., Rao, M., Sridharan, M.: Predicting your own effort. In: Proceedings of the 11th International conference on autonomous agents and multiagent systems, pp 695–702 (2012)
Bai, J., Perron, P.: Computation and analysis of multiple structural change models. J. Appl. Econ. 18(1), 1–22 (2003)
Article MathSciNet Google Scholar
Buhrmester, M.D., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data. Perspect. Psychol. Sci. 6(1), 3–5 (2011)
Article Google Scholar
Carvalho, A.: Tailored proper scoring rules elicit decision weights. Judgment and Decision Making 10(1), 86–96 (2015)
Google Scholar
Carvalho, A., Dimitrov, S., Larson, K.: Inducing honest reporting without observing outcomes: an application to the peer-review process (2013). arXiv preprint arXiv: 1309.3197
Carvalho, A., Dimitrov, S., Larson, K.: The output-agreement method induces honest behavior in the presence of social projection. ACM SIGecom Exchanges 13(1), 77–81 (2014)
Article Google Scholar
Carvalho, A., Dimitrov, S., Larson, K.: A study on the influence of the number of mturkers on the quality of the aggregate output. In: Bulling, N. (ed.) Multi-agent systems, lecture notes in computer science, vol. 8953, pp 285–300. Springer (2015)
Carvalho, A., Larson, K.: Sharing a reward based on peer evaluations. In: Proceedings of the 9th International conference on autonomous agents and multiagent systems, pp 1455–1456 (2010)
Carvalho, A., Larson, K.: A truth serum for sharing rewards. In: Proceedings of the 10th International conference on autonomous agents and multiagent systems, pp 635–642 (2011)
Carvalho, A., Larson, K.: Sharing rewards among strangers based on peer evaluations. Decis. Anal. 9(3), 253–273 (2012)
Article MathSciNet Google Scholar
Carvalho, A., Larson, K.: A consensual linear opinion pool. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp 2518–2524 (2013)
Chen, Y., Chu, C.H., Mullen, T., Pennock, D.M.: Information markets vs. opinion pools: an empirical comparison. In: Proceedings of the 6th ACM Conference on Electronic Commerce, pp 58–67 (2005)
Chiu, C.M., Liang, T.P., Turban, E.: What can crowdsourcing do for decision support. Decis. Support. Syst. 65, 40–49 (2014)
Article Google Scholar
Clemen, R.T.: Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4), 559–583 (1989)
Article Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Multiple classifier systems, lecture notes in computer science, vol. 1857, pp 1–15. Springer (2000)
Gao, X.A., Mao, A., Chen, Y.: Trick or treat: putting peer prediction to the test. In: Proceedings of the 1st workshop on crowdsourcing and online behavioral experiments (2013)
Hansen, L.K., Salamon, P.: Neural Network Ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Article Google Scholar
Hanson, R.: Combinatorial information market design. Inf. Syst. Front. 5(1), 107–119 (2003)
Article Google Scholar
Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 45–51 (2012)
Huang, S.W., Fu, W.T.: Enhancing reliability using peer consistency evaluation in human computation. In: Proceedings of the 2013 conference on computer supported cooperative work, pp 639–648 (2013)
Ipeirotis, P.G.: Analyzing the amazon mechanical turk marketplace. XRDS Crossroads: The ACM Magazine for Students 17(2), 16–21 (2010)
Article Google Scholar
Ipeirotis, P.G., Provost, F., Sheng, V.S., Wang, J.: Repeated labeling using multiple noisy labelers. Data Min. Knowl. Disc 28(2), 402–441 (2014)
Article MathSciNet MATH Google Scholar
Lin, C.H., Weld, D.S.: Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 132–133 (2012)
Marge, M., Banerjee, S., Rudnicky, A.I.: Using the amazon mechanical turk for transcription of spoken language. In: Proceedings of the 2010 IEEE International conference on acoustics speech and signal processing, pp 5270–5273 (2010)
Mason, W., Suri, S.: Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods 44(1), 1–23 (2012)
Article Google Scholar
Neruda: P.: 100 Love Sonnets. Exile (2007)
Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How many trees in a random forest In: Perner, P. (ed.) Machine learning and data mining in pattern recognition, Lecture notes in computer science, vol. 7376, pp 154–168. Springer, Berlin (2012)
Paolacci, G., Chandler, J., Ipeirotis, P.G.: Running experiments on amazon mechanical turk. Judgment and Decision making 5(5), 411–419 (2010)
Google Scholar
Plous, S.: The Psychology of Judgment and Decision Making. Mcgraw-Hill Book Company (1993)
Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 SIGCHI conference on human factors in computing systems, pp 1403–1412 (2011)
Ren, J., Nickerson, J.V., Mason, W., Sakamoto, Y., Graber, B.: Increasing the crowd’s capacity to create: how alternative generation affects the diversity, relevance and effectiveness of generated ads. Decis. Support. Syst. 65, 28–39 (2014)
Article Google Scholar
Savage, L.J.: Elicitation of personal probabilities and expectations. J. Am. Stat. Assoc 66(336), 783–801 (1971)
Article MathSciNet MATH Google Scholar
Selten, R.: Axiomatic characterization of the quadratic scoring rule. Exp. Econ 1(1), 43–62 (1998)
Article MathSciNet MATH Google Scholar
Shaw, A.D., Horton, J.J., Chen, D.L.: Designing incentives for inexpert human raters. In: Proceedings of the ACM 2011 conference on computer supported cooperative work, pp 275–284 (2011)
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th International conference on knowledge discovery and data mining, pp 614–622 (2008)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing, pp 254–263 (2008)
Taylor, J., Taylor, A., Greenaway, K.: Little Ann and Other Poems. Nabu Press (2010)
Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using multi-armed bandits. In: Proceedings of the 20th European conference on artificial intelligence, pp 768–773 (2012)
Winkler, R.L., Clemen, R.T.: Multiple experts vs. multiple methods: combining correlation assessments. Decis. Anal 1(3), 167–176 (2004)
Article Google Scholar
Winkler, R.L., Murphy, A.H.: “Good” Probability Assessors. J. Appl. Meteorol 7(5), 751–758 (1968)
Article Google Scholar
Yuen, M.C., King, I., Leung, K.S.: A survey of crowdsourcing systems. In: Proceedings of IEEE 3rd International Conference on Social Computing, pp 766–773 (2011)
Zeileis, A., Leisch, F., Hornik, K., Kleiber, C.: strucchange: an R package for testing for structural change in linear regression models. J. Stat. Softw 7(2), 1–38 (2002)
Article Google Scholar
Zhang, H., Horvitz, E., Parkes, D.: Automated workflow synthesis. In: Proceedings of the 27th AAAI conference on artificial intelligence, pp 1020–1026 (2013)

Download references

Author information

Authors and Affiliations

Rotterdam School of Management, Erasmus University, Burgemester Oudlaan 50, 3062 PA, Rotterdam, The Netherlands
Arthur Carvalho
Department of Management Sciences, University of Waterloo, 200 Universtiy Ave W., Waterloo, ON, N2L 3G1, Canada
Stanko Dimitrov
David R. Cheriton School of Computer Science, University of Waterloo, 200 Universtiy Ave W., Waterloo, ON, N2L 3G1, Canada
Kate Larson

Authors

Arthur Carvalho
View author publications
You can also search for this author inPubMed Google Scholar
Stanko Dimitrov
View author publications
You can also search for this author inPubMed Google Scholar
Kate Larson
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Arthur Carvalho.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Carvalho, A., Dimitrov, S. & Larson, K. How many crowdsourced workers should a requester hire?. Ann Math Artif Intell 78, 45–72 (2016). https://doi.org/10.1007/s10472-015-9492-4

Download citation

Published: 06 January 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10472-015-9492-4

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

How many crowdsourced workers should a requester hire?

Abstract

Article PDF

Similar content being viewed by others

Picking Peaches or Squeezing Lemons: Selecting Crowdsourcing Workers for Reducing Cost of Redundancy

What You Sow, So Shall You Reap! Toward Preselection Mechanisms for Macrotask Crowdsourcing

Algorithmic Management for Improving Collective Productivity in Crowdsourcing

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)