[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2936924.2937066acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article
Public Access

Optimal Testing for Crowd Workers

Published: 09 May 2016 Publication History

Abstract

Requesters on crowdsourcing platforms, such as Amazon Mechanical Turk, routinely insert gold questions to verify that a worker is diligent and is providing high-quality answers. However, there is no clear understanding of when and how many gold questions to insert. Typically, requesters mix a flat 10-30% of gold questions into the task stream of every worker. This static policy is arbitrary and wastes valuable budget --- the exact percentage is often chosen with little experimentation, and, more importantly, it does not adapt to individual workers, the current mixture of spamming vs. diligent workers, or the number of tasks workers perform before quitting.
We formulate the problem of balancing between (1) testing workers to determine their accuracy and (2) actually getting work performed as a partially-observable Markov decision process (POMDP) and apply reinforcement learning to dynamically calculate the best policy. Evaluations on both synthetic data and with real Mechanical Turk workers show that our agent learns adaptive testing policies that produce up to 111% more reward than the non-adaptive policies used by most requesters. Furthermore, our method is fully automated, easy to apply, and runs mostly out of the box.

References

[1]
Crowdflower Inc. job launch checklist. https://success.crowdflower.com/hc/en-us/articles/202703195-Job-Launch-Checklist.
[2]
Crowdflower Inc. test question best practices. https://success.crowdflower.com/hc/en-us/articles/202703105-Test-Question-Best-Practices.
[3]
G. Angeli, J. Tibshirani, J. Wu, and C. D. Manning. Combining distant and partial supervision for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), 2014.
[4]
Y. Bengio and P. Frasconi. An input output HMM architecture. In Advances in Neural Information Processing Systems (NIPS), 1995.
[5]
J. Bragg, Mausam, and D. S. Weld. Learning on the job: Optimal instruction for crowdsourcing. In ICML Workshop on Crowdsourcing and Machine Learning, 2015.
[6]
C. Callison-Burch and M. Dredze. Creating speech and language data with amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, 2010.
[7]
P. Dai, C. H. Lin, Mausam, and D. S. Weld. POMDP-based control of workflows for crowdsourcing. Artif. Intell., 202:52--85, 2013.
[8]
P. Dai, Mausam, and D. S. Weld. Decision-theoretic control of crowd-sourced workflows. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010), 2010.
[9]
P. Donmez, J. G. Carbonell, and J. G. Schneider. A probabilistic framework to learn from multiple annotators with time-varying accuracy. In Proceedings of the SIAM International Conference on Data Mining (SDM 2010), 2010.
[10]
Y. Gao and A. G. Parameswaran. Finish them!: Pricing algorithms for human computation. Proceedings of the VLDB Endowment (PVLDB), 7(14):1965--1976, 2014.
[11]
M. R. Gormley, A. Gerber, M. Harper, and M. Dredze. Non-expert correction of automatically generated relation annotations. In NAACL Workshop on Creating Speech and Language Data With Amazon's Mechanical Turk, 2010.
[12]
H. J. Jung, Y. Park, and M. Lease. Predicting next label quality: A time-series model of crowdwork. In Proceedings of the Second AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2014), 2014.
[13]
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artif. Intell., 101(1--2):99--134, 1998.
[14]
E. Kamar, S. Hacker, and E. Horvitz. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), 2012.
[15]
A. Kobren, C. H. Tan, P. G. Ipeirotis, and E. Gabrilovich. Getting more for less: Optimized crowdsourcing with dynamic tasks and goals. In Proceedings of the 24th International Conference on World Wide Web (WWW 2015), 2015.
[16]
C. H. Lin, Mausam, and D. S. Weld. Dynamically switching between synergistic workflows for crowdsourcing. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2012), 2012.
[17]
A. Mao, Y. Chen, E. Horvitz, M. E. Schwamb, C. J. Lintott, and A. M. Smith. Volunteering Versus Work for Pay: Incentives and Tradeoffs in Crowdsourcing. In Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2013), 2013.
[18]
D. Oleson, A. Sorokin, G. P. Laughlin, V. Hester, J. Le, and L. Biewald. Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In Human Computation Workshop, page 11, 2011.
[19]
A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: Algorithms for filtering data with humans. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2012), 2012.
[20]
J. M. Porta, N. Vlassis, M. T. Spaan, and P. Poupart. Point-based value iteration for continuous POMDPs. J. Mach. Learn. Res., 7:2329--2367, Dec. 2006.
[21]
S. Rajpal, K. Goel, and Mausam. POMDP-based worker pool selection for crowdsourcing. In ICML Workshop on Crowdsourcing and Machine Learning, 2015.
[22]
T. Smith and R. Simmons. Focused real-time dynamic programming for MDPs: Squeezing more out of a heuristic. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), 2006.
[23]
M. Toomim, T. Kriplean, C. Pörtner, and J. A. Landay. Utility of human-computer interactions: toward a science of preference measurement. In Proceedings of the International Conference on Human Factors in Computing Systems (CHI 2011), 2011.
[24]
D. Weld, Mausam, C. Lin, and J. Bragg. Artificial intelligence and collective intelligence. In T. Malone and M. Bernstein, editors, The Collective Intelligence Handbook. MIT Press, 2015.
[25]
P. Welinder, S. Branson, S. Belongie, and P. Perona. The multidimensional wisdom of crowds. In Advances in Neural Information Processing Systems (NIPS), 2010.
[26]
J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems (NIPS), 2009.
[27]
M. Yin and Y. Chen. Bonus or not? learn to reward in crowdsourcing. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), 2015.
[28]
C. Zhang, F. Niu, C. Ré, and J. W. Shavlik. Big data versus the crowd: Looking for relationships in all the right places. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), 2012.

Cited By

View all
  • (2022)Hierarchical Entity Resolution using an OracleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526147(414-428)Online publication date: 10-Jun-2022
  • (2021)The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor GameProceedings of the 26th International Conference on Intelligent User Interfaces10.1145/3397481.3450647(423-433)Online publication date: 14-Apr-2021
  • (2019)Crowdsourcing with Fairness, Diversity and Budget ConstraintsProceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3306618.3314282(297-304)Online publication date: 27-Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems
May 2016
1580 pages
ISBN:9781450342391

Sponsors

  • IFAAMAS

In-Cooperation

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2016

Check for updates

Author Tags

  1. crowdsourcing
  2. reinforcement learning

Qualifiers

  • Research-article

Funding Sources

  • Bloomberg
  • Google
  • NSF
  • ONR

Conference

AAMAS '16
Sponsor:

Acceptance Rates

AAMAS '16 Paper Acceptance Rate 137 of 550 submissions, 25%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Hierarchical Entity Resolution using an OracleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526147(414-428)Online publication date: 10-Jun-2022
  • (2021)The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor GameProceedings of the 26th International Conference on Intelligent User Interfaces10.1145/3397481.3450647(423-433)Online publication date: 14-Apr-2021
  • (2019)Crowdsourcing with Fairness, Diversity and Budget ConstraintsProceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3306618.3314282(297-304)Online publication date: 27-Jan-2019
  • (2019)Key Crowdsourcing Technologies for Product Design and DevelopmentInternational Journal of Automation and Computing10.1007/s11633-018-1138-716:1(1-15)Online publication date: 1-Feb-2019
  • (2018)CrowdEvalProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237922(1486-1494)Online publication date: 9-Jul-2018
  • (2018)SproutProceedings of the 31st Annual ACM Symposium on User Interface Software and Technology10.1145/3242587.3242598(165-176)Online publication date: 11-Oct-2018
  • (2018)Robust Entity Resolution using Random GraphsProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183755(3-18)Online publication date: 27-May-2018
  • (2018)Crowd-based Multi-Predicate Screening of Papers in Literature ReviewsProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186036(55-64)Online publication date: 10-Apr-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media