[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2487575.2487595acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Evaluating the crowd with confidence

Published: 11 August 2013 Publication History

Abstract

Worker quality control is a crucial aspect of crowdsourcing systems; typically occupying a large fraction of the time and money invested on crowdsourcing. In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality. We show that our techniques generate correct confidence intervals on a range of real-world datasets, and demonstrate wide applicability by using them to evict poorly performing workers, and provide confidence intervals on the accuracy of the answers.

References

[1]
Mechanical Turk. http://mturk.com.
[2]
Omar Alonso, Daniel E. Rose, and Benjamin Stewart. Crowdsourcing for relevance evaluation. SIGIR Forum, 42, 2008.
[3]
K. Bellare, S. Iyengar, A. Parameswaran, and V. Rastogi. Active sampling for entity matching. In KDD, 2012.
[4]
George Casella and Roger Berger. Statistical Inference. Duxbury Resource Center, 2001.
[5]
Steve Cooper and Mehran Sahami. Reflections on stanford's moocs. Communications of the ACM, 56(2):28--30, 2013.
[6]
Image Comparison Dataset. http://www.stanford.edu/$\sim$manasrj/ic\_data.tar.gz.
[7]
A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 28(1):20--28, 1979.
[8]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1--38, 1977.
[9]
J. Whitehill et al. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS. 2009.
[10]
M. Motoyama et al. Recaptchas : Understanding captcha-solving services in an economic context. In USENIX Security Symposium '10.
[11]
P. Donmez et al. Efficiently learning the accuracy of labeling sources for selective sampling. In KDD, 2009.
[12]
R. Snow et al. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP, 2008.
[13]
V. Raykar et al. Supervised learning from multiple experts: whom to trust when everyone lies a bit. In ICML, 2009.
[14]
S. Guo, A. Parameswaran, and H. Garcia-Molina. So Who Won? Dynamic Max Discovery with the Crowd. In SIGMOD, 2012.
[15]
Maya R. Gupta and Yihua Chen. Theory and use of the em algorithm. Found. Trends Signal Process., 4(3):223--296, March 2011.
[16]
Manas Joglekar, Hector Garcia-Molina, and Aditya Parameswaran. Infolab technical report, 2012. http://ilpubs.stanford.edu:8090/1051/.
[17]
David R. Karger, Sewoong Oh, and Devavrat Shah. Budget-optimal task allocation for reliable crowdsourcing systems. CoRR, abs/1110.3564, 2011.
[18]
Xuan Liu, Meiyu Lu, Beng Chin Ooi, Yanyan Shen, Sai Wu, and Meihui Zhang. Cdas: a crowdsourcing data analytics system. Proc. VLDB Endow., 5(10):1040--1051, June 2012.
[19]
A. Marcus, E. Wu, D. Karger, S. Madden, and R. Miller. Human-powered sorts and joins. In VLDB, 2012.
[20]
Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM Algorithm and Extensions (Wiley Series in Probability and Statistics). Wiley-Interscience, 2 edition, March 2008.
[21]
A. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: Algorithms for filtering data with humans. In SIGMOD, 2012.
[22]
A. Parameswaran, A. Das Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. In VLDB, 2011.
[23]
R. Gomes et al. Crowdclustering. In NIPS, 2011.
[24]
A. Ramesh, A. Parameswaran, H. Garcia-Molina, and N. Polyzotis. Identifying reliable workers swiftly. Infolab technical report, Stanford University, 2012.
[25]
Vikas C. Raykar and Shipeng Yu. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research, 13:491--518, 2012.
[26]
Flavio P. Ribeiro, Dinei A. F. Florêncio, and Vitor H. Nascimento. Crowdsourcing subjective image quality evaluation. In ICIP, 2011.
[27]
D. B. Rubin. Multiple Imputation for Nonresponse in Surveys. Wiley, 1987.
[28]
V. S. Sheng, F. Provost, and P. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In SIGKDD, pages 614--622, 2008.
[29]
Yuandong Tian and Jun Zhu. Learning from crowds in the presence of schools of thought. In KDD, 2012.
[30]
B. Walczak and D.L. Massart. Dealing with missing data: Part ii. Chemometrics and Intelligent Laboratory Systems, 58(1):29 -- 42, 2001.
[31]
J. Wang, T. Kraska, M. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. In VLDB, 2012.
[32]
Larry Wasserman. All of Statistics. Springer, 2003.
[33]
P. Welinder and P. Perona. Online crowdsourcing: rating annotators and obtaining cost-effective labels. In CVPR, 2010.
[34]
Edwin B. Wilson. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158):209--212, 1927.
[35]
O. Zaidan and C. Callison-Burch. Feasibility of human-in-the-loop minimum error rate training. In EMNLP, 2009.

Cited By

View all
  • (2022)Combining Human and Machine Confidence in Truthfulness AssessmentJournal of Data and Information Quality10.1145/354691615:1(1-17)Online publication date: 28-Dec-2022
  • (2021)Return-of-Interest Conscious Truth Inference for Crowdsourcing Queries2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)10.1109/TAAI54685.2021.00020(60-65)Online publication date: Nov-2021
  • (2021)Data Enrichment in the Information Graphs Environment Based on a Specialized Architecture of Information ChannelsProcedia Computer Science10.1016/j.procs.2021.07.001190(492-499)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. confidence
  2. crowdsourcing

Qualifiers

  • Poster

Conference

KDD' 13
Sponsor:

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Combining Human and Machine Confidence in Truthfulness AssessmentJournal of Data and Information Quality10.1145/354691615:1(1-17)Online publication date: 28-Dec-2022
  • (2021)Return-of-Interest Conscious Truth Inference for Crowdsourcing Queries2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)10.1109/TAAI54685.2021.00020(60-65)Online publication date: Nov-2021
  • (2021)Data Enrichment in the Information Graphs Environment Based on a Specialized Architecture of Information ChannelsProcedia Computer Science10.1016/j.procs.2021.07.001190(492-499)Online publication date: 2021
  • (2021)Strong natural language query generationInformation Retrieval Journal10.1007/s10791-021-09395-3Online publication date: 15-Jul-2021
  • (2021)A Conceptual Professional Assessment Model Based RDF Data CrowdsourcingAdvances in Natural Computation, Fuzzy Systems and Knowledge Discovery10.1007/978-3-030-70665-4_5(35-47)Online publication date: 27-Jun-2021
  • (2020)Fast and three-riousProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525245(3280-3291)Online publication date: 13-Jul-2020
  • (2019)The ever evolving online labor marketProceedings of the VLDB Endowment10.14778/3352063.335211412:12(1978-1981)Online publication date: 1-Aug-2019
  • (2019)Knowledge Enhanced Quality Estimation for CrowdsourcingIEEE Access10.1109/ACCESS.2019.29321497(106694-106704)Online publication date: 2019
  • (2019)Quality Management of Workers in an In-House Crowdsourcing-Based Framework for Deduplication of Organizations’ DatabasesIEEE Access10.1109/ACCESS.2019.29249797(90715-90730)Online publication date: 2019
  • (2018)Analyzing Payment-Driven Targeted Q8A SystemsACM Transactions on Social Computing10.1145/32814491:3(1-21)Online publication date: 10-Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media