More Web Proxy on the site http://driver.im/

research-article

Enhancing reliability using peer consistency evaluation in human computation

Authors:

Shih-Wen Huang,

Wai-Tat FuAuthors Info & Claims

CSCW '13: Proceedings of the 2013 conference on Computer supported cooperative work

Pages 639 - 648

https://doi.org/10.1145/2441776.2441847

Published: 23 February 2013 Publication History

Abstract

Peer consistency evaluation is often used in games with a purpose (GWAP) to evaluate workers using outputs of other workers without using gold standard answers. Despite its popularity, the reliability of peer consistency evaluation has never been systematically tested to show how it can be used as a general evaluation method in human computation systems. We present experimental results that show that human computation systems using peer consistency evaluation can lead to outcomes that are even better than those that evaluate workers using gold standard answers. We also show that even without evaluation, simply telling the workers that their answers will be used as future evaluation standards can significantly enhance the workers' performance. Results have important implication for methods that improve the reliability of human computation systems.

References

[1]

Ahn, L. V., Blum, M., Hopper, N. J., and Langford, J. Captcha: using hard ai problems for security. In Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques, EUROCRYPT'03, Springer-Verlag (Berlin, Heidelberg, 2003), 294--311.

Digital Library

[2]

Bandiera, O., Barankay, I., and Rasul, I. Social preferences and the response to incentives: Evidence from personnel data. The Quarterly Journal of Economics 120, 3 (2005), 917--962.

[3]

Dawid, A. P., and Skene, A. M. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), pp. 20--28.

[4]

Dow, S., Kulkarni, A., Klemmer, S., and Hartmann, B. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1013--1022.

Digital Library

[5]

Gneezy, U., and Rustichini, A. Pay enough or don't pay at all. The Quarterly Journal of Economics 115, 3 (2000), 791--810.

[6]

Harris, C. G. You're hired! an examination of crowdsourcing incentive models in human resourse tasks. In Proceedings of WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (2011).

[7]

Hirth, M., Hossfeld, T., and Tran-Gia, P. Cost-optimal validation mechanisms and cheat-detection for crowdsourcing platforms. In Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2011 Fifth International Conference on (30 2011-july 2 2011), 316--321.

Digital Library

[8]

Huang, E., Zhang, H., Parkes, D. C., Gajos, K. Z., and Chen, Y. Toward automatic task design: a progress report. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 77--85.

Digital Library

[9]

Huang, S.-W., and Fu, W.-T. Systematic analysis of output agreement games: Effects of gaming environment, social interaction, and feedback. In Proceedings of HCOMP12: The 4th Workshop on Human Computation (2012).

[10]

Ipeirotis, P. G. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21.

Digital Library

[11]

Ipeirotis, P. G., Provost, F., and Wang, J. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 64--67.

Digital Library

[12]

Kittur, A., Khamkar, S., André, P., and Kraut, R. Crowdweaver: visually managing complex crowd work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1033--1036.

Digital Library

[13]

Kittur, A., Smus, B., Khamkar, S., and Kraut, R. E. Crowdforge: crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology, UIST '11, ACM (New York, NY, USA, 2011), 43--52.

Digital Library

[14]

Kulkarni, A., Can, M., and Hartmann, B. Collaboratively crowdsourcing workflows with turkomatic. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1003--1012.

Digital Library

[15]

Law, E., and Von Ahn, L. Human Computation. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2011.

Digital Library

[16]

Le, J., Edmonds, A., Hester, V., and Biewald, L. Ensuring quality in crowdsourced search relevance evaluation: The effects of training qustion distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (2010).

[17]

Liem, B., Zhang, H., and Chen, Y. An Iterative Dual Pathway Structure for Speech-to-Text Transcription. In Proceedings of the AAAI Workshop on Human Computation (HCOMP) (2011).

[18]

Lin, C., Mausam, M., and Weld, D. Crowdsourcing control: Moving beyond multiple choice. In Proceedings of HCOMP12: The 4th Workshop on Human Computation (2012).

[19]

Lin, C. H., Mausam, and Weld, D. S. Dynamically switching between synergistic workflows for crowdsourcing. In AAAI (2012).

[20]

Little, G., Chilton, L. B., Goldman, M., and Miller, R. C. Turkit: tools for iterative tasks on mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '09, ACM (New York, NY, USA, 2009), 29--30.

Digital Library

[21]

Mason, W., and Watts, D. J. Financial incentives and the "performance of crowds". SIGKDD Explor. Newsl. 11, 2 (May 2010), 100--108.

Digital Library

[22]

Oleson, D., Sorokin, A., Laughlin, G., Hester, V., Le, J., and Biewald, L. Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In Proceedings of HCOMP11: The 3rd Workshop on Human Computation (2011).

[23]

Paritosh, P. Human computation must be reproducible. In Proceedings of CrowdSearch: Crowdsourcing Web search 2012 (2012).

[24]

Quinn, A. J., and Bederson, B. B. Human computation: a survey and taxonomy of a growing field. In Proceedings of the 2011 annual conference on Human factors in computing systems, CHI '11, ACM (New York, NY, USA, 2011), 1403--1412.

Digital Library

[25]

Robertson, S., Vojnovic, M., and Weber, I. Rethinking the esp game. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, CHI EA '09, ACM (New York, NY, USA, 2009), 3937--3942.

Digital Library

[26]

Seemakurty, N., Chu, J., von Ahn, L., and Tomasic, A. Word sense disambiguation via human computation. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 60--63.

Digital Library

[27]

Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, Association for Computational Linguistics (Stroudsburg, PA, USA, 2008), 254--263.

Digital Library

[28]

Sun, Y.-A., Roy, S., and Little, G. D. Beyond independent agreement: A tournament selection approach for quality assurance of human computation tasks. In Proceedings of HCOMP11: The 3rd Workshop on Human Computation (2011).

[29]

von Ahn, L., and Dabbish, L. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI '04, ACM (New York, NY, USA, 2004), 319--326.

Digital Library

[30]

von Ahn, L., and Dabbish, L. Designing games with a purpose. Commun. ACM 51 (Aug. 2008), 58--67.

Digital Library

Cited By

Han TXu WFang YDing X(2025)Large Scale Anonymous Collusion and its detection in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125284259(125284)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125284
Xie HLui J(2023)Quantifying Worker Reliability for Crowdsensing Applications: Robust Feedback Rating and ConvergenceIEEE Transactions on Mobile Computing10.1109/TMC.2021.307247722:1(459-471)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TMC.2021.3072477
Wang XChen LBan TLyu DGuan YWu XZhou XChen H(2023)Accurate Label Refinement From Multiannotator of Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.324140261(1-13)Online publication date: 2023
https://doi.org/10.1109/TGRS.2023.3241402
Show More Cited By

Index Terms

Enhancing reliability using peer consistency evaluation in human computation
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems

Recommendations

Who are the crowdworkers?: shifting demographics in mechanical turk
CHI EA '10: CHI '10 Extended Abstracts on Human Factors in Computing Systems

Amazon Mechanical Turk (MTurk) is a crowdsourcing system in which tasks are distributed to a population of thousands of anonymous workers for completion. This system is increasingly popular with researchers and developers. Here we extend previous ...
CrowdForge: crowdsourcing complex work
CHI EA '11: CHI '11 Extended Abstracts on Human Factors in Computing Systems

Micro-task markets such as Amazon's Mechanical Turk represent a new paradigm for accomplishing work, in which employers can tap into a large population of workers around the globe to accomplish tasks in a fraction of the time and money of more ...
Generating ground truth for music mood classification using mechanical turk
JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

Mood is an important access point in music digital libraries and online music repositories, but generating ground truth for evaluating various music mood classification algorithms is a challenging problem. This is because collecting enough human ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CSCW '13: Proceedings of the 2013 conference on Computer supported cooperative work

February 2013

1594 pages

ISBN:9781450313315

DOI:10.1145/2441776

General Chairs:
Amy Bruckman
Georgia Institute of Technology, USA
,
Scott Counts
Microsoft Research, USA
,
Program Chairs:
Cliff Lampe
University of Michigan, USA
,
Loren Terveen
University of Minnesota, USA

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CSCW '13

Sponsor:

SIGCHI

CSCW '13: Computer Supported Cooperative Work

February 23 - 27, 2013

Texas, San Antonio, USA

Acceptance Rates

Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

Upcoming Conference

CSCW '25

Sponsor:
sigchi

Computer-Supported Cooperative Work and Social Computing

October 18 - 22, 2025

Bergen , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
446
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han TXu WFang YDing X(2025)Large Scale Anonymous Collusion and its detection in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125284259(125284)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125284
Xie HLui J(2023)Quantifying Worker Reliability for Crowdsensing Applications: Robust Feedback Rating and ConvergenceIEEE Transactions on Mobile Computing10.1109/TMC.2021.307247722:1(459-471)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TMC.2021.3072477
Wang XChen LBan TLyu DGuan YWu XZhou XChen H(2023)Accurate Label Refinement From Multiannotator of Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.324140261(1-13)Online publication date: 2023
https://doi.org/10.1109/TGRS.2023.3241402
Hettiachchi DSchaekermann MMcKinney TLease M(2021)The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can HelpProceedings of the ACM on Human-Computer Interaction10.1145/34760735:CSCW2(1-26)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3476073
Alqershi FAl-Qurishi MAksoy MAlrubaian MImran M(2020)A Robust Consistency Model of Crowd Workers in Text Labeling TasksIEEE Access10.1109/ACCESS.2020.30227738(168381-168393)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3022773
Chung JSong JKutty SHong SKim JLasecki W(2019)Efficient Elicitation Approaches to Estimate Collective Crowd AnswersProceedings of the ACM on Human-Computer Interaction10.1145/33591643:CSCW(1-25)Online publication date: 7-Nov-2019
https://dl.acm.org/doi/10.1145/3359164
Nassar LKarray F(2019)Overview of the crowdsourcing processKnowledge and Information Systems10.1007/s10115-018-1235-560:1(1-24)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s10115-018-1235-5
Qiu CSquicciarini AKhare DCarminati BCaverlee JAndre EKoenig SDastani MSukthankar G(2018)CrowdEvalProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237922(1486-1494)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237922
Barz Bvan Dijk TSpaan BDenzler J(2018)Putting User Reputation on the MapProceedings of the 2nd ACM SIGSPATIAL Workshop on Geospatial Humanities10.1145/3282933.3282937(1-6)Online publication date: 6-Nov-2018
https://dl.acm.org/doi/10.1145/3282933.3282937
Daniel FKucherbaev PCappiello CBenatallah BAllahbakhsh M(2018)Quality Control in CrowdsourcingACM Computing Surveys10.1145/314814851:1(1-40)Online publication date: 4-Jan-2018
https://dl.acm.org/doi/10.1145/3148148
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten