[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2441776.2441847acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Enhancing reliability using peer consistency evaluation in human computation

Published: 23 February 2013 Publication History

Abstract

Peer consistency evaluation is often used in games with a purpose (GWAP) to evaluate workers using outputs of other workers without using gold standard answers. Despite its popularity, the reliability of peer consistency evaluation has never been systematically tested to show how it can be used as a general evaluation method in human computation systems. We present experimental results that show that human computation systems using peer consistency evaluation can lead to outcomes that are even better than those that evaluate workers using gold standard answers. We also show that even without evaluation, simply telling the workers that their answers will be used as future evaluation standards can significantly enhance the workers' performance. Results have important implication for methods that improve the reliability of human computation systems.

References

[1]
Ahn, L. V., Blum, M., Hopper, N. J., and Langford, J. Captcha: using hard ai problems for security. In Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques, EUROCRYPT'03, Springer-Verlag (Berlin, Heidelberg, 2003), 294--311.
[2]
Bandiera, O., Barankay, I., and Rasul, I. Social preferences and the response to incentives: Evidence from personnel data. The Quarterly Journal of Economics 120, 3 (2005), 917--962.
[3]
Dawid, A. P., and Skene, A. M. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), pp. 20--28.
[4]
Dow, S., Kulkarni, A., Klemmer, S., and Hartmann, B. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1013--1022.
[5]
Gneezy, U., and Rustichini, A. Pay enough or don't pay at all. The Quarterly Journal of Economics 115, 3 (2000), 791--810.
[6]
Harris, C. G. You're hired! an examination of crowdsourcing incentive models in human resourse tasks. In Proceedings of WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (2011).
[7]
Hirth, M., Hossfeld, T., and Tran-Gia, P. Cost-optimal validation mechanisms and cheat-detection for crowdsourcing platforms. In Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2011 Fifth International Conference on (30 2011-july 2 2011), 316--321.
[8]
Huang, E., Zhang, H., Parkes, D. C., Gajos, K. Z., and Chen, Y. Toward automatic task design: a progress report. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 77--85.
[9]
Huang, S.-W., and Fu, W.-T. Systematic analysis of output agreement games: Effects of gaming environment, social interaction, and feedback. In Proceedings of HCOMP12: The 4th Workshop on Human Computation (2012).
[10]
Ipeirotis, P. G. Analyzing the amazon mechanical turk marketplace. XRDS 17, 2 (Dec. 2010), 16--21.
[11]
Ipeirotis, P. G., Provost, F., and Wang, J. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 64--67.
[12]
Kittur, A., Khamkar, S., André, P., and Kraut, R. Crowdweaver: visually managing complex crowd work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1033--1036.
[13]
Kittur, A., Smus, B., Khamkar, S., and Kraut, R. E. Crowdforge: crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology, UIST '11, ACM (New York, NY, USA, 2011), 43--52.
[14]
Kulkarni, A., Can, M., and Hartmann, B. Collaboratively crowdsourcing workflows with turkomatic. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, ACM (New York, NY, USA, 2012), 1003--1012.
[15]
Law, E., and Von Ahn, L. Human Computation. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2011.
[16]
Le, J., Edmonds, A., Hester, V., and Biewald, L. Ensuring quality in crowdsourced search relevance evaluation: The effects of training qustion distribution. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (2010).
[17]
Liem, B., Zhang, H., and Chen, Y. An Iterative Dual Pathway Structure for Speech-to-Text Transcription. In Proceedings of the AAAI Workshop on Human Computation (HCOMP) (2011).
[18]
Lin, C., Mausam, M., and Weld, D. Crowdsourcing control: Moving beyond multiple choice. In Proceedings of HCOMP12: The 4th Workshop on Human Computation (2012).
[19]
Lin, C. H., Mausam, and Weld, D. S. Dynamically switching between synergistic workflows for crowdsourcing. In AAAI (2012).
[20]
Little, G., Chilton, L. B., Goldman, M., and Miller, R. C. Turkit: tools for iterative tasks on mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '09, ACM (New York, NY, USA, 2009), 29--30.
[21]
Mason, W., and Watts, D. J. Financial incentives and the "performance of crowds". SIGKDD Explor. Newsl. 11, 2 (May 2010), 100--108.
[22]
Oleson, D., Sorokin, A., Laughlin, G., Hester, V., Le, J., and Biewald, L. Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In Proceedings of HCOMP11: The 3rd Workshop on Human Computation (2011).
[23]
Paritosh, P. Human computation must be reproducible. In Proceedings of CrowdSearch: Crowdsourcing Web search 2012 (2012).
[24]
Quinn, A. J., and Bederson, B. B. Human computation: a survey and taxonomy of a growing field. In Proceedings of the 2011 annual conference on Human factors in computing systems, CHI '11, ACM (New York, NY, USA, 2011), 1403--1412.
[25]
Robertson, S., Vojnovic, M., and Weber, I. Rethinking the esp game. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, CHI EA '09, ACM (New York, NY, USA, 2009), 3937--3942.
[26]
Seemakurty, N., Chu, J., von Ahn, L., and Tomasic, A. Word sense disambiguation via human computation. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '10, ACM (New York, NY, USA, 2010), 60--63.
[27]
Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, Association for Computational Linguistics (Stroudsburg, PA, USA, 2008), 254--263.
[28]
Sun, Y.-A., Roy, S., and Little, G. D. Beyond independent agreement: A tournament selection approach for quality assurance of human computation tasks. In Proceedings of HCOMP11: The 3rd Workshop on Human Computation (2011).
[29]
von Ahn, L., and Dabbish, L. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI '04, ACM (New York, NY, USA, 2004), 319--326.
[30]
von Ahn, L., and Dabbish, L. Designing games with a purpose. Commun. ACM 51 (Aug. 2008), 58--67.

Cited By

View all
  • (2025)Large Scale Anonymous Collusion and its detection in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125284259(125284)Online publication date: Jan-2025
  • (2023)Quantifying Worker Reliability for Crowdsensing Applications: Robust Feedback Rating and ConvergenceIEEE Transactions on Mobile Computing10.1109/TMC.2021.307247722:1(459-471)Online publication date: 1-Jan-2023
  • (2023)Accurate Label Refinement From Multiannotator of Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.324140261(1-13)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. Enhancing reliability using peer consistency evaluation in human computation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CSCW '13: Proceedings of the 2013 conference on Computer supported cooperative work
    February 2013
    1594 pages
    ISBN:9781450313315
    DOI:10.1145/2441776
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 February 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowdsourcing
    2. evaluation
    3. human computation
    4. mechanical turk
    5. user behavior

    Qualifiers

    • Research-article

    Conference

    CSCW '13
    Sponsor:
    CSCW '13: Computer Supported Cooperative Work
    February 23 - 27, 2013
    Texas, San Antonio, USA

    Acceptance Rates

    Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

    Upcoming Conference

    CSCW '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Large Scale Anonymous Collusion and its detection in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125284259(125284)Online publication date: Jan-2025
    • (2023)Quantifying Worker Reliability for Crowdsensing Applications: Robust Feedback Rating and ConvergenceIEEE Transactions on Mobile Computing10.1109/TMC.2021.307247722:1(459-471)Online publication date: 1-Jan-2023
    • (2023)Accurate Label Refinement From Multiannotator of Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.324140261(1-13)Online publication date: 2023
    • (2021)The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can HelpProceedings of the ACM on Human-Computer Interaction10.1145/34760735:CSCW2(1-26)Online publication date: 18-Oct-2021
    • (2020)A Robust Consistency Model of Crowd Workers in Text Labeling TasksIEEE Access10.1109/ACCESS.2020.30227738(168381-168393)Online publication date: 2020
    • (2019)Efficient Elicitation Approaches to Estimate Collective Crowd AnswersProceedings of the ACM on Human-Computer Interaction10.1145/33591643:CSCW(1-25)Online publication date: 7-Nov-2019
    • (2019)Overview of the crowdsourcing processKnowledge and Information Systems10.1007/s10115-018-1235-560:1(1-24)Online publication date: 1-Jul-2019
    • (2018)CrowdEvalProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237922(1486-1494)Online publication date: 9-Jul-2018
    • (2018)Putting User Reputation on the MapProceedings of the 2nd ACM SIGSPATIAL Workshop on Geospatial Humanities10.1145/3282933.3282937(1-6)Online publication date: 6-Nov-2018
    • (2018)Quality Control in CrowdsourcingACM Computing Surveys10.1145/314814851:1(1-40)Online publication date: 4-Jan-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media