[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2470654.2470684acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Crowdsourcing performance evaluations of user interfaces

Published: 27 April 2013 Publication History

Abstract

Online labor markets, such as Amazon's Mechanical Turk (MTurk), provide an attractive platform for conducting human subjects experiments because the relative ease of recruitment, low cost, and a diverse pool of potential participants enable larger-scale experimentation and faster experimental revision cycle compared to lab-based settings. However, because the experimenter gives up the direct control over the participants' environments and behavior, concerns about the quality of the data collected in online settings are pervasive. In this paper, we investigate the feasibility of conducting online performance evaluations of user interfaces with anonymous, unsupervised, paid participants recruited via MTurk. We implemented three performance experiments to re-evaluate three previously well-studied user interface designs. We conducted each experiment both in lab and online with participants recruited via MTurk. The analysis of our results did not yield any evidence of significant or substantial differences in the data collected in the two settings: All statistically significant differences detected in lab were also present on MTurk and the effect sizes were similar. In addition, there were no significant differences between the two settings in the raw task completion times, error rates, consistency, or the rates of utilization of the novel interaction mechanisms introduced in the experiments. These results suggest that MTurk may be a productive setting for conducting performance evaluations of user interfaces providing a complementary approach to existing methodologies.

References

[1]
Callison-Burch, C. Fast, cheap, and creative: evaluating translation quality using amazon's mechanical turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP '09, Association for Computational Linguistics (Stroudsburg, PA, USA, 2009), 286--295.
[2]
Chandler, D., and Kapelner, A. Breaking monotony with meaning: Motivation in crowdsourcing markets. Working Paper, May 2010.
[3]
Cole, F., Sanik, K., DeCarlo, D., Finkelstein, A., Funkhouser, T., Rusinkiewicz, S., and Singh, M. How well do line drawings depict shape? SIGGRAPH '09: SIGGRAPH 2009 papers (July 2009).
[4]
Devore, J. Probability and Statistics for Engineering and the Sciences, seventh ed. Thomson Higher Education, 2008.
[5]
Gajos, K. Z., Czerwinski, M., Tan, D. S., and Weld, D. S. Exploring the design space for adaptive graphical user interfaces. In AVI '06: Proceedings of the working conference on Advanced visual interfaces, ACM Press (New York, NY, USA, 2006), 201--208.
[6]
Gajos, K. Z., Everitt, K., Tan, D. S., Czerwinski, M., and Weld, D. S. Predictability and accuracy in adaptive user interfaces. In CHI '08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM (New York, NY, USA, 2008), 1271--1274.
[7]
Grossman, T., and Balakrishnan, R. The bubble cursor: enhancing target acquisition by dynamic resizing of the cursor's activation area. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI '05, ACM (New York, NY, USA, 2005), 281--290.
[8]
Heer, J., and Bostock, M. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the 28th international conference on Human factors in computing systems, CHI '10, ACM (New York, NY, USA, 2010), 203--212.
[9]
Horton, J. J., Rand, D. G., and Zeckhauser, R. J. The online laboratory: Conducting experiments in a real labor market. Experimental Economics (2011).
[10]
Kapelner, A., and Chandler, D. Preventing Satisficing in Online Surveys: A "Kapcha" to Ensure Higher Quality Data. In CrowdConf (2010).
[11]
Kittur, A., Chi, E. H., and Suh, B. Crowdsourcing user studies with mechanical turk. In Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, CHI '08, ACM (New York, NY, USA, 2008), 453--456.
[12]
Kong, N., Heer, J., and Agrawala, M. Perceptual Guidelines for Creating Rectangular Treemaps. Visualization and Computer Graphics, IEEE Transactions on 16, 6 (2010), 990--998.
[13]
Little, G., Chilton, L., Goldman, M., and Miller, R. Turkit: human computation algorithms on mechanical turk. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 57--66.
[14]
Mao, A., Chen, Y., Gajos, K., Parkes, D., Procaccia, A., and Zhang, H. Turkserver: Enabling synchronous and longitudinal online experiments. In Proceedings of HCOMP'12 (2012).
[15]
Mason, W., and Suri, S. Conducting behavioral research on amazon's mechanical turk. Behavior Research Methods (2010), 1--23.
[16]
Mason, W., and Watts, D. Financial incentives and the "performance of crowds". HCOMP '09: Proceedings of the ACM SIGKDD Workshop on Human Computation (June 2009).
[17]
Noronha, J., Hysen, E., Zhang, H., and Gajos, K. Z. Platemate: Crowdsourcing nutrition analysis from food photographs. In Proceedings of UIST'11 (2011).
[18]
Oleson, D., Sorokin, A., Laughlin, G., Hester, V., Le, J., and Biewald, L. Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In Proceedings of HCOMP'11 (2011).
[19]
Oppenheimer, D., Meyvis, T., and Davidenko, N. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology 45, 4 (2009), 867--872.
[20]
Paolacci, G., Chandler, J., and Ipeirotis, P. Running experiments on amazon mechanical turk. Judgment and Decision Making 5, 5 (2010), 411--419.
[21]
Prelec, D. A Bayesian Truth Serum for Subjective Data. Science 306, 5695 (Oct. 2004), 462--466.
[22]
Rand, D. G. The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of theoretical biology (2012).
[23]
Rzeszotarski, J. M., and Kittur, A. Instrumenting the crowd: Using implicit behavioral measures to predict task performance. In Proceedings of the 24th annual ACM symposium on User interface software and technology, UIST '11, ACM (New York, NY, USA, 2011).
[24]
Schmidt, L. Crowdsourcing for human subjects research. In CrowdConf (2010).
[25]
Sears, A., and Shneiderman, B. Split menus: effectively using selection frequency to organize menus. ACM Trans. Comput.-Hum. Interact. 1, 1 (1994), 27--51.
[26]
Shaw, A. D., Horton, J. J., and Chen, D. L. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work, CSCW '11, ACM (New York, NY, USA, 2011), 275--284.
[27]
Suri, S., and Watts, D. Cooperation and contagion in networked public goods experiments. Arxiv preprint arXiv10081276 (2010).

Cited By

View all
  • (2024)Sample-size and Repetition Effects on the Prediction Accuracy of Time and Error-rate Models in Steering TasksJournal of Information Processing10.2197/ipsjjip.32.24732(247-255)Online publication date: 2024
  • (2024)0.2-mm-Step Verification of the Dual Gaussian Distribution Model with Large Sample Size for Predicting Tap Success RatesProceedings of the ACM on Human-Computer Interaction10.1145/36981538:ISS(674-693)Online publication date: 24-Oct-2024
  • (2024)The Effect of Latency on Movement Time in Path-steeringProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642316(1-19)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. Crowdsourcing performance evaluations of user interfaces

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
    April 2013
    3550 pages
    ISBN:9781450318990
    DOI:10.1145/2470654
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 April 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowdsourcing
    2. mechanical turk
    3. user interface evaluation

    Qualifiers

    • Research-article

    Conference

    CHI '13
    Sponsor:

    Acceptance Rates

    CHI '13 Paper Acceptance Rate 392 of 1,963 submissions, 20%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)47
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sample-size and Repetition Effects on the Prediction Accuracy of Time and Error-rate Models in Steering TasksJournal of Information Processing10.2197/ipsjjip.32.24732(247-255)Online publication date: 2024
    • (2024)0.2-mm-Step Verification of the Dual Gaussian Distribution Model with Large Sample Size for Predicting Tap Success RatesProceedings of the ACM on Human-Computer Interaction10.1145/36981538:ISS(674-693)Online publication date: 24-Oct-2024
    • (2024)The Effect of Latency on Movement Time in Path-steeringProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642316(1-19)Online publication date: 11-May-2024
    • (2024)Behavioral Differences between Tap and Swipe: Observations on Time, Error, Touch-point Distribution, and Trajectory for Tap-and-swipe Enabled TargetsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642272(1-12)Online publication date: 11-May-2024
    • (2024)COR Themes for Readability from Iterative FeedbackProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642108(1-23)Online publication date: 11-May-2024
    • (2024)Aspectual Processing Shifts Visual Event ApprehensionCognitive Science10.1111/cogs.1347648:6Online publication date: 24-Jun-2024
    • (2024) Deciding to Stop Early or Continue the Experiment After Checking p -Values at Interim Points: Introducing Group Sequential Designs to UI-Based Comparative Studies International Journal of Human–Computer Interaction10.1080/10447318.2024.2407662(1-10)Online publication date: 8-Oct-2024
    • (2024)Relative Merits of Nominal and Effective Indexes of Difficulty of Fitts’ Law: Effects of Sample Size and the Number of Repetitions on Model FitInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2303201(1-18)Online publication date: 14-Jan-2024
    • (2024)Creating and validating a scholarly knowledge graph using natural language processing and microtask crowdsourcingInternational Journal on Digital Libraries10.1007/s00799-023-00360-725:2(273-285)Online publication date: 1-Jun-2024
    • (2023)Accuracy and Reliability of At-Home Quantification of Motor Impairments Using a Computer-Based Pointing Task with Children with Ataxia-TelangiectasiaACM Transactions on Accessible Computing10.1145/358179016:1(1-25)Online publication date: 28-Mar-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media