[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Leveraging crowd skills and consensus for collaborative web-resource labeling

Published: 01 June 2019 Publication History

Abstract

In this paper, we propose a three-stage approach called CLabel for enforcing collaborative web-resource labeling in form of a crowdsourcing process. In CLabel, the results of both crowdsourcing and automated tasks are combined into a coherent process flow. CLabel leverages on crowd preferences and consensus, for capturing the different interpretations that can be associated with a considered web resource in form of different candidate labels and for selecting the most agreed candidate(s) as the final result. CLabel succeeds to be particularly appropriate for application to labeling problems and scenarios where human feelings and preferences are decisive to select the answers (i.e., labels) supported by the majority of the crowd. Moreover, CLabel succeeds in providing label variety when multiple labels are required for a suitable resource annotation, thus avoiding duplicate or repetitive labels.
A real case-study of collective web-resource labeling in the music domain is presented, where we discuss the task/consensus configuration and obtained labels as well as the results of two specific tests, respectively devoted to the analysis of label variety, and to the comparison of CLabel results against a reference classification system, where music resources are labeled using predefined categories based on a mix of social-based and expert-based recommendations.

Highlights

A three-stage approach called CLabel is proposed for enforcing collaborative web-resource labeling.
CLabel is characterized by the disciplined combination of automatic tools/techniques and crowdsourcing.
CLabel is capable of dealing with complex crowdsourcing processes where multiple task typologies, including create-questions, are required.
CLabel leverages on crowd preferences and consensus for capturing the different interpretations that can be associated with a considered web resources.

References

[1]
Karger D.R., Oh S., Shah D., Efficient crowdsourcing for multi-class labeling, in: Proc. of the Int. Conference on Measurement and Modeling of Computer Systems, (SIGMETRICS 2013), ACM, Pittsburgh, PA, USA, 2013, pp. 81–92.
[2]
Nguyen Q.V.H., Duong C.T., Nguyen T.T., Weidlich M., Aberer K., Yin H., Zhou X., Argument discovery via crowdsourcing, VLDB J. 26 (4) (2017) 511–535.
[3]
Demartini G., Difallah D.E., Cudré-Mauroux P., ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking, in: Proc. of the 21st WWW Conference, ACM, Lyon, France, 2012, pp. 469–478.
[4]
Arolas E.E., de Guevara F. G.-L., Towards an integrated crowdsourcing definition, J. Inf. Sci. 38 (2) (2012) 189–200.
[5]
Khattak F.K., Salleb-Aouissi A., Robust crowd labeling using little expertise, in: Proc. of the 16th Int. Conference on Discovery Science, (DS 2013), Springer, Singapore, 2013, pp. 94–109.
[6]
W. Tang, M. Lease, Semi-supervised consensus labeling for crowdsourcing, in: Proc. of the 2nd SIGIR Int. Workshop on Crowdsourcing for Information Retrieval, CIR 2011, 2011.
[7]
D.W. Barowy, C. Curtsinger, E.D. Berger, A. McGregor, AutoMan: A platform for integrating human-based and digital computation, in: Proc. of the 27th Annual ACM SIGPLAN OOPSLA Conference, Tucson, AZ, USA, 2012.
[8]
A. Bozzon, M. Brambilla, S. Ceri, A. Mauri, Reactive crowdsourcing, in: Proc. of the 22nd Int. WWW Conference, Rio de Janeiro, Brazil, 2013.
[9]
Chilton L.B., Little G., Edge D., Weld D.S., Landay J.A., Cascade: Crowdsourcing taxonomy creation, in: Proc. of the Int. Conference on Human Factors in Computing Systems, (CHI 2013), ACM, Paris, France, 2013, pp. 1999–2008.
[10]
M.S. Bernstein, G. Little, R.C. Miller, B. Hartmann, M.S. Ackerman, D.R. Karger, D. Crowell, K. Panovich, Soylent: a word processor with a crowd inside, in: Proc. of the 23rd Annual ACM Symposium on User Interface Software and Technology, UIST 2010, New York, NY, USA, 2010, pp. 313–322.
[11]
A. Kulkarni, M. Can, B. Hartmann, Collaboratively crowdsourcing workflows with turkomatic, in: Proc. of the ACM Conference on Computer Supported Cooperative Work, Seattle, Washington, USA, 2012, pp. 1003–1012.
[12]
D. Jurgens, Embracing ambiguity: A comparison of annotation methodologies for crowdsourcing word sense labels, in: Proc. of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Atlanta, Georgia, USA, 2013, pp. 556–562.
[13]
V.S. Sheng, F.J. Provost, P.G. Ipeirotis, Get another label? Improving data quality and data mining using multiple, noisy labelers, in: Proc. of the 14th Int. Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, 2008, pp. 614–622.
[14]
S. Oyama, Y. Baba, Y. Sakurai, H. Kashima, Accurate integration of crowdsourced labels using workers’ self-reported confidence scores, in: Proc. of the 23rd Int. Joint Conference on Artificial Intelligence, IJCAI 2013, Beijing, China, 2013.
[15]
Zhang J., Sheng V.S., Li T., Label aggregation for crowdsourcing with bi-layer clustering, in: Proc. of the 40th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Shinjuku, Tokyo, Japan, 2017, pp. 921–924.
[16]
L. Yin, J. Han, W. Zhang, Y. Yu, Aggregating crowd wisdoms with label-aware autoencoders, in: Proc. of the 26th Int. Joint Conference on Artificial Intelligence, IJCAI-17, Melbourne, Australia, 2017, pp. 1325–1331.
[17]
Pion-Tonachini L., Makeig S., Kreutz-Delgado K., Crowd labeling latent dirichlet allocation, Knowl. Inf. Syst. (2017).
[18]
P.G. Ipeirotis, F. Provost, J. Wang, Quality management on amazon mechanical turk, in: Proc. of the 16th ACM SIGKDD Workshop on Human Computation, Washington, DC, USA, 2010, pp. 64–67.
[19]
M. Joglekar, H. Garcia-Molina, A.G. Parameswaran, Evaluating the crowd with confidence, in: Proc. of the 19th ACM International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 2013, pp. 686–694.
[20]
Downs J.S., Holbrook M.B., Sheng S., Cranor L.F., Are your participants gaming the system? Screening mechanical turk workers, in: Proc. of the 28th Int. Conference on Human Factors in Computing Systems, (CHI 2010), ACM, Atlanta, Georgia, USA, 2010, pp. 2399–2402.
[21]
J. Le, A. Edmonds, V. Hester, L. Biewald, Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution, in: Proc. of the 1st SIGIR Workshop on Crowdsourcing for Search Evaluation, Geneva, Switzerland, 2010, pp. 21–26.
[22]
D. Oleson, A. Sorokin, G.P. Laughlin, V. Hester, J. Le, L. Biewald, Programmatic gold: Targeted and scalable quality assurance in crowdsourcing, in: Proc. of the 1st AAAI Workshop on Human Computation, San Francisco, California, USA, 2011.
[23]
Dow S., Kulkarni A.P., Bunge B., Nguyen T., Klemmer S.R., Hartmann B., Shepherding the crowd: Managing and providing feedback to crowd workers, in: Proc. of the Int. Conference on Human Factors in Computing Systems, (CHI 2011), ACM, Vancouver, BC, Canada, 2011, pp. 1669–1674.
[24]
D.L. Hansen, P.J. Schone, D. Corey, M. Reid, J. Gehring, Quality control mechanisms for crowdsourcing: peer review, arbitration, & expertise at familysearch indexing, in: Proc. of the 16th ACM Conference on Computer Supported Cooperative Work, CSCW 2013, San Antonio, TX, USA, 2013, pp. 649–660.
[25]
Amer-Yahia S., Roy S.B., Toward worker-centric crowdsourcing, IEEE Data Eng. Bull. 39 (4) (2016) 3–13.
[26]
G. Kazai, J. Kamps, N. Milic-Frayling, Worker types and personality traits in crowdsourcing relevance labels, in: Proc. of the 20th ACM Conference on Information and Knowledge Management, CIKM, Glasgow, Scotland, UK, 2011, pp. 1941–1944.
[27]
S. Castano, A. Ferrara, S. Montanelli, A multi-dimensional approach to crowd-consensus modeling and evaluation, in: Proc. of the 34th Int. Conference on Conceptual Modeling, ER 2015, Stockholm, Sweden, 2015.
[28]
Castano S., Ferrara A., Genta L., Montanelli S., Combining crowd consensus and user trustworthiness for managing collective tasks, Future Gener. Comput. Syst. 54 (2016).
[29]
The Argo Crowdsourcing Project, http://island.ricerca.di.unimi.it/projects/argo/, Italian language.
[30]
Navarro G., A guided tour to approximate string matching, ACM Comput. Surv. 33 (1) (2001).
[31]
Manning C.D., Raghavan P., Schütze H., Introduction to Information Retrieval, Cambridge University Press, 2008.
[32]
Castano S., Ferrara A., Montanelli S., Structured data clouding across multiple webs, Inf. Syst. 37 (4) (2012).
[33]
A. Sheshadri, M. Lease, SQUARE: A benchmark for research on computing crowd consensus, in: Proc. of the 1st AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2013, Palm Springs, CA, USA, 2013.
[34]
S. Castano, A. Ferrara, S. Montanelli, Designing crowdsourcing tasks with consensus constraints, in: Proc. of the Int. IEEE Conference on Collaboration Technologies and Systems, CTS 2016, Orlando, FL, USA, 2016, pp. 97–103.
[35]

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 95, Issue C
Jun 2019
890 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 June 2019

Author Tags

  1. Crowdsourcing
  2. Consensus-based web-resource labeling
  3. Task design

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media