short-paper

On the Use of an Intermediate Class in Boolean Crowdsourced Relevance Annotations for Learning to Rank Comments

Authors:

Alberto Barrón-Cedeño,

Giovanni Da San Martino,

Simone Filice,

Alessandro MoschittiAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1209 - 1212

https://doi.org/10.1145/3077136.3080763

Published: 07 August 2017 Publication History

Get Access

Abstract

In many Information Retrieval tasks, the boundary between classes is not well defined, and assigning a document to a specific class may be complicated, even for humans. For instance, a document which is not directly related to the user's query may still contain relevant information. In this scenario, an option is to define an intermediate class collecting ambiguous instances. Yet some natural questions arise. Is this annotation strategy convenient? how should the intermediate class be treated? To answer these questions, we explored two community question answering datasets whose comments were originally annotated with three classes. We re-annotated a subset of instances considering a binary good vs bad setting. Our main contribution is to show empirically that the inclusion of an intermediate class to assess Boolean relevance is not useful. Moreover, in case the data is already annotated with a 3-class strategy, the instances from the intermediate class can be safely removed at training time.

References

[1]

Omar Alonso and Matthew Lease 2011. Tutorial: Crowdsourcing for Information Retrieval: Principles, Methods, and Applications Proceedings of the SIGIR'11. Beijing, China. https://www.slideshare.net/mattlease/crowdsourcing-for-information-retrieval-principles-methods-and-applications

Google Scholar

[2]

Alberto Barrón-Cede no, Simone Filice, Giovanni Da San Martino, Shafiq Joty, Lluís Màrquez, Preslav Nakov, and Alessandro Moschitti 2015. Thread-Level Information for Comment Classification in Community Question Answering Proceedings of ACL-HLT'15. Association for Computational Linguistics, Beijing, China, 687--693.

Google Scholar

[3]

Eyal Beigman and Beata Beigman Klebanov 2009. Learning with Annotation Noise. Proceedings ACL-IJCNLP'09 August, 280--287.

Crossref

Google Scholar

[4]

Ondvrej Bojar, Christian Buck, Chris Callison-Burch, Christian Federmann, Barry Haddow, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia 2013. Findings of the 2013 Workshop on Statistical Machine Translation Proceedings of WMT'13. Association for Computational Linguistics, Sofia, Bulgaria, 1--44.

Google Scholar

[5]

Thomas Demeester, Dolf Trieschnigg, Dong Nguyen, and Ke Hiemstra, Djoerd Zhou 2014. Overview of the TREC 2014 Federated Web Search Track Proceedings of the Twenty-Third Text REtrieval Conference. Gaithersburg, MD.

Google Scholar

[6]

Yao-Xiang Ding and Zhi-Hua Zhou 2016. Crowdsourcing with Unsure Option. In Proceedings of the NIPS '16 Workshop on Crowdsourcing and Machine Learning (CrowdML). Barcelona, Spain.

Google Scholar

[7]

Simone Filice, Danilo Croce, Alessandro Moschitti, and Roberto Basili 2016. KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers, See citeNsemeval:16, 1116--1123.

Google Scholar

[8]

Simone Filice, Giovanni Da San Martino, and Alessandro Moschitti. 2015. Structural Representations for Learning Relations between Pairs of Texts ACL-HLT '15. Association for Computational Linguistics, Beijing, China, 1003--1013.

Google Scholar

[9]

Panos Ipeirotis. 2011. Crowdsourcing using Mechanical Turk: Quality Management and Scalability Proceedings of CSDM'11. Hong Kong, China.

Google Scholar

[10]

Kalervo Jarvelin and Jaana Kekalainen 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents Proceedings of SIGIR'00. ACM, New York, NY, 41--48.

Google Scholar

[11]

Kalervo Jarvelin and Jaana Kekalainen 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (Oct. 2002), 422--446.

Digital Library

Google Scholar

Cited By

View all

Filice SMoschitti A(2019)Learning pairwise patterns in Community Question AnsweringIntelligenza Artificiale10.3233/IA-17003412:2(49-65)Online publication date: 29-Jan-2019
https://doi.org/10.3233/IA-170034

Index Terms

On the Use of an Intermediate Class in Boolean Crowdsourced Relevance Annotations for Learning to Rank Comments
1. Information systems
  1. Information retrieval

Recommendations

Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval

We study the potential of supervised learning to rank for FAQ retrieval.Supervised models offer performance improvements for this task.We explored low-effort paraphrase-based data labeling strategies.Paraphrase-based labeling was effective for the best ...
Named entity disambiguation for questions in community question answering

Named entity disambiguation (NED) refers to the task of mapping entity mentions in running texts to the correct entries in a specific knowledge base (e.g., Wikipedia). Although there has been a lot of work on NED for long and formal texts like Wikipedia ...
Multi-Emotion Estimation in Narratives from Crowdsourced Annotations
JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries

Emotion annotations are important metadata for narrative texts in digital libraries. Such annotations are necessary for automatic text-to-speech conversion of narratives and affective education support and can be used as training data for machine ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
150
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Filice SMoschitti A(2019)Learning pairwise patterns in Community Question AnsweringIntelligenza Artificiale10.3233/IA-17003412:2(49-65)Online publication date: 29-Jan-2019
https://doi.org/10.3233/IA-170034

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval

Named entity disambiguation for questions in community question answering

Multi-Emotion Estimation in Narratives from Crowdsourced Annotations