More Web Proxy on the site http://driver.im/

tutorial

Opening the NLP Blackbox - Analysis and Evaluation of NLP Models: Methods, Challenges and Opportunities

Authors:

Sandya Mannarswamy,

Saravanan ChidambaramAuthors Info & Claims

CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)

Pages 447 - 448

https://doi.org/10.1145/3430984.3431969

Published: 02 January 2021 Publication History

Abstract

Although Rapid progress in NLP Research has seen a swift translation to real world commercial deployment. While a number of NLP applications have emerged, failures of translating scientific progress in NLP to real-world software have also been considerable. Evaluation of NLP models is often limited to held out test set accuracy on a handful of datasets. Lack of rigorous evaluation leads to over-estimation of generalization performance of the built model. A lack of understanding of the inner workings of the model results in ‘Clever Hans’ models which fail in real world deployments. Of late there has been considerable research interest into analysis methods for NLP models, and evaluation techniques going beyond test set performance metrics. However, this area of work is still not widely disseminated through the NLP community. This tutorial aims to address this gap, by providing a detailed overview of NLP model analysis and evaluation methods, discuss their strengths and weaknesses and also point towards future research directions in this area.

References

[1]

Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72

[2]

Dieuwke Hupkes, Sara Veldhoen, and Willem Zuidema. 2018. Visualisation and ’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research, 61:907–926

Digital Library

[3]

Sara Veldhoen, Dieuwke Hupkes, and Willem Zuidema. 2016. Diagnostic Classifiers: Revealing How Neural Networks Process Hierarchical Structure. In CEUR Workshop Proceedings

[4]

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017a. Analysis of sentence embedding models using prediction tasks in natural language processing. IBM Journal of Research and Development, 61(4):3–9

[5]

Alexis Conneau and Douwe Kiela. 2018. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan. European Languages Resources Association

[6]

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-Grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks. In International Conference on Learning Representations 2017.

[7]

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896. Association for Computational Linguistics.

[8]

Tom McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy. Association for Computational Linguistics

[9]

John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.

[10]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Semantically Equivalent Adversarial Rules for Debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 856–865. Association for Computational Linguistics.

[11]

Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M. Rush. 2018b. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. IEEE Transactions on Visualization and Computer Graphics, 24(1):667–676

[12]

Mikel Artetxe, Gorka Labaka, Inigo Lopez-Gazpio, and Eneko Agirre. 2018. Uncovering Divergent Linguistic Information in Word Embeddings with Lessons for Intrinsic and Extrinsic visua. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 282–291. Association for Computational Linguistics.

[13]

Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, and James Glass. 2019b, January. NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI): Demonstrations Track.

[14]

Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, and Graham Neubig. 2018. Stress Test Evaluation for Natural Language Inference. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2340–2353. Association for Computational Linguistics

[15]

Benjamin Heinzerling. NLP's clever hans moment has arrived - The Gradient, 2019. https://thegradient.pub/nlps-clever-hans-moment-has-arrived.

Cited By

Sindiramutty STan CLau SThangaveloo RGharib AManchuri AKhan NTee WMuniandy L(2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
https://doi.org/10.4018/978-1-6684-6361-1.ch002
Li WZhao JQiu ZGao WPeng HZhang Q(2024)The foresight methodology for transitional shale gas reservoirs prediction based on a knowledge graphGeomechanics and Geophysics for Geo-Energy and Geo-Resources10.1007/s40948-024-00888-110:1Online publication date: 14-Oct-2024
https://doi.org/10.1007/s40948-024-00888-1
Mello CCheema GThakkar G(2022)Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 OlympicsInternational Journal of Digital Humanities10.1007/s42803-022-00052-95:2-3(131-157)Online publication date: 16-Nov-2022
https://doi.org/10.1007/s42803-022-00052-9

Recommendations

IR meets NLP: On the Semantic Similarity between Subject-Verb-Object Phrases
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Measuring the semantic similarity between phrases and sentences is an important task in natural language processing (NLP) and information retrieval (IR). We compare the quality of the distributional semantic NLP models against phrase-based semantic IR. ...
Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora
Abstract
Nowadays, natural language processing (NLP) is increasingly relaying on pre-trained word embeddings for use in various tasks. However, there is little research devoted to Latvian – a language that is much more morphologically complex than English. ...
Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]

Natural language processing (NLP) is a theory-motivated range of computational techniques for the automatic analysis and representation of human language. NLP research has evolved from the era of punch cards and batch processing (in which the analysis ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)

January 2021

453 pages

ISBN:9781450388177

DOI:10.1145/3430984

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

CODS COMAD 2021

CODS COMAD 2021: 8th ACM IKDD CODS and 26th COMAD

January 2 - 4, 2021

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
187
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sindiramutty STan CLau SThangaveloo RGharib AManchuri AKhan NTee WMuniandy L(2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
https://doi.org/10.4018/978-1-6684-6361-1.ch002
Li WZhao JQiu ZGao WPeng HZhang Q(2024)The foresight methodology for transitional shale gas reservoirs prediction based on a knowledge graphGeomechanics and Geophysics for Geo-Energy and Geo-Resources10.1007/s40948-024-00888-110:1Online publication date: 14-Oct-2024
https://doi.org/10.1007/s40948-024-00888-1
Mello CCheema GThakkar G(2022)Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 OlympicsInternational Journal of Digital Humanities10.1007/s42803-022-00052-95:2-3(131-157)Online publication date: 16-Nov-2022
https://doi.org/10.1007/s42803-022-00052-9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents