[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3430984.3431969acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
tutorial

Opening the NLP Blackbox - Analysis and Evaluation of NLP Models: Methods, Challenges and Opportunities

Published: 02 January 2021 Publication History

Abstract

Although Rapid progress in NLP Research has seen a swift translation to real world commercial deployment. While a number of NLP applications have emerged, failures of translating scientific progress in NLP to real-world software have also been considerable. Evaluation of NLP models is often limited to held out test set accuracy on a handful of datasets. Lack of rigorous evaluation leads to over-estimation of generalization performance of the built model. A lack of understanding of the inner workings of the model results in ‘Clever Hans’ models which fail in real world deployments. Of late there has been considerable research interest into analysis methods for NLP models, and evaluation techniques going beyond test set performance metrics. However, this area of work is still not widely disseminated through the NLP community. This tutorial aims to address this gap, by providing a detailed overview of NLP model analysis and evaluation methods, discuss their strengths and weaknesses and also point towards future research directions in this area.

References

[1]
Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72
[2]
Dieuwke Hupkes, Sara Veldhoen, and Willem Zuidema. 2018. Visualisation and ’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research, 61:907–926
[3]
Sara Veldhoen, Dieuwke Hupkes, and Willem Zuidema. 2016. Diagnostic Classifiers: Revealing How Neural Networks Process Hierarchical Structure. In CEUR Workshop Proceedings
[4]
Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017a. Analysis of sentence embedding models using prediction tasks in natural language processing. IBM Journal of Research and Development, 61(4):3–9
[5]
Alexis Conneau and Douwe Kiela. 2018. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan. European Languages Resources Association
[6]
Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-Grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks. In International Conference on Learning Representations 2017.
[7]
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896. Association for Computational Linguistics.
[8]
Tom McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy. Association for Computational Linguistics
[9]
John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.
[10]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Semantically Equivalent Adversarial Rules for Debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 856–865. Association for Computational Linguistics.
[11]
Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M. Rush. 2018b. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. IEEE Transactions on Visualization and Computer Graphics, 24(1):667–676
[12]
Mikel Artetxe, Gorka Labaka, Inigo Lopez-Gazpio, and Eneko Agirre. 2018. Uncovering Divergent Linguistic Information in Word Embeddings with Lessons for Intrinsic and Extrinsic visua. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 282–291. Association for Computational Linguistics.
[13]
Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, and James Glass. 2019b, January. NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI): Demonstrations Track.
[14]
Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, and Graham Neubig. 2018. Stress Test Evaluation for Natural Language Inference. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2340–2353. Association for Computational Linguistics
[15]
Benjamin Heinzerling. NLP's clever hans moment has arrived - The Gradient, 2019. https://thegradient.pub/nlps-clever-hans-moment-has-arrived.

Cited By

View all
  • (2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
  • (2024)The foresight methodology for transitional shale gas reservoirs prediction based on a knowledge graphGeomechanics and Geophysics for Geo-Energy and Geo-Resources10.1007/s40948-024-00888-110:1Online publication date: 14-Oct-2024
  • (2022)Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 OlympicsInternational Journal of Digital Humanities10.1007/s42803-022-00052-95:2-3(131-157)Online publication date: 16-Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)
January 2021
453 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 January 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Assessment
  2. Diagnostic Classifiers
  3. Evaluation
  4. Model Analysis
  5. Visualization

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CODS COMAD 2021
CODS COMAD 2021: 8th ACM IKDD CODS and 26th COMAD
January 2 - 4, 2021
Bangalore, India

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
  • (2024)The foresight methodology for transitional shale gas reservoirs prediction based on a knowledge graphGeomechanics and Geophysics for Geo-Energy and Geo-Resources10.1007/s40948-024-00888-110:1Online publication date: 14-Oct-2024
  • (2022)Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 OlympicsInternational Journal of Digital Humanities10.1007/s42803-022-00052-95:2-3(131-157)Online publication date: 16-Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media