SpanPredict: Extraction of Predictive Document Spans with Neural Attention

Vivek Subramanian, Matthew Engelhard, Sam Berchuck, Liqun Chen, Ricardo Henao, Lawrence Carin

Abstract

In many natural language processing applications, identifying predictive text can be as important as the predictions themselves. When predicting medical diagnoses, for example, identifying predictive content in clinical notes not only enhances interpretability, but also allows unknown, descriptive (i.e., text-based) risk factors to be identified. We here formalize this problem as predictive extraction and address it using a simple mechanism based on linear attention. Our method preserves differentiability, allowing scalable inference via stochastic gradient descent. Further, the model decomposes predictions into a sum of contributions of distinct text spans. Importantly, we require only document labels, not ground-truth spans. Results show that our model identifies semantically-cohesive spans and assigns them scores that agree with human ratings, while preserving classification performance.

Anthology ID:: 2021.naacl-main.413
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Editors:: Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5234–5258
Language:
URL:: https://aclanthology.org/2021.naacl-main.413
DOI:: 10.18653/v1/2021.naacl-main.413
Bibkey:
Cite (ACL):: Vivek Subramanian, Matthew Engelhard, Sam Berchuck, Liqun Chen, Ricardo Henao, and Lawrence Carin. 2021. SpanPredict: Extraction of Predictive Document Spans with Neural Attention. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5234–5258, Online. Association for Computational Linguistics.
Cite (Informal):: SpanPredict: Extraction of Predictive Document Spans with Neural Attention (Subramanian et al., NAACL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.naacl-main.413.pdf
Optional supplementary code:: 2021.naacl-main.413.OptionalSupplementaryCode.zip
Video:: https://aclanthology.org/2021.naacl-main.413.mp4
Data: IMDb Movie Reviews

PDF Cite Search Optional supplementary code Video