Computer Science > Computation and Language

arXiv:2302.09303 (cs)

[Submitted on 21 Jan 2023]

Title:Stress Test for BERT and Deep Models: Predicting Words from Italian Poetry

Authors:Rodolfo Delmonte, Nicolò Busetto

View PDF

Abstract:In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences for a total of 900 tokens mostly extracted from Italian poetry of the first half of last century. Then we alternated canonical and noncanonical versions of the same sentence before processing them with the same DL model. We used then sentences from the newswire domain containing similar syntactic structures. The results show that the DL model is highly sensitive to presence of noncanonical structures. However, DLs are also very sensitive to word frequency and to local non literal meaning compositional effect. This is also apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases. In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary words.

Comments:	23 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2302.09303 [cs.CL]
	(or arXiv:2302.09303v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2302.09303
Journal reference:	International Journal on Natural Language Computing (IJNLC) Vol.11, No.6, December 2022
Related DOI:	https://doi.org/10.5121/ijnlc.2022.11602

Submission history

From: Rodolfo Delmonte [view email]
[v1] Sat, 21 Jan 2023 09:44:19 UTC (710 KB)

Computer Science > Computation and Language

Title:Stress Test for BERT and Deep Models: Predicting Words from Italian Poetry

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Stress Test for BERT and Deep Models: Predicting Words from Italian Poetry

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators