Computer Science > Machine Learning

arXiv:2410.12704 (cs)

[Submitted on 16 Oct 2024]

Title:Sarcasm Detection in a Less-Resourced Language

Authors:Lazar Đoković, Marko Robnik-Šikonja

Abstract:The sarcasm detection task in natural language processing tries to classify whether an utterance is sarcastic or not. It is related to sentiment analysis since it often inverts surface sentiment. Because sarcastic sentences are highly dependent on context, and they are often accompanied by various non-verbal cues, the task is challenging. Most of related work focuses on high-resourced languages like English. To build a sarcasm detection dataset for a less-resourced language, such as Slovenian, we leverage two modern techniques: a machine translation specific medium-size transformer model, and a very large generative language model. We explore the viability of translated datasets and how the size of a pretrained transformer affects its ability to detect sarcasm. We train ensembles of detection models and evaluate models' performance. The results show that larger models generally outperform smaller ones and that ensembling can slightly improve sarcasm detection performance. Our best ensemble approach achieves an $\text{F}_1$-score of 0.765 which is close to annotators' agreement in the source language.

Comments:	4 pages, published in the Slovenian Conference on Artificial Intelligence
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
ACM classes:	I.2.6; I.2.7
Cite as:	arXiv:2410.12704 [cs.LG]
	(or arXiv:2410.12704v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.12704
Journal reference:	Proceedings of the 27th International Multiconference INFORMATION SOCIETY - IS 2024, Volume A, 2024, pages 19-22
Related DOI:	https://doi.org/10.70314/is.2024.scai.4212

Submission history

From: Lazar Đoković [view email]
[v1] Wed, 16 Oct 2024 16:10:59 UTC (286 KB)

Computer Science > Machine Learning

Title:Sarcasm Detection in a Less-Resourced Language

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sarcasm Detection in a Less-Resourced Language

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators