Abstract
Irony is a complex linguistic phenomenon that has been extensively studied in computational linguistics across many languages. Existing research has relied heavily on annotated corpora, which are inherently biased due to their creation process. This study focuses on the problem of bias in cross-domain and cross-language irony detection and aims to identify the extent of topic bias in benchmark corpora and how it affects the generalization of models across domains and languages (English, Spanish, and Italian). Our findings offer a first insight into this issue and showed that mitigating the topic bias in these corpora improves the generalization of models beyond their training data. These results have important implications for the development of robust models in the analysis of ironic language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
References
Baeza-Yates, R.: Bias on the web. Commun. ACM 61(6), 54–61 (2018)
Barbieri, F., Basile, V., Croce, D., Nissim, M., Novielli, N., Patti, V.: Overview of the Evalita 2016 SENTIment POLarity classification task. In: Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016) (2016)
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)
Basile, V., Lai, M., Sanguinetti, M.: Long-term social media data collection at the university of Turin. In: Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), pp. 1–6. CEUR-WS (2018)
Benamara, F., Grouin, C., Karoui, J., Moriceau, V., Robba, I.: Analyse d’Opinion et langage figuratif dans des tweets: présentation et Résultats du défi fouille de textes DEFT2017. In: Actes de l’atelier DEFT2017 Associé à la Conférence TALN. Orléans, France (2017)
Beukeboom, C.J., Burgers, C.: Seeing bias in irony: how recipients infer speakers’ stereotypes from their ironic remarks about social-category members. Group Process. Intergroup Relat. 23(7), 1085–1102 (2020)
Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)
Cardellino, C.: Spanish billion words corpus and embeddings (2016). https://crscardellino.me/SBWCE/, https://crscardellino.me/SBWCE/. Accessed 14 Mar 2023
Carvalho, P., Sarmento, L., Silva, M.J., De Oliveira, E.: Clues for detecting irony in user-generated contents: oh...!! it’s “so easy”;-. In: Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, pp. 53–56 (2009)
Chakhachiro, R.: Translating irony in political commentary texts from English into Arabic. Babel 53(3), 216 (2007)
Cignarella, A.C., et al.: Overview of the Evalita 2018 task on irony detection in Italian tweets (IronITA). In: Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18). CEUR.org, Turin (2018)
Cignarella, A.T., Basile, V., Sanguinetti, M., Bosco, C., Rosso, P., Benamara, F.: Multilingual irony detection with dependency syntax and neural models. In: 28th International Conference on Computational Linguistics, pp. 1346–1358. Association for Computational Linguistics (ACL) (2020)
Cignarella, A.T., Bosco, C., Patti, V., Lai, M.: Twittirò: an Italian twitter corpus with a multi-layered annotation for irony. IJCoL. Ital. J. Comput. Linguist. 4(4–2), 25–43 (2018)
Clark, H.H., Gerrig, R.J.: On the pretense theory of irony. J. Exp. Psychol. Gener. 113(1), 121–126 (1984)
Clear, J.H.: The British national corpus. In: The Digital World, pp. 163–187 (1993)
Colston, H.L.: Irony as indirectness cross-linguistically: on the scope of generic mechanisms. In: Capone, A., García-Carpintero, M., Falzone, A. (eds.) Indirect Reports and Pragmatics in the World Languages. PPPP, vol. 19, pp. 109–131. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-78771-8_6
del Pilar Salas-Zárate, M., Alor-Hernández, G., Sánchez-Cervantes, J.L., Paredes-Valverde, M.A., García-Alcaraz, J.L., Valencia-García, R.: Review of English literature on figurative language applied to social networks. Knowl. Inf. Syst. 62(6), 2105–2137 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, C.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the NAACL: HLT, Volume 1 (Long and Short Papers), pp. 4171–4186. ACL, Minneapolis (2019)
Famiglini, L., Fersini, E., Rosso, P.: On the generalization of figurative language detection: the case of irony and sarcasm. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds.) NLDB 2021. LNCS, vol. 12801, pp. 178–186. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80599-9_16
Farías, D.I.H., Patti, V., Rosso, P.: Irony detection in Twitter: the role of Affective content. ACM Trans. Internet Technol. 16(3), 1–24 (2016)
Frenda, S., Cignarella, A.T., Basile, V., Bosco, C., Patti, V., Rosso, P.: The unbearable hurtfulness of sarcasm. Expert Syst. Appl. 193, 116398 (2022)
Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., Ureña-López, L.A.: A survey on bias in deep NLP. Appl. Sci. (Switz.) 11(7), 3184 (2021)
Ghanem, B., Karoui, J., Benamara, F., Moriceau, V., Rosso, P.: IDAT at FIRE2019: overview of the track on irony detection in Arabic tweets. In: Proceedings of the 11th Forum for Information Retrieval Evaluation, pp. 10–13 (2019)
Ghanem, Bilal, Karoui, Jihen, Benamara, Farah, Rosso, Paolo, Moriceau, Véronique.: Irony detection in a multilingual context. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 141–149. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_18
Giora, R.: On irony and negation. Discour. Process. 19(2), 239–264 (1995)
González, J.Á., Hurtado, L.F., Pla, F.: Transformer based contextualization of pre-trained word embeddings for irony detection in Twitter. Inf. Process. Manage. 57, 1–15 (2020)
Grice, H.P.: Logic and conversation. In: Cole, P., Morgan., J. (eds.) Syntax and Semantics 3: Speech Acts, pp. 41–58. Academic Press, New York (1975)
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the ACL, pp. 328–339. Association for Computational Linguistics (2018)
Karoui, J., Benamara, F., Moriceau, V.: Automatic Detection of Irony, 1st edn. Wiley, Hoboken (2019)
Maynard, D., Greenwood, M.A.: Who cares about sarcastic tweets ? Investigating the impact of sarcasm on sentiment Analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (2014)
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 1–35 (2021). https://doi.org/10.1145/3457607
Mohammad, S.M., Zhu, X., Kiritchenko, S., Martin, J.: Sentiment, emotion, purpose, and style in electoral tweets. Inf. Process. Manage. 51(4), 480–499 (2015)
Nguyen, D.Q., Vu, T., Tuan Nguyen, A.: BERTweet: a pre-trained language model for English tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 9–14. Association for Computational Linguistics (2020)
Ortega-Bueno, R., Chulvi, B., Rangel, F., Paolo, R., Fersini, E.: Profiling irony and stereotype spreaders on twitter (IROSTEREO) at PAN 2022. CEUR-WS. org (2022)
Ortega-Bueno, R., Rangel, F., Hernández Farıas, D., Rosso, P., Montes-y Gómez, M., Medina Pagola, J.E.: Overview of the task on irony detection in Spanish variants. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), vol. 2421, pp. 229–256. CEUR-WS. org (2019)
Poletto, F., Basile, V., Sanguinetti, M., Bosco, C., Patti, V.: Resources and benchmark corpora for hate speech detection: a systematic review. Lang. Resour. Eval. 55, 1–47 (2020)
Polignano, M., Basile, V., Basile, P., de Gemmis, M., Semeraro, G.: ALBERTo: modeling Italian social media language with BERT. IJCoL. Ital. J. Comput. Linguist. 5(5–2), 11–31 (2019)
Ptáček, T., Habernal, I., Hong, J.: Sarcasm detection on Czech and English Twitter. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, pp. 213–223. Dublin City University and Association for Computational Linguistics, Dublin (2014)
Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 704–714 (2013)
Sánchez-Junquera, J., Rosso, P., Montes, M., Chulvi, B., et al.: Masking and BERT-based models for stereotype identication. Procesamiento Lenguaje Nat. 67, 83–94 (2021)
Sánchez-Junquera, J., Rosso, P., Montes, M., Ponzetto, S.P.: Masking and transformer-based models for hyperpartisanship detection in news. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1244–1251 (2021)
Sánchez-Junquera, J., Villaseñor-Pineda, L., Montes-y Gómez, M., Rosso, P., Stamatatos, E.: Masking domain-specific information for cross-domain deception detection. Pattern Recogn. Lett. 135, 122–130 (2020)
Sanguinetti, M., Poletto, F., Bosco, C., Patti, V., Stranisci, M.: An Italian twitter corpus of hate speech against immigrants. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Sap, M.: Positive AI with social commonsense models. Ph.D. thesis (2021)
Sperber, D., Wilson, D.: Irony and the use-mention distinction. In: Cole, P. (ed.) Radical Pragmatics, pp. 295–318. Academic Press, New York (1981)
Van Hee, C., Lefever, E., Hoste, V.: SemEval-2018 task 3: irony detection in English tweets. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 39–50. Association for Computational Linguistics, New Orleans (2018)
Veale, T.: Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 278–287. Association for Computational Linguistics, Portland (2011)
Wiegand, M., Ruppenhofer, J., Kleinbauer, T.: Detection of abusive language: the problem of biased datasets. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 602–608. Association for Computational Linguistics, Minneapolis (2019)
Wilson, D., Sperber, D.: On verbal irony. Lingua 87, 53–76 (1992)
Zhuang, L., Wayne, L., Ya, S., Jun, Z.: A robustly optimized BERT pre-training approach with post-training. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 1218–1227. Chinese Information Processing Society of China (2021)
Acknowledgement
The work of Ortega-Bueno and Rosso was in the framework of the FairTransNLP research project (PID2021-124361OB-C31) funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ortega-Bueno, R., Rosso, P., Fersini, E. (2023). Cross-Domain and Cross-Language Irony Detection: The Impact of Bias on Models’ Generalization. In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds) Natural Language Processing and Information Systems. NLDB 2023. Lecture Notes in Computer Science, vol 13913. Springer, Cham. https://doi.org/10.1007/978-3-031-35320-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-35320-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35319-2
Online ISBN: 978-3-031-35320-8
eBook Packages: Computer ScienceComputer Science (R0)