[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

This paper focuses on text-based affective computing for Jopara, a code-switching language that combines Guarani and Spanish. First, we collected a dataset of tweets primarily written in Guarani and annotated them for three widely used dimensions in sentiment analysis: (a) emotion recognition, (b) humor detection, and (c) offensive language identification. Then, we developed several neural network models, including large language models specifically designed for Guarani, and compared their performance against off-the-shelf multilingual and Spanish pre-trained models for the aforementioned dimensions. Our experiments show that language models incorporating Guarani during pre-training or pre-fine-tuning consistently achieve the best results, despite limited resources (a single 24-GB GPU and only 800K tokens). Notably, even a Guarani BERT model with just two layers of Transformers shows a favorable balance between accuracy and computational power, likely due to the inherent low-resource nature of the task. We present a comprehensive overview of corpus creation and model development for low-resource languages like Guarani, particularly in the context of its code-switching with Spanish, resulting in Jopara. Our findings shed light on the challenges and strategies involved in analyzing affective language in such linguistic contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The tweet IDs of the datasets for text-based affective detection in Guarani/Jopara and Guarani BERT language models can be obtained at https://github.com/mmaguero/guarani-multi-affective-analysis and https://huggingface.co/mmaguero, respectively. For further details, please contact the authors.

Notes

  1. The grn is the ISO 639-3 language code for Guarani. Its ISO 639-1 code is gn.

  2. Some tokens are a mixture of n-grams of Guarani and Spanish characters, e.g., “study” would be “estudio” (spa), “ñemoarandú” (grn), and “studiá” (jopara).

  3. A rule-based MT open-source platform, https://github.com/apertium/apertium-grn.

  4. Some examples of keywords used to download the tweets are “cheraa” (“friend,” “bro”), “ojoaju” (“joins,” “unites”), “rejapo” (“do,” “make”), “ningo” (a link of the emphasizing type, which joins the subject and predicate of a sentence, which serves to remark that is something that the speaker knows to be true), “aguyje” (“thanks,” “thank you”), “haguére” (“because,” “due to”), “irundy” (“four”), “oñembo” (“pose as,” “pretend to be,” “impersonate,” “get hold of sth.”), “pire” (“skin,” “leather,” “bark”), and “epyta” (“stop,” “keep/stand/stay still”).

  5. https://www.nltk.org/_modules/nltk/classify/textcat.html

  6. https://polyglot.readthedocs.io/en/latest/Detection.html

  7. From https://dumps.wikimedia.org/gnwiki/, keeping both main texts and headers.

  8. From https://dumps.wikimedia.org/gnwiktionary/.

  9. https://huggingface.co/mmaguero/beto-gn-base-cased

  10. https://huggingface.co/mmaguero/multilingual-bert-gn-base-cased

  11. A NVIDIA GeForce RTX 3090 with 24GB.

  12. https://huggingface.co/mmaguero/gn-bert-tiny-cased

  13. https://huggingface.co/mmaguero/gn-bert-small-cased

  14. https://huggingface.co/mmaguero/gn-bert-base-cased

  15. https://huggingface.co/mmaguero/gn-bert-large-cased

  16. For the emotion recognition corpus, we calculated the micro-F1 score for the emotion classes (angry, happy, and sad), excluding the other class, according to [62, § 6]. Similarly, for the binary humor detection and offensive language identification corpora, we computed the micro-F1 score on the positive class (fun and off, respectively).

  17. https://docs.wandb.ai/guides/sweeps

References

  1. Mager M, Gutierrez-Vasques X, Sierra G, Meza-Ruiz I. Challenges of language technologies for the indigenous languages of the Americas. In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New Mexico, USA: Association for Computational Linguistics. 2018. p. 55–69. https://aclanthology.org/C18-1006.

  2. Mager M, Oncevay A, Ebrahimi A, Ortega J, Rios A, Fan A, et al. Findings of the AmericasNLP 2021 shared task on open machine translation for indigenous languages of the Americas. In: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas. Online: Association for Computational Linguistics. 2021. p. 202–217. https://aclanthology.org/2021.americasnlp-1.23.

  3. García Trillo MA, Estrella Gutiérrez A, Gelbukh A, Peña Ortega AP, Reyes Pérez A, Maldonado Sifuentes CE, et al. Procesamiento de lenguaje natural para las lenguas indígenas. 1. Universidad Michoacana de San Nicolás de Hidalgo. 2021. https://isbnmexico.indautor.cerlalc.org/catalogo.php?mode=detalle &nt=334970.

  4. Estigarribia B. Guarani-Spanish Jopara mixing in a Paraguayan novel: does it reflect a third language, a language variety, or true codeswitching? J Lang Contact. 2015;8(2):183–222. https://doi.org/10.1163/19552629-00802002.

    Article  Google Scholar 

  5. Chiruzzo L, Góngora S, Alvarez A, Giménez-Lugo G, Agüero-Torales M, Rodríguez Y. Jojajovai: a parallel Guarani-Spanish corpus for MT benchmarking. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association. 2022. p. 2098–2107. https://aclanthology.org/2022.lrec-1.226.

  6. Boidin C. Jopara: una vertiente sol y sombra del mestizaje. In: et Haralambos Symeonidis WD, editor. Tupí y Guaraní. Estructuras, contactos y desarrollos. vol. 11 of Regionalwissenschaften Lateinamerika. Munster, Germany: LIT-Verlag. 2005. p. 303–331. https://halshs.archives-ouvertes.fr/halshs-00257767.

  7. Bittar Prieto J. A variationist perspective on Spanish-origin verbs in Paraguayan Guarani [Master’s Thesis]. The University of New Mexico. New Mexico. 2016. https://digitalrepository.unm.edu/ling_etds/4.

  8. Bittar Prieto J. A constructionist approach to verbal borrowing: the case of Paraguayan Guarani. The University of New Mexico’s Latin American & Iberian Institute 2020 PhD Fellows. https://www.youtube.com/watch?v=C5XiLqR4onA.

  9. Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Association for Computational Linguistics. 2002. p. 79–86. https://aclanthology.org/W02-1011.

  10. Cambria E, Hussain A. Sentic computing. Cogn Comput. 2015;7(2):183–5. https://doi.org/10.1007/s12559-015-9325-0.

    Article  Google Scholar 

  11. Ghosh S, Ekbal A, Bhattacharyya P. A multitask framework to detect depression, sentiment and multi-label emotion from suicide notes. Cogn Comput. 2022;14(1):110–29. https://doi.org/10.1007/s12559-021-09828-7.

    Article  Google Scholar 

  12. Lieberman MD. Affect labeling in the age of social media. Nat Hum Behav. 2019;3(1):20–1. https://doi.org/10.1038/s41562-018-0487-0.

    Article  Google Scholar 

  13. Adwan OY, Al-Tawil M, Huneiti A, Shahin R, Abu Zayed A, Al-Dibsi R. Twitter sentiment analysis approaches: a survey. Int J Emerg Technol Learn (iJET). 2020Aug;15(15):79–93. https://doi.org/10.3991/ijet.v15i15.14467.

  14. Jakobsen AL, Mesa-Lao B. Translation in transition: between cognition, computing and technology, vol 133. John Benjamins Publishing Company. 2017. https://www.jbe-platform.com/content/books/9789027265371.

  15. Jain DK, Boyapati P, Venkatesh J, Prakash M. An intelligent cognitive-inspired computing with big data analytics framework for sentiment analysis and classification. Information Processing & Management. 2022;59(1): 102758. https://doi.org/10.1016/j.ipm.2021.102758.

    Article  Google Scholar 

  16. Green D. Language control in different contexts: the behavioral ecology of bilingual speakers. Front Psychol. 2011;2. https://doi.org/10.3389/fpsyg.2011.00103.

  17. Agüero-Torales MM. Machine learning approaches for topic and sentiment analysis in multilingual opinions and low-resource languages: from English to Guarani [Ph.D. thesis]. University of Granada. Granada. 2022. http://hdl.handle.net/10481/72863.

  18. Hedderich MA, Lange L, Adel H, Strötgen J, Klakow D. A survey on recent approaches for natural language processing in low-resource scenarios. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics. 2021. p. 2545–2568. https://aclanthology.org/2021.naacl-main.201.

  19. Pajupuu H, Altrov R, Pajupuu J. Identifying polarity in different text types. Folklore (14060957). 2016;64. https://doi.org/10.7592/FEJF2016.64.polarity.

  20. Afli H, McGuire S, Way A. Sentiment translation for low resourced languages: experiments on Irish general election tweets. In: 18th International Conference on Computational Linguistics and Intelligent Text Processing. 2017. p. 1–10. https://doras.dcu.ie/23370/.

  21. Batra R, Kastrati Z, Imran AS, Daudpota SM, Ghafoor A. A large-scale tweet dataset for Urdu text sentiment analysis. https://www.preprints.org/manuscript/202103.0572/v1.

  22. Kralj Novak P, Smailović J, Sluban B, Mozetič I. Sentiment of emojis. PLoS ONE. 2015;10(12):1–22. https://doi.org/10.1371/journal.pone.0144296.

    Article  Google Scholar 

  23. Khan MY, Nizami MS. Urdu Sentiment Corpus (v1.0): linguistic exploration and visualization of labeled dataset for Urdu sentiment analysis. In: 2020 International Conference on Information Science and Communication Technology (ICISCT). IEEE; 2020. p. 1–15.

  24. Muhammad SH, Adelani DI, Ruder S, Ahmad IS, Abdulmumin I, Bello BS, et al. NaijaSenti: a Nigerian Twitter sentiment corpus for multilingual sentiment analysis. Marseille, France: European Language Resources Association. https://aclanthology.org/2022.lrec-1.63.

  25. Ogueji K, Zhu Y, Lin J. Small data? No problem! Exploring the viability of pretrained multilingual language models for low-resourced languages. In: Proceedings of the 1st Workshop on Multilingual Representation Learning. Punta Cana, Dominican Republic: Association for Computational Linguistics. 2021. p. 116–126. https://aclanthology.org/2021.mrl-1.11.

  26. Devi MD, Saharia N. Exploiting topic modelling to classify sentiment from lyrics. In: Bhattacharjee A, Borgohain SK, Soni B, Verma G, Gao XZ, editors. Machine learning, image processing, network security and data sciences. Singapore: Springer Singapore; 2020. p. 411–23.

    Chapter  Google Scholar 

  27. Chen Y, Skiena S. Building sentiment Lexicons for all major languages. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Baltimore, Maryland: Association for Computational Linguistics. 2014. p. 383–389. https://aclanthology.org/P14-2063.

  28. Asgari E, Braune F, Roth B, Ringlstetter C, Mofrad M. UniSent: universal adaptable sentiment lexica for 1000+ languages. In: Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association. 2020. p. 4113–4120. https://aclanthology.org/2020.lrec-1.506.

  29. Duran M. Transformations and paraphrases for Quechua sentiment predicates. In: Bekavac B, Kocijan K, Silberztein M, Šojat K, editors. Formalising natural languages: applications to natural language processing and digital humanities. Cham: Springer International Publishing; 2021. p. 61–73.

    Chapter  Google Scholar 

  30. Ríos AA, Amarilla PJ, Lugo GAG. Sentiment categorization on a creole language with Lexicon-based and machine learning techniques. In: 2014 Brazilian Conference on Intelligent Systems. IEEE; 2014. p. 37–43.

  31. Borges Y, Mercant F, Chiruzzo L. Using Guarani verbal morphology on Guarani-Spanish machine translation experiments. Procesamiento del Lenguaje Natural. 2021;66:89–98.

    Google Scholar 

  32. Giossa N, Góngora S. Construcción de recursos para traducción automática guaraní-español [Bachelor’s Thesis]. Universidad de la República (Uruguay). Facultad de Ingeniería. 2021. (Bachelor’s Thesis). https://hdl.handle.net/20.500.12008/30019.

  33. Kann K, Ebrahimi A, Mager M, Oncevay A, Ortega JE, Rios A, et al. AmericasNLI: machine translation and natural language inference systems for Indigenous languages of the Americas. Front Artif Intell Appl. 2022;5. https://doi.org/10.3389/frai.2022.995667.

  34. Kuznetsova A, Tyers F. A finite-state morphological analyser for Paraguayan Guaraní. In: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas. Online: Association for Computational Linguistics. 2021. p. 81–89. https://aclanthology.org/2021.americasnlp-1.9.

  35. Cordova J, Boidin C, Itier C, Moreaux MA, Nouvel D. Processing Quechua and Guarani historical texts query expansion at character and word level for information retrieval. In: Lossio-Ventura JA, Muñante D, Alatrista-Salas H, editors. Information management and big data. Cham: Springer International Publishing. 2019. p. 198–211. https://doi.org/10.1007/978-3-030-11680-4_20.

  36. Chiruzzo L, Agüero-Torales MM, Alvarez A, Rodríguez Y. Initial experiments for building a Guarani WordNet. In: Proceedings of the 12th International Global Wordnet Conference. Donostia/San Sebastian, Basque Country, Spain. 2023. https://www.hitz.eus/gwc2023/sites/default/files/aurkezpenak/GWC2023_paper_9051.pdf.

  37. Mazumder M, Chitlangia S, Banbury C, Kang Y, Ciro JM, Achorn K, et al. Multilingual spoken words corpus. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2021. https://openreview.net/forum?id=c20jiJ5K2H.

  38. Babu A, Wang C, Tjandra A, Lakhotia K, Xu Q, Goyal N, et al. XLS-R: self-supervised cross-lingual speech representation learning at scale. In: Proceedings of the 23rd InterSpeech Conference. 2022. p. 2278–2282. https://www.isca-speech.org/archive/pdfs/interspeech_2022/babu22_interspeech.pdf.

  39. Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems, vol. 33. Curran Associates, Inc. 2020. p. 12449–12460. https://proceedings.neurips.cc/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf.

  40. Xu Q, Baevski A, Likhomanenko T, Tomasello P, Conneau A, Collobert R, et al. Self-training and pre-training are complementary for speech recognition. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2021. p. 3030–3034.

  41. NLLB Team, Costa-jussà MR, Cross J, Çelebi O, Elbayad M, Heafield K, et al. No language left behind: scaling human-centered machine translation. https://arxiv.org/abs/2207.04672.

  42. Yong ZX, Schoelkopf H, Muennighoff N, Aji AF, Adelani DI, Almubarak K, et al. BLOOM+1: adding language support to BLOOM for zero-shot prompting. https://arxiv.org/abs/2212.09535.

  43. Agüero-Torales MM, Vilares D, López-Herrera A. On the logistical difficulties and findings of Jopara Sentiment Analysis. In: Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching. Online: Association for Computational Linguistics. 2021. p. 95–102. https://aclanthology.org/2021.calcs-1.12.

  44. Strapparava C, Mihalcea R. Affect detection in texts. In: The Oxford Handbook of Affective Computing. Oxford Library of Psychology. 2015.

  45. Ekman P. An argument for basic emotions. Cognit Emot. 1992;6(3–4):169–200. https://doi.org/10.1080/02699939208411068.

    Article  Google Scholar 

  46. Plutchik R. The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci. 2001;89(4):344–50.

    Article  Google Scholar 

  47. Mihalcea R, Strapparava C. Learning to laugh (automatically): computational models for humor recognition. Comput Intell. 2006;22(2):126–42. https://doi.org/10.1111/j.1467-8640.2006.00278.x.

    Article  MathSciNet  Google Scholar 

  48. Zampieri M, Nakov P, Rosenthal S, Atanasova P, Karadzhov G, Mubarak H, et al. SemEval-2020 Task 12: multilingual offensive language identification in social media (OffensEval 2020). In: Proceedings of the Fourteenth Workshop on Semantic Evaluation. Barcelona (online): International Committee for Computational Linguistics. 2020. p. 1425–1447. https://aclanthology.org/2020.semeval-1.188.

  49. Ranasinghe T, Zampieri M. Multilingual offensive language identification with cross-lingual embeddings. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics. 2020. p. 5838–5844. https://aclanthology.org/2020.emnlp-main.470.

  50. Wang M, Yang H, Qin Y, Sun S, Deng Y. Unified humor detection based on sentence-pair augmentation and transfer learning. In: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. Lisboa, Portugal: European Association for Machine Translation. 2020. p. 53–59. https://aclanthology.org/2020.eamt-1.7.

  51. Lamprinidis S, Bianchi F, Hardt D, Hovy D. Universal joy a data set and results for classifying emotions across languages. In: Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Online: Association for Computational Linguistics. 2021. p. 62–75. https://aclanthology.org/2021.wassa-1.7.

  52. Pfeiffer J, Vulić I, Gurevych I, Ruder S. MAD-X: an adapter-based framework for multi-task cross-lingual transfer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics. 2020. p. 7654–7673. https://aclanthology.org/2020.emnlp-main.617.

  53. Estigarribia B. A grammar of Paraguayan Guarani. London: UCL Press; 2020. https://library.oapen.org/handle/20.500.12657/51773.

  54. Abdellaoui H, Zrigui M. Using tweets and emojis to build TEAD: an Arabic dataset for sentiment analysis. Computación y Sistemas. 2018;22:777–786. https://doi.org/10.13053/cys-22-3-3031.

  55. Yue L, Chen W, Li X, Zuo W, Yin M. A survey of sentiment analysis in social media. Knowl Inf Syst. 2019;60(2):617–63. https://doi.org/10.1007/s10115-018-1236-4.

    Article  Google Scholar 

  56. Tejwani R.: Two-dimensional sentiment analysis of text. https://arxiv.org/abs/1406.2022.

  57. Yen MF, Huang YP, Yu LC, Chen YL. A two-dimensional sentiment analysis of online public opinion and future financial performance of publicly listed companies. Computational Economics. 2021. p. 1–22. https://doi.org/10.1007/s10614-021-10111-y.

  58. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics. 2017. p. 427–431. https://aclanthology.org/E17-2068.

  59. Mamta, Ekbal A, Bhattacharyya P. Exploring multi-lingual, multi-task, and adversarial learning for low-resource sentiment analysis. ACM Trans Asian Low-Resour Lang Inf Process. 2022;21(5). https://doi.org/10.1145/3514498.

  60. Adelani DI, Abbott J, Neubig G, D’souza D, Kreutzer J, Lignos C, et al. MasakhaNER: named entity recognition for African languages. Transactions of the Association for Computational Linguistics. 2021;9:1116–31.

    Article  Google Scholar 

  61. de Marneffe MC, Manning CD, Nivre J, Zeman D. Universal dependencies. Comput Linguist. 2021;47(2):255–308. https://doi.org/10.1162/coli_a_00402. https://direct.mit.edu/coli/articlepdf/47/2/255/1938138/coli_a_00402.pdf

  62. Chatterjee A, Narahari KN, Joshi M, Agrawal P. SemEval-2019 Task 3: EmoContext contextual emotion detection in text. In: Proceedings of the 13th International Workshop on Semantic Evaluation. Minneapolis, Minnesota, USA: Association for Computational Linguistics. 2019. p. 39–48. https://aclanthology.org/S19-2005.

  63. Artstein R, Poesio M. Inter-coder agreement for computational linguistics. Comput Linguist. 2008;34(4):555–96. https://doi.org/10.1162/coli.07-034-R2.

    Article  Google Scholar 

  64. Chiruzzo L, Castro S, Rosá A. HAHA 2019 dataset: a corpus for humor analysis in Spanish. In: Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association. 2020. p. 5106–5112. https://aclanthology.org/2020.lrec-1.628.

  65. Hossain N, Krumm J, Gamon M, Kautz H. SemEval-2020 Task 7: assessing humor in edited news headlines. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation. Barcelona (online): International Committee for Computational Linguistics. 2020. p. 746–758. https://aclanthology.org/2020.semeval-1.98.

  66. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.

    Article  Google Scholar 

  67. LeCun Y, Bengio Y, et al. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks. 1995;3361(10):1995.

    Google Scholar 

  68. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics. 2019. p. 4171–4186. https://aclanthology.org/N19-1423.

  69. K K, Wang Z, Mayhew S, Roth D. Cross-lingual ability of multilingual BERT: an empirical study. In: International Conference on Learning Representations. 2020. https://openreview.net/forum?id=HJeT3yrtDr.

  70. Cañete J, Chaperon G, Fuentes R, Ho JH, Kang H, Pérez J. Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020. 2020. https://pml4dc.github.io/iclr2020/program/pml4dc_10.html.

  71. Yang J, Zhang Y. NCRF++: an open-source neural sequence labeling toolkit. In: Proceedings of ACL 2018, System Demonstrations. Melbourne, Australia: Association for Computational Linguistics. 2018. p. 74–79. https://aclanthology.org/P18-4013.

  72. Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014. p. 655–665.

  73. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, et al. Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR. 2016;abs/1609.08144.

  74. Pires T, Schlinger E, Garrette D. How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics. 2019. p. 4996–5001. https://aclanthology.org/P19-1493.

  75. Wu S, Dredze M. Beto, Bentz, Becas: the surprising cross-lingual effectiveness of BERT. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics. 2019. p. 833–844. https://aclanthology.org/D19-1077.

  76. Conneau A, Wu S, Li H, Zettlemoyer L, Stoyanov V. Emerging cross-lingual structure in pretrained language models. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics. 2020. p. 6022–6034. https://aclanthology.org/2020.acl-main.536.

  77. Lauscher A, Ravishankar V, Vulić I, Glavaš G. From zero to hero: on the limitations of zero-shot language transfer with multilingual transformers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics. 2020. p. 4483–4499. https://aclanthology.org/2020.emnlp-main.363.

  78. Winata GI, Madotto A, Lin Z, Liu R, Yosinski J, Fung P. Language models are few-shot multilingual learners. In: Proceedings of the 1st Workshop on Multilingual Representation Learning. Punta Cana, Dominican Republic: Association for Computational Linguistics. 2021. p. 1–15. https://aclanthology.org/2021.mrl-1.1.

  79. Vilares D, Garcia M, Gómez-Rodríguez C. Bertinho: Galician BERT representations. Procesamiento del Lenguaje Natural. 2021;66:13–26.

    Google Scholar 

  80. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al.: RoBERTa: a robustly optimized BERT pretraining approach. https://openreview.net/forum?id=SyxS0T4tvS.

  81. Attardi G.: WikiExtractor. GitHub. https://github.com/attardi/wikiextractor.

  82. Agerri R, San Vicente I, Campos JA, Barrena A, Saralegi X, Soroa A, et al. Give your text representation models some love: the case for Basque. In: Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association. 2020. https://aclanthology.org/2020.lrec-1.588.

  83. Naseem U, Razzak I, Khan SK, Prasad M. A comprehensive survey on word representation models: from classical to state-of-the-art word representation language models. ACM Trans Asian Low-Resour Lang Inf Process. 2021;20(5). https://doi.org/10.1145/3434237.

  84. Zhou K, Yang J, Loy CC, Liu Z. Learning to prompt for vision-language models. Int J Comput Vision. 2022 sep;130(9):2337-2348. https://doi.org/10.1007/s11263-022-01653-1.

  85. Kuratov Y, Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language. In: Proceedings of the International Conference “Dialogue 2019”. Moscow, Russia: Computational Linguistics and Intellectual Technologies. 2019. p. 333–339. https://www.dialog-21.ru/media/4606/kuratovyplusarkhipovm-025.pdf.

  86. Souza F, Nogueira R, Lotufo R. BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Cerri R, Prati RC, editors. Intelligent Systems. Cham: Springer International Publishing; 2020. p. 403–17.

    Chapter  Google Scholar 

  87. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics; 2020. p. 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6.

  88. Kann K, Cho K, Bowman SR. Towards realistic practices in low-resource natural language processing: the development set. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics. 2019. p. 3342–3349. https://aclanthology.org/D19-1329.

  89. Plaza-Del-Arco FM, Molina-González MD, Ureña-López LA, Martín-Valdivia MT. A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access. 2021;9:112478–89. https://doi.org/10.1109/ACCESS.2021.3103697.

    Article  Google Scholar 

  90. Schulz C, Eger S, Daxenberger J, Kahse T, Gurevych I. Multi-Task learning for argumentation mining in low-resource settings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). New Orleans, Louisiana: Association for Computational Linguistics. 2018. p. 35–41. https://aclanthology.org/N18-2006.

  91. Hu Y, Huang H, Lan T, Wei X, Nie Y, Qi J, et al. Multi-task learning for low-resource second language acquisition modeling. In: Wang X, Zhang R, Lee YK, Sun L, Moon YS, editors., et al., Web and Big Data. Cham: Springer International Publishing. 2020. p. 603–11.

  92. Magooda A, Litman D, Elaraby M. Exploring multitask learning for low-resource abstractive summarization. In: Findings of the association for computational linguistics: EMNLP 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics. 2021. p. 1652–1661. https://aclanthology.org/2021.findings-emnlp.142.

  93. Biewald L.: Experiment tracking with weights and biases. Software available from https://www.wandb.com/.

Download references

Acknowledgements

We thank the annotators for their work. We are also grateful to ExplosionAI for giving us access to their Prodigy annotation tool (https://prodi.gy/) under the research license. We also thank (i) the Visual Information Processing Group of the University of Granada (especially Javier Mateos Delgado) and (ii) the Generalitat Valenciana and the University of Alicante through the DGX computing platform for giving us access to the GPU hardware necessary to carry out the training of the language models. Finally, we thank Olga Zamaraeva and Ana Vilar for their valuable assistance with English writing.

Funding

This work is supported by a 2020 Leonardo Grant for Researchers and Cultural Creators from the FBBVA. This paper has also received funding from grant SCANNER-UDC (PID2020-113230RB-C21) funded by MCIN/AEI/10.13039/501100011033, the European Research Council (ERC), which has supported this research under the European Union’s Horizon Europe research and innovation programme (SALSA, grant agreement no. 101100615), Xunta de Galicia (ED431C 2020/11), and Centro de Investigación de Galicia “CITIC,” funded by Xunta de Galicia and the European Union (ERDF — Galicia 2014–2020 Program), by grant ED431G 2019/01. Additionally, the research leading to these results received funding from the University of Granada, Generalitat Valenciana, and the University of Alicante (IDIFEDER/2020/003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marvin M. Agüero-Torales.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.

Disclaimer

FBBVA accepts no responsibility for the opinions, statements, and contents included in the project and/or the results thereof, which are entirely the responsibility of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Hyperparameters Search and Implementation Details

Appendix. Hyperparameters Search and Implementation Details

Table 7 shows the hyperparameters used to train all proposed models, using (i) the NCRF++ [71] for training our sequential classifiers without pre-training and (ii) for the sequential classifiers that use pre-trained language models, the Hugging Face package [87].

Table 7 Hyperparameters for model training (adapted from [17, p. 228, Table C.2])

For models with Transformers, hyperparameter selection was performed with a Bayesian hyperparameter search method using the platform W&BFootnote 17 [93] on the dev set. It chooses the parameters to optimize the probability of improvement based on the relation between these parameters and the model metric (macro-accuracy in our case). On the other hand, a batch size of 10 was used to train the CNN and biLSTM models, and the hyperparameter selection was performed with a random hyperparameter search method. In addition, we train these models for a maximum of 50 epochs. Early stopping criteria (set to 3) were used to train the Transformer-based models. Finally, an NVIDIA Tesla T4 (16GB GPU) is used to train all models.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agüero-Torales, M.M., López-Herrera, A.G. & Vilares, D. Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language. Cogn Comput 15, 1391–1406 (2023). https://doi.org/10.1007/s12559-023-10165-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-023-10165-0

Keywords

Mathematics Subject Classification

Navigation