Abstract
We explore solutions for text classification applied to online cooking recipes, in a multitask, multilingual approach. The main objective is designing a solution that ensures high accuracy on the prediction tasks from, but not constrained to, 6 European Languages, considering also the cross-lingual transferability. The challenges of the problem are structured on two main dimensions: (1) data driven - such as imbalance and noise in the training data, and (2) solution driven - such as multilingualism, or the need to easily extend the model to new languages. We propose a solution focused on the XLM-R architecture, fine-tuned jointly on all tasks. We apply self-supervised domain adaptation via additional pre-training and analyze the enhancements produced by performing a 0-shot evaluation for underrepresented languages. Compared to basic language modeling solutions, we obtained an increase of 1.32% and 2.42%, respectively for the two most difficult classification tasks. In the 0-shot context, the absolute improvements are of 16.71% and 7.83% respectively, on underrepresented languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Available at: https://github.com/hhursev/recipe-scrapers.
References
Conneau, A., et al.: Unsupervised Cross-lingual Representation Learning at Scale (2020)
Vaswani, A., et al.: Attention Is All You Need. arXiv: 1706.03762 (2017)
Chen, S., Zhang, Y., Yang, Q.: 2021 Multi-Task Learning in Natural Language Processing: An Overview. arXiv:2109.09138 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019)
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The Long-Document Transformer, arXiv: 2004.05150 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Negru, VA., Lemnaru, C., Potolea, R. (2023). Multitask, Cross-Lingual Recipe Classification Using Joint Fine-Tuning Mechanisms. In: Delir Haghighi, P., et al. Information Integration and Web Intelligence. iiWAS 2023. Lecture Notes in Computer Science, vol 14416. Springer, Cham. https://doi.org/10.1007/978-3-031-48316-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-031-48316-5_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48315-8
Online ISBN: 978-3-031-48316-5
eBook Packages: Computer ScienceComputer Science (R0)