Arabic Gloss WSD Using BERT
<p>Histogram of benchmark senses per word.</p> "> Figure 2
<p>Transformer architecture, multi-head attention layer architecture, and scaled dot-product attention unit.</p> "> Figure 3
<p>Bidirectional Encoder Representation from Transformers (BERT) training input and BERT training tasks: the first task is to predict masked words, and the second task makes the next sentence prediction.</p> "> Figure 4
<p>Transfer learning strategies.</p> "> Figure 5
<p>Model I architecture.</p> "> Figure 6
<p>Model II architecture.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Benchmark
4. Background
4.1. Transformers
4.2. Bidirectional Encoder Representation from Transformers
4.3. Arabic Pretrained BERT Models
4.4. Transfer Learning with BERT
5. Materials and Methods
5.1. Model I
5.2. Model II
6. Experiments
6.1. Evaluation Dataset
6.2. Evaluation Metrics
6.3. Training Configuration
7. Results
- AraBERTv2 as a pretrained BERT model without tokenizing the input sentences.
- AraBERTv2 as a pretrained BERT model with tokenizing the input sentences.
- ArBERT as a pretrained BERT model without tokenizing the input sentences.
8. Discussion
9. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Navigli, R. Word Sense Disambiguation: A Survey. ACM Comput. Surv. 2009, 41, 1–69. [Google Scholar] [CrossRef]
- Nancy, I.; Jean, V. Word sense disambiguation: The state of the art. Comput. Linguist. 1998, 24, 1–40. [Google Scholar]
- Debili, F.; Achour, H.; Souissi, E. La langue arabe et l’ordinateur: De l’étiquetage grammatical à la voyellation automatique. Correspondances 2002, 71, 10–28. [Google Scholar]
- Alqahtani, S.; Aldarmaki, H.; Diab, M. Homograph disambiguation through selective diacritic restoration. arXiv 2019, arXiv:1912.04479. [Google Scholar]
- Britton, B.K. Lexical ambiguity of words used in English text. Behav. Res. Methods Inst. 1978, 10, 1–7. [Google Scholar] [CrossRef]
- Farghaly, A.; Shaalan, K. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 2009, 8, 1–22. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, June 2019; pp. 4171–4186. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
- Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based model for Arabic language understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
- Abdul-Mageed, M.; Elmadany, A.; Nagoudi, E.M.B. ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic. arXiv 2020, arXiv:2101.01785. [Google Scholar]
- Jawahar, G.; Sagot, B.; Seddah, D. What does BERT learn about the structure of language? In Proceedings of the ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Minneapolis, MN, USA, June 2019.
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training, Vancouver, Canada. 2018 October. Available online: https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf (accessed on 10 March 2021).
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Elayeb, B. Arabic word sense disambiguation: A review. Artif. Intell. Rev. 2019, 52, 2475–2532. [Google Scholar] [CrossRef]
- Alkhatlan, A.; Kalita, J.; Alhaddad, A. Word sense disambiguation for arabic exploiting arabic wordnet and word embedding. Procedia Comput. Sci. 2018, 142, 50–60. [Google Scholar] [CrossRef]
- Hadni, M.; Ouatik, S.E.A.; Lachkar, A. Word sense disambiguation for Arabic text categorization. Int. Arab J. Inf. Technol. 2016, 13, 215–222. [Google Scholar]
- Alian, M.; Awajan, A.; Al-Kouz, A. Arabic word sense disambiguation using Wikipedia. Int. J. Comput. Inf. Sci. 2016, 12, 61. [Google Scholar] [CrossRef] [Green Version]
- Aizawa, A. An information-theoretic perspective of tf–idf measures. Inf. Process. Manag. 2003, 39, 45–65. [Google Scholar] [CrossRef]
- Zouaghi, A.; Merhbene, L.; Zrigui, M. Word Sense disambiguation for Arabic language using the variants of the Lesk algorithm. In Proceedings of the International Conference on Artificial Intelligence (ICAI), the Steering Committee of the World Congress in Computer Science, Vancouver, BC, Canada, July 2011. [Google Scholar]
- Zouaghi, A.; Merhbene, L.; Zrigui, M. A hybrid approach for arabic word sense disambiguation. Int. J. Comput. Process. Lang. 2012, 24, 133–151. [Google Scholar] [CrossRef]
- Laatar, R.; Aloulou, C.; Belghuith, L.H. Word2vec for Arabic word sense disambiguation. In International Conference on Applications of Natural Language to Information Systems; Springer: New York, NY, USA, 2018; pp. 308–311. [Google Scholar]
- Bekkali, M.; Lachkar, A. Context-based Arabic Word Sense Disambiguation using Short Text Similarity Measure. In Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications, October 2018; pp. 1–6. Available online: https://dl.acm.org/doi/abs/10.1145/3289402.3289521 (accessed on 10 March 2021).
- Laatar, R.; Aloulou, C.; Belguith, L.H. Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural Networks. ACM Trans. Asian Low-Resourc. Lang. Inf. Process. (TALLIP) 2020, 19, 1–16. [Google Scholar] [CrossRef]
- Akbik, A.; Blythe, D.; Vollgraf, R. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–25 August 2018; pp. 1638–1649. [Google Scholar]
- Black, W.; Elkateb, S.; Rodriguez, H.; Alkhalifa, M.; Vossen, P.; Pease, A.; Fellbaum, C. Introducing the Arabic wordnet project. In Proceedings of the Third International WordNet Conference, Citeseer, UK, 22–26 January 2006; pp. 295–300. [Google Scholar]
- Omar Ahmed Mokhtar. Modern Standard Arabic Dictionary; Omar: Alam ALkotob, Egypt, 2008. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. Stat 2016, 1050, 21. [Google Scholar]
- Abdelali, A.; Darwish, K.; Durrani, N.; Mubarak, H. Farasa: A fast and furious segmenter for arabic. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, San Diego, CA, USA, June 2016; pp. 11–16. [Google Scholar]
- Kingma, D.; Ba, L. Adam: A Method for Stochastic Optimization; The International Conference on Learning Representations (ICLR): San Diego, CA, USA, 2015. [Google Scholar]
- Lachenbruch, P.A. McNemar test. Wiley StatsRef Stat. Ref. Online 2014, 5. [Google Scholar]
- Menai, M.E.B. Word sense disambiguation using evolutionary algorithms–Application to Arabic language. Comput. Hum. Behav. 2014, 41, 92–103. [Google Scholar] [CrossRef]
- Bouhriz, N.; Benabbou, F.; Lahmar, E.B. Word sense disambiguation approach for Arabic text. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 381–385. [Google Scholar] [CrossRef] [Green Version]
# Words | # Senses | Average Sense per Word | Max Senses per Word |
---|---|---|---|
5347 | 15,549 | 3 | 40 |
Word | Gloss | Context-Example |
---|---|---|
عرض (show) | عرض الموضوع له بسطه وطرحه ليطلعه عليه (Show the topic to him, simplify the topic to show it) | عرض خطة بحثه (Show his research plan) |
عرض (honor) | ما يمدح ويذم من الإنسان نفسه وحسبه أو فيمن يلزمه أمره (What a person praises and defames in himself or in whom he is obligated to) | طعن في عرض فلان (Insult someone honor) |
Training Data | Vocab | Configs | ||||
---|---|---|---|---|---|---|
Model | Sentence | Size | No. of Tokens | Tokenization | Size | of Params |
AraBERTv2 | 200 M | 77 GB | 8.6 B | Sentence piece | 60 K | of 135 |
ARBERT | Several (6 sources) | 61 GB | 6.5 B | Word piece | 100 K | of 163 |
Word | Gloss | Context-Example | Label |
---|---|---|---|
عرض (show) | عرض الموضوع له بسطه وطرحه ليطلعه عليه (Show the topic to him, simplify the topic to show it) | عرض خطة بحثه (Show his research plan) | 1 |
عرض (honor) | عرض الموضوع له بسطه وطرحه ليطلعه عليه (Show the topic to him, simplify the topic to show it) | طعن في عرض فلان (Insult someone honor) | 0 |
Context-Definition Pairs of the Target Word | Label | Word |
---|---|---|
[CLS] ’gloss sentence’ [SEP] ’example sentence’ | 1 | the word |
[CLS] ’gloss sentence’ [SEP] ’wrong example sentence’ | 0 | the word |
No. Words | No. Senses | No. Positive Examples | No. Negative Examples |
---|---|---|---|
3601 | 6246 | 1800 | 1801 |
Model | Optimizer | Learning Rate | Epochs | Layers to Represent Target Word |
---|---|---|---|---|
Model I | Adam | 20 | last 4 | |
Model II | Adam | 2 | - |
Model | BERT Model | Tokenization | Precision | Recall | F1 |
---|---|---|---|---|---|
Model I | AraBERTv2 | No | 69% | 78% | 74% |
Model I | AraBERTv2 | Yes | 67% | 78% | 72% |
Model I | ARBERT | No | 50% | 90% | 64% |
Model II | AraBERTv2 | No | 92% | 87% | 89% |
Model II | AraBERTv2 | Yes | 96% | 67% | 79% |
Model II | ARBERT | No | 89% | 73% | 83% |
Model | Embedding Model | Result Source | Precision | Recall | F1 |
---|---|---|---|---|---|
Model I | AraBERTv2 | Our benchmark | 69% | 78% | 74% |
Model I | AraBERTv2 | Our Benchmark | 67% | 78% | 72% |
Model I | ArBERT | Our benchmark | 50% | 90% | 64% |
Model II | AraBERTv2 | our benchmark | 92% | 87% | 89% |
Model II | AraBERTv2 | Our benchmark | 96% | 67% | 79% |
Model II | ARBERT | Our benchmark | 89% | 73% | 83% |
Laatar [25] | FLAIR | Our benchmark | 63% | 63% | 63% |
Laatar [25] | FLAIR | Orginal paper result (100 words) | 67% | 67% | 67% |
Laatar [23] | word2vec | Our benchmaek | 45% | 45% | 45% |
Laatar [23] | word2vec | Original paper test (100 words) | 78% | - | - |
Model | Testset | Precision | Recall | F1 |
---|---|---|---|---|
Model II-AraBERTv2 | 3601 ambiguous words | 92% | 87% | 89% |
Model II-AraBERTv2-tokenized | 3601 ambiguous words | 96% | 67% | 79% |
Model II-ArBERT | 3601 ambiguous words | 89% | 73% | 80% |
Hadnie [18] | AWN | 73% | - | - |
Zouaghi [21] | 50 ambiguous words | 59% | - | - |
Menai [33] | - | 79% | 63% | 70% |
Bekkali [24] | 1217 text | - | - | 85% |
Bouhriz [34] | manually collected text | 74% | - | - |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
El-Razzaz, M.; Fakhr, M.W.; Maghraby, F.A. Arabic Gloss WSD Using BERT. Appl. Sci. 2021, 11, 2567. https://doi.org/10.3390/app11062567
El-Razzaz M, Fakhr MW, Maghraby FA. Arabic Gloss WSD Using BERT. Applied Sciences. 2021; 11(6):2567. https://doi.org/10.3390/app11062567
Chicago/Turabian StyleEl-Razzaz, Mohammed, Mohamed Waleed Fakhr, and Fahima A. Maghraby. 2021. "Arabic Gloss WSD Using BERT" Applied Sciences 11, no. 6: 2567. https://doi.org/10.3390/app11062567
APA StyleEl-Razzaz, M., Fakhr, M. W., & Maghraby, F. A. (2021). Arabic Gloss WSD Using BERT. Applied Sciences, 11(6), 2567. https://doi.org/10.3390/app11062567