Abdelrahim Qaddoumi

2024

pdf bib abs
Picking Up Where the Linguist Left Off: Mapping Morphology to Phonology through Learning the Residuals
Salam Khalifa | Abdelrahim Qaddoumi | Ellen Broselow | Owen Rambow
Proceedings of The Second Arabic Natural Language Processing Conference

Learning morphophonological mappings between the spoken form of a language and its underlying morphological structures is crucial for enriching resources for morphologically rich languages like Arabic. In this work, we focus on Egyptian Arabic as our case study and explore the integration of linguistic knowledge with a neural transformer model. Our approach involves learning to correct the residual errors from hand-crafted rules to predict the spoken form from a given underlying morphological representation. We demonstrate that using a minimal set of rules, we can effectively recover errors even in very low-resource settings.

2023

pdf bib abs
Reconstruction Probing
Najoung Kim | Jatin Khilnani | Alex Warstadt | Abdelrahim Qaddoumi
Findings of the Association for Computational Linguistics: ACL 2023

We propose reconstruction probing, a new analysis method for contextualized representations based on reconstruction probabilities in masked language models (MLMs). This method relies on comparing the reconstruction probabilities of tokens in a given sequence when conditioned on the representation of a single token that has been fully contextualized and when conditioned on only the decontextualized lexical prior of the model. This comparison can be understood as quantifying the contribution of contextualization towards reconstruction—the difference in the reconstruction probabilities can only be attributed to the representational change of the single token induced by contextualization. We apply this analysis to three MLMs and find that contextualization boosts reconstructability of tokens that are close to the token being reconstructed in terms of linear and syntactic distance. Furthermore, we extend our analysis to finer-grained decomposition of contextualized representations, and we find that these boosts are largely attributable to static and positional embeddings at the input layer.

pdf bib abs
Abed at KSAA-RD Shared Task: Enhancing Arabic Word Embedding with Modified BERT Multilingual
Abdelrahim Qaddoumi
Proceedings of ArabicNLP 2023

This paper presents a novel approach to the Arabic Reverse Dictionary Shared Task at WANLP 2023 by leveraging the BERT Multilingual model and introducing modifications augmentation and using a multi attention head. The proposed method aims to enhance the performance of the model in understanding and generating word embeddings for Arabic definitions, both in monolingual and cross-lingual contexts. It achieved good results compared to benchmark and other models in the shared task 1 and 2.

2022

In this paper, we present the results and findings of the Shared Task on Gender Rewriting, which was organized as part of the Seventh Arabic Natural Language Processing Workshop. The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., a female speaker with a male listener, a male speaker with a male listener, etc.). This requires changing the grammatical gender (masculine or feminine) of certain words referring to the users. In this task, we focus on Arabic, a gender-marking morphologically rich language. A total of five teams from four countries participated in the shared task.

pdf bib abs
Arabic Sentiment Analysis by Pretrained Ensemble
Abdelrahim Qaddoumi
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

This paper presents the 259 team’s BERT ensemble designed for the NADI 2022 Subtask 2 (sentiment analysis) (Abdul-Mageed et al., 2022). Twitter Sentiment analysis is one of the language processing (NLP) tasks that provides a method to understand the perception and emotions of the public around specific topics. The most common research approach focuses on obtaining the tweet’s sentiment by analyzing its lexical and syntactic features. We used multiple pretrained Arabic-Bert models with a simple average ensembling and then chose the best-performing ensemble on the training dataset and ran it on the development dataset. This system ranked 3rd in Subtask 2 with a Macro-PN-F1-score of 72.49%.