Authors:
Salma Abid Azzi
and
Chiraz Ben Othmane Zribi
Affiliation:
National School of Computer Science, Manouba University, Tunisia
Keyword(s):
Natural Language Processing, Multi-label Classification, Deep Learning, Arabic Language, Abusive Texts, Social Media, Convolutional Neural Networks, Recurrent Neural Networks.
Abstract:
Facing up to abusive texts in social networks is gradually becoming a mainstream NLP research topic. However, the detection of its specific related forms is still scarce. The majority of automatic solutions cast the problem into a two-class or three-class classification issue not taking into account its variety of aspects. Specifically in the Arabic language, as one of the most widely spoken languages, social media abusive texts are written in a mix of different dialects which further complicates the detection process. The goal of this research is to detect eight specific subtasks of abusive language in Arabic social platforms, namely Racism, Sexism, Xenophobia, Violence, Hate, Pornography, Religious hatred, and LGBTQ a Hate. To conduct our experiments, we evaluated the performance of CNN, BiLSTM, and BiGRU deep neural networks with pre-trained Arabic word embeddings (AraVec). We also investigated the recent Bidirectional Encoder Representations from Transformers (BERT) model with it
s special tokenizer. Results show that DNN classifiers achieved nearly the same performance with an overall average precision of 85%. Moreover, although all the deep learning models obtained very close results, BERT slightly outperformed the others with a precision of 90% and a micro-averaged F1 score of 79%.
(More)