research-article

Empowering Digital Civility with an NLP Approach for Detecting 𝕏 (Formerly Known as Twitter) Cyberbullying through Boosted Ensembles

Authors:

Senthil Prabakaran,

Navaneetha Krishnan Muthunambu,

Nagarajan JeyaramanAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 12

Article No.: 168, Pages 1 - 31

https://doi.org/10.1145/3695251

Published: 23 November 2024 Publication History

Get Access

Abstract

As the number of social networking sites grows, so do cyber dangers. Cyberbullying is harmful behavior that uses technology to intimidate, harass, or harm someone, often on social media platforms like 𝕏 (formerly known as Twitter). Machine learning is the optimal approach for cyberbullying detection on 𝕏 to process large amounts of data, identify patterns of offensive behavior, and automate the detection process for corpus of tweets. To identify cyber threats using a trained model, the boosted ensemble (BE) technique is assessed with various machine learning algorithms such as the convolutional neural network (CNN), long short-term memory (LSTM), naive Bayes (NB), decision tree (DT), support vector machine (SVM), bidirectional LSTM (BILSTM), recurrent neural network LSTM (RNN-LSTM), multi-modal cyberbullying detection (MMCD), and random forest (RF). These classifiers are trained on the vectorized data to classify the tweets to identify cyberbullying threats. The proposed framework can detect cyberbullying cases precisely on tweets. The significance of the work lies in detecting and mitigating cyber threats in real time, and it impacts in enhancing the safety and well-being of social media users by reducing instances of cyberbullying and other cyber threats. The comparative analysis is done using metrics like accuracy, precision, recall, and F1-score, and the comparison results show that the BE technique outperforms other compared algorithms with its overall performance. Respectively, the accuracy rates of CNN, LSTM, NB, DT, SVM, RF, BILSTM, and BE are 92.5%, 93.5%, 84.6%, 88%, 89.3%, 92%, 93.75%, and 96%; precision rates of CNN, LSTM, NB, DT, SVM, RF, RNN-LSTM, and BE are 90.2%, 91.3%, 88%, 85%, 86%, 91.6%, 92.1%, and 94%; recall rates of CNN, LSTM, NB, DT, SVM, RF, BILSTM, and BE are 89.8%, 90.7%, 90%, 82%, 88.67%, 89%, 91.04%, and 93.7%; and F1-scores of CNN, LSTM, NB, DT, SVM, RF, MMCD, and BE are 90.6%, 91.8%, 85%, 84.56% 87.2%, 90%, 84.6%, and 94.89%.

References

[1]

Simeon O. Edosomwan, Sitalaskshmi Kalangot Prakasan, Doriane Kouame, Jonelle Watson, and Tom Seymour. 2011. The history of social media and its impact on business. Journal of Applied Management and Entrepreneurship 16, 3 (2011), 79–83.

Abstract

References

Index Terms

Recommendations

Using boosting to prune bagging ensembles

Building boosted classification tree ensemble with genetic programming

An Ensemble Classification System for Twitter Sentiment Analysis

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Full Text

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations