[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Empowering Digital Civility with an NLP Approach for Detecting đť•Ź (Formerly Known as Twitter) Cyberbullying through Boosted Ensembles

Published: 23 November 2024 Publication History

Abstract

As the number of social networking sites grows, so do cyber dangers. Cyberbullying is harmful behavior that uses technology to intimidate, harass, or harm someone, often on social media platforms like đť•Ź (formerly known as Twitter). Machine learning is the optimal approach for cyberbullying detection on đť•Ź to process large amounts of data, identify patterns of offensive behavior, and automate the detection process for corpus of tweets. To identify cyber threats using a trained model, the boosted ensemble (BE) technique is assessed with various machine learning algorithms such as the convolutional neural network (CNN), long short-term memory (LSTM), naive Bayes (NB), decision tree (DT), support vector machine (SVM), bidirectional LSTM (BILSTM), recurrent neural network LSTM (RNN-LSTM), multi-modal cyberbullying detection (MMCD), and random forest (RF). These classifiers are trained on the vectorized data to classify the tweets to identify cyberbullying threats. The proposed framework can detect cyberbullying cases precisely on tweets. The significance of the work lies in detecting and mitigating cyber threats in real time, and it impacts in enhancing the safety and well-being of social media users by reducing instances of cyberbullying and other cyber threats. The comparative analysis is done using metrics like accuracy, precision, recall, and F1-score, and the comparison results show that the BE technique outperforms other compared algorithms with its overall performance. Respectively, the accuracy rates of CNN, LSTM, NB, DT, SVM, RF, BILSTM, and BE are 92.5%, 93.5%, 84.6%, 88%, 89.3%, 92%, 93.75%, and 96%; precision rates of CNN, LSTM, NB, DT, SVM, RF, RNN-LSTM, and BE are 90.2%, 91.3%, 88%, 85%, 86%, 91.6%, 92.1%, and 94%; recall rates of CNN, LSTM, NB, DT, SVM, RF, BILSTM, and BE are 89.8%, 90.7%, 90%, 82%, 88.67%, 89%, 91.04%, and 93.7%; and F1-scores of CNN, LSTM, NB, DT, SVM, RF, MMCD, and BE are 90.6%, 91.8%, 85%, 84.56% 87.2%, 90%, 84.6%, and 94.89%.

References

[1]
Simeon O. Edosomwan, Sitalaskshmi Kalangot Prakasan, Doriane Kouame, Jonelle Watson, and Tom Seymour. 2011. The history of social media and its impact on business. Journal of Applied Management and Entrepreneurship 16, 3 (2011), 79–83.
[2]
Juan Carlos Pereira-Kohatsu, Lara Quijano-Sánchez, Federico Liberatore, and Miguel Camacho-Collados. 2019. Detecting and monitoring hate speech in Twitter. Sensors 19, 21 (2019), 4654.
[3]
Sheri Bauman. 2014. Cyberbullying: What Counselors Need to Know. John Wiley & Sons.
[4]
D. Dhanalakshmi, Namani Deepika Rani, Keerthi Pendam, Shanmugasundaram Hariharan, Vinay Kukreja, and P. R. Jayakshata. 2023. Machine learning based intelligent cyberbullying avoidance system. In Proceedings of the International Conference on Sustainable Computing and Smart Systems (ICSCSS’23). 1594–1597.
[5]
Conor M. C. Guckin, and Lucie Corcoran. 2017. Cyberbullying: Where Are We Now? A Cross-National Understanding. MDPI, Wuhan, China.
[6]
Tracy Vaillancourt, Robert Faris, and Faye Mishna. 2017. Cyberbullying in children and youth: Implications for health and clinical practice. Canadian Journal of Psychiatry 62, 6 (2017), 368–373.
[7]
Anke Görzig and Kjartan Ólafsson. 2013. What makes a bully a cyberbully? Unravelling the characteristics of cyberbullies across twenty-five European countries. Journal of Children and Media 7, 1 (2013), 9–27.
[8]
Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore. 2012. Learning from bullying traces in social media. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 656–666.
[9]
E. M. Hadiya. 2022. Cyber bullying detection in Twitter using machine learning algorithms. International Journal of Advances in Engineering and Management 4, 8 (2022), 1172–1184.
[10]
D. M. Farid, Li Zhang, Chowdhury Mofizur Rahman, M. Alamgir Hossain, and Rebecca Strachan. 2014. Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Systems with Applications 41, 4 (2014), 1937–1946.
[11]
Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell, Michal Kosinski, Lyle Ungar, and H. Andrew Schwartz. 2014. Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1146–1151.
[12]
Acar Steers Wickham. 2019. Contextualizing cyberstalking victimization: Assessing the role of social context in cyber victimization experiences. Violence and Victims 34, 3 (2019), 444–462.
[13]
Anna Schmidt and Michael Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. 1–10.
[14]
Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the International Conference on Privacy, Security, Risk, and Trust on Social Computing. 71–80.
[15]
Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. 54–63.
[16]
György Kovács, Pedro Alonso, and Rajkumar Saini. 2021. Challenges of hate speech detection in social media: Data scarcity, and leveraging external resources. SN Computer Science 2, 2 (2021), 1–15.
[17]
Sean MacAvaney, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. 2019. Hate speech detection: Challenges and solutions. PLOS ONE 14, 8 (2019), e0221152.
[18]
Lanyu Shang, Yang Zhang, Yuheng Zha, Yingxi Chen, Christina Youn, and Dong Wang. 2021. AOMD: An analogy-aware approach to offensive meme detection on social media. Information Processing & Management 58, 5 (2021), 102664.
[19]
Rao Faizan Ali, Amgad Muneer, P. D. D. Dominic, Shakirah Mohd Taib, and Ebrahim A. A. Ghaleb. 2021. Internet of things (IoT) security challenges and solutions: A systematic literature review. In Proceedings of the 3rd International Conference on Advances in Cyber Security (ACeS’21). 128–154.
[20]
Maral Dadvar, Dolf Trieschnigg, and Franciska De Jong. 2014. Experts and machines against bullies: A hybrid approach to detect cyberbullies. In Advances in Artificial Intelligence. Lecture Notes in Computer Science, Vol. 8436. Springer, 275–281.
[21]
John Batani, Elliot Mbunge, Benhildah Muchemwa, Goabaone Gaobotse, Caroline Gurajena, Stephen Fashoto, Tatenda Kavu, and Kudakwashe Dandajena. 2022. A review of deep learning models for detecting cyberbullying on social media networks. In Cybernetics Perspectives in Systems. Lecture Notes in Networks and Systems, Vol. 503. Springer, 528–550.
[22]
Mohammed Saud Alsaidan, Nawaf Saad Altayar, Saqer Habeeb Alshmmari, Meshari Mahud Alshammari, Faisal Turki Alqahtani, and Khaled Abdullah Mohajer. 2020. The prevalence and determinants of body dysmorphic disorder among young social media users: A cross-sectional study. Dermatology Reports 12, 3 (2020), 70–76.
[23]
Zahra Ashktorab, Eben Haber, Jennifer Golbeck, and Jessica Vitak. 2017. Beyond cyberbullying: Self-disclosure, harm and social support on ASKfm. In Proceedings of the 2017 ACM on Web Science Conference. 3–12.
[24]
Thomas Griffiths, Michael Jordan, Joshua Tenenbaum, and David Blei. 2003. Hierarchical topic models and the nested Chinese restaurant process. In Proceedings of the 17th Annual Conference of Neural Information Processing Systems (NIPS’03).
[25]
Franca Debole and Fabrizio Sebastiani. 2003. Supervised term weighting for automated text categorization. In Proceedings of the 2003 ACM Symposium on Applied Computing. 784–788.
[26]
David Hall and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 248–256.
[27]
David Blei, Lawrence Carin, and David Dunson. 2010. Probabilistic topic models. IEEE Signal Processing Magazine 27, 6 (2010), 55–65.
[28]
H. Jelodar, Y. Wang, C. Yuan, \(\mathbb{X}\). Feng, \(\mathbb{X}\). Jiang, Y. Li, and L. Zhao. 2019. Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications 78, 1 (2019), 15169–15211.
[29]
Zhi Xu and Sencun Zhu. 2010. Filtering offensive language in online communities using grammatical relations. In Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference. 1–10.
[30]
Vinita Nahar, Xue Li, and Chaoyi Pang. 2013. An effective approach for cyberbullying detection. Communications in Information Science and Management Engineering 3, 5 (2013), 238.
[31]
Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, and Shivakant Mishra. 2018. Scalable and timely detection of cyberbullying in online social networks. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 1738–1747.
[32]
William Warner and Julia Hirschberg. 2012. Detecting hate speech on the World Wide Web. In Proceedings of the 2nd Workshop on Language in Social Media. 19–26.
[33]
Nuno Dionísio, Fernando Alves, Pedro M. Ferreira, and Alysson Bessani. 2019. Cyberthreat detection from Twitter using deep neural networks. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN’19). 1–8.
[34]
Hafiz M. Farooq and Naif M. Otaibi. 2018.Optimal machine learning algorithms for cyber threat detection. In Proceedings of the 20th International Conference on Computer Modelling and Simulation (UKSim’18). 32–37.
[35]
Bang Cheng Zhang, Guan Yu Hu, Zhi Jie Zhou, You Min Zhang, Pei Li Qiao, and Lei Lei Chang. 2017. Network intrusion detection based on directed acyclic graph and belief rule base. ETRI Journal 39, 4 (2017), 592–604.
[36]
Wei Wang, Yiqiang Sheng, Jinlin Wang, Xuewen Zeng, Xiaozhou Ye, Yongzhong Huang, and Ming Zhu. 2017. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 6 (2017), 1792–1806.
[37]
Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias, and Ke Li. 2016. AI^2: Training a big data machine to defend. In Proceedings of the 2nd International Conference on Big Data Security on Cloud (BigDataSecurity’16), the IEEE International Conference on High Performance and Smart Computing (HPSC’16), and the IEEE International Conference on Intelligent Data and Security (IDS’16). 49–54.
[38]
Yun Shen, Enrico Mariconti, Pierre Antoine Vervier, and Gianluca Stringhini. 2018. Tiresias: Predicting security events through deep learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 592–605.
[39]
Mohammed Ali Al-Garadi, Kasturi Dewi Varathan, and Sri Devi Ravana. 2016. Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior 63 (2016), 433–443.
[40]
Jason Brownlee. 2017. Machine Learning Algorithms from Scratch with Python. Machine Learning Mastery. Jason Brownlee.
[41]
Hayder A. Alatabi and Ayad R. Abbas. 2020. Sentiment analysis in social media using machine learning techniques. Iraqi Journal of Science 61, 1 (2020), 193–201.
[42]
Licheng Jiao, Zhongjian Huang, Xu Liu, Yuting Yang, Mengru Ma, Jiaxuan Zhao, Chao You, Biao Hou, Shuyuan Yang, Fang Liu, Wenping Ma, Lingling Li, Puhua Chen, Zhixi Feng, Xu Tang, Yuwei Guo, Xiangrong Zhang, Dou Quan, Shuang Wang, Weibin Li, Jing Bai, Yangyang Li, Ronghua Shang, and Jie Feng. 2023. Brain-inspired remote sensing interpretation: A comprehensive survey. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16, (2023), 2992–3033.
[43]
May Me Me Hlaing and Nang Saing Moon Kham. 2020. Defining news authenticity on social media using a machine learning approach. In Proceedings of the 2020 IEEE Conference on Computer Applications (ICCA’20). 1–6.
[44]
Leo Breiman. 1996. Bagging predictors. Machine Learning 24 (1996), 123–140.
[45]
T. Hastie, R. Tibshirani, and J. Friedman. 2010. The elements of statistical learning: Data mining, inference, and prediction. Journal of the Royal Statistical Society Series A: Statistics in Society 173, 3 (2010), 693–694.
[46]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer, New York, NY.
[47]
David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992), 241–259.
[48]
Leo Brieman. 1996. Stacked regressions. Machine Learning 24 (1996), 49–64.
[49]
Aurélien Géron. 2022. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media.
[50]
Hao Chen, Jun Liu, Yanzhang Lv, Max Haifei Li, Mengyue Liu, and Qinghua Zheng. 2018. Semi-supervised clue fusion for spammer detection in Sina Weibo. Information Fusion 44 (2018), 22–32.
[51]
Samiya Khan, Xiufeng Liu, Syed Arshad Ali, and Mansaf Alam. 2019. Bivariate, cluster and suitability analysis of NoSQL solutions for different application areas. arXiv preprint arXiv:1911.11181 (2019).
[52]
Tingmin Wu, Sheng Wen, Yang Xiang, and Wanlei Zhou. 2018. Twitter spam detection: Survey of new approaches and comparative study. Computers & Security 76 (2018), 265–284.
[53]
Haifeng Sun, Daixuan Cheng, Jingyu Wang, Qi Qi, and Jianxin Liao. 2021. Pattern and content controlled response generation. Information Processing & Management 58, 5 (2021), 102605.
[54]
Daniel L. Schacter and Marvin Chun. 2022. Visual Memory. Yale University, New Haven, CT.
[55]
Agata Kołakowska, Wioleta Szwoch, and Mariusz Szwoch. 2020. A review of emotion recognition methods based on data acquired via smartphone sensors. Sensors 20, 21 (2020), 6367.
[56]
Kashif Ayyub, Saqib Iqbal, Muhammad Wasif Nisar, Ehsan Ullah Munir, Fawaz Khaled Alarfaj, and Naif Almusallam. 2022. A feature-based approach for sentiment quantification using machine learning. Electronics 11, 6 (2022), 846.
[57]
Senthil Prabakaran, Ramalakshmi Ramar, Irshad Hussain, Balasubramanian Prabhu Kavin, Sultan S. Alshamrani, Ahmed Saeed AlGhamdi, and Abdullah Alshehri. 2022. Predicting attack pattern via machine learning by exploiting stateful firewall as virtual network function in an SDN network. Sensors 22, 3 (2022), 709.
[58]
Justin Cheng, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2015. Antisocial behavior in online discussion communities. In Proceedings of the International AAAI Conference on Web and Social Media. 61–70.
[59]
Hamzeh Qudah, Mwaffaq Ahmad Abu Alhija, and Hassan Tarawneh. 2023. Improving cyberbullying detection through adaptive external dictionary in machine learning. Research Square. Retrieved October 8, 2024 from
[60]
Ajeng Ayu Kustianti, Renta Sianturi, Ameliya Sarwani, Anggita Putri Siswadi, Delia Nurmalita, and Elisa Puspitasari. 2022. Teknologi informasi efektif mendeteksi cyberbullying. Journal of Bionursing 4, 2 (2022), 69–78.
[61]
Wikimedia Foundation. 2022. Evaluation of binary classifiers. Wikipedia. Retrieved October 8, 2024 from https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers
[62]
Wikimedia Foundation. 2023. Accuracy and precisions. Wikipedia. Retrieved October 8, 2024 from https://en.wikipedia.org/wiki/Accuracy_and_precisions
[63]
Mohammed Sabri, Brahami Menaouer, Abid Faten Fatima Zohra, and Matta Nada. 2022. Sentiment analysis of COVID tweets using adaptive neuro-fuzzy inference system. International Journal of Software Science and Computational Intelligence 14, 1 (2022), 1–20.
[64]
Dhai Eddine Salhi, Abdelkamel A. Kamel Tari, and Tahar Kechadi. 2021. Using e-reputation for sentiment analysis: Twitter as a case study. International Journal of Cloud Applications and Computing 11, 2 (2021), 32–47.
[65]
Praneeth Gunti, Brij B. Gupta, and Elhadj Benkhelifa. 2022. Data mining approaches for sentiment analysis in online social networks (OSNs). In Data Mining Approaches for Big Data and Sentiment Analysis in Social Media, Brij B. Gupta, Dragan Perakovic, Ahmed A. Abd El-Latif, and Deepak Gupta (Eds.). IGI Global, 116–141.
[66]
Andrea Perera and Pumudu Fernando. 2024. Cyberbullying detection system on social media using supervised machine learning. Procedia Computer Science 239 (2024), 506–516.
[67]
Tanjim Mahmud, Michal Ptaszynski, and Fumito Masui. 2024. Exhaustive study into machine learning and deep learning methods for multilingual cyberbullying detection in Bangla and Chittagonian texts. Electronics. 13, 9 (2024), 1677.
[68]
Haifa Saleh Alfurayj, Belén F. Hurtado, Syaheerah Lebai Lutfi, and Toqir A. Rana. 2024. Exploring bystander contagion in cyberbully detection: A systematic review. Journal of Ambient Intelligence and Humanized Computing 15 (2024), 1–17.
[69]
Adamu Gaston Philipo, Doreen Sebastian Sarwatt, and Jianguo Ding Mahmoud Daneshmand. 2024. Cyberbullying detection: Exploring datasets, technologies, and approaches on social media platforms. arXiv:2407.12154 [cs.CY] (2024).
[70]
Navaneetha Krishnan Muthunambu, Senthil Prabakaran, Balasubramanian Prabhu Kavin, Kishore Senthil Siruvangur, Kavitha Chinnadurai, and Jehad Ali. 2024. A novel eccentric intrusion detection model based on recurrent neural networks with leveraging LSTM. Computer Materials Continua 78, 3 (2024), 3089–3127.
[71]
Akash Shah, Sapna Varshney, and Monica Mehrotra. 2024. Threats on online social network platforms: Classification, detection, and prevention techniques. Multimedia Tools and Applications 67, 3 (2024), 1–33.
[72]
Mahmoud Ahmad Al-Khasawneh, Muhammad Faheem, Ala Abdulsalam Alarood, Safa Habibullah, and Eesa Alsolami. 2024. Toward multi-modal approach for identification and detection of cyberbullying in social networks. IEEE Access 12 (2024), 90158–90170.

Index Terms

  1. Empowering Digital Civility with an NLP Approach for Detecting đť•Ź (Formerly Known as Twitter) Cyberbullying through Boosted Ensembles

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 12
    December 2024
    237 pages
    EISSN:2375-4702
    DOI:10.1145/3613720
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 November 2024
    Online AM: 07 October 2024
    Accepted: 02 September 2024
    Revised: 16 August 2024
    Received: 23 January 2024
    Published in TALLIP Volume 23, Issue 12

    Check for updates

    Author Tags

    1. Boosted ensembles
    2. machine learning
    3. tweet attack
    4. count vectorization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 100
      Total Downloads
    • Downloads (Last 12 months)100
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media