[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

MANDOLA: A Big-Data Processing and Visualization Platform for Monitoring and Detecting Online Hate Speech

Published: 04 March 2020 Publication History

Abstract

In recent years, the increasing propagation of hate speech in online social networks and the need for effective counter-measures have drawn significant investment from social network companies and researchers. This has resulted in the development of many web platforms and mobile applications for reporting and monitoring online hate speech incidents. In this article, we present MANDOLA, a big-data processing system that monitors, detects, visualizes, and reports the spread and penetration of online hate-related speech using big-data approaches. MANDOLA consists of six individual components that intercommunicate to consume, process, store, and visualize statistical information regarding hate speech spread online. We also present a novel ensemble-based classification algorithm for hate speech detection that can significantly improve the performance of MANDOLA’s ability to detect hate speech. To present the functionality and usability of our system, we present a use case scenario of real-life event annotation and data correlation. As shown from the performance of the individual modules, as well as the usability and functionality of the whole system, MANDOLA is a powerful system for reporting and monitoring online hate speech.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available at http://tensorflow.org (Software available from tensorflow.org.)
[2]
Imran Awan. 2014. Islamophobia and Twitter: A typology of online hate against Muslims on social media. Policy 8 Internet 6, 2 (June 2014), 133--150.
[3]
Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW’17 Companion). ACM, New York, NY.
[4]
Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. 54--63. https://www.aclweb.org/anthology/S19-2007.
[5]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937.
[6]
L. Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (Aug. 1996), 123--140.
[7]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems 29, 8 (1997), 1157--1166.
[8]
Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M. Buhmann. 2010. The balanced accuracy and its posterior distribution. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, Los Alamitos, CA, 3121--3124.
[9]
Peter Burnap, Omer Rana, Matthew Williams, William Housley, Adam Edwards, Jeffrey Morgan, Luke Sloan, and Javier Conejero. 2015. COSMOS: Towards an integrated and scalable service for analysing social media on demand. International Journal of Parallel, Emergent and Distributed Systems 30, 2 (2015), 80--100.
[10]
Arthur T. E. Capozzi, Mirko Lai, Valerio Basile, Cataldo Musto, Marco Polignano, Fabio Poletto, Manuela Sanguinetti, et al. 2019. Computational linguistics against hate: Hate speech detection and visualization on social media in the “Contro L’Odio” project. In Proceedings of the 6th Italian Conference on Computational Linguistics (CLiC-it’19).
[11]
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on Twitter. arXiv:1702.06877.
[12]
Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 2012 International Conference on Privacy, Security, Risk, and Trust and the 2012 International Confernece on Social Computing. IEEE, Los Alamitos, CA.
[13]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078.
[14]
Raphael Cohen-Almagor. 2015. Viral hate: Containing its spread on the Internet by Abraham H. Foxman and Christopher Wolf. Basingstoke: Palgrave Macmillan, 2013. 256pp., £17.99, ISBN 978 0230342170. Political Studies Review 13, 2 (2015), 281--282. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/1478-9302.12087_70
[15]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537. http://dl.acm.org/citation.cfm?id=1953048.2078186.
[16]
Maral Dadvar, Dolf Trieschnigg, Roeland Ordelman, and Franciska de Jong. 2013. Improving cyberbullying detection with user context. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR’13). 693--696.
[17]
Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. arxiv:1703.04009.
[18]
Cong Ding, Yang Chen, and Xiaoming Fu. 2013. Crowd crawling: Towards collaborative data collection for large-scale online social networks. In Proceedings of the 1st ACM Conference on Online Social Networks (COSN’13). ACM, New York, NY, 183--188.
[19]
Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web (WWW’15 Companion). ACM, New York, NY.
[20]
EEANews. Countering Hate Speech Online. Retrieved February 15, 202 from https://eeagrants.org/News/2012/Countering-hate-speech-online.
[21]
H. Efstathiades, D. Antoniades, G. Pallis, and M. D. Dikaiakos. 2016. Distributed large-scale data collection in online social networks. In Proceedings of the 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC’16). 373--380.
[22]
Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2018. A unified deep learning architecture for abuse detection. arxiv:1802.00385.
[23]
D. G. Njagi, Z. Zhang, D. Hanyurwimfura, and J. Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (April 2015), 215--230.
[24]
Iginio Gagliardone, Danit Gal, Thiago Alves, and Gabriela Martinez. 2015. Countering Online Hate Speech. Retrieved February 15, 2020 from https://unesdoc.unesco.org/ark:/48223/pf0000233231.
[25]
Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online.
[26]
Dario Garcia-Gasulla, Ferran Parés, Armand Vilalta, Jonathan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, and Toyotaro Suzumura. 2017. On the behavior of convolutional nets for feature extraction. arxiv:1703.01127.
[27]
Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215--230.
[28]
I. Goodfellow, Y. Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, MA. http://www.deeplearningbook.org.
[29]
Edel Greevy and Alan F. Smeaton. 2004. Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, NY.
[30]
Jake Harwood. 2011. Book review: Waltman, M., 8 Haas, J. (2011). The communication of hate. New York, NY: Peter Lang. vii + 202 pp. ISBN: 978-1433104473. Journal of Language and Social Psychology 30, 3 (2011), 350--352. arXiv:https://doi.org/10.1177/0261927X11407170
[31]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arxiv:1512.03385.
[32]
C. J. Hutto and Eric Gilbert. 2015. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM’14).
[33]
John L. Stacy Joshua S. White, Jeanna N. Matthews. 2012. Coalmine: an experience in building a system for social media analytics. In Proceedings Volume 8408: Cyber Sensing 2012. SPIE, 8408.
[34]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arxiv:1412.6980.
[35]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov. 1998), 2278--2324.
[36]
Zachary Chase Lipton. 2015. A critical review of recurrent neural networks for sequence learning. arxiv:1506.00019.
[37]
Walid Magdy, Kareem Darwish, and Norah Abokhodair. 2015. Quantifying public response towards Islam on Twitter after Paris attacks. arXiv:1512.04570.
[38]
Estelle De Marco. 2017. D2.1b: Definition of Illegal Hatred and Implications. Retrieved February 15, 2020 from http://www.mandola-project.eu/publications/.
[39]
Y. Mehdad and J. Tetreault. 2016. Do characters abuse more than words? In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue.
[40]
Stefano Menini, Giovanni Moretti, Michele Corazza, Elena Cabrio, Sara Tonelli, and Serena Villata. 2019. A system to monitor cyberbullying based on message classification and social network analysis. In Proceedings of the 3rd Workshop on Abusive Language Online. 105--110. https://www.aclweb.org/anthology/W19-3511.
[41]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arxiv:1301.3781.
[42]
Fred Morstatter, Jürgen Pfeffer, Huan Liu, and Kathleen M. Carley. 2013. Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s Firehose. arxiv:1306.5204.
[43]
Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (WWW’16). ACM, New York, NY.
[44]
Francisco Ordóñez and Daniel Roggen. 2016. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (Jan. 2016), 115.
[45]
Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. 2013. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--390. http://aclweb.org/anthology/N13-1039.
[46]
Ji Ho Park and Pascale Fung. 2017. One-step and two-step classification for abusive language detection on Twitter. In Proceedings of the 1st Workshop on Abusive Language Online.
[47]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543. http://www.aclweb.org/anthology/D14-1162.
[48]
Georgios Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Effective hate-speech detection in Twitter data using recurrent neural networks. Applied Intelligence 48, 12 (Dec. 2018), 4730--4742.
[49]
A. Ritter, S. Clark, E., and Oren E. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1524--1534.
[50]
Alan Ritter, Mausam, Oren Etzioni, and Sam Clark. 2012. Open domain event extraction from Twitter. In Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (KDD’12). 1104--1112.
[51]
Robert E. Schapire and Yoav Freund. 2012. Boosting: Foundations and Algorithms. MIT Press, Cambridge, MA.
[52]
Anna Schmidt and Michael Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media.
[53]
Mazin Sidahmed. 2016. Claims of Hate Crimes Possibly Linked to Trump’s Election Reported Across the US. Retrieved February 15, 2020 from https://www.theguardian.com/us-news/2016/nov/10/hate-crime-spike-us-donald-trump-president.
[54]
Leandro Araújo Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. arxiv:1603.07709.
[55]
Naftali Tishby and Noga Zaslavsky. 2015. Deep learning and the information bottleneck principle. arxiv:1503.02406.
[56]
Alan Travis. 2017. Anti-Muslim Hate Crime Surges After Manchester and London Bridge Attacks. Retrieved February 15, 2020 from https://www.theguardian.com/society/2017/jun/20/anti-muslim-hate-surges-after-manchester-and-london-bridge-attacks.
[57]
European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). Retrieved February 15, 2020 from http://data.europa.eu/eli/reg/2016/679/oj.
[58]
Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the 1st Italian Conference on Cybersecurity (ITASEC’17). 86--95.
[59]
William Warner and Julia Hirschberg. 2012. Detecting hate speech on the World Wide Web. In Proceedings of the 2nd Workshop on Language in Social Media (LSM’12). 19--26. http://dl.acm.org/citation.cfm?id=2390374.2390377.
[60]
Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on NLP and Computational Social Science.
[61]
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88--93. http://www.aclweb.org/anthology/N16-2013.
[62]
David H. Wolpert. 1992. Stacked generalization. Neural Networks 5 (1992), 241--259.
[63]
Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale Twitter corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY.
[64]
Shuhan Yuan, Xintao Wu, and Yang Xiang. 2016. A two phase deep learning model for identifying discrimination from tweets. In Proceedings of the 19th International Conference on Extending Database Technology (EDBT’16). 696--697.
[65]
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. SemEval-2019 Task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. 75--86.
[66]
Shiwei Zhang, Xiuzhen Zhang, and Jeffrey Chan. 2017. A word-character convolutional neural network for language-agnostic Twitter sentiment analysis. In Proceedings of the 22nd Australasian Document Computing Symposium (ADCS’17).
[67]
Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arxiv:1502.01710.
[68]
Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. arxiv:1509.01626.
[69]
Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics 1, 1 (Dec. 2010), 43--52.
[70]
Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In The Semantic Web. Springer International Publishing, 745--760.
[71]
W. X. Zhao, J. Jiang, J. Weng, J. He, E. P. Lim, H. Yan, and X. Li. 2011. Comparing Twitter and traditional media using topic models. In Advances in Information Retrieval, P. Clough, C. Foley, C. Gurrin, G. J. F. Jones, W. Kraaij, Hyowon L., and V. Mudoch (Eds.). Springer, Berlin, Germany, 338--349.
[72]
H. Zhong, H. Li, A. Squicciarini, S. Rajtmajer, C. Griffin, D. Miller, and C. Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3952--3958.

Cited By

View all
  • (2024)Optimized Application of the Decision Tree ID3 Algorithm Based on Big Data in Sports Performance ManagementInternational Journal of e-Collaboration10.4018/IJeC.35002220:1(1-20)Online publication date: 17-Sep-2024
  • (2024)Enhancing Monitoring Performance: A Microservices Approach to Monitoring with Spyware Techniques and Prediction ModelsSensors10.3390/s2413421224:13(4212)Online publication date: 28-Jun-2024
  • (2024)“HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social MediaACM Transactions on the Web10.1145/364382918:2(1-36)Online publication date: 2-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 20, Issue 2
Special Section on Emotions in Conflictual Social Interactions and Regular Papers
May 2020
256 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3386441
  • Editor:
  • Ling Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2020
Accepted: 01 November 2019
Revised: 01 September 2019
Received: 01 April 2019
Published in TOIT Volume 20, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hate speech
  2. big-data processing platform
  3. deep learning
  4. online social networks
  5. system approach

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Rights Equality and Citizenship (REC) programme of the European Union

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)139
  • Downloads (Last 6 weeks)11
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimized Application of the Decision Tree ID3 Algorithm Based on Big Data in Sports Performance ManagementInternational Journal of e-Collaboration10.4018/IJeC.35002220:1(1-20)Online publication date: 17-Sep-2024
  • (2024)Enhancing Monitoring Performance: A Microservices Approach to Monitoring with Spyware Techniques and Prediction ModelsSensors10.3390/s2413421224:13(4212)Online publication date: 28-Jun-2024
  • (2024)“HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social MediaACM Transactions on the Web10.1145/364382918:2(1-36)Online publication date: 2-Feb-2024
  • (2024)Toxic Comment Detection Using Bidirectional Sequence Classifiers2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT)10.1109/IDCIoT59759.2024.10467922(709-716)Online publication date: 4-Jan-2024
  • (2024)Broadband Internet Speed Dashboard for Sustainable Service Improvement in Thailand2024 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp60711.2024.00092(418-423)Online publication date: 18-Feb-2024
  • (2024)The Effect of Phrase Vector Embedding in Explainable Hierarchical Attention-Based Tamil Code-Mixed Hate Speech and Intent DetectionIEEE Access10.1109/ACCESS.2024.334995812(11316-11329)Online publication date: 2024
  • (2024)Artificial intelligence and socioeconomic forces: transforming the landscape of religionHumanities and Social Sciences Communications10.1057/s41599-024-03137-811:1Online publication date: 10-May-2024
  • (2024)Cultural Violence and Peace Interventions in Social MediaInformation Technology for Peace and Security10.1007/978-3-658-44810-3_18(379-410)Online publication date: 1-Nov-2024
  • (2023)The Expression of Hate Motive on Social Media: Perspective of the MalaysianProceedings of the 8th International Conference on Communication and Media 2022 (i-COME 22)10.2991/978-2-38476-098-5_3(17-30)Online publication date: 31-Aug-2023
  • (2023)Exploring Automatic Hate Speech Detection on Social Media: A Focus on Content-Based AnalysisSage Open10.1177/2158244023118131113:2Online publication date: 17-Jun-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media