[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Toxic Comment Classification Service in Social Network

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

Abstract

The article discusses the development of an online tool for moderating the content of social network groups. The use of classification using machine learning methods is proposed as the main element of the system. The creation of the feature set of messages is assumed by extracting the content features of the text, as well as the use of word embeddings vectors. The authors conducted a series of experiments to find the best combination of vector representation, content features and classification method. Tests on a dataset of 11 thousand messages in Russian showed the result of 87% accuracy. The architecture of the group moderator’s web application with the ability to automatically apply classification results to control users and display posts is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 87.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolutional neural networks for toxic comment classification. arXiv preprint arXiv:1802.09957 (2018)

  2. Medialogiya–monitoring and analysis of media and social networks (rus.). https://www.mlg.ru

  3. Corazza, M., Menini, S., Cabrio, E., Tonelli, S., Villata, S.: A multilingual evaluation for online hate speech detection. ACM Trans. Internet Technol. Assoc. Comput. Mach. 20(2), 1–22 (2020). https://doi.org/10.1145/3377323.hal-02972184

    Article  Google Scholar 

  4. Russian Language Toxic Comments. https://www.kaggle.com/blackmoon/russian-language-toxic-comments

  5. “Toxicology” project: vk_comments_DS. https://github.com/mihatronych/files/blob/main/ds_of_toxic_messages_from_vk/our_toxic_vk_comments_data.csv

  6. Shekhar, R., Pranjić, M., Pollak, S., Pelicon, A., Purver, M.: Automating news comment moderation with limited resources: benchmarking in croatian and estonian. J. Lang. Technol. Comput. Linguist. 34, 49–79 (2020)

    Google Scholar 

  7. Pavlopoulos, J., Malakasiotis, P., Androutsopoulos, I.: Deeper attention to abusive user content moderation. In: EMNLP, pp. 1125–1135. Copenghagen, Denmark (2017)

    Google Scholar 

  8. Levonevskiy, D., Malov, D., Vatamaniuk, I.: Estimating aggressiveness of russian texts by means of machine learning. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 270–279. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_28

    Chapter  Google Scholar 

  9. Lee, J.-T., Yang, M.-C., Rim, H.-C.: Discovering high-quality threaded discussions in online forums. J. Comput. Sci. Technol. 29(3), 519–531 (2014)

    Article  Google Scholar 

  10. Plaza-del Arco, F.M., Molina-Gonzalez, D., Martın-Valdivia, T., Urena-Lopez, A.: SINAI at SemEval-2019 Task 6: incorporating lexicon knowledge into SVM learning to identify and categorize offensive language in social media. In: The 13th International Workshop on Semantic Evaluation (SemEval) (2019)

    Google Scholar 

  11. Chernyaev, A., Spryiskov, A., Ivashko, A., Bidulya, Y.: A rumor detection in Russian tweets. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS (LNAI), vol. 12335, pp. 108–118. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60276-5_11

    Chapter  Google Scholar 

  12. Pavlopoulos, J., Thain, N., Dixon, L., Androutsopoulos, I.: ConvAI at SemEval-2019 Task 6: offensive language identification and categorization with perspective and BERT. In: SemEval, Minneapolis, USA (2019)

    Google Scholar 

  13. Pietro, M.D.: Text Classification with NLP: tf-idf vs Word2Vec vs BERT. https://towardsdatascience.com/text-classification-with-nlp-tf-idf-vs-word2vec-vs-bert-41ff868d1794

  14. Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: a survey on vector representations of meaning. arXiv:1805.04032. Bibcode:2018arXiv180504032C (2018)

  15. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp. 88–93 (2016)

    Google Scholar 

  16. NLTK documentation. https://www.nltk.org

  17. Morphological analyzer pymorphy2. https://pymorphy2.readthedocs.io

  18. Document-term matrix. https://en.wikipedia.org/wiki/Document-term_matrix

  19. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830. JMLR (2011)

    Google Scholar 

  20. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. Valletta, Malta, May. ELRA (2010). http://is.muni.cz/publication/884893/en

  21. Gensim: Doc2vec. https://radimrehurek.com/gensim/models/doc2vec.html

  22. Mestre, M.: FastText: stepping through the code. https://medium.com/@mariamestre/fasttext-stepping-through-the-code-259996d6ebc4

  23. Dostoevsky: Sentiment Analysis Library for Russian Language. https://pypi.org/project/dostoevsky

  24. SpaCy: Industrial-Strength Natural Language Processing. https://spacy.io

  25. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification, Department of Computer Science, Stanford University, Stanford 94305. https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf

  26. Wang, Z.: NBSVM. https://www.kaggle.com/ziliwang/nbsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dolgushin, M., Ismakova, D., Bidulya, Y., Krupkin, I., Barskaya, G., Lesiv, A. (2021). Toxic Comment Classification Service in Social Network. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics