Abstract
As the use of the Internet is increasing, people are connected virtually using social media platforms such as text messages, Facebook, Twitter, etc. This has led to increase in the spread of unsolicited messages known as spam which is used for marketing, collecting personal information, or just to offend the people. Therefore, it is crucial to have a strong spam detection architecture that could prevent these types of messages. Spam detection in noisy platform such as Twitter is still a problem due to short text and high variability in the language used in social media. In this paper, we propose a novel deep learning architecture based on Convolutional Neural Network (CNN) and Long Short Term Neural Network (LSTM). The model is supported by introducing the semantic information in representation of the words with the help of knowledge-bases such as WordNet and ConceptNet. Use of these knowledge-bases improves the performance by providing better semantic vector representation of testing words which earlier were having random value due to not seen in the training. Proposed Experimental results on two benchmark datasets show the effectiveness of the proposed approach with respect to the accuracy and F1-score.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal, B., Mittal, N.: Sentiment analysis using conceptnet ontology and context information. In: Prominent Feature Extraction for Sentiment Analysis. Springer. https://doi.org/10.1007/978-3-319-25343-5 https://doi.org/10.1007/978-3-319-25343-5 (2016)
Almeida, T.A., Yamakami, A., Almeida, J.: Evaluation of approaches for dimensionality reduction applied with naive bayes anti-spam filters. In: International Conference on Machine Learning and Applications, 2009. ICMLA’09, pp 517–522. IEEE (2009)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and development in information retrieval, pp 160–167. ACM (2000)
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1724–1734 (2014)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Cournane, A., Hunt, R.: An analysis of the tools used for the generation and prevention of spam. Comput. Secur. 23(2), 154–166 (2004)
DeBarr, D., Wechsler, H.: Spam detection using clustering, random forests, and active learning. In: Sixth Conference on Email and Anti-Spam. Mountain View (2009)
Devlin, J., Kamali, M., Subramanian, K., Prasad, R., Natarajan, P.: Statistical machine translation as a language model for handwriting recognition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 291–296. IEEE (2012)
Gao, Y., Mi, G., Tan, Y.: Variable length concentration based feature construction method for spam detection. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)
Grier, C., Thomas, K., Paxson, V.: Zhang, M.: spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)
Havasi, C., Speer, R., Alonso, J.: Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent Advances in Natural Language Processing (RANLP). John Benjamins Philadelphia, pp 27–29 (2007)
Healy, M., Delany, S.J., Zamolotskikh, A.: An assessment of case base reasoning for short text message classification. In: Proceedings of the 15th Irish Conference on Artificial Intelligence and Cognitive Sciences (AICS’04), pp. 9–18 (2004)
Jain, G., Sharma, M.: Social media: a review. In: Information Systems Design and Intelligent Applications, pp. 387–395. Springer (2016)
Jain, G., Sharma, M., Agarwal, B.: Optimizing semantic lstm for spam detection. Int. J. Inf. Technol. 1–12 (2018)
Jain, G., Sharma, M., Agarwal, B.: Spam detection on social media using semantic convolutional neural network. Int. J. Knowl. Disc. Bioinfo 8(1), 12–26 (2018)
Karami, A., Zhou, L.: Improving static sms spam detection by using new content-based features. In: Twentieth Americas Conference on Information Systems, Savannah, pp. 1–9 (2014)
Kim, C., Hwang, K.B.: Naive bayes classifier learning with feature selection for spam detection in social bookmarking. In: ECML PKDD Discovery Challenge, p 32 (2008)
Kim, J., Chung, K., Choi, K.: Spam filtering with dynamically updated url statistics. IEEE Secur. Priv. 5(4) (2007)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014)
Kolari, P., Finin, T., Joshi, A.: SVMS for the blogosphere: Blog identification and splog detection. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 92–99 (2006)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), vol. 333, pp. 2267–2273 (2015)
Lei, T., Barzilay, R., Jaakkola, T.: Molding cnns for text: non-linear, non-consecutive convolutions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1565–1575. Association for Computational Linguistics (2015)
Levine, J.R.: Experiences with greylisting. In: Second Conference on Email and Anti-Spam (CEAS), pp. 1–2 (2005)
Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3818–3824 (2016)
Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8), 2992–3000 (2013)
Mccord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: International Conference on Autonomic and Trusted Computing, pp. 175–186. Springer, Berlin (2011)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)
Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325. Association for Computational Linguistics (2015)
Sabri, A.T., Mohammads, A.H., Al-Shargabi, B., Hamdeh, M.A.: Developing new continuous learning approach for spam detection using artificial neural network. Eur. J. Sci. Res. 42(3), 525–535 (2010)
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584. IEEE (2015)
Silva, R.M., Almeida, T.A., Yamakami, A.: Artificial neural networks for content-based web spam detection. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), p. 1 (2012)
Socher, R., Bauer, J., Manning, C.D., Manning, C.D., Andrew, Y.N.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 455–465 (2013)
Stern, H.: A survey of modern spam tools. In: The Fifth Conference on Email and Anti-Spam (CEAS), pp. 1–10 (2008)
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC’10), pp. 1–9. ACM (2010)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1556–1566. Association for Computational Linguistics (2015)
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: 2011 IEEE Symposium on Security and Privacy (SP), pp. 447–462. IEEE (2011)
Tseng, C.Y., Chen, M.S.: Incremental SVM model for spam detection on dynamic email social networks. In: International Conference on Computational Science and Engineering, (CSE’09), vol. 4, pp. 128–135. IEEE (2009)
Wang, H.B., Yu, Y., Liu, Z.: SVM classifier incorporating feature selection using ga for spam detection. In: Embedded and Ubiquitous Computing–EUC, vol. 2005, pp 1147–1154 (2005)
Wu, F., Shu, J., Huang, Y., Yuan, Z.: Co-detecting social spammers and spam messages in microblogging via exploiting social contexts. Neurocomputing 201, 51–65 (2016)
Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW ’17, pp. 3:1–3:8. ACM, New York (2017)
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 3(4), 243–269 (2004)
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary pso with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, G., Sharma, M. & Agarwal, B. Spam detection in social media using convolutional and long short term memory neural network. Ann Math Artif Intell 85, 21–44 (2019). https://doi.org/10.1007/s10472-018-9612-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-018-9612-z