Abstract
Traditional methods treat the search problem as a process of selecting and ranking sequential documents. The methods have been proved effective and are widely used in the web search domain. However, due to the complexity and particularity of microblog text contents, the classical methods are rarely used microblog searches for specific topics. Focusing on the issue of searching for specific topics in microblog content, we present a microblog search method for security topics based on deep reinforcement learning by modeling the microblog search for specific topics as a continuous-state Markov decision process. We also design a novel deep Q network to evaluate the relevance of microblog content based on the target topic. We adopt reinforcement learning to solve the microblog search problem using an intelligent strategy and evaluate content relevance through deep learning. Experiments conducted on a real-world dataset show that our approach outperforms the selected baseline methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal, M.K., Bansal, D., Garg, M., et al.: Keyword search on microblog data streams: finding contextual messages in real time[C]. In: Proceedings of 19th International Conference on Extending Database Technology (EDBT), pp. 15–18 (2016)
Asadi, N., Lin, J.: Fast candidate generation for real-time tweet search with bloom filter chains[J]. ACM Transactions on Information Systems (TOIS). 31(3), 13 (2013)
Basu, M., Roy, A., Ghosh, K., et al.: A novel word embedding based stemming approach for microblog retrieval during disasters[C]. In: European Conference on Information Retrieval, pp. 589–597. Springer, Cham (2017)
Basu, M., Roy, A., Ghosh, K., et al.: Microblog retrieval in a disaster situation: a new test collection for evaluation[C]. SMERP@ ECIR. 22–31 (2017)
Borisov, A., Markov, I., de Rijke, M., et al.: A neural click model for web search[C]. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 531–541 (2016)
Burges, C., Shaked, T., Renshaw, E., et al.: Learning to rank using gradient descent[C]. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ACM (2005)
Busch, M., Gade, K., Larson, B., et al.: Earlybird: real-time search at twitter[C]//data engineering (ICDE), 2012 IEEE 28th international conference on. IEEE. 1360–1369 (2012)
Calderone, D., Sastry, S.S.: Markov decision process routing games[C]//Proceedings of the 8th International Conference on Cyber-Physical Systems. ACM. 273–279 (2017)
Cao, Z., Qin, T., Liu, T.Y., et al.: Learning to rank: from pairwise approach to listwise approach[C]. In: Proceedings of the 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)
Chen, C.C., Wang, S.D.: An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm[J]. ACM Transactions on Architecture and Code Optimization (TACO). 10(4), 25 (2013)
Chen, C., Li, F., Ooi, B.C., et al.: Ti: an efficient indexing mechanism for real-time search on tweets[C]. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 649–660. ACM (2011)
Chen, Q., Hu, Q., Huang, J., et al.: TAKer: fine-grained time-aware microblog search with kernel density estimation[J]. IEEE Trans. Knowl. Data Eng. 30(8), 1602–1615 (2018)
De Maio, C., Fenza, G., Gallo, M., et al.: Time-aware adaptive tweets ranking through deep learning[J]. Futur. Gener. Comput. Syst. (2017)
Dolotta, T.A.: Data Processing in 1980–1985[M]. Wiley (1976)
Dzida, W., Herda, S., Itzfeldt, W.D.: User-perceived quality of interactive systems[J]. IEEE Trans. Softw. Eng. SE-4(4), 270–276 (1978)
Feng, S., Song, K., Wang, D., et al.: A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs[J]. World Wide Web Internet and Web Information Systems. 18(4), 949–967 (2015)
Feng, S., Wang, Y., Liu, L., et al.: Attention based hierarchical LSTM network for context-aware microblog sentiment classification[J]. World Wide Web Internet and Web Information Systems. 2018, 1–23
Graves A. Generating Sequences with Recurrent Neural Networks[J]. arXiv preprint arXiv:1308.0850, 2013
Guo, J., Fan, Y., Ai, Q., et al.: A deep relevance matching model for ad-hoc retrieval[C]. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)
Hasanain, M., Elsayed, T.: Query performance prediction for microblog search[J]. Inf. Process. Manag. 53(6), 1320–1341 (2017)
Herranz, L., Jiang, S., Li, X.: Scene recognition with CNNs: objects, scales and dataset bias[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 571–579 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory[J]. Neural Comput. 9(8), 1735–1780 (1997)
Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data[C]. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp. 2333–2338. ACM (2013)
Huang, J., Peng, M., Wang, H., et al.: A probabilistic method for emerging topic tracking in microblog stream[J]. World Wide Web Internet and Web Information Systems. 20(2), 325–350 (2017)
Keyhanipour, A.H., Moshiri, B., Rahgozar, M., Oroumchian, F., Ansari, A.A.: Integration of data fusion and reinforcement learning techniques for the rank-aggregation problem[J]. Int. J. Mach. Learn. Cybern. 7(6), 1131–1145 (2016)
Keyhanipour, A.H., Keyhanipour, A.H., Moshiri, B., et al.: Learning to rank with click-through features in a reinforcement learning framework[J]. International Journal of Web Information Systems. 12(4), 448–476 (2016)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Proceedings of the International Joint Conferences on Artificial Intelligence. 14(2), 1137–1145 (1995)
Kou, F., Du, J., He, Y., Ye, L.: Social network search based on semantic analysis and learning[J]. CAAI Transactions on Intelligence Technology. 1(4), 293–302 (2016)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning[J]. Nature. 521(7553), 436–444 (2015)
Liu, X., Gao, J., He, X., et al.: Representation learning using multi-task deep neural networks for semantic classification and information retrieval[C]. HLT-NAACL. 912–921 (2015)
Luo, J., Zhang, S., Yang, H.: Win-win search: dual-agent stochastic game in session search[C]. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 587–596. ACM (2014)
Mao, J., Liu, Y., Luan, H., et al.: Understanding and predicting usefulness judgment in web search[C]. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1169–1172. ACM (2017)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient Estimation of Word Representations in Vector Space[J]. arXiv preprint arXiv:1301, vol. 3781, (2013)
Miranda, F., Lins, L., Klosowski, J.T., Silva, C.T.: TOPKUBE: a rank-aware data cube for real-time exploration of spatiotemporal data[J]. IEEE Trans. Vis. Comput. Graph. 24(3), 1394–1407 (2018)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning[J]. Nature. 518(7540), 529–533 (2015)
Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning[C]. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Nguyen, D.T., Jung, J.E.: Real-time event detection for online behavioral analysis of big social data[J]. Futur. Gener. Comput. Syst. 66, 137–145 (2017)
Olteanu, A., Castillo, C., Diaz, F., et al.: CrisisLex: a lexicon for collecting and filtering microblogged communications in crises[C]. In: Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, pp. 376–386 (2014)
Puterman, M.L.: Markov decision processes[J]. Handbooks in Operations Research and Management Science. 2, 331–434 (1990)
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond[J]. Foundations and Trends® in Information Retrieval. 3(4), 333–389 (2009)
Rodriguez Perez, J.A.: Microblog Retrieval Challenges and Opportunities[D]. University of Glasgow (2018)
Schütze, H.: Introduction to information retrieval[C]. Proceedings of the International Communication of Association for Computing Machinery Conference. (2008)
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks[C]. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382. ACM (2015)
Shen, Y., He, X., Gao, J., et al.: A latent semantic model with convolutional-pooling structure for information retrieval[C]. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 101–110. ACM (2014)
Shen, Y., He, X., Gao, J., et al.: Learning semantic representations using convolutional neural networks for web search[C]. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 373–374. ACM (2014)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search[J]. Nature. 529(7587), 484–489 (2016)
Singla, R., Modha, S., Majumder, P., et al.: Information extraction from microblog for disaster related event[C]//SMERP@ ECIR. 85–92 (2017)
Song, X., Jiang, S., Herranz, L.: Multi-scale multi-feature context modeling for scene recognition in the semantic manifold[J]. IEEE Trans. Image Process. 26(6), 2721–2735 (2017)
Song Z, Zhang L, Liu T, et al. Ranking learning algorithm of information retrieval based on WeChat public numbers[C]//Proceedings of the 6th International Conference on Information Engineering. ACM, 2017: 4
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction (2nd ed)[M]. Cambridge. MIT press. (2016)
Wang, S., Huang, S., Liu, T.Y., et al.: Ranking-oriented collaborative filtering: a listwise approach. [J]. ACM Transactions on Information Systems (TOIS). 35(2), 10 (2016)
Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval[C]. In: Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 559–568 (2017)
Wei, Q., Lewis, F.L., Sun, Q., Yan, P., Song, R.: Discrete-time deterministic Q-learning: a novel convergence analysis[J]. IEEE Transactions on Cybernetics. 47(5), 1224–1237 (2017)
Wei, Z., Xu, J., Lan, Y., et al.: Reinforcement learning to rank with Markov decision process[C]. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 945–948. ACM (2017)
Xia, L., Xu, J., Lan, Y., et al.: Adapting markov decision process for search result diversification[C]. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544. ACM (2017)
Xia, F., Yu, C., Xu, L., et al.: Top-k temporal keyword search over social media data[J]. World Wide Web Internet and Web Information Systems. 20(5), 1049–1069 (2017)
Xingjian, S.H.I., Chen, Z., Wang, H., et al.: Convolutional LSTM network: a machine learning approach for precipitation now casting[C]. Adv. Neural Inf. Proces. Syst. 802–810 (2015)
Xu, J., Xia, L., Lan, Y., et al.: Directly optimize diversity evaluation measures: a new approach to search result diversification[J]. ACM Transactions on Intelligent Systems and Technology (TIST). 8(3), 41 (2017)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval[C]//ACM SIGIR Forum. ACM. 51(2), 268–276 (2017)
Zhang, X., He, B., Luo, T., et al.: Query-biased learning to rank for real-time twitter search[C]. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1915–1919. ACM (2012)
Zhang, D., Nie, L., Luan, H., et al.: Compact indexing and judicious searching for billion-scale microblog retrieval[J]. ACM Transactions on Information Systems (TOIS). 35(3), 27 (2017)
Zhang, R., Jin, Z., Liu, X.: A study on the analysis model of the ranking of the theme of Weibo[J]. Int. J. Pattern Recognit. Artif. Intell. 32(03), 1851003 (2018)
Zheng, N., Jin, M., Hong, H., Huang, L., Gu, Z., Li, H.: Real-time and precise insect flight control system based on virtual reality[J]. Electron. Lett. 53(6), 387–389 (2017)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant (No.61772083, No.61532006, No. 61877006, No. 61802028), in part by the Fundamental Research Funds for the Central University (No.2018RC44), in part by the Director Foundation of Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia (No.ITSM20180102).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, N., Du, J., Yao, X. et al. A content search method for security topics in microblog based on deep reinforcement learning. World Wide Web 23, 75–101 (2020). https://doi.org/10.1007/s11280-019-00697-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00697-7