Abstract
Collaborative filtering (CF) is a prevailing technique utilized for recommendation systems and has been comprehensively explored to tackle the problem of information overload particularly in the Big Data context. The traditional CF algorithms are capable to perform adequately under various circumstances, nevertheless, there exist some shortcomings involving cold start and data sparsity. Moreover, a potential breakthrough rests in taking full advantage of any valuable semantic information contained in items. Therefore, for alleviating these defects, in this paper, we propose a two-stage collaborative filtering approach driven by Simhash-based semantic feature analysis, of which the first stage is Simhash-based semantic feature extraction for items and categories, and the second stage is reinforced CF rating prediction driven by intensely compressed category features. The rich semantic features of vast items and their categories can be rapidly extracted and compressed in the first stage by employing the Simhash, with being utilized to promote the traditional collaborative filtering processes. Besides, to solve the problems pertaining to the Big Data context, we design a parallel algorithm on Spark to accelerate the time-consuming process of semantic feature extraction for vast items. Finally, we conduct comprehensive experiments to validate the reinforced CF approach by adopting practical datasets, and the results reveal that compared with the traditional CF algorithms it can accomplish a promising performance.
Similar content being viewed by others
References
Khan, M. M., Ibrahim, R., & Ghani, I. (2017). Cross domain recommender systems: A systematic literature review. ACM Computing Surveys, 50(3), 1–34.
Yang, Z., Bing, W., Zheng, K., Wang, X., & Lei, L. (2017). A survey of collaborative filtering-based recommender systems for mobile internet applications. IEEE Access, 4, 3273–3287.
Huang, H., Yin, H., Min, G., Zhang, J., Yulei, W., & Zhang, X. (2018). Energy-aware dual-path geographic routing to bypass routing holes in wireless sensor networks. IEEE Transactions on Mobile Computing, 17(6), 1339–1352.
Min, G., Yulei, W., & Al-Dubai, A. Y. (2012). Performance modelling and analysis of cognitive mesh networks. IEEE Transactions on Communications, 60(6), 1471–1478.
Zhao, F., Yan, F., Jin, H., Yang, L. T., & Chen, Y. (2017). Personalized mobile searching approach based on combining content-based filtering and collaborative filtering. IEEE Systems Journal, 11(1), 324–332.
Elahi, M., Ricci, F., & Rubens, N. (2016). A survey of active learning in collaborative filtering recommender systems. Computer Science Review, 20, 29–50.
Huang, H., Yin, H., Min, G., Jiang, H., Zhang, J., & Yulei, W. (2017). Data-driven information plane in software-defined networking. IEEE Communications Magazine, 55(6), 218–224.
Erdt, M., Fernandez, A., & Rensing, C. (2015). Evaluating recommender systems for technology enhanced learning: A quantitative survey. IEEE Transactions on Learning Technologies, 8(4), 326–344.
Lai, C., Giuliani, A., & Semeraro, G. (2017). Information filtering and retrieval. Berlin: Springer.
Yao, L., Sheng, Q. Z., Ngu, A. H. H., Yu, J., & Segev, A. (2015). Unified collaborative and content-based web service recommendation. IEEE Transactions on Services Computing, 8(3), 453–466.
Hong, T.-P., Lin, C.-W., Yang, K.-T., & Wang, S.-L. (2013). Using tf-idf to hide sensitive itemsets. Applied Intelligence, 38(4), 502–510.
Hazimeh, H., & Zhai, C. (2015). Axiomatic analysis of smoothing methods in language models for pseudo-relevance feedback. In Proceedings of the 2015 international conference on the theory of information retrieval (pp. 141–150). ACM.
Mooney, R. J., & Roy, L. (2000). Content-based book recommending using learning for text categorization. In Proceedings of the fifth ACM conference on Digital libraries (pp. 195–204). ACM.
Yang, X., Guo, Y., & Liu, Y. (2013). Bayesian-inference-based recommendation in online social networks. IEEE Transactions on Parallel and Distributed Systems, 24(4), 642–651.
Yang, X., Guo, Y., Liu, Y., & Steck, H. (2014). A survey of collaborative filtering based social recommender systems. Computer Communications, 41, 1–10.
Sahoo, N., Singh, P. V., & Mukhopadhyay, T. (2012). A hidden markov model for collaborative filtering. Management Information Systems Quarterly, 36, 1329–1356.
Wang, J., De Vries, A. P., & Reinders, M. J. T. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 501–508). ACM.
Gu, L., Yang, P., & Dong, Y. (2014). An dynamic-weighted collaborative filtering approach to address sparsity and adaptivity issues. In 2014 IEEE Congress on Evolutionary Computation (CEC) (pp. 3044–3050). IEEE.
Wei, J., He, J., Chen, K., Zhou, Y., & Tang, Z. (2017). Collaborative filtering and deep learning based recommendation system for cold start items. Expert Systems with Applications, 69, 29–39.
Lian, J., Zhang, F., Xie, X., & Sun, G. (2017). Cccfnet: A content-boosted collaborative filtering neural network for cross domain recommender systems. In Proceedings of the 26th international conference on World Wide Web companion (pp. 817–818). International World Wide Web Conferences Steering Committee.
Gu, L., Yang, P., & Dong, Y. (2015). SHDC: A fast documents classification method based on Simhash. In International conference on algorithms and architectures for parallel processing (pp. 198–212). Cham: Springer.
Hong, T. P., Lin, C. W., Yang, K. T., & Wang, S. L. (2013). Using tf-idf to hide sensitive itemsets. Applied Intelligence, 38(4), 502–510.
Kulis, B., Jain, P., & Grauman, K. (2009). Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2143–2157.
Costa, G., Manco, G., & Ortale, R. (2010). An incremental clustering scheme for data de-duplication. Data Mining and Knowledge Discovery, 20(1), 152–187.
Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In Thiry-fourth ACM symposium on theory of computing (pp. 380–388).
Manku, G. S., Jain, A., & Sarma, A. D. (2007). Detecting near-duplicates for web crawling. In International conference on World Wide Web (pp. 141–150).
Yulei, W., Min, G., Li, K., & Javadi, B. (2012). Modeling and analysis of communication networks in multicluster systems under spatio-temporal bursty traffic. IEEE Transactions on Parallel and Distributed Systems, 23(5), 902–912.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. In Usenix conference on hot topics in cloud computing (p. 10).
Har-Peled, S., Indyk, P., & Motwani, R. (2012). Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing, 8(1), 321–350.
Zhang, W., Yoshida, T., & Tang, X. (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems, 21(8), 879–886.
Jia, W., Pan, S., Zhu, X., Cai, Z., Zhang, P., & Zhang, C. (2015). Self-adaptive attribute weighting for naive bayes classification. Expert Systems with Applications, 42(3), 1487–1502.
Yang, P., Li, Y., Lv, R., Wu, G., Zhou, Y., et al. (2017). Uniform content label format specification. National standard of People’s Republic of China (GB/T 35304-2017).
Acknowledgements
We would like to thank Professor Youping Li, the director of Future Network Research Center of SEU, for his enlightening suggestions for enhancing the traditional collaborative filtering via Simhash-based category features. This work is supported by the National Natural Science Foundation of China under Grants No. 61472080, No. 61672155, the Academician Consulting Project of Chinese Academy of Engineering under Grant No. 2018-XY-07, and the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, P., Gu, L. & Liu, X. Collaborative filtering driven by fast semantic feature analysis on Spark. Wireless Netw 28, 1321–1334 (2022). https://doi.org/10.1007/s11276-018-01901-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11276-018-01901-8