Abstract
Citation recommendation is the task of suggesting a list of references for an author given a manuscript. This is important for academic research for it provides an efficient and easy way to find relevant literatures. In this paper, we propose a novel probabilistic topic model to automatically recommend citations for researchers. The model considers not only text content similarity between papers but also community relevance among authors for effective citation recommendation. To fully utilize content and diversified link information in a bibliographic network, we extend LDA with matrix factorization, so that semantic topic learning and community detection are essentially reinforcing each other during parameter estimation. We also develop a flexible way to generate a family of citation link probability functions, which can substantially increase the model capacity. Experimental results on the ANN and DBLP dataset show that our model outperforms baseline algorithms for citation recommendation, and is capable of generating qualified author communities and topics.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Barceló G, Cendejas E, Sidorov G et al (2009) Formal grammar for hispanic named entities analysis. International conference on intelligent text processing and computational linguistics. Springer Berlin Heidelberg, pp 183–194
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Cai D, He X, Wu X et al (2008) Non-negative matrix factorization on manifold. In: Eighth IEEE international conference on data mining, pp 63–72
Chang J, Blei DM (2009) Relational topic models for document networks. In: AIStats, pp 81–88
Chen J, Saad Y (2012) Dense subgraph extraction with application to community detection. IEEE Trans Knowl Data Eng 24:1216–1230
Cohen W and Sarawagi S (2004) Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 89–98
Cohn D, Hofmann T (2001) The missing link-a probabilistic model of document content and hypertext connectivity. Adv Neural Inf Process Syst 430–436
Cota RG, Ferreira AA, Nascimento C et al (2010) An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J Am Soc Inf Sci Technol 61(9):1853–1870
Erosheva E, Fienberg S, Lafferty J (2004) Mixed-membership models of scientific publications. Proc Natl Acad Sci 101:5220–5227
Etzioni O, Cafarella M, Downey D et al (2005) Unsupervised named-entity extraction from the web: an experimental study. Artif intell 165(1):91–134
Gori M, Pucci (2006) A Research paper recommender systems: a random-walk based approach. In: IEEE/WIC/ACM international conference on web intelligence, pp 778–781
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235
He Q, Pei J, Kifer D et al (2010) Context-aware citation recommendation. In: Proceedings of the 19th international conference on World Wide Web, pp 421–430
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196
Kataria S (2012) Topic models for link prediction in document networks. The Pennsylvania State University, University Park
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web, pp 631–640
Li C, Cheung WK, Ye Y et al (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44:359–383
Lin YR, Sun J, Sundaram H et al (2011) Community discovery via metagraph factorization. ACM Trans Knowl Discov Data 5:17
Liu X, Bollen J, Nelson ML et al (2005) Co-authorship networks in the digital library research community. Inf Process Manage 41:1462–1480
Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th annual international conference on machine learning, pp 665–672
Mcauliffe JD, Blei DM (2008) Supervised topic models. Adv Neural Inf Process Syst pp 121–128.
McNee SM, Albert I, Cosley D et al (2002) On the recommendation of citations for research papers. In: Proceeding of ACM conference on computer supported cooperative work, pp 116–125
Mei Q, Cai D, Zhang D et al (2008) Topic modeling with network regularization. In: Proceedings of the 17th international conference on world wide web, pp 101–110
Meng F, Gao D, Li W et al (2013) A unified graph model for personalized query-oriented reference paper recommendation. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 1509–1512
Mimno D, Wallach HM, McCallum A (2007) Community-based link prediction with text. In: Workshop on statistical models of networks, the 21st annual conference on neural information processing systems
Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence, pp 352–359
Nallapati RM, Ahmed A, Xing EP et al (2008) Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 542–550
Nie Z, Zhang Y, Wen J-R et al (2005) Object-level ranking: bringing order to web objects. In: Proceedings of the 14th international conference on world wide web, pp 567–574
Purushotham S, Liu Y, Kuo C-CJ (2012) Collaborative topic regression with social matrix factorization for recommendation systems. In: Proceedings of the 29th annual international conference on machine learning, pp 325–341
Radev DR, Muthukrishnan P, Qazvinian V et al (2013) The ACL anthology network corpus. Lang Resour Eval 47:919–944
Ren X, Liu J, Yu X et al (2014) Cluscite: Effective citation recommendation by information network-based clustering. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 821–830
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, pp 232–241
Rosen-Zvi M, Griffiths T, Steyvers M et al (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494
Sugiyama K, Kan MY (2010) Scholarly paper recommendation via user’s recent research interests. In: Proceedings of the 10th annual joint conference on digital libraries, pp 29–38
Tang J, Zhang J, Yao L et al (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 990–998
Tang X, Wan X, Zhang X (2014) Cross-language context-aware citation recommendation in scientific articles. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, pp 817–826
Thompson P and Dozier C (1997) Name searching and information retrieval. In: Proceedings of 2nd Conference on empirical methods in natural language processing EMNLP, pp 134–140
Torres R, McNee SM, Abel M et al (2004) Enhancing digital libraries with TechLens+. In: Proceedings of the 4th ACM/IEEE-CS joint conference on digital libraries, pp 228–236
Von Luxburg U (2007) A tutorial on spectral clustering. Stat comput 17:395–416
Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 448–456
Wang H, Li W (2015) Relational collaborative topic regression for recommender systems. IEEE Trans Knowl Data Eng 27:1343–1355
Wang F, Li T, Wang X et al (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Disc 22:493–521
Wang Z, Wang W, Xue G et al (2015) Semi-supervised community detection framework based on non-negative factorization using individual labels. In: International conference in swarm intelligence, pp 349–359
Yang Z, Hong L, Davison BD (2013) Academic network analysis: a joint topic modeling approach. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and Mining, pp 324–333
Yang L, Cao X, Jin D et al (2015) A unified semi-supervised community detection framework using latent space graph regularization. IEEE trans cybern 45:2585–2598
Zhang ZY (2013) Community structure detection in complex networks with partial background information. EPL 101(4):48005
Zhang H, Qiu B, Giles CL et al (2007) An LDA-based community structure discovery approach for large-scale social networks. Intell Secur Inf 200–207
Zhang ZY, Sun KD, Wang SQ (2013) Enhanced community structure detection in complex networks with partial background information. Sci Rep 3(11):3241
Zhang X, Guan N, Zhang W et al (2015) Symmetric non-negative matrix factorization based link partition method for overlapping community detection. In: IEEE International conference on systems, man, and cybernetics, pp 2198–2203
Acknowledgements
The work described in this paper was partially support by National Natural Science Foundation of China (Project No. 61373046) and Natural Science Basic Research Plan in Shaanxi Province of China (Project No. S2015YFJM2129).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dai, T., Zhu, L., Cai, X. et al. Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network. J Ambient Intell Human Comput 9, 957–975 (2018). https://doi.org/10.1007/s12652-017-0497-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-017-0497-1