[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction

Published: 21 April 2021 Publication History

Abstract

While scientific collaboration is critical for a scholar, some collaborators can be more significant than others, e.g., lifetime collaborators. It has been shown that lifetime collaborators are more influential on a scholar’s academic performance. However, little research has been done on investigating predicting such special relationships in academic networks. To this end, we propose Scholar2vec, a novel neural network embedding for representing scholar profiles. First, our approach creates scholars’ research interest vector from textual information, such as demographics, research, and influence. After bridging research interests with a collaboration network, vector representations of scholars can be gained with graph learning. Meanwhile, since scholars are occupied with various attributes, we propose to incorporate four types of scholar attributes for learning scholar vectors. Finally, the early-stage similarity sequence based on Scholar2vec is used to predict lifetime collaborators with machine learning methods. Extensive experiments on two real-world datasets show that Scholar2vec outperforms state-of-the-art methods in lifetime collaborator prediction. Our work presents a new way to measure the similarity between two scholars by vector representation, which tackles the knowledge between network embedding and academic relationship mining.

References

[1]
Yi Bu, Dakota S. Murray, Ying Ding, Yong Huang, and Yiming Zhao. 2018. Measuring the stability of scientific collaboration. Scientometrics 114, 2 (2018), 463–479.
[2]
Hung-Hsuan Chen, Liang Gou, Xiaolong Zhang, and Clyde Lee Giles. 2011. Collabseer: A search engine for collaboration discovery. In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries. ACM, 231–240.
[3]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785–794.
[4]
Mario Cocciaa and Lili Wangc. 2016. Evolution and convergence of the patterns of international scientific collaboration. Proceedings of the National Academy of Sciences 113, 18 (2016), E2547.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.
[6]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.
[7]
Soumyajit Ganguly and Vikram Pudi. 2017. Paper2vec: Combining graph and text information for scientific paper representation. In Proceedings of the European Conference on Information Retrieval. Springer, 383–395.
[8]
Yoav Goldberg and Omer Levy. 2014. word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv:1402.3722. Retrieved from https://arxiv.org/abs/1402.3722.
[9]
Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems 151 (2018), 78–94.
[10]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864.
[11]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st Conference on Neural Information Processing Systems. 1024–1034.
[12]
Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 633–641.
[13]
Xiangjie Kong, Huizhen Jiang, Wei Wang, Teshome Megersa Bekele, Zhenzhen Xu, and Meng Wang. 2017. Exploring dynamic research interest and academic influence for scientific collaborator recommendation. Scientometrics 113, 1 (2017), 369–385.
[14]
Xiangjie Kong, Mengyi Mao, Wei Wang, Jiaying Liu, and Bo Xu. 2018. VOPRec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing (2018), 1–13.
[15]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.
[16]
Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Huan Liu. 2017. Attributed network embedding for learning in a dynamic environment. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management. ACM, 387–396.
[17]
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. Textboxes: A fast text detector with a single deep neural network. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI, 4161–4167.
[18]
Jiaying Liu, Jing Ren, Wenqing Zheng, Lianhua Chi, Ivan Lee, and Feng Xia. 2020. Web of scholars: A scholar knowledge graph. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.
[19]
Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications 390, 6 (2011), 1150–1170.
[20]
Víctor Martínez, Fernando Berzal, and Juan-Carlos Cubero. 2017. A survey of link prediction in complex networks. ACM Computing Surveys 49, 4 (2017), 69.
[21]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations.
[22]
Henry Navarro, Giovanna Miritello, Arturo Canales, and Esteban Moro. 2017. Temporal patterns behind the strength of persistent ties. EPJ Data Science 6, 1 (2017), 31.
[23]
Mark E. J. Newman. 2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98, 2 (2001), 404–409.
[24]
Mark E. J. Newman. 2004. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences 101, suppl 1 (2004), 5200–5205.
[25]
Joshua O’Madadhain, Jon Hutchins, and Padhraic Smyth. 2005. Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explorations Newsletter 7, 2 (2005), 23–30.
[26]
Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. 2016. Tri-party deep network representation. Network 11, 9 (2016), 12.
[27]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 1 (2011), 2825–2830.
[28]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.
[29]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 701–710.
[30]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 (Long Papers). Association for Computational Linguistics, 2227–2237.
[31]
Alexander Michael Petersen. 2015. Quantifying the impact of weak, strong, and super ties in scientific careers. Proceedings of the National Academy of Sciences 112, 34 (2015), E4671–E4680.
[32]
Jari Saramäki, Mikko Kivelä, Jukka-Pekka Onnela, Kimmo Kaski, and Janos Kertesz. 2007. Generalizations of the clustering coefficient to weighted complex networks. Physical Review E 75, 2 (2007), 027105.
[33]
Ramesh R Sarukkai. 2000. Link prediction and path analysis using Markov chains. Computer Networks 33, 1–6 (2000), 377–386.
[34]
Roberta Sinatra, Dashun Wang, Pierre Deville, Chaoming Song, and Albert-László Barabási. 2016. Quantifying the evolution of individual scientific impact. Science 354, 6312 (2016), aaf5239.
[35]
Diane H. Sonnenwald. 2007. Scientific collaboration. Annual Review of Information Science and Technology 41, 1 (2007), 643–681.
[36]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067–1077.
[37]
Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. 2012. Cross-domain collaboration recommendation. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1285–1293.
[38]
Han Tian and Hankz Hankui Zhuo. 2017. Paper2vec: Citation-context based document distributed representation for scholar recommendation. arXiv:1703.06587. Retrieved from https://arxiv.org/abs/1703.06587.
[39]
Chun Hua Tsai and Yu Ru Lin. 2016. Tracing and predicting collaboration for junior scholars. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 375–380.
[40]
Wei Wang, Jing Ren, Mubarak Alrashoud, Feng Xia, Mengyi Mao, and Amr Tolba. 2020. Early-stage reciprocity in sustainable scientific collaboration. Journal of Informetrics 14, 3 (2020), 101041.
[41]
Wei Wang, Liangtian Wan, Xiangjie Kong, Zhiguo Gong, and Feng Xia. 2019. Not every couple is a pair: A supervised approach for lifetime collaborator identification. In Proceedings of the 2019 Pacific-Asia Conference on Information Systems. 1–8.
[42]
Wei Wang, Bo Xu, Jiaying Liu, Zixin Cui, Shuo Yu, Xiangjie Kong, and Feng Xia. 2019. CSTeller: Forecasting scientific collaboration sustainability based on extreme gradient boosting. World Wide Web 22, 6 (2019), 2749–2770.
[43]
Wei Wang, Shuo Yu, Teshome Megersa Bekele, Xiangjie Kong, and Feng Xia. 2017. Scientific collaboration patterns vary with scholars’ academic ages. Scientometrics 112, 1 (2017), 329–343.
[44]
Feng Xia, Zhen Chen, Wei Wang, Jing Li, and Laurence T. Yang. 2014. MVCWalker: Random walk-based most valuable collaborators recommendation exploiting academic factors. IEEE Transactions on Emerging Topics in Computing 2, 3 (2014), 364–375.
[45]
Feng Xia, Jiaying Liu, Hansong Nie, Yonghao Fu, Liangtian Wan, and Xiangjie Kong. 2020. Random walks: A review of algorithms and applications. IEEE Transactions on Emerging Topics in Computational Intelligence 4, 2 (2020), 95–107.
[46]
Feng Xia, Jiaying Liu, Jing Ren, Wei Wang, and Xiangjie Kong. 2020. Turing number: How far are you to A. M. turing award? ACM SIGWEB Newsletter (Nov. 2020), Article 5.
[47]
Feng Xia, Wei Wang, Teshome Megersa Bekele, and Huan Liu. 2017. Big scholarly data: A survey. IEEE Transactions on Big Data PP, 99 (2017), 1–19.
[48]
Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y. Chang. 2015. Network representation learning with rich text information. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2111–2117.
[49]
Dejian Yang, Senzhang Wang, Chaozhuo Li, Xiaoming Zhang, and Zhoujun Li. 2017. From properties to links: Deep network embedding on incomplete graphs. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management. ACM, 367–376.
[50]
Zaihan Yang, Dawei Yin, and Brian D. Davison. 2014. Recommendation in academia: A joint multi-relational model. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 566–571.
[51]
Chenwei Zhang, Yi Bu, Ying Ding, and Jian Xu. 2018. Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology 69, 1 (2018), 72–86.
[52]
Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2019. Attributed network embedding via subspace discovery. Data Mining and Knowledge Discovery 33, 6 (2019), 1953–1980.
[53]
Xing Zhou, Lixin Ding, Zhaokui Li, and Runze Wan. 2017. Collaborator recommendation in heterogeneous bibliographic networks using random walks. Information Retrieval Journal 20, 4 (2017), 317–337.

Cited By

View all
  • (2024)RefCit2vec: embedding models considering references and citations for measuring document similarityScientometrics10.1007/s11192-024-05067-3129:8(4669-4693)Online publication date: 1-Aug-2024
  • (2023)Link prediction in research collaboration: a multi-network representation learning framework with joint trainingMultimedia Tools and Applications10.1007/s11042-023-15720-382:30(47215-47233)Online publication date: 1-Dec-2023
  • (2021)Heterogeneous Graph Learning for Explainable Recommendation over Academic NetworksIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3498851.3498926(29-36)Online publication date: 14-Dec-2021

Index Terms

  1. Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 3
      June 2021
      533 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3454120
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 April 2021
      Accepted: 01 December 2020
      Revised: 01 December 2020
      Received: 01 September 2019
      Published in TKDD Volume 15, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Network embedding
      2. academic information retrieval
      3. scientific collaboration
      4. graph learning

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)56
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)RefCit2vec: embedding models considering references and citations for measuring document similarityScientometrics10.1007/s11192-024-05067-3129:8(4669-4693)Online publication date: 1-Aug-2024
      • (2023)Link prediction in research collaboration: a multi-network representation learning framework with joint trainingMultimedia Tools and Applications10.1007/s11042-023-15720-382:30(47215-47233)Online publication date: 1-Dec-2023
      • (2021)Heterogeneous Graph Learning for Explainable Recommendation over Academic NetworksIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3498851.3498926(29-36)Online publication date: 14-Dec-2021

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media