More Web Proxy on the site http://driver.im/

research-article

Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction

Authors:

Brian D. DavisonAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 15, Issue 3

Article No.: 40, Pages 1 - 19

https://doi.org/10.1145/3442199

Published: 21 April 2021 Publication History

Abstract

While scientific collaboration is critical for a scholar, some collaborators can be more significant than others, e.g., lifetime collaborators. It has been shown that lifetime collaborators are more influential on a scholar’s academic performance. However, little research has been done on investigating predicting such special relationships in academic networks. To this end, we propose Scholar2vec, a novel neural network embedding for representing scholar profiles. First, our approach creates scholars’ research interest vector from textual information, such as demographics, research, and influence. After bridging research interests with a collaboration network, vector representations of scholars can be gained with graph learning. Meanwhile, since scholars are occupied with various attributes, we propose to incorporate four types of scholar attributes for learning scholar vectors. Finally, the early-stage similarity sequence based on Scholar2vec is used to predict lifetime collaborators with machine learning methods. Extensive experiments on two real-world datasets show that Scholar2vec outperforms state-of-the-art methods in lifetime collaborator prediction. Our work presents a new way to measure the similarity between two scholars by vector representation, which tackles the knowledge between network embedding and academic relationship mining.

References

[1]

Yi Bu, Dakota S. Murray, Ying Ding, Yong Huang, and Yiming Zhao. 2018. Measuring the stability of scientific collaboration. Scientometrics 114, 2 (2018), 463–479.

Digital Library

[2]

Hung-Hsuan Chen, Liang Gou, Xiaolong Zhang, and Clyde Lee Giles. 2011. Collabseer: A search engine for collaboration discovery. In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries. ACM, 231–240.

Digital Library

[3]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785–794.

Digital Library

[4]

Mario Cocciaa and Lili Wangc. 2016. Evolution and convergence of the patterns of international scientific collaboration. Proceedings of the National Academy of Sciences 113, 18 (2016), E2547.

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.

[6]

Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.

Digital Library

[7]

Soumyajit Ganguly and Vikram Pudi. 2017. Paper2vec: Combining graph and text information for scientific paper representation. In Proceedings of the European Conference on Information Retrieval. Springer, 383–395.

[8]

Yoav Goldberg and Omer Levy. 2014. word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv:1402.3722. Retrieved from https://arxiv.org/abs/1402.3722.

[9]

Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems 151 (2018), 78–94.

[10]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864.

Digital Library

[11]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st Conference on Neural Information Processing Systems. 1024–1034.

[12]

Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 633–641.

[13]

Xiangjie Kong, Huizhen Jiang, Wei Wang, Teshome Megersa Bekele, Zhenzhen Xu, and Meng Wang. 2017. Exploring dynamic research interest and academic influence for scientific collaborator recommendation. Scientometrics 113, 1 (2017), 369–385.

Digital Library

[14]

Xiangjie Kong, Mengyi Mao, Wei Wang, Jiaying Liu, and Bo Xu. 2018. VOPRec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing (2018), 1–13.

[15]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.

[16]

Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Huan Liu. 2017. Attributed network embedding for learning in a dynamic environment. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management. ACM, 387–396.

Digital Library

[17]

Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. Textboxes: A fast text detector with a single deep neural network. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI, 4161–4167.

[18]

Jiaying Liu, Jing Ren, Wenqing Zheng, Lianhua Chi, Ivan Lee, and Feng Xia. 2020. Web of scholars: A scholar knowledge graph. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[19]

Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications 390, 6 (2011), 1150–1170.

[20]

Víctor Martínez, Fernando Berzal, and Juan-Carlos Cubero. 2017. A survey of link prediction in complex networks. ACM Computing Surveys 49, 4 (2017), 69.

Digital Library

[21]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations.

[22]

Henry Navarro, Giovanna Miritello, Arturo Canales, and Esteban Moro. 2017. Temporal patterns behind the strength of persistent ties. EPJ Data Science 6, 1 (2017), 31.

[23]

Mark E. J. Newman. 2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98, 2 (2001), 404–409.

[24]

Mark E. J. Newman. 2004. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences 101, suppl 1 (2004), 5200–5205.

[25]

Joshua O’Madadhain, Jon Hutchins, and Padhraic Smyth. 2005. Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explorations Newsletter 7, 2 (2005), 23–30.

Digital Library

[26]

Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. 2016. Tri-party deep network representation. Network 11, 9 (2016), 12.

[27]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 1 (2011), 2825–2830.

Digital Library

[28]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.

[29]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 701–710.

Digital Library

[30]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 (Long Papers). Association for Computational Linguistics, 2227–2237.

[31]

Alexander Michael Petersen. 2015. Quantifying the impact of weak, strong, and super ties in scientific careers. Proceedings of the National Academy of Sciences 112, 34 (2015), E4671–E4680.

[32]

Jari Saramäki, Mikko Kivelä, Jukka-Pekka Onnela, Kimmo Kaski, and Janos Kertesz. 2007. Generalizations of the clustering coefficient to weighted complex networks. Physical Review E 75, 2 (2007), 027105.

[33]

Ramesh R Sarukkai. 2000. Link prediction and path analysis using Markov chains. Computer Networks 33, 1–6 (2000), 377–386.

Digital Library

[34]

Roberta Sinatra, Dashun Wang, Pierre Deville, Chaoming Song, and Albert-László Barabási. 2016. Quantifying the evolution of individual scientific impact. Science 354, 6312 (2016), aaf5239.

[35]

Diane H. Sonnenwald. 2007. Scientific collaboration. Annual Review of Information Science and Technology 41, 1 (2007), 643–681.

[36]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067–1077.

Digital Library

[37]

Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. 2012. Cross-domain collaboration recommendation. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1285–1293.

Digital Library

[38]

Han Tian and Hankz Hankui Zhuo. 2017. Paper2vec: Citation-context based document distributed representation for scholar recommendation. arXiv:1703.06587. Retrieved from https://arxiv.org/abs/1703.06587.

[39]

Chun Hua Tsai and Yu Ru Lin. 2016. Tracing and predicting collaboration for junior scholars. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 375–380.

Digital Library

[40]

Wei Wang, Jing Ren, Mubarak Alrashoud, Feng Xia, Mengyi Mao, and Amr Tolba. 2020. Early-stage reciprocity in sustainable scientific collaboration. Journal of Informetrics 14, 3 (2020), 101041.

[41]

Wei Wang, Liangtian Wan, Xiangjie Kong, Zhiguo Gong, and Feng Xia. 2019. Not every couple is a pair: A supervised approach for lifetime collaborator identification. In Proceedings of the 2019 Pacific-Asia Conference on Information Systems. 1–8.

[42]

Wei Wang, Bo Xu, Jiaying Liu, Zixin Cui, Shuo Yu, Xiangjie Kong, and Feng Xia. 2019. CSTeller: Forecasting scientific collaboration sustainability based on extreme gradient boosting. World Wide Web 22, 6 (2019), 2749–2770.

[43]

Wei Wang, Shuo Yu, Teshome Megersa Bekele, Xiangjie Kong, and Feng Xia. 2017. Scientific collaboration patterns vary with scholars’ academic ages. Scientometrics 112, 1 (2017), 329–343.

Digital Library

[44]

Feng Xia, Zhen Chen, Wei Wang, Jing Li, and Laurence T. Yang. 2014. MVCWalker: Random walk-based most valuable collaborators recommendation exploiting academic factors. IEEE Transactions on Emerging Topics in Computing 2, 3 (2014), 364–375.

[45]

Feng Xia, Jiaying Liu, Hansong Nie, Yonghao Fu, Liangtian Wan, and Xiangjie Kong. 2020. Random walks: A review of algorithms and applications. IEEE Transactions on Emerging Topics in Computational Intelligence 4, 2 (2020), 95–107.

[46]

Feng Xia, Jiaying Liu, Jing Ren, Wei Wang, and Xiangjie Kong. 2020. Turing number: How far are you to A. M. turing award? ACM SIGWEB Newsletter (Nov. 2020), Article 5.

[47]

Feng Xia, Wei Wang, Teshome Megersa Bekele, and Huan Liu. 2017. Big scholarly data: A survey. IEEE Transactions on Big Data PP, 99 (2017), 1–19.

[48]

Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y. Chang. 2015. Network representation learning with rich text information. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2111–2117.

[49]

Dejian Yang, Senzhang Wang, Chaozhuo Li, Xiaoming Zhang, and Zhoujun Li. 2017. From properties to links: Deep network embedding on incomplete graphs. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management. ACM, 367–376.

Digital Library

[50]

Zaihan Yang, Dawei Yin, and Brian D. Davison. 2014. Recommendation in academia: A joint multi-relational model. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 566–571.

[51]

Chenwei Zhang, Yi Bu, Ying Ding, and Jian Xu. 2018. Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology 69, 1 (2018), 72–86.

Digital Library

[52]

Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2019. Attributed network embedding via subspace discovery. Data Mining and Knowledge Discovery 33, 6 (2019), 1953–1980.

Digital Library

[53]

Xing Zhou, Lixin Ding, Zhaokui Li, and Runze Wan. 2017. Collaborator recommendation in heterogeneous bibliographic networks using random walks. Information Retrieval Journal 20, 4 (2017), 317–337.

Digital Library

Cited By

Huang CChen K(2024)RefCit2vec: embedding models considering references and citations for measuring document similarityScientometrics10.1007/s11192-024-05067-3129:8(4669-4693)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s11192-024-05067-3
Yang CWang CZheng RGeng S(2023)Link prediction in research collaboration: a multi-network representation learning framework with joint trainingMultimedia Tools and Applications10.1007/s11042-023-15720-382:30(47215-47233)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s11042-023-15720-3
Chen XTang TRen JLee IChen HXia F(2021)Heterogeneous Graph Learning for Explainable Recommendation over Academic NetworksIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3498851.3498926(29-36)Online publication date: 14-Dec-2021
https://dl.acm.org/doi/10.1145/3498851.3498926

Index Terms

Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Attributed Collaboration Network Embedding for Academic Relationship Mining

Finding both efficient and effective quantitative representations for scholars in scientific digital libraries has been a focal point of research. The unprecedented amounts of scholarly datasets, combined with contemporary machine learning and big data ...
Venue Topic Model–enhanced Joint Graph Modelling for Citation Recommendation in Scholarly Big Data
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

Natural language processing technologies, such as topic models, have been proven to be effective for scholarly recommendation tasks with the ability to deal with content information. Recently, venue recommendation is becoming an increasingly important ...
Scientific collaboration patterns vary with scholars' academic ages

Scientists may encounter many collaborators of different academic ages throughout their careers. Thus, they are required to make essential decisions to commence or end a creative partnership. This process can be influenced by strategic motivations ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 3

June 2021

533 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3454120

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Accepted: 01 December 2020

Revised: 01 December 2020

Received: 01 September 2019

Published in TKDD Volume 15, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
365
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang CChen K(2024)RefCit2vec: embedding models considering references and citations for measuring document similarityScientometrics10.1007/s11192-024-05067-3129:8(4669-4693)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s11192-024-05067-3
Yang CWang CZheng RGeng S(2023)Link prediction in research collaboration: a multi-network representation learning framework with joint trainingMultimedia Tools and Applications10.1007/s11042-023-15720-382:30(47215-47233)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s11042-023-15720-3
Chen XTang TRen JLee IChen HXia F(2021)Heterogeneous Graph Learning for Explainable Recommendation over Academic NetworksIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3498851.3498926(29-36)Online publication date: 14-Dec-2021
https://dl.acm.org/doi/10.1145/3498851.3498926

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents