[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3178876.3186152acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Public Access

Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification

Published: 23 April 2018 Publication History

Abstract

In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing de-anonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning node embeddings in academic network to infer the correlation of paper-author pair. However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose Camel, a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. Extensive experiments demonstrate that Camel outperforms the state-of-the-art baselines. It achieves an average improvement of 6.3% over the best baseline method.

References

[1]
Ting Chen and Yizhou Sun. 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification. In WSDM. 295--304.
[2]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014).
[3]
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks KDD. 135--144.
[4]
Yuxiao Dong, Reid A Johnson, and Nitesh V Chawla. 2015. Will this paper increase your h-index?: Scientific impact prediction WSDM. 149--158.
[5]
Dmitry Efimov, Lucas Silva, and Benjamin Solecki. 2013. Kdd cup 2013-author-paper identification challenge: second place team KDD Cup Workshop.
[6]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD. 855--864.
[7]
Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, and Jiawei Han. 2016. Large-scale embedding learning in heterogeneous event data ICDM. 907--912.
[8]
Qi He, Jian Pei, Daniel Kifer, Prasenjit Mitra, and Lee Giles. 2010. Context-aware citation recommendation. In WWW. 421--430.
[9]
Shawndra Hill and Foster Provost. 2003. The myth of the double-blind review?: author identification using only citations. Acm Sigkdd Explorations Newsletter Vol. 5, 2 (2003), 179--184.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780.
[11]
Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative metric learning. In WWW. 193--201.
[12]
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks KDD. 1595--1604.
[13]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[14]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents ICML. 1188--1196.
[15]
Chun-Liang Li, Yu-Chuan Su, Ting-Wei Lin, Cheng-Hao Tsai, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, et almbox. 2015. Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013. JMLR Vol. 16, 1 (2015), 2921--2947.
[16]
Xiang Liu, Torsten Suel, and Nasir Memon. 2014 a. A robust model for paper reviewer assignment. In RecSys. 25--32.
[17]
Xiaozhong Liu, Yingying Yu, Chun Guo, and Yizhou Sun. 2014 b. Meta-path-based ranking with pseudo relevance feedback on heterogeneous graph for citation recommendation. In CIKM. 121--130.
[18]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality NIPS. 3111--3119.
[19]
Mathias Payer, Ling Huang, Neil Zhenqiang Gong, Kevin Borgolte, and Mario Frank. 2015. What you submit is who you are: A multimodal approach for deanonymizing scientific publications. TIFS Vol. 10, 1 (2015), 200--212.
[20]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations KDD. 701--710.
[21]
Xiang Ren, Jialu Liu, Xiao Yu, Urvashi Khandelwal, Quanquan Gu, Lidan Wang, and Jiawei Han. 2014. Cluscite: Effective citation recommendation by information network-based clustering KDD. 821--830.
[22]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback UAI. 452--461.
[23]
Hua-Wei Shen, Dashun Wang, Chaoming Song, and Albert-László Barabási. 2014. Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes. AAAI, Vol. Vol. 14. 291--297.
[24]
Roberta Sinatra, Dashun Wang, Pierre Deville, Chaoming Song, and Albert-László Barabási. 2016. Quantifying the evolution of individual scientific impact. Science Vol. 354, 6312 (2016), aaf5239.
[25]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB Vol. 4, 11 (2011), 992--1003.
[26]
Yizhou Sun, Brandon Norick, Jaiwei Han, Xifeng Yan, Philip Yu, and Xiao Yu. 2012. PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks. In KDD. 1348--1356.
[27]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. 1067--1077.
[28]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks KDD. 990--998.
[29]
Andrew Tomkins, Min Zhang, and William D Heavlin. 2017. Single versus Double Blind Reviewing at WSDM 2017. arXiv preprint arXiv:1702.00502 (2017).
[30]
Susan Van Rooyen, Fiona Godlee, Stephen Evans, Nick Black, and Richard Smith. 1999. Effect of open peer review on quality of reviews and on reviewers' recommendations: a randomised trial. BMJ Vol. 318, 7175 (1999), 23--27.
[31]
Dashun Wang, Chaoming Song, and Albert-László Barabási. 2013. Quantifying long-term scientific impact. Science Vol. 342, 6154 (2013), 127--132.
[32]
Kilian Q Weinberger and Lawrence K Saul. 2009. Distance metric learning for large margin nearest neighbor classification. JMLR Vol. 10, 2 (2009), 207--244.
[33]
Eric P Xing, Michael I Jordan, Stuart J Russell, and Andrew Y Ng. 2003. Distance metric learning with application to clustering with side-information NIPS. 521--528.
[34]
Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, and Jiawei Han. 2017. Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation. In KDD. 1245--1254.
[35]
Xing Zhao. 2013. The scorecard solution to the author-paper identification challenge KDD Cup Workshop. 4.

Cited By

View all
  • (2025)Synergistic Multi-Drug Combination Prediction Based on Heterogeneous Network Representation Learning with Contrastive LearningTsinghua Science and Technology10.26599/TST.2023.901014930:1(215-233)Online publication date: Feb-2025
  • (2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624305(112624)Online publication date: Dec-2024
  • (2024)Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancementJournal of Molecular Graphics and Modelling10.1016/j.jmgm.2024.108843(108843)Online publication date: Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. author identification
  2. deep learning
  3. heterogeneous networks
  4. metric learning
  5. representation learning

Qualifiers

  • Research-article

Funding Sources

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)21
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Synergistic Multi-Drug Combination Prediction Based on Heterogeneous Network Representation Learning with Contrastive LearningTsinghua Science and Technology10.26599/TST.2023.901014930:1(215-233)Online publication date: Feb-2025
  • (2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624305(112624)Online publication date: Dec-2024
  • (2024)Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancementJournal of Molecular Graphics and Modelling10.1016/j.jmgm.2024.108843(108843)Online publication date: Aug-2024
  • (2024)Improving healthy food recommender systems through heterogeneous hypergraph learningEgyptian Informatics Journal10.1016/j.eij.2024.10057028(100570)Online publication date: Dec-2024
  • (2024)Review of heterogeneous graph embedding methods based on deep learning techniques and comparing their efficiency in node classificationSocial Network Analysis and Mining10.1007/s13278-023-01178-614:1Online publication date: 3-Jan-2024
  • (2023)Web-Scale Academic Name Disambiguation: The WhoIsWho Benchmark, Leaderboard, and ToolkitProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599930(3817-3828)Online publication date: 6-Aug-2023
  • (2023)A Survey on Heterogeneous Graph Embedding: Methods, Techniques, Applications and SourcesIEEE Transactions on Big Data10.1109/TBDATA.2022.31774559:2(415-436)Online publication date: 1-Apr-2023
  • (2023)RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-TrainingIEEE Transactions on Big Data10.1109/TBDATA.2022.31523869:1(186-199)Online publication date: 1-Feb-2023
  • (2023)$$\textbf{E}^{3}$$-MG: End-to-End Expert Linking via Multi-Granularity Representation LearningNeural Information Processing10.1007/978-981-99-8178-6_21(268-280)Online publication date: 30-Nov-2023
  • (2022)Computational method using heterogeneous graph convolutional network model combined with reinforcement layer for MiRNA–disease association predictionBMC Bioinformatics10.1186/s12859-022-04843-323:1Online publication date: 25-Jul-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media