More Web Proxy on the site http://driver.im/

research-article

Public Access

Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification

Authors:

Xiangliang Zhang,

Nitesh V. ChawlaAuthors Info & Claims

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 709 - 718

https://doi.org/10.1145/3178876.3186152

Published: 23 April 2018 Publication History

All formats PDF

Abstract

In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing de-anonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning node embeddings in academic network to infer the correlation of paper-author pair. However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose Camel, a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. Extensive experiments demonstrate that Camel outperforms the state-of-the-art baselines. It achieves an average improvement of 6.3% over the best baseline method.

References

[1]

Ting Chen and Yizhou Sun. 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification. In WSDM. 295--304.

Digital Library

[2]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014).

[3]

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks KDD. 135--144.

Digital Library

[4]

Yuxiao Dong, Reid A Johnson, and Nitesh V Chawla. 2015. Will this paper increase your h-index?: Scientific impact prediction WSDM. 149--158.

Digital Library

[5]

Dmitry Efimov, Lucas Silva, and Benjamin Solecki. 2013. Kdd cup 2013-author-paper identification challenge: second place team KDD Cup Workshop.

[6]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD. 855--864.

Digital Library

[7]

Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, and Jiawei Han. 2016. Large-scale embedding learning in heterogeneous event data ICDM. 907--912.

[8]

Qi He, Jian Pei, Daniel Kifer, Prasenjit Mitra, and Lee Giles. 2010. Context-aware citation recommendation. In WWW. 421--430.

Digital Library

[9]

Shawndra Hill and Foster Provost. 2003. The myth of the double-blind review?: author identification using only citations. Acm Sigkdd Explorations Newsletter Vol. 5, 2 (2003), 179--184.

Digital Library

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780.

Digital Library

[11]

Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative metric learning. In WWW. 193--201.

Digital Library

[12]

Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks KDD. 1595--1604.

Digital Library

[13]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[14]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents ICML. 1188--1196.

Digital Library

[15]

Chun-Liang Li, Yu-Chuan Su, Ting-Wei Lin, Cheng-Hao Tsai, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, et almbox. 2015. Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013. JMLR Vol. 16, 1 (2015), 2921--2947.

Digital Library

[16]

Xiang Liu, Torsten Suel, and Nasir Memon. 2014 a. A robust model for paper reviewer assignment. In RecSys. 25--32.

Digital Library

[17]

Xiaozhong Liu, Yingying Yu, Chun Guo, and Yizhou Sun. 2014 b. Meta-path-based ranking with pseudo relevance feedback on heterogeneous graph for citation recommendation. In CIKM. 121--130.

Digital Library

[18]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality NIPS. 3111--3119.

Digital Library

[19]

Mathias Payer, Ling Huang, Neil Zhenqiang Gong, Kevin Borgolte, and Mario Frank. 2015. What you submit is who you are: A multimodal approach for deanonymizing scientific publications. TIFS Vol. 10, 1 (2015), 200--212.

[20]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations KDD. 701--710.

Digital Library

[21]

Xiang Ren, Jialu Liu, Xiao Yu, Urvashi Khandelwal, Quanquan Gu, Lidan Wang, and Jiawei Han. 2014. Cluscite: Effective citation recommendation by information network-based clustering KDD. 821--830.

Digital Library

[22]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback UAI. 452--461.

Digital Library

[23]

Hua-Wei Shen, Dashun Wang, Chaoming Song, and Albert-László Barabási. 2014. Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes. AAAI, Vol. Vol. 14. 291--297.

Digital Library

[24]

Roberta Sinatra, Dashun Wang, Pierre Deville, Chaoming Song, and Albert-László Barabási. 2016. Quantifying the evolution of individual scientific impact. Science Vol. 354, 6312 (2016), aaf5239.

[25]

Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB Vol. 4, 11 (2011), 992--1003.

Digital Library

[26]

Yizhou Sun, Brandon Norick, Jaiwei Han, Xifeng Yan, Philip Yu, and Xiao Yu. 2012. PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks. In KDD. 1348--1356.

Digital Library

[27]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. 1067--1077.

Digital Library

[28]

Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks KDD. 990--998.

Digital Library

[29]

Andrew Tomkins, Min Zhang, and William D Heavlin. 2017. Single versus Double Blind Reviewing at WSDM 2017. arXiv preprint arXiv:1702.00502 (2017).

[30]

Susan Van Rooyen, Fiona Godlee, Stephen Evans, Nick Black, and Richard Smith. 1999. Effect of open peer review on quality of reviews and on reviewers' recommendations: a randomised trial. BMJ Vol. 318, 7175 (1999), 23--27.

[31]

Dashun Wang, Chaoming Song, and Albert-László Barabási. 2013. Quantifying long-term scientific impact. Science Vol. 342, 6154 (2013), 127--132.

[32]

Kilian Q Weinberger and Lawrence K Saul. 2009. Distance metric learning for large margin nearest neighbor classification. JMLR Vol. 10, 2 (2009), 207--244.

Digital Library

[33]

Eric P Xing, Michael I Jordan, Stuart J Russell, and Andrew Y Ng. 2003. Distance metric learning with application to clustering with side-information NIPS. 521--528.

Digital Library

[34]

Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, and Jiawei Han. 2017. Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation. In KDD. 1245--1254.

Digital Library

[35]

Xing Zhao. 2013. The scorecard solution to the author-paper identification challenge KDD Cup Workshop. 4.

Digital Library

Cited By

Xi XYuan JLu SHe J(2025)Synergistic Multi-Drug Combination Prediction Based on Heterogeneous Network Representation Learning with Contrastive LearningTsinghua Science and Technology10.26599/TST.2023.901014930:1(215-233)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2023.9010149
Huang ZZhang HHao CYang HWu H(2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624305(112624)Online publication date: Dec-2024
https://doi.org/10.1016/j.knosys.2024.112624
Zhao ZZhou QWu CSu RXiong W(2024)Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancementJournal of Molecular Graphics and Modelling10.1016/j.jmgm.2024.108843(108843)Online publication date: Aug-2024
https://doi.org/10.1016/j.jmgm.2024.108843
Show More Cited By

Index Terms

Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Information systems
  1. Data management systems
    1. Database design and models
      1. Graph-based database models
        Network data models
  2. Information retrieval
    1. Retrieval models and ranking

Recommendations

SHNE: Representation Learning for Semantic-Associated Heterogeneous Networks
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Representation learning in heterogeneous networks faces challenges due to heterogeneous structural information of multiple types of nodes and relations, and also due to the unstructured attribute or content (e.g., text) associated with some types of ...
Task-Guided Pair Embedding in Heterogeneous Network
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Many real-world tasks solved by heterogeneous network embedding methods can be cast as modeling the likelihood of a pairwise relationship between two nodes. For example, the goal of author identification task is to model the likelihood of a paper being ...
CrossFormer: Cross-Modal Representation Learning via Heterogeneous Graph Transformer
Transformers have been recognized as powerful tools for various cross-modal tasks due to their superior ability to perform representation learning through self-attention. Existing transformer-based cross-modal models can be categorized into single-stream ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '18: Proceedings of the 2018 World Wide Web Conference

April 2018

2000 pages

ISBN:9781450356398

General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

King Abdullah University of Science and Technology
National Science Foundation
Army Research Laboratory

Conference

WWW '18

Sponsor:

IW3C2

WWW '18: The Web Conference 2018

April 23 - 27, 2018

Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
1,311
Total Downloads

Downloads (Last 12 months)181
Downloads (Last 6 weeks)25

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xi XYuan JLu SHe J(2025)Synergistic Multi-Drug Combination Prediction Based on Heterogeneous Network Representation Learning with Contrastive LearningTsinghua Science and Technology10.26599/TST.2023.901014930:1(215-233)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2023.9010149
Huang ZZhang HHao CYang HWu H(2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624305(112624)Online publication date: Dec-2024
https://doi.org/10.1016/j.knosys.2024.112624
Zhao ZZhou QWu CSu RXiong W(2024)Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancementJournal of Molecular Graphics and Modelling10.1016/j.jmgm.2024.108843(108843)Online publication date: Aug-2024
https://doi.org/10.1016/j.jmgm.2024.108843
Wang JZhou JAksoy MSharma NRahman MZain JAlenazi MAminzadeh A(2024)Improving healthy food recommender systems through heterogeneous hypergraph learningEgyptian Informatics Journal10.1016/j.eij.2024.10057028(100570)Online publication date: Dec-2024
https://doi.org/10.1016/j.eij.2024.100570
Noori ABalafar MBouyer ASalmani K(2024)Review of heterogeneous graph embedding methods based on deep learning techniques and comparing their efficiency in node classificationSocial Network Analysis and Mining10.1007/s13278-023-01178-614:1Online publication date: 3-Jan-2024
https://doi.org/10.1007/s13278-023-01178-6
Chen BZhang JZhang FHan TCheng YLi XDong YTang JSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Web-Scale Academic Name Disambiguation: The WhoIsWho Benchmark, Leaderboard, and ToolkitProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599930(3817-3828)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599930
Wang XBo DShi CFan SYe YYu P(2023)A Survey on Heterogeneous Graph Embedding: Methods, Techniques, Applications and SourcesIEEE Transactions on Big Data10.1109/TBDATA.2022.31774559:2(415-436)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TBDATA.2022.3177455
Qiao ZFu YWang PXiao MNing ZZhang DDu YZhou Y(2023)RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-TrainingIEEE Transactions on Big Data10.1109/TBDATA.2022.31523869:1(186-199)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TBDATA.2022.3152386
Zha ZQi PBao XQin B(2023)$$\textbf{E}^{3}$$-MG: End-to-End Expert Linking via Multi-Granularity Representation LearningNeural Information Processing10.1007/978-981-99-8178-6_21(268-280)Online publication date: 30-Nov-2023
https://doi.org/10.1007/978-981-99-8178-6_21
Huang DAn JZhang LLiu B(2022)Computational method using heterogeneous graph convolutional network model combined with reinforcement layer for MiRNA–disease association predictionBMC Bioinformatics10.1186/s12859-022-04843-323:1Online publication date: 25-Jul-2022
https://doi.org/10.1186/s12859-022-04843-3
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten