More Web Proxy on the site http://driver.im/

research-article

LargeEA: aligning entities for large-scale knowledge graphs

Authors:

Baihua ZhengAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 15, Issue 2

Pages 237 - 245

https://doi.org/10.14778/3489496.3489504

Published: 01 October 2021 Publication History

Abstract

Entity alignment (EA) aims to find equivalent entities in different knowledge graphs (KGs). Current EA approaches suffer from scalability issues, limiting their usage in real-world EA scenarios. To tackle this challenge, we propose LargeEA to align entities between large-scale KGs. LargeEA consists of two channels, i.e., structure channel and name channel. For the structure channel, we present METIS-CPS, a memory-saving mini-batch generation strategy, to partition large KGs into smaller mini-batches. LargeEA, designed as a general tool, can adopt any existing EA approach to learn entities' structural features within each mini-batch independently. For the name channel, we first introduce NFF, a name feature fusion method, to capture rich name features of entities without involving any complex training process; we then exploit a name-based data augmentation to generate seed alignment without any human intervention. Such design fits common real-world scenarios much better, as seed alignment is not always available. Finally, LargeEA derives the EA results by fusing the structural features and name features of entities. Since no widely-acknowledged benchmark is available for large-scale EA evaluation, we also develop a large-scale EA benchmark called DBP1M extracted from real-world KGs. Extensive experiments confirm the superiority of LargeEA against state-of-the-art competitors.

References

[1]

The source code of BERT. https://github.com/huggingface/transformers.

[2]

The source code of datasketch. https://github.com/ekzhu/datasketch.

[3]

The source code of LargeEA. https://github.com/ZJU-DBL/LargeEA.

[4]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary G. Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In ISWC. 722--735.

Digital Library

[5]

Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787--2795.

Digital Library

[6]

Richard W Brislin. 1970. Back-translation for cross-cultural research. Journal of cross-cultural psychology 1, 3 (1970), 185--216.

[7]

Yixin Cao, Zhiyuan Liu, Chengjiang Li, Zhiyuan Liu, Juanzi Li, and Tat-Seng Chua. 2019. Multi-Channel Graph Neural Network for Entity Alignment. In ACL. 1452--1461.

[8]

Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. 2018. Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment. In IJCAI. 3998--4004.

Digital Library

[9]

Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2017. Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment. In IJCAI. 1511--1517.

Digital Library

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.

[11]

Matthias Fey, Jan Eric Lenssen, Christopher Morris, Jonathan Masci, and Nils M. Kriege. 2020. Deep Graph Matching Consensus. In ICLR.

[12]

Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau. 2011. LogMap: Logic-Based and Scalable Ontology Matching. In ISWC. 273--288.

Digital Library

[13]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017).

[14]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomás Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In EACL. 427--431.

[15]

George Karypis and Vipin Kumar. 1998. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359--392.

Digital Library

[16]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.

[17]

Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI. 31--46.

Digital Library

[18]

Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In ICLR.

[19]

Chengjiang Li, Yixin Cao, Lei Hou, Jiaxin Shi, Juanzi Li, and Tat-Seng Chua. 2019. Semi-supervised Entity Alignment via Joint Knowledge Embedding Model and Cross-graph Model. In EMNLP. 2723--2732.

[20]

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep Entity Matching with Pre-Trained Language Models. Proc. VLDB Endow. 14, 1 (2020), 50--60.

Digital Library

[21]

Xixun Lin, Hong Yang, Jia Wu, Chuan Zhou, and Bin Wang. 2019. Guiding Cross-lingual Entity Alignment via Adversarial Knowledge Embedding. In ICDM. 429--438.

[22]

Fangyu Liu, Muhao Chen, Dan Roth, and Nigel Collier. 2020. Visual Pivoting for (Unsupervised) Entity Alignment. arXiv preprint arXiv:2009.13603 (2020).

[23]

Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, and Tat-Seng Chua. 2020. Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment. In EMNLP. 6355--6364.

[24]

Farzaneh Mahdisoltani, Joanna Biega, and Fabian M. Suchanek. 2015. YAGO3: A Knowledge Base from Multilingual Wikipedias. In CIDR.

[25]

Xin Mao, Wenting Wang, Huimin Xu, Man Lan, and Yuanbin Wu. 2020. MRAEA: An Efficient and Robust Entity Alignment Approach for Cross-lingual Knowledge Graph. In WSDM. 420--428.

Digital Library

[26]

Xin Mao, Wenting Wang, Huimin Xu, Yuanbin Wu, and Man Lan. 2020. Relational Reflection Entity Alignment. In CIKM. 1095--1104.

Digital Library

[27]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD. 19--34.

Digital Library

[28]

Hao Nie, Xianpei Han, Le Sun, Chi Man Wong, Qiang Chen, Suhui Wu, and Wei Zhang. 2020. Global Structure and Local Semantics-Preserved Embeddings for Entity Alignment. In IJCAI. 3658--3664.

Digital Library

[29]

Shichao Pei, Lu Yu, Robert Hoehndorf, and Xiangliang Zhang. 2019. Semi-Supervised Entity Alignment via Knowledge Graph Embedding with Awareness of Degree Difference. In WWW. 3130--3136.

Digital Library

[30]

Shichao Pei, Lu Yu, and Xiangliang Zhang. 2019. Improving Cross-lingual Entity Alignment via Optimal Transport. In IJCAI. 3231--3237.

Digital Library

[31]

Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Probabilistic Alignment of Relations, Instances, and Schema. PVLDB 5, 3 (2011), 157--168.

Digital Library

[32]

Zequn Sun, Muhao Chen, Wei Hu, Chengming Wang, Jian Dai, and Wei Zhang. 2020. Knowledge Association with Hyperbolic Knowledge Graph Embeddings. In EMNLP. 5704--5716.

[33]

Zequn Sun, Wei Hu, and Chengkai Li. 2017. Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding. In ISWC. 628--644.

[34]

Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping Entity Alignment with Knowledge Graph Embedding. In IJCAI. 4396--4402.

Digital Library

[35]

Zequn Sun, JiaCheng Huang, Wei Hu, Muhao Chen, Lingbing Guo, and Yuzhong Qu. 2019. TransEdge: Translating Relation-Contextualized Embeddings for Knowledge Graphs. In ISWC. 612--629.

[36]

Zequn Sun, Chengming Wang, Wei Hu, Muhao Chen, Jian Dai, Wei Zhang, and Yuzhong Qu. 2020. Knowledge Graph Alignment Network with Gated Multi-Hop Neighborhood Aggregation. In AAAI. 222--229.

[37]

Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs. PVLDB 13, 11 (2020), 2326--2340.

Digital Library

[38]

Xiaobin Tang, Jing Zhang, Bo Chen, Yang Yang, Hong Chen, and Cuiping Li. 2020. BERT-INT: A BERT-based Interaction Model For Knowledge Graph Alignment. In IJCAI. 3174--3180.

Digital Library

[39]

Bayu Distiawan Trisedya, Jianzhong Qi, and Rui Zhang. 2019. Entity Alignment between Knowledge Graphs Using Attribute Embeddings. In AAAI. 297--304.

Digital Library

[40]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR. OpenReview.net.

[41]

Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.

Digital Library

[42]

Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018. Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks. In EMNLP. 349--357.

[43]

Zhichun Wang, Jinjian Yang, and Xiaoju Ye. 2020. Knowledge Graph Alignment with Entity-Pair Embedding. In EMNLP. 1672--1680.

[44]

Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, and Dongyan Zhao. 2019. Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs. In IJCAI. 5278--5284.

[45]

Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, and Dongyan Zhao. 2019. Jointly Learning Entity and Relation Representations for Entity Alignment. In EMNLP. 240--249.

[46]

Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, and Dongyan Zhao. 2020. Neighborhood Matching Network for Entity Alignment. In ACL. 6477--6487.

[47]

Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. In WWW. 1271--1279.

Digital Library

[48]

Hongteng Xu, Dixin Luo, and Lawrence Carin. 2019. Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching. In NeurIPS. 3046--3056.

Digital Library

[49]

Kun Xu, Linfeng Song, Yansong Feng, Yan Song, and Dong Yu. 2020. Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment. In AAAI. 9354--9361.

[50]

Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, and Dong Yu. 2019. Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network. In ACL. 3156--3161.

[51]

Hsiu-Wei Yang, Yanyan Zou, Peng Shi, Wei Lu, Jimmy Lin, and Xu Sun. 2019. Aligning Cross-Lingual Entities with Multi-Aspect Information. In EMNLP. 4430--4440.

[52]

Kai Yang, Shaoqin Liu, Junfeng Zhao, Yasha Wang, and Bing Xie. 2020. COTSAE: CO-Training of Structure and Attribute Embeddings for Entity Alignment. In AAAI. 3025--3032.

[53]

Weixin Zeng, Xiang Zhao, Jiuyang Tang, and Xuemin Lin. 2020. Collective Entity Alignment via Adaptive Features. In ICDE. 1870--1873.

[54]

Weixin Zeng, Xiang Zhao, Wei Wang, Jiuyang Tang, and Zhen Tan. 2020. Degree-Aware Alignment for Entities in Tail. In SIGIR. 811--820.

Digital Library

[55]

Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative Knowledge Base Embedding for Recommender Systems. In SIGKDD. 353--362.

Digital Library

[56]

Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen, Lingbing Guo, and Yuzhong Qu. 2019. Multi-view Knowledge Graph Embedding for Entity Alignment. In IJCAI. 5429--5435.

[57]

Xiang Zhao, Weixin Zeng, Jiuyang Tang, Wei Wang, and Fabian M. Suchanek. 2020. An experimental study of state-of-the-art entity alignment approaches. TKDE 10 (2020).

[58]

Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative Entity Alignment via Joint Knowledge Embeddings. In IJCAI. 4258--4264.

Digital Library

[59]

Qiannan Zhu, Xiaofei Zhou, Jia Wu, Jianlong Tan, and Li Guo. 2019. Neighborhood-Aware Attentional Representation for Multilingual Knowledge Graphs. In IJCAI. 1943--1949.

Digital Library

[60]

Yan Zhuang, Guoliang Li, Zhuojian Zhong, and Jianhua Feng. 2017. Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases. In CIKM. 1917--1926.

Digital Library

Cited By

Wang KXu YLuo S(2024)TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph ReasoningProceedings of the VLDB Endowment10.14778/3675034.367503917:10(2459-2472)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675039
Huo NCheng RKao BNing WHaldar NLi XLi JNajafi MLi TQu G(2024)ZeroEA: A Zero-Training Entity Alignment Framework via Pre-Trained Language ModelProceedings of the VLDB Endowment10.14778/3654621.365464017:7(1765-1774)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.14778/3654621.3654640
Liang WMeo PTang YZhu J(2024)A Survey of Multi-modal Knowledge Graphs: Technologies and TrendsACM Computing Surveys10.1145/365657956:11(1-41)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3656579
Show More Cited By

Index Terms

LargeEA: aligning entities for large-scale knowledge graphs
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

ArThUR: A Tool for Markov Logic Network
Proceedings of the Confederated International Workshops on On the Move to Meaningful Internet Systems: OTM 2014 Workshops - Volume 8842

Logical approaches-and ontologies in particular-offer a well-adapted framework for representing knowledge present on the Semantic Web [InlineEquation not available: see fulltext.]. These ontologies are formulated in [InlineEquation not available: see ...
The Coolest Way to Generate Binary Strings

Pick a binary string of length n and remove its first bit b . Now insert b after the first remaining 10, or insert $\overline{b}$ at the end if there is no remaining 10. Do it again. And again. Keep going! Eventually, you will cycle through all 2ⁿ of the ...
Use of the q-Gaussian mutation in evolutionary algorithms
Special issue on advances in computational intelligence and bioinformatics

This paper proposes the use of the q-Gaussian mutation with self-adaptation of the shape of the mutation distribution in evolutionary algorithms. The shape of the q-Gaussian mutation distribution is controlled by a real parameter q. In the proposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 15, Issue 2

October 2021

247 pages

ISSN:2150-8097

Editors:
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2021

Published in PVLDB Volume 15, Issue 2

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
105
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang KXu YLuo S(2024)TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph ReasoningProceedings of the VLDB Endowment10.14778/3675034.367503917:10(2459-2472)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675039
Huo NCheng RKao BNing WHaldar NLi XLi JNajafi MLi TQu G(2024)ZeroEA: A Zero-Training Entity Alignment Framework via Pre-Trained Language ModelProceedings of the VLDB Endowment10.14778/3654621.365464017:7(1765-1774)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.14778/3654621.3654640
Liang WMeo PTang YZhu J(2024)A Survey of Multi-modal Knowledge Graphs: Technologies and TrendsACM Computing Surveys10.1145/365657956:11(1-41)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3656579
Obraczka DRahm E(2024)Comparing Symbolic and Embedding-Based Approaches for Relational BlockingKnowledge Engineering and Knowledge Management10.1007/978-3-031-77792-9_10(155-173)Online publication date: 25-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-77792-9_10
Huang JSun ZChen QXu XRen WHu W(2023)Deep Active Alignment of Knowledge Graph Entities and SchemataProceedings of the ACM on Management of Data10.1145/35893041:2(1-26)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589304
Long WLi XWang LZhang FLin ZLin X(2023)Efficient m-closest entity matching over heterogeneous information networksKnowledge-Based Systems10.1016/j.knosys.2023.110299263:COnline publication date: 5-Mar-2023
https://dl.acm.org/doi/10.1016/j.knosys.2023.110299
Liu BHua WZuccon GZhao GZhang XAl Hasan MXiong L(2022)High-quality Task Division for Large-scale Entity AlignmentProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557352(1258-1268)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557352

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents