Computer Science > Computation and Language

arXiv:2406.10310 (cs)

[Submitted on 14 Jun 2024 (v1), last revised 25 Nov 2024 (this version, v3)]

Title:TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

Authors:Zhuofeng Li, Zixing Gou, Xiangnan Zhang, Zhongyuan Liu, Sirui Li, Yuntong Hu, Chen Ling, Zheng Zhang, Liang Zhao

View PDF HTML (experimental)

Abstract:Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotations significantly limits the exploration of contextual relationships between entities, hindering deeper insights into graph-structured data. To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges. The TEG-DB datasets are large-scale and encompass a wide range of domains, from citation networks to social networks. In addition, we conduct extensive benchmark experiments on TEG-DB to assess the extent to which current techniques, including pre-trained language models, graph neural networks, and their combinations, can utilize textual node and edge information. Our goal is to elicit advancements in textual-edge graph research, specifically in developing methodologies that exploit rich textual node and edge descriptions to enhance graph analysis and provide deeper insights into complex real-world networks. The entire TEG-DB project is publicly accessible as an open-source repository on Github, accessible at this https URL.

Comments:	Accepted by NeurIPS 2024
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.10310 [cs.CL]
	(or arXiv:2406.10310v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.10310

Submission history

From: Zhuofeng Li [view email]
[v1] Fri, 14 Jun 2024 06:22:47 UTC (177 KB)
[v2] Wed, 20 Nov 2024 11:47:58 UTC (2,677 KB)
[v3] Mon, 25 Nov 2024 13:35:47 UTC (2,677 KB)

Computer Science > Computation and Language

Title:TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators