Efficient Graph Representation Learning by Non-Local Information Exchange †
<p>The relationship between the expressibility of the original and 3 underlying topologies of graphs and modern GNN performance (in node classification accuracy). Different landmarks represent different datasets. Colors denote graph re-wiring methods. Red arrow lines highlight the improvement by our re-wiring method. Red box explains ours preduces an easier graph to classify via changing the topology as nodes with same class denoted by colored oval being more separated. Note that all re-wiring methods are applied with the same baseline hyperparameter.</p> "> Figure 2
<p>Non-local information exchange mechanism (<b>right</b>), where colors of node denote the distance marked by numbers between a node to the red one, nodes with mixed color denote aggregated node feature by message-passing, solid lines are edges of graph, and dashed lines denote express connections. The technique reminiscent of non-local mean technique for image processing (<b>left</b>), which is able to capture global information by express connections that are denoted by red dashed lines reducing the over-smoothing risk in GNNs. Both ideas integrate information beyond either a spatial or topological neighbor, in order to preserve distinctive feature representations.</p> "> Figure 3
<p>(<b>left</b>): Illustration of progressive NLE for simulated graph with original adjacency matrix <math display="inline"><semantics> <mi mathvariant="bold">A</mi> </semantics></math> to re-wired topology <math display="inline"><semantics> <mrow> <mi>h</mi> <mi>o</mi> <mi>p</mi> <mo>(</mo> <mi mathvariant="bold">A</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> </semantics></math> (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>⋯</mo> </mrow> </semantics></math>). (<b>right</b>): ExM sorts original graph and new graphs cascaded (C-ExM) or aggregated (A-ExM) to input to any GNN. Green arrow indicates the pipeline of an arbitrary GNN.</p> "> Figure 4
<p>Comparison of re-wiring methods between DropEdge, GDC, and our NLE. NLE can mitigate over-smoothing issues. Compared with previous graph re-wiring methods, over-smoothness is delayed after using NLE. Even though G2GNN or using skip connection almost eliminated smoothed node features, using NLE leads to a larger Dirichlet energy than the original graph topology.</p> "> Figure 5
<p>Bar plots of performance by using different layer numbers on real data.</p> ">
Abstract
:1. Introduction
- We propose a non-local information exchange mechanism to efficiently integrate feature representations from non-local neighborhoods with a collection of express connections.
- Current works focus on the optimization of feature representation with a deep GNN model by overcoming the over-smoothing issue. We address this challenge from the novel perspective of expressibility—i.e., the insight of our NLE mechanism is to directly combine local and global information through express links before the over-smoothed features undermine the discriminative power of feature representation.
- We devise our ExM wrapper as a pre-processing step, which generates new topologies in advance and facilitates various GNN models to retain SOTA performance on both homophilous and heterophilous graph data.
2. Preliminaries
2.1. Related Works
2.2. Proxy Measurement for Expressibility
3. Methods
3.1. Graph Re-Wiring by Hierarchical Non-Local Information Exchange
3.2. Express Messenger (ExM) Wrapper
3.2.1. Exhaustive vs. Progressive NLE
Cascaded Express Messenger (C-ExM) Wrapper
Algorithm 1 Cascaded ExM wrapper |
Require: Range of hopping steps , hop while do if l is odd or then else if l is even then end if end while |
Aggregated Express Messenger (A-ExM) Wrapper
Algorithm 2 Aggregated ExM wrapper |
Require: Range of hopping steps , hop while do while do end while end while |
3.3. Implementation
4. Experiments
4.1. Experimental Setting
4.1.1. Dataset
4.1.2. Experiment Setup
4.2. Results on Synthetic Data
4.3. Results on Real Data
4.3.1. Benchmark Graph Re-Wiring Techniques
4.3.2. Evaluation on Graph Feature Representation Learning
4.3.3. Performance of Deep GNN
4.3.4. Ablation Studies
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Data Statistics
h | Node Number | Edge Number | Class Number | |
---|---|---|---|---|
Questions | 0.84 | 48,921 | 153,540 | 2 |
Cora | 0.81 | 2708 | 10,556 | 7 |
Pubmed | 0.80 | 19,717 | 88,648 | 3 |
Citeseer | 0.74 | 3327 | 9104 | 6 |
Minesweeper | 0.68 | 10,000 | 39,402 | 2 |
Tolokers | 0.60 | 11,758 | 519,000 | 2 |
Amazon-ratings | 0.38 | 24,492 | 93,050 | 5 |
Chameleon | 0.24 | 2277 | 31,421 | 5 |
Squirrel | 0.22 | 5201 | 198,493 | 5 |
Wisconsin | 0.20 | 251 | 515 | 5 |
Cornell | 0.13 | 183 | 298 | 5 |
Texas | 0.11 | 183 | 325 | 5 |
Roman-empire | 0.05 | 22,662 | 32,927 | 18 |
Appendix B. Detailed Analysis in Ablation Studies
Roman | Amazon | Mine. | Squirrel | Citeseer | ||
---|---|---|---|---|---|---|
GAT + C-ExM | 83.74 ± 0.55 | 46.44 ± 0.42 | 86.54 ± 0.78 | 61.55 ± 1.33 | 74.25 ± 1.27 | |
76.44 ± 0.87 | 46.83 ± 0.72 | 90.56 ± 0.59 | 61.48 ± 0.99 | 73.58 ± 1.97 | ||
76.19 ± 0.82 | 46.76 ± 0.71 | 90.96 ± 0.74 | 62.51 ± 0.81 | 73.84 ± 1.27 | ||
75.97 ± 0.70 | 47.00 ± 0.61 | 90.82 ± 0.82 | 61.91 ± 1.32 | 74.25 ± 1.52 | ||
SAGE + C-ExM | 90.34 ± 0.42 | 50.25 ± 0.49 | 87.09 ± 0.79 | 43.93 ± 1.21 | 75.30 ± 1.74 | |
85.00 ± 0.52 | 50.44 ± 0.79 | 92.51 ± 0.40 | 43.30 ± 1.86 | 74.57 ± 1.50 | ||
85.11 ± 0.50 | 51.30 ± 0.52 | 92.81 ± 0.39 | 44.85 ± 1.60 | 74.64 ± 1.72 | ||
84.81 ± 0.50 | 51.59 ± 0.39 | 92.93 ± 0.40 | 44.76 ± 1.79 | 74.35 ± 1.37 |
Average Degree | Roman | Amazon. | Chameleon | Squirrel |
---|---|---|---|---|
1.45 | 3.80 | 13.80 | 38.16 | |
GAT + C-ExM | 80.77 ± 0.45 | 48.43 ± 0.44 | 70.88 ± 1.60 | 76.00 ± 0.95 |
GAT + A-ExM | 79.99 ± 0.49 | 48.17 ± 0.53 | 73.18 ± 2.08 | 77.52 ± 0.86 |
FSGNN + C-ExM | 72.61 ± 0.57 | 44.56 ± 0.59 | 75.99 ± 1.31 | 66.45 ± 4.17 |
FSGNN + A-ExM | 72.17 ± 0.35 | 45.42 ± 0.45 | 77.76 ± 0.94 | 73.85 ± 2.08 |
References
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Zhang, M.; Chen, Y. Link prediction based on graph neural networks. In Proceedings of the NIPS’18: 32nd International Conference on Neural Information Processing Systems, Montreal, ON, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Yun, S.; Jeong, M.; Kim, R.; Kang, J.; Kim, H.J. Graph transformer networks. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Kreuzer, D.; Beaini, D.; Hamilton, W.; Létourneau, V.; Tossou, P. Rethinking graph transformers with spectral attention. Adv. Neural Inf. Process. Syst. 2021, 34, 21618–21629. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; Sun, X. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3438–3445. [Google Scholar]
- Kim, J.; Nguyen, D.; Min, S.; Cho, S.; Lee, M.; Lee, H.; Hong, S. Pure transformers are powerful graph learners. Adv. Neural Inf. Process. Syst. 2022, 35, 14582–14595. [Google Scholar]
- Chen, J.; Gao, K.; Li, G.; He, K. NAGphormer: A tokenized graph transformer for node classification in large graphs. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2022. [Google Scholar]
- He, X.; Hooi, B.; Laurent, T.; Perold, A.; Lecun, Y.; Bresson, X. A Generalization of ViT/MLP-Mixer to Graphs. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; Proceedings of Machine Learning Research, PMLR: Birmingham, UK, 2023; Volume 202, pp. 12724–12745. [Google Scholar]
- Cai, S.; Li, L.; Deng, J.; Zhang, B.; Zha, Z.J.; Su, L.; Huang, Q. Rethinking Graph Neural Architecture Search From Message-Passing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6657–6666. [Google Scholar]
- Feng, J.; Chen, Y.; Li, F.; Sarkar, A.; Zhang, M. How powerful are k-hop message-passing graph neural networks. Adv. Neural Inf. Process. Syst. 2022, 35, 4776–4790. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Mo, Y.; Peng, L.; Xu, J.; Shi, X.; Zhu, X. Simple unsupervised graph representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 28 February–1 March 2022; Volume 36, pp. 7797–7805. [Google Scholar]
- Zhu, J.; Yan, Y.; Zhao, L.; Heimann, M.; Akoglu, L.; Koutra, D. Beyond homophily in graph neural networks: Current limitations and effective designs. Adv. Neural Inf. Process. Syst. 2020, 33, 7793–7804. [Google Scholar]
- Zheng, X.; Wang, Y.; Liu, Y.; Li, M.; Zhang, M.; Jin, D.; Yu, P.S.; Pan, S. Graph neural networks for graphs with heterophily: A survey. arXiv 2022, arXiv:2202.07082. [Google Scholar]
- Rong, Y.; Huang, W.; Xu, T.; Huang, J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv 2019, arXiv:1907.10903. [Google Scholar]
- Gasteiger, J.; Weißenberger, S.; Günnemann, S. Diffusion improves graph learning. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Battiston, F.; Cencetti, G.; Iacopini, I.; Latora, V.; Lucas, M.; Patania, A.; Young, J.G.; Petri, G. Networks beyond pairwise interactions: Structure and dynamics. Phys. Rep. 2020, 874, 1–92. [Google Scholar]
- Alon, U.; Yahav, E. On the bottleneck of graph neural networks and its practical implications. arXiv 2020, arXiv:2006.05205. [Google Scholar]
- Topping, J.; Di Giovanni, F.; Chamberlain, B.P.; Dong, X.; Bronstein, M.M. Understanding over-squashing and bottlenecks on graphs via curvature. arXiv 2021, arXiv:2111.14522. [Google Scholar]
- Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: New York, NY, USA, 2005; Volume 2, pp. 60–65. [Google Scholar]
- Coifman, R.R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef]
- Rusch, T.K.; Chamberlain, B.P.; Mahoney, M.W.; Bronstein, M.M.; Mishra, S. Gradient gating for deep multi-rate learning on graphs. arXiv 2022, arXiv:2210.00513. [Google Scholar]
- Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:1806.01261. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2010, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Zhang, Z.; Liu, Q.; Hu, Q.; Lee, C.K. Hierarchical graph transformer with adaptive node sampling. Adv. Neural Inf. Process. Syst. 2022, 35, 21171–21183. [Google Scholar]
- Zhang, J.; Zhang, H.; Xia, C.; Sun, L. Graph-bert: Only attention is needed for learning graph representations. arXiv 2020, arXiv:2001.05140. [Google Scholar]
- Morris, C.; Ritzert, M.; Fey, M.; Hamilton, W.L.; Lenssen, J.E.; Rattan, G.; Grohe, M. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4602–4609. [Google Scholar]
- Morris, C.; Fey, M.; Kriege, N.M. The power of the Weisfeiler-Leman algorithm for machine learning with graphs. arXiv 2021, arXiv:2105.05911. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
- Wei, Z.; Wu, G. Non-local Exchange: Introduce Non-locality via Graph re-wiring to Graph Neural Networks. In Proceedings of the NeurIPS 2024 Workshop on Behavioral Machine Learning, Vancouver, BC, Canada, 14 December 2024. [Google Scholar]
- Platonov, O.; Kuznedelev, D.; Diskin, M.; Babenko, A.; Prokhorenkova, L. A critical look at the evaluation of GNNs under heterophily: Are we really making progress? arXiv 2023, arXiv:2302.11640. [Google Scholar]
- Pei, H.; Wei, B.; Chang, K.C.C.; Lei, Y.; Yang, B. Geom-gcn: Geometric graph convolutional networks. arXiv 2020, arXiv:2002.05287. [Google Scholar]
- Shi, Y.; Huang, Z.; Feng, S.; Zhong, H.; Wang, W.; Sun, Y. Masked label prediction: Unified message-passing model for semi-supervised classification. arXiv 2020, arXiv:2009.03509. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Wang, X.; Zhang, M. How powerful are spectral graph neural networks. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: Birmingham, UK, 2022; pp. 23341–23362. [Google Scholar]
- Maurya, S.K.; Liu, X.; Murata, T. Simplifying approach to node classification in graph neural networks. J. Comput. Sci. 2022, 62, 101695. [Google Scholar] [CrossRef]
- Rusch, T.K.; Chamberlain, B.; Rowbottom, J.; Mishra, S.; Bronstein, M. Graph-coupled oscillator networks. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: Birmingham, UK, 2022; pp. 18888–18909. [Google Scholar]
- Chen, M.; Wei, Z.; Huang, Z.; Ding, B.; Li, Y. Simple and deep graph convolutional networks. In Proceedings of the International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2020; PMLR: Birmingham, UK, 2020; pp. 1725–1735. [Google Scholar]
- Nikolentzos, G.; Dasoulas, G.; Vazirgiannis, M. k-hop graph neural networks. Neural Netw. 2020, 130, 195–205. [Google Scholar] [CrossRef] [PubMed]
- Cauteruccio, F.; Kou, Y. Investigating the emotional experiences in eSports spectatorship: The case of League of Legends. Inf. Process. Manag. 2023, 60, 103516. [Google Scholar] [CrossRef]
GCN | GDC | DropEdge | NAGphormer | NLE (Ours) | |
---|---|---|---|---|---|
Long-range interaction | ✗ | ✗ | ✗ | ✓ | ✓ |
Graph re-wiring | ✗ | ✓ | ✓ | ✗ | ✓ |
Anti-over-smoothing | ✗ | ✗ | ✓ | ✓ | ✓ |
h | Original | DropEdge | GDC | NLE (Ours) | |
---|---|---|---|---|---|
Cham. | 0.24 | 26.54 | 24.56 (−1.98) | 33.55 (+7.01) | 35.31 (+8.77) |
Squirrel | 0.22 | 22.96 | 23.44 (+0.48) | 22.96 (−0.00) | 22.96 (−0.00) |
Wiscon. | 0.20 | 47.06 | 41.18 (−5.88) | 41.18 (−5.88) | 50.98 (+3.92) |
Texas | 0.11 | 64.87 | 64.87 (−0.00) | 64.87 (−0.00) | 70.27 (+5.40) |
Roman. | 0.05 | 24.32 | 22.71 (−1.61) | 19.91 (−4.41) | 28.08 (+3.76) |
Roman | Texas | Wisconsin | Squirrel | Chameleon | Amazon | Citeseer | Pubmed | Cora | |
---|---|---|---|---|---|---|---|---|---|
h = 0.05 | h = 0.11 | h = 0.20 | h = 0.22 | h = 0.24 | h = 0.38 | h = 0.74 | h = 0.80 | h = 0.81 | |
Baseline | |||||||||
DropEdge | |||||||||
GDC | |||||||||
SDRF | – | – | |||||||
C-ExME | |||||||||
A-ExME | |||||||||
C-ExMP | |||||||||
A-ExMP |
Roman | Texas | Wisconsin | Squirrel | Chameleon | Citeseer | Pubmed | Average | |
---|---|---|---|---|---|---|---|---|
h = 0.05 | h = 0.11 | h = 0.20 | h = 0.22 | h = 0.24 | h = 0.74 | h = 0.80 | Imp. | |
GCN | – | |||||||
GPRGNN | – | – | ||||||
UGT | – | – | – | |||||
GloGNN | – | |||||||
GBKGNN | – | – | – | |||||
GAT | ||||||||
w/ExM | ||||||||
GT | ||||||||
w/ExM | ||||||||
SAGE | ||||||||
w/ExM | ||||||||
NAG | ||||||||
w/ExM | ||||||||
JacobiConv | ||||||||
w/ExM | ||||||||
FSGNN | ||||||||
w/ExM | ||||||||
Average Imp. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, Z.; Dan, T.; Ding, J.; Wu, G. Efficient Graph Representation Learning by Non-Local Information Exchange. Electronics 2025, 14, 1047. https://doi.org/10.3390/electronics14051047
Wei Z, Dan T, Ding J, Wu G. Efficient Graph Representation Learning by Non-Local Information Exchange. Electronics. 2025; 14(5):1047. https://doi.org/10.3390/electronics14051047
Chicago/Turabian StyleWei, Ziquan, Tingting Dan, Jiaqi Ding, and Guorong Wu. 2025. "Efficient Graph Representation Learning by Non-Local Information Exchange" Electronics 14, no. 5: 1047. https://doi.org/10.3390/electronics14051047
APA StyleWei, Z., Dan, T., Ding, J., & Wu, G. (2025). Efficient Graph Representation Learning by Non-Local Information Exchange. Electronics, 14(5), 1047. https://doi.org/10.3390/electronics14051047