[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3626772.3657771acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Scalable Community Search over Large-scale Graphs based on Graph Transformer

Published: 11 July 2024 Publication History

Abstract

Given a graph G and a query node q, community search (CS) aims to find a structurally cohesive subgraph from G that contains q. CS is widely used in many real-world applications, such as online recommendation and expert finding. Recently, the rise of learning-based CS methods has garnered extensive research interests, showcasing the promising potential of neural solutions. However, there remains room for optimization: (1) They initialize node features via classical methods, e.g., one-hot, random, and position encoding, which may fall short in capturing valuable community cohesiveness-related features. (2) The reliance on GCN or GCN-like models poses challenges in scaling to large graphs. (3) Existing methods do not adapt well to dynamic graphs, often requiring retraining from scratch. To handle this, we present CSFormer, a scalable CS based on Graph Transformer. First, we present a novel l-hop neighborhood community vector based on n-order h-index to represent each node's community features, generating a sequence of feature vectors by varying the neighborhood scope l. Then, we build a Transformer backbone to learn a good graph embedding that carries rich community features, based on which we perform a prediction-filtering-based online CS to efficiently return a community of q. We extend CSFormer to dynamic graphs and various community models. Extensive experiments on seven real-world graphs show our solution's superiority on effectiveness, e.g., we attain an average improvement of 20.6% in F1-score compared to the latest competitors.

References

[1]
Esra Akbas and Peixiang Zhao. 2017. Truss-based Community Search: a Truss-equivalence Based Indexing Approach. PVLDB, Vol. 10, 11 (2017), 1298--1309.
[2]
Nicola Barbieri, Francesco Bonchi, Edoardo Galimberti, and Francesco Gullo. 2015. Efficient and Effective Community Search. Data Min. Knowl. Discov., Vol. 29, 5 (2015), 1406--1433.
[3]
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) Algorithm for Cores Decomposition of Networks. arXiv, Vol. cs.DS/0310049 (2003).
[4]
Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized Core Decomposition. In SIGMOD. 1006--1023.
[5]
Lijun Chang and Lu Qin. 2019. Cohesive Subgraph Computation Over Large Sparse Graphs. In ICDE. 2068--2071.
[6]
Jinsong Chen, Kaiyuan Gao, Gaichao Li, and Kun He. 2023 a. NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs. In ICLR.
[7]
Jiazun Chen, Yikuan Xia, and Jun Gao. 2023 b. CommunityAF: An Example-based Community Search Method via Autoregressive Flow. Proc. VLDB Endow., Vol. 16, 10 (2023), 2565--2577. https://doi.org/10.14778/3603581.3603595
[8]
Lu Chen, Chengfei Liu, Kewen Liao, Jianxin Li, and Rui Zhou. 2019. Contextual Community Search Over Large Social Networks. In ICDE. 88--99.
[9]
Lu Chen, Chengfei Liu, Rui Zhou, Jianxin Li, Xiaochun Yang, and Bin Wang. 2018. Maximum Co-located Community Search in Large Scale Social Networks. PVLDB, Vol. 11, 10 (2018), 1233--1246.
[10]
Yankai Chen, Jie Zhang, Yixiang Fang, Xin Cao, and Irwin King. 2020. Efficient Community Search over Large Directed Graph: An Augmented Index-based Approach. In IJCAI. 3544--3550.
[11]
Zi Chen, Yiwei Zhao, Long Yuan, Xuemin Lin, and Kai Wang. 2023 c. Index-Based Biclique Percolation Communities Search on Bipartite Graphs. In 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3--7, 2023. IEEE, 2699--2712. https://doi.org/10.1109/ICDE55515.2023.00207
[12]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In KDD, Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). 257--266.
[13]
Code and datasets. 2023. Code and datasets. https://anonymous.4open.science/r/LGCS-C7ED/.
[14]
Lizhen Cui, Lingxi Yue, Dong Wen, and Lu Qin. 2018. K-connected cores computation in large dual networks. Data Science and Engineering, Vol. 3 (2018), 293--306.
[15]
Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, and Wei Wang. 2013. Online Search of Overlapping Communities. In SIGMOD. 277--288.
[16]
Joel Dudley, Tarangini Deshpande, and Atul J. Butte. 2011. Exploiting Drug-disease Relationships for Computational Drug Repositioning. Briefings Bioinform., Vol. 12, 4 (2011), 303--311.
[17]
Shuheng Fang, Kangfei Zhao, Guanghua Li, and Jeffrey Xu Yu. 2023. Community Search: A Meta-Learning Approach. In ICDE. 2358--2371.
[18]
Yixiang Fang, Reynold Cheng, Yankai Chen, Siqiang Luo, and Jiafeng Hu. 2017a. Effective and Efficient Attributed Community Search. VLDBJ, Vol. 26, 6 (2017), 803--828.
[19]
Yixiang Fang, Reynold Cheng, Xiaodong Li, Siqiang Luo, and Jiafeng Hu. 2017b. Effective Community Search over Large Spatial Graphs. PVLDB, Vol. 10, 6 (2017), 709--720.
[20]
Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective Community Search for Large Attributed Graphs. PVLDB, Vol. 9, 12 (2016), 1233--1244.
[21]
Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020a. A Survey of Community Search over Big Graphs. VLDBJ, Vol. 29, 1 (2020), 353--392.
[22]
Yixiang Fang, Zhongran Wang, Reynold Cheng, Hongzhi Wang, and Jiafeng Hu. 2019. Effective and Efficient Community Search Over Large Directed Graphs. IEEE Trans. Knowl. Data Eng., Vol. 31, 11 (2019), 2093--2107.
[23]
Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020b. Effective and Efficient Community Search over Large Heterogeneous Information Networks. PVLDB, Vol. 13, 6 (2020), 854--867.
[24]
Jun Gao, Jiazun Chen, Zhao Li, and Ji Zhang. 2021. ICS-GNN: Lightweight Interactive Community Search via Graph Neural Network. PVLDB, Vol. 14, 6 (2021), 1006--1018.
[25]
Xiaoxuan Gou, Xiaoliag Xu, Xiangying Wu, Runhuai Chen, Yuxiang Wang, Tianxing Wu, and Xiangyu Ke. 2023. Effective and Efficient Community Search with Graph Embeddings. In ECAI.
[26]
William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NeurIPS, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 1024--1034.
[27]
Peng Han, Silin Zhou, Jie Yu, Zichen Xu, Lisi Chen, and Shuo Shang. 2023. Personalized Re-ranking for Recommendation with Mask Pretraining. Data Science and Engineering, Vol. 8, 4 (2023), 357--367.
[28]
Farnoosh Hashemi, Ali Behrouz, and Milad Rezaei Hajidehi. 2023. CS-TGN: Community Search via Temporal Graph Neural Networks. In Companion Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, Ying Ding, Jie Tang, Juan F. Sequeda, Lora Aroyo, Carlos Castillo, and Geert-Jan Houben (Eds.). ACM, 1196--1203. https://doi.org/10.1145/3543873.3587654
[29]
Allen L Hu and Keith CC Chan. 2013. Utilizing both topological and attribute information for protein complex identification in PPI networks. IEEE/ACM transactions on computational biology and bioinformatics, Vol. 10, 3 (2013), 780--792.
[30]
Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, and Yixiang Fang. 2016. Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs. In CIKM. 1241--1250.
[31]
Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying K-truss Community in Large and Dynamic Graphs. In SIGMOD. 1311--1322.
[32]
Xin Huang and Laks V. S. Lakshmanan. 2017. Attribute-Driven Community Search. PVLDB, Vol. 10, 9 (2017), 949--960.
[33]
Xin Huang, Laks V. S. Lakshmanan, Jeffrey Xu Yu, and Hong Cheng. 2015. Approximate Closest Community Search in Networks. PVLDB, Vol. 9, 4 (2015), 276--287.
[34]
Yuli Jiang, Yu Rong, Hong Cheng, Xin Huang, Kangfei Zhao, and Junzhou Huang. 2021. QD-GCN: Query-Driven Graph Convolutional Networks for Attributed Community Search. arXiv, Vol. abs/2104.03583 (2021).
[35]
Yuli Jiang, Yu Rong, Hong Cheng, Xin Huang, Kangfei Zhao, and Junzhou Huang. 2022. Query Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive Attributed. PVLDB, Vol. 15, 6 (2022), 1243--1255.
[36]
Jon M. Kleinberg. 2000. Navigation in A Small World. Nature, Vol. 406 (2000), 845--845.
[37]
Ling Li, Siqiang Luo, Yuhai Zhao, Caihua Shan, Zhengkui Wang, and Lu Qin. 2023. COCLEP: Contrastive Learning-based Semi-Supervised Community Search. In ICDE. 2483--2495.
[38]
Yuqi Li, Guosheng Zang, Chunyao Song, Xiaojie Yuan, and Tingjian Ge. 2024. Leveraging Semantic Information for Enhanced Community Search in Heterogeneous Graphs. Data Science and Engineering (2024), 1--18.
[39]
Qing Liu, Xuankun Liao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2023. Distributed ((α), (β))-Core Decomposition over Bipartite Graphs. In 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3--7, 2023. IEEE, 909--921. https://doi.org/10.1109/ICDE55515.2023.00075
[40]
Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020a. Truss-based Community Search over Large Directed Graphs. In SIGMOD. 2183--2197.
[41]
Qing Liu, Yifan Zhu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020b. VAC: Vertex-Centric Attributed Community Search. In ICDE. 937--948.
[42]
Linyuan Lü, Tao Zhou, Qian-Ming Zhang, and H Eugene Stanley. 2016. The H-index of A Network Node and Its Relation to Degree and Coreness. Nature communications, Vol. 7, 1 (2016), 10168.
[43]
Wensheng Luo, Xu Zhou, Kenli Li, Yunjun Gao, and Keqin Li. 2023. Efficient Influential Community Search in Large Uncertain Graphs. IEEE Trans. Knowl. Data Eng., Vol. 35, 4 (2023), 3779--3793. https://doi.org/10.1109/TKDE.2021.3131611
[44]
Yury A. Malkov and Dmitry A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 42, 4 (2020), 824--836.
[45]
Xiaoye Miao, Yue Liu, Lu Chen, Yunjun Gao, and Jianwei Yin. 2022. Reliable Community Search on Uncertain Graphs. In ICDE. 1166--1179.
[46]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. In ICLR.
[47]
Galileo Mark Namata, Ben London, Lise Getoor, and Bert Huang. 2012. Query-Driven Active Surveying for Collective Classification. In International Workshop on Mining and Learning with Graphs (MLG).
[48]
Lutz Oettershagen, Nils M. Kriege, and Petra Mutzel. 2023. A Higher-Order Temporal H-Index for Evolving Networks. In KDD, Ambuj K. Singh, Yizhou Sun, Leman Akoglu, Dimitrios Gunopulos, Xifeng Yan, Ravi Kumar, Fatma Ozcan, and Jieping Ye (Eds.). 1770--1782.
[49]
Jeff Z. Pan, Elspeth Edelstein, Patrik Bansky, and Adam Wyner. 2021. A Knowledge Graph Based Approach to Social Science Surveys. Data Intell., Vol. 3, 4 (2021), 477--506. https://doi.org/10.1162/dint_a_00107
[50]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI, Blai Bonet and Sven Koenig (Eds.). 4292--4293.
[51]
Benedek Rozemberczki and Rik Sarkar. 2020. Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In CIKM. 1325--1334.
[52]
Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective Classification in Network Data. AI Mag., Vol. 29, 3 (2008), 93--106.
[53]
Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. 2006. BioGRID: A General Repository for Interaction Datasets. Nucleic Acids Res., Vol. 34, Database-Issue (2006), 535--539.
[54]
Longxu Sun, Xin Huang, Ronghua Li, Byron Choi, and Jianliang Xu. 2020. Index-based Intimate-Core Community Search in Large Weighted Graphs. IEEE Trans. Knowl. Data Eng. (2020).
[55]
Charalampos E. Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria A. Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In KDD. 104--112.
[56]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NeruIPS. 5998--6008.
[57]
Jia Wang and James Cheng. 2012. Truss Decomposition in Massive Networks. PVLDB, Vol. 5, 9 (2012), 812--823.
[58]
Jianwei Wang, Kai Wang, Xuemin Lin, Wenjie Zhang, and Ying Zhang. [n.,d.]. Neural Attributed Community Search at Billion Scale. Proc. ACM Manag. Data, Article 251 ( [n.,d.]), bibinfonumpages25 pages.
[59]
Jianwei Wang, Kai Wang, Xuemin Lin, Wenjie Zhang, and Ying Zhang. 2024. Efficient Unsupervised Community Search with Pre-trained Graph Transformer. arxiv: 2403.18869 [cs.SI]
[60]
Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. 2022. Navigable proximity graph-driven native hybrid queries with structured and unstructured constraints. arXiv preprint arXiv:2203.13601 (2022).
[61]
Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021b. A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. PVLDB, Vol. 14, 11 (2021), 1964--1978.
[62]
Yue Wang, Xun Jian, Zhenhua Yang, and Jia Li. 2017. Query optimal k-plex based community in graphs. Data Science and Engineering, Vol. 2 (2017), 257--273.
[63]
Yuxiang Wang, Jun Liu, Xiaoliang Xu, Xiangyu Ke, Tianxing Wu, and Xiaoxuan Gou. 2023 a. Efficient and Effective Academic Expert Finding on Heterogeneous Graphs through (k, P)-Core based Embedding. ACM Trans. Knowl. Discov. Data, Vol. 17, 6 (2023), 85:1--85:35.
[64]
Yuxiang Wang, Xiaoliang Xu, Qifan Hong, Jiahui Jin, and Tianxing Wu. 2021a. Top-k star queries on knowledge graphs through semantic-aware bounding match scores. Knowledge-Based Systems, Vol. 213 (2021), 106655.
[65]
Yuxiang Wang, Yuyang Zhao, Xiaoliang Xu, Yue Wu, Tianxing Wu, and Xiangyu Ke. 2023 b. Random Walk-based Community Key-members Search over Large Graphs. arxiv: 2210.17403 [cs.DB]
[66]
Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf, and Junchi Yan. 2023 a. DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion. In ICLR.
[67]
Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian, and Junchi Yan. 2023 b. Simplifying and Empowering Transformers for Large-Graph Representations. In NeurIPS.
[68]
Yanping Wu, Jun Zhao, Renjie Sun, Chen Chen, and Xiaoyang Wang. 2021. Efficient personalized influential community search in large networks. Data Science and Engineering, Vol. 6, 3 (2021), 310--322.
[69]
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. 2020. On Layer Normalization in the Transformer Architecture. In ICLR, Vol. 119. 10524--10533.
[70]
Xiaoliang Xu, Jun Liu, Yuxiang Wang, and Xiangyu Ke. 2022. Academic Expert Finding via (k, P)-Core based Embedding over Heterogeneous Graphs. In ICDE.
[71]
Zongyu Xu, Yihao Zhang, Long Yuan, Yuwen Qian, Zi Chen, Mingliang Zhou, Qin Mao, and Weibin Pan. 2023. Effective Community Search on Large Attributed Bipartite Graphs. Int. J. Pattern Recognit. Artif. Intell., Vol. 37, 2 (2023), 2359002:1--2359002:25. https://doi.org/10.1142/S0218001423590024
[72]
Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and Efficient Truss Computation over Large Heterogeneous Information Networks. In ICDE. 901--912.
[73]
Kai Yao and Lijun Chang. 2021. Efficient Size-Bounded Community Search over Large Networks. PVLDB, Vol. 14, 8 (2021), 1441--1453.
[74]
Junhao Ye, Yuanyuan Zhu, and Lu Chen. 2023 a. Top-r keyword-based community search in attributed graphs. In ICDE. 1652--1664.
[75]
Junhao Ye, Yuanyuan Zhu, and Lu Chen. 2023 b. Top-r keyword-based community search in attributed graphs. In 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3--7, 2023. IEEE, 1652--1664. https://doi.org/10.1109/ICDE55515.2023.00130
[76]
Long Yuan, Lu Qin, Wenjie Zhang, Lijun Chang, and Jianye Yang. 2018. Index-Based Densest Clique Percolation Community Search in Networks. IEEE Trans. Knowl. Data Eng., Vol. 30, 5 (2018), 922--935.
[77]
Zhiwei Zhang, Xin Huang, Jianliang Xu, Byron Choi, and Zechao Shang. 2019. Keyword-Centric Community Search. In ICDE. 422--433.
[78]
Donglai Zhu, Hengshuai Yao, Bei Jiang, and Peng Yu. 2018. Negative Log Likelihood Ratio Loss for Deep Neural Network Classification. CoRR, Vol. abs/1804.10690 (2018).

Cited By

View all
  • (2024)Conditional Community Search Based on Weight InformationElectronics10.3390/electronics1321432113:21(4321)Online publication date: 4-Nov-2024
  • (2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 6-Aug-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2024
3164 pages
ISBN:9798400704314
DOI:10.1145/3626772
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. community search
  2. graph transformer

Qualifiers

  • Research-article

Conference

SIGIR 2024
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)422
  • Downloads (Last 6 weeks)63
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Conditional Community Search Based on Weight InformationElectronics10.3390/electronics1321432113:21(4321)Online publication date: 4-Nov-2024
  • (2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 6-Aug-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media