[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access
Just Accepted

GraphService: Topology-aware Constructor for Large-scale Graph Applications

Online AM: 17 August 2024 Publication History

Abstract

Graph-based services are becoming integrated into everyday life through graph applications and graph learning systems. While traditional graph processing approaches boast excellent throughput with millisecond-level processing time, the construction phase before executing kernel graph operators (e.g., BFS, SSSP) can take up to tens of hours, severely impacting the quality of graph service. Is it feasible to develop a fast graph constructor that can complete the construction process within minutes, or even seconds?
This paper aims to answer this question. We present GraphService, a flexible and efficient graph constructor for fast graph applications. To facilitate graph applications with better service, we equip GraphService with a hierarchy-aware graph partitioner based on communication topology, as well as a graph topology-aware compression by exploiting a huge number of identical-degree vertices within graph topology. Our evaluation, performed on a range of graph operations and datasets, shows that GraphService significantly reduces communication cost by three orders of magnitude improvement to construct a graph. Furthermore, we tailor GraphService for downstream graph tasks and deploy it on a production supercomputer using 79,024 computing nodes, achieving a remarkable graph processing throughput that outperforms the top-ranked supercomputer on the latest Graph500 list, with construction time reduced by orders of magnitude.

References

[1]
2021. https://graph500.org/. (2021).
[2]
2022. https://law.di.unimi.it/webdata/twitter-2010/. (2022).
[3]
2022. https://lemurproject.org/clueweb12/. (2022).
[4]
2022. https://www.laitimes.com/en/article/85ga_86m5.html. (2022).
[5]
2024. http://www.diag.uniroma1.it//challenge9/download.shtml. (2024).
[6]
Zainab Abbas, Vasiliki Kalavri, Paris Carbone, and Vladimir Vlassov. 2018. Streaming graph partitioning: an experimental study. Proceedings of the VLDB Endowment 11, 11 (2018), 1590–1603.
[7]
Arvind Arasu, Jasmine Novak, Andrew Tomkins, and John Tomlin. 2002. PageRank computation and the structure of the web: Experiments and algorithms. In Proceedings of the eleventh international World Wide Web conference, poster track. 107–117.
[8]
Nicolas Aspert, Volodymyr Miz, Benjamin Ricaud, and Pierre Vandergheynst. 2019. A graph-structured dataset for Wikipedia research. In Companion Proceedings of The 2019 World Wide Web Conference. 1188–1193.
[9]
Graph500 benchmark BFS. 2023. The Graph 500 List. https://graph500.org/?page_id=1240 Last accessed 03 Nov 2023.
[10]
M. Blocksome, C. Archer, T. Inglett, P. McCarthy, M. Mundy, J. Ratterman, A. Sidelnik, B. Smith, G. Almási, J. Castanos, D. Lieber, J. Moreira, S. Krishnamoorthy, V. Tipparaju, and J. Nieplocha. 2006. Design and Implementation of a One-Sided Communication Interface for the IBM eServer Blue Gene. ACM/IEEE SC 2006 Conference (SC’06)(2006), 54–54.
[11]
Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1–11.
[12]
Huanqi Cao, Yuanwei Wang, Haojie Wang, Heng Lin, Zixuan Ma, Wanwang Yin, and Wenguang Chen. 2022. Scaling graph traversal to 281 trillion edges with 40 million cores. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 234–245.
[13]
Zhenhua Chang, Ding Ding, and Youhao Xia. 2021. A graph-based QoS prediction approach for web service recommendation. Applied Intelligence(2021), 1–15.
[14]
Fabio Checconi and Fabrizio Petrini. 2014. Traversing trillions of edges in real time: Graph exploration on large-scale parallel machines. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 425–434.
[15]
R. Chen, J. Shi, Y. Chen, and H. Chen. 2015. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. European Conference on Computer Systems(2015), 1–15.
[16]
Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing (TOPC) 5, 3 (2019), 1–39.
[17]
Xinhai Chen, Peizhen Xie, Lihua Chi, Jie Liu, and Chunye Gong. 2018. An efficient SIMD compression format for sparse matrix-vector multiplication. Concurrency and Computation: Practice and Experience 30, 23(2018), e4800.
[18]
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 752–768.
[19]
Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, et al. 2021. GraphScope: a unified engine for big graph processing. Proceedings of the VLDB Endowment 14, 12 (2021), 2879–2892.
[20]
Wenfei Fan, Muyang Liu, Chao Tian, Ruiqi Xu, and Jingren Zhou. 2020. Incrementalization of graph partitioning algorithms. Proceedings of the VLDB Endowment 13, 8 (2020), 1261–1274.
[21]
A. Faraj, S. Kumar, B. Smith, A. Mamidala, and J. Gunnels. 2009. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations. In 2009 17th IEEE Symposium on High Performance Interconnects. 63–72. https://doi.org/10.1109/HOTI.2009.12
[22]
Per Fuchs, Domagoj Margan, and Jana Giceva. 2023. Sortledton: a Universal Graph Data Structure. ACM SIGMOD Record 52, 1 (2023), 17–25.
[23]
Pablo Fuentes, Mariano Benito, Enrique Vallejo, José Luis Bosque, Ramón Beivide, Andreea Anghel, Germán Rodríguez, Mitch Gusat, Cyriel Minkenberg, and Mateo Valero. 2017. A scalable synthetic traffic model of Graph500 for computer networks analysis. Concurrency and Computation: Practice and Experience 29, 24(2017), e4231.
[24]
Xinbiao Gan, Jiaqi Guo, Peilin Guo, Guang Wu, Jiaqi Si, Songzhu Mei, Cong Liu, and Tiejun Li. 2023. GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media Access. In Proceedings of the 31st ACM International Conference on Multimedia. 8984–8993.
[25]
Xinbiao Gan, Guang Wu, Cong Liu, Jiaqi Si, Xuguang Chen, Bo Yang, and Tiejun Li. 2022. TianheQueries: Ultra-Fast and Scalable Graph Queries on Tianhe Supercomputer. In 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 1153–1158.
[26]
Xinbiao Gan, Guang Wu, Shenghao Qiu, Feng Xiong, Jiaqi Si, Jianbin Fang, Dezun Dong, Chunye Gong, Tiejun Li, and Zheng Wang. 2024. GraphCube: Interconnection Hierarchy-aware Graph Processing. In Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 160–174.
[27]
Xinbiao Gan, Yiming Zhang, Ruibo Wang, Tiejun Li, Tiaojie Xiao, Ruigeng Zeng, Jie Liu, and Kai Lu. 2021. TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2021), 941–951.
[28]
Xinbiao Gan, Yiming Zhang, Ruibo Wang, Tiejun Li, Tiaojie Xiao, Ruigeng Zeng, Jie Liu, and Kai Lu. 2021. TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer. IEEE Transactions on Parallel and Distributed Systems (2021), 1–1. https://doi.org/10.1109/TPDS.2021.31007852
[29]
Xinbiao Gan, Yiming Zhang, Ruigeng Zeng, Jie Liu, Ruibo Wang, Tiejun Li, Li Chen, and Kai Lu. 2022. Xtree: Traversal-based partitioning for extreme-scale graph processing on supercomputers. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2046–2059.
[30]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. {PowerGraph}: Distributed {Graph-Parallel} Computation on Natural Graphs. In 10th USENIX symposium on operating systems design and implementation (OSDI 12). 17–30.
[31]
Muhammad Imran, Gábor E Gévay, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl. 2022. Fast datalog evaluation for batch and stream graph processing. World Wide Web 25, 2 (2022), 971–1003.
[32]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359–392.
[33]
George Karypis, Kirk Schloegel, and Vipin Kumar. 1997. Parmetis: Parallel graph partitioning and sparse matrix ordering library. (1997).
[34]
Deyu Kong, Xike Xie, and Zhuoxu Zhang. 2022. Clustering-based partitioning for large web graphs. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 593–606.
[35]
S. Kumar, A. Mamidala, Daniel Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burow. 2012. PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer. 2012 IEEE 26th International Parallel and Distributed Processing Symposium (2012), 763–773.
[36]
Mingzhe Li, Xiaoyi Lu, S. Potluri, Khaled Hamidouche, J. Jose, K. Tomko, and D. Panda. 2014. Scalable Graph500 design with MPI-3 RMA. 2014 IEEE International Conference on Cluster Computing (CLUSTER) (2014), 230–238.
[37]
Yishui Li, Peizhen Xie, Xinhai Chen, Jie Liu, Bo Yang, Shengguo Li, Chunye Gong, Xinbiao Gan, and Han Xu. 2020. VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors. The Journal of Supercomputing 76 (2020), 2063–2081.
[38]
Z Li, C Wu, and Y Li. 2021. FEP-based large-scale virtual screening for effective drug discovery against COVID-19. In Int. Conf. High Performance Computing, Networking, Storage, and Analysis.
[39]
Heng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin, and Weimin Zheng. 2017. Scalable graph traversal on sunway taihulight with ten million cores. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 635–645.
[40]
Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, et al. 2018. Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 706–716.
[41]
Weifeng Liu and Brian Vinter. 2015. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing. 339–350.
[42]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2009. Pregel: a system for large-scale graph processing.Sigmod (2009), 135–146.
[43]
Ruben Mayer and Hans-Arno Jacobsen. 2021. Hybrid edge partitioner: Partitioning large power-law graphs under memory constraints. In Proceedings of the 2021 International Conference on Management of Data. 1289–1302.
[44]
Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph structure in the web—revisited: a trick of the heavy tail. In Proceedings of the 23rd international conference on World Wide Web. 427–432.
[45]
Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. [n.d.]. Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark. ([n. d.]).
[46]
Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and M. Sato. 2020. Performance Evaluation of Supercomputer Fugaku using Breadth-First Search Benchmark in Graph500. 2020 IEEE International Conference on Cluster Computing (CLUSTER) (2020), 408–409.
[47]
Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. 2020. Performance Evaluation of Supercomputer Fugaku using Breadth-First Search Benchmark in Graph500. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 408–409.
[48]
Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. 2021. Performance of the supercomputer fugaku for breadth-first search in graph500 benchmark. In High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24–July 2, 2021, Proceedings 36. Springer, 372–390.
[49]
Md Nahid Newaz, Hua Ming, Sayan Ghosh, Joshua Suetterlein, and Nathan R Tallent. [n.d.]. Simulating Application Agnostic Process Assignment for Graph Workloads on Dragonfly and Fat Tree topologies. ([n. d.]).
[50]
Joel Nishimura and Johan Ugander. 2013. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1106–1114.
[51]
Anil Pacaci and M Tamer Özsu. 2019. Experimental analysis of streaming algorithms for graph partitioning. In Proceedings of the 2019 International Conference on Management of Data. 1375–1392.
[52]
G. Shah, J. Nieplocha, J. Mirza, Chulho Kim, R. Harrison, R. Govindaraju, K. Gildea, Paul DiNicola, and C. A. Bender. 1998. Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP. Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (1998), 260–266.
[53]
Naoyuki Shida, S. Sumimoto, and Atsuya Uno. 2012. MPI Library and Low-Level Communication on the K computer.
[54]
Hyogi Sim, Awais Khan, and Sudharshan S Vazhkudai. 2020. An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers. (2020).
[55]
George M Slota, Cameron Root, Karen Devine, Kamesh Madduri, and Sivasankaran Rajamanickam. 2020. Scalable, multi-constraint, complex-objective graph partitioning. IEEE Transactions on Parallel and Distributed Systems 31, 12 (2020), 2789–2801.
[56]
Stergios Stergiou, Dipen Rughwani, and Kostas Tsioutsiouliklis. 2018. Shortcutting label propagation for distributed connected components. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 540–546.
[57]
Tina Esther Trueman, P Narayanasamy, and J Ashok Kumar. 2022. A graph-based method for ranking of cloud service providers. The Journal of Supercomputing 78, 5 (2022), 7260–7277.
[58]
Sotiris Tsioutsiouliklis, Evaggelia Pitoura, Panayiotis Tsaparas, Ilias Kleftakis, and Nikos Mamoulis. 2021. Fairness-aware pagerank. In Proceedings of the Web Conference 2021. 3815–3826.
[59]
Koji Ueno and Toyotaro Suzumura. 2012. Highly scalable graph search for the graph500 benchmark. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing. 149–160.
[60]
Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka. 2016. Extreme scale breadth-first search on supercomputers. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 1040–1047.
[61]
Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka. 2017. Efficient breadth-first search on massively parallel and distributed-memory machines. Data Science and Engineering 2, 1 (2017), 22–35.
[62]
Ruibo Wang, Kai Lu, Juan Chen, Wenzhe Zhang, Jinwen Li, Yuan Yuan, Pingjing Lu, Libo Huang, Shengguo Li, and Xiaokang Fan. 2020. Brief introduction of TianHe exascale prototype system. Tsinghua Science and Technology 26, 3 (2020), 361–369.
[63]
Yingheng Wang, Yaosen Min, Xin Chen, and Ji Wu. 2021. Multi-view graph contrastive representation learning for drug-drug interaction prediction. In Proceedings of the Web Conference 2021. 2921–2933.
[64]
Wikipedia. 2021. Fugaku (supercomputer). https://en.wikipedia.org/wiki/Fugaku_(supercomputer) Last accessed 20 September 2021.
[65]
Jeremiah J Wilke and Joseph P Kenny. 2020. Opportunities and limitations of Quality-of-Service (QoS) in Message Passing (MPI) applications on adaptively routed Dragonfly and Fat Tree networks.Technical Report. Sandia National Lab.(SNL-CA), Livermore, CA (United States).
[66]
Gan Xinbiao, Tan Wen, and Liu Jie. 2021. Bidirectional-Bitmap Based CSR for Reducing Large-Scale Graph Space. Journal of Computer Research and Development 58, 3(2021), 458.
[67]
Dongjin Yu, Yu Liu, Yueshen Xu, and Yuyu Yin. 2014. Personalized QoS prediction for web services using latent factor models. In 2014 IEEE international conference on services computing. IEEE, 107–114.
[68]
H. Yu, I. Chung, and J. Moreira. 2006. Topology Mapping for Blue Gene/L Supercomputer. ACM/IEEE SC 2006 Conference (SC’06)(2006), 52–52.
[69]
Yiming Zhang, Kai Lu, and Wenguang Chen. 2021. Processing extreme-scale graphs on China’s supercomputers. Commun. ACM 64, 11 (2021), 60–63.
[70]
Yiming Zhang, Haonan Wang, Menghan Jia, Jinyan Wang, Dong sheng Li, Guangtao Xue, and K. Tan. 2020. TopoX: Topology Refactorization for Minimizing Network Communication in Graph Computations. IEEE/ACM Transactions on Networking 28 (2020), 2768–2782.
[71]
Zhuoxu Zhang and Zezhong Ding. 2022. Streaming Graph Clustering for Graph Partition. In 2022 IEEE 5th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE). IEEE, 880–884.
[72]
Li Zhe, Wu Chengkun, Li Yishui, et al. 2021. FEP-Based Large-Scale Virtual Screening for Effective Drug Discovery Against COVID-19 [J/OL]. (2021).
[73]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, Kimberly Keeton and Timothy Roscoe (Eds.). USENIX Association, 301–316. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhu

Index Terms

  1. GraphService: Topology-aware Constructor for Large-scale Graph Applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization Just Accepted
    EISSN:1544-3973
    Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Online AM: 17 August 2024
    Accepted: 08 August 2024
    Revised: 22 July 2024
    Received: 16 May 2024

    Check for updates

    Author Tags

    1. Graph construction
    2. Graph partitioning
    3. Graph representation
    4. Sorted graph
    5. Graph processing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 133
      Total Downloads
    • Downloads (Last 12 months)133
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media