Zhao et al., 2024 - Google Patents

Load Balanced PIM-Based Graph Processing

Zhao et al., 2024

Document ID: 7973059322282298893
Author: Zhao X; Chen S; Kang Y
Publication year: 2024
Publication venue: ACM Transactions on Design Automation of Electronic Systems

External Links

Cited by

Snippet

Graph processing is widely used for many modern applications, such as social networks, recommendation systems, and knowledge graphs. However, processing large-scale graphs on traditional Von Neumann architectures is challenging due to the irregular graph data and …

Continue reading at dl.acm.org (other versions)

238000012545 processing 0 title abstract description 58

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
- G06F3/0601—Dedicated interfaces to storage systems
- G06F3/0628—Dedicated interfaces to storage systems making use of a particular technique
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
- G06F3/0601—Dedicated interfaces to storage systems
- G06F3/0602—Dedicated interfaces to storage systems specifically adapted to achieve a particular effect

Similar Documents

Publication	Publication Date	Title
Chen et al.	2019	Powerlyra: Differentiated graph computation and partitioning on skewed graphs
Zhang et al.	2018	GraphP: Reducing communication for PIM-based graph processing with efficient data partition
Barker et al.	2008	Entering the petaflop era: the architecture and performance of Roadrunner
Sariyüce et al.	2013	Betweenness centrality on GPUs and heterogeneous architectures
Satish et al.	2014	Navigating the maze of graph analytics frameworks using massive graph datasets
Satish et al.	2012	Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing
Chu et al.	2020	Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems
Gharaibeh et al.	2013	Efficient large-scale graph processing on hybrid CPU and GPU systems
Kim et al.	2012	CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster
Wang et al.	2021	Grus: Toward unified-memory-efficient high-performance graph processing on gpu
Addisie et al.	2018	Heterogeneous memory subsystem for natural graph analytics
Chen et al.	2022	ThunderGP: Resource-efficient graph processing framework on FPGAs with HLS
Wang et al.	2020	A conflict-free scheduler for high-performance graph processing on multi-pipeline FPGAs
Mohanamuraly et al.	2020	Hardware locality-aware partitioning and dynamic load-balancing of unstructured meshes for large-scale scientific applications
Zhao et al.	2024	Load Balanced PIM-Based Graph Processing
Kalnis et al.	2012	Mizan: Optimizing graph mining in large parallel systems
Mirsadeghi et al.	2016	PTRAM: A parallel topology-and routing-aware mapping framework for large-scale HPC systems
Faraji et al.	2017	Exploiting heterogeneity of communication channels for efficient GPU selection on multi-GPU nodes
Li et al.	2019	Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing
Addisie et al.	2020	Centaur: Hybrid processing in on/off-chip memory architecture for graph analytics
Wang et al.	2019	KLSAT: An application mapping algorithm based on Kernighan–Lin partition and simulated annealing for a specific WK-recursive NoC architecture
Fan et al.	2021	Scalable and efficient graph traversal on high-throughput cluster
Anastasiadis et al.	2023	PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems
Su et al.	2024	GraFlex: Flexible Graph Processing on FPGAs through Customized Scalable Interconnection Network
Liu et al.	2018	Topology‐Aware Strategy for MPI‐IO Operations in Clusters