Zhao et al., 2024 - Google Patents
Load Balanced PIM-Based Graph ProcessingZhao et al., 2024
- Document ID
- 7973059322282298893
- Author
- Zhao X
- Chen S
- Kang Y
- Publication year
- Publication venue
- ACM Transactions on Design Automation of Electronic Systems
External Links
Snippet
Graph processing is widely used for many modern applications, such as social networks, recommendation systems, and knowledge graphs. However, processing large-scale graphs on traditional Von Neumann architectures is challenging due to the irregular graph data and …
- 238000012545 processing 0 title abstract description 58
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
- G06F3/0601—Dedicated interfaces to storage systems
- G06F3/0628—Dedicated interfaces to storage systems making use of a particular technique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
- G06F3/0601—Dedicated interfaces to storage systems
- G06F3/0602—Dedicated interfaces to storage systems specifically adapted to achieve a particular effect
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Powerlyra: Differentiated graph computation and partitioning on skewed graphs | |
Zhang et al. | GraphP: Reducing communication for PIM-based graph processing with efficient data partition | |
Barker et al. | Entering the petaflop era: the architecture and performance of Roadrunner | |
Sariyüce et al. | Betweenness centrality on GPUs and heterogeneous architectures | |
Satish et al. | Navigating the maze of graph analytics frameworks using massive graph datasets | |
Satish et al. | Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing | |
Chu et al. | Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems | |
Gharaibeh et al. | Efficient large-scale graph processing on hybrid CPU and GPU systems | |
Kim et al. | CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster | |
Wang et al. | Grus: Toward unified-memory-efficient high-performance graph processing on gpu | |
Addisie et al. | Heterogeneous memory subsystem for natural graph analytics | |
Chen et al. | ThunderGP: Resource-efficient graph processing framework on FPGAs with HLS | |
Wang et al. | A conflict-free scheduler for high-performance graph processing on multi-pipeline FPGAs | |
Mohanamuraly et al. | Hardware locality-aware partitioning and dynamic load-balancing of unstructured meshes for large-scale scientific applications | |
Zhao et al. | Load Balanced PIM-Based Graph Processing | |
Kalnis et al. | Mizan: Optimizing graph mining in large parallel systems | |
Mirsadeghi et al. | PTRAM: A parallel topology-and routing-aware mapping framework for large-scale HPC systems | |
Faraji et al. | Exploiting heterogeneity of communication channels for efficient GPU selection on multi-GPU nodes | |
Li et al. | Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing | |
Addisie et al. | Centaur: Hybrid processing in on/off-chip memory architecture for graph analytics | |
Wang et al. | KLSAT: An application mapping algorithm based on Kernighan–Lin partition and simulated annealing for a specific WK-recursive NoC architecture | |
Fan et al. | Scalable and efficient graph traversal on high-throughput cluster | |
Anastasiadis et al. | PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems | |
Su et al. | GraFlex: Flexible Graph Processing on FPGAs through Customized Scalable Interconnection Network | |
Liu et al. | Topology‐Aware Strategy for MPI‐IO Operations in Clusters |