Huang et al., 2024 - Google Patents
An optimized error-controlled mpi collective framework integrated with lossy compressionHuang et al., 2024
View PDF- Document ID
- 323249764786909653
- Author
- Huang J
- Di S
- Yu X
- Zhai Y
- Zhang Z
- Liu J
- Lu X
- Raffenetti K
- Zhou H
- Zhao K
- Chen Z
- Cappello F
- Guo Y
- Thakur R
- Publication year
- Publication venue
- 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
External Links
Snippet
With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. The large message size …
- 238000007906 compression 0 title abstract description 152
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Moreland et al. | An image compositing solution at scale | |
Di et al. | Fast error-bounded lossy HPC data compression with SZ | |
Ma et al. | Garaph: Efficient {GPU-accelerated} graph processing on a single machine with balanced replication | |
Zou et al. | FlexAnalytics: a flexible data analytics framework for big data applications with I/O performance improvement | |
US11436065B2 (en) | System for efficient large-scale data distribution in distributed and parallel processing environment | |
Zhou et al. | Designing high-performance mpi libraries with on-the-fly compression for modern gpu clusters | |
Knecht et al. | Large-scale parallel configuration interaction. II. Two-and four-component double-group general active space implementation with application to BiH | |
Peterka et al. | A configurable algorithm for parallel image-compositing applications | |
Huang et al. | An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression | |
Yu et al. | Ultrafast error-bounded lossy compression for scientific datasets | |
JPWO2014061481A1 (en) | Data transfer apparatus and data transfer system using adaptive compression algorithm | |
Cheng et al. | HAFLO: GPU-based acceleration for federated logistic regression | |
Al Sideiri et al. | CUDA implementation of fractal image compression | |
US20190281316A1 (en) | High efficiency video coding method and apparatus, and computer-readable storage medium | |
CN117435855A (en) | Method for performing convolution operation, electronic device, and storage medium | |
Barrett et al. | Reducing the bulk in the bulk synchronous parallel model | |
Zhou et al. | Accelerating broadcast communication with gpu compression for deep learning workloads | |
Wu et al. | Memory-efficient quantum circuit simulation by using lossy data compression | |
Xu et al. | Scaling up data-parallel analytics platforms: Linear algebraic operation cases | |
US10210136B2 (en) | Parallel computer and FFT operation method | |
Huang et al. | POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters | |
Markov et al. | CGX: adaptive system support for communication-efficient deep learning | |
Suresh et al. | Network assisted non-contiguous transfers for GPU-aware MPI libraries | |
KR20220142059A (en) | In-memory Decoding Cache and Its Management Scheme for Accelerating Deep Learning Batching Process | |
Koyama et al. | Scalable data parallel distributed training for graph neural networks |