Huang et al., 2024 - Google Patents

An optimized error-controlled mpi collective framework integrated with lossy compression

Huang et al., 2024

Document ID: 323249764786909653
Author: Huang J; Di S; Yu X; Zhai Y; Zhang Z; Liu J; Lu X; Raffenetti K; Zhou H; Zhao K; Chen Z; Cappello F; Guo Y; Thakur R
Publication year: 2024
Publication venue: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

External Links

Cited by

Snippet

With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. The large message size …

Continue reading at arxiv.org (PDF) (other versions)

238000007906 compression 0 title abstract description 152

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication	Publication Date	Title
Moreland et al.	2011	An image compositing solution at scale
Di et al.	2016	Fast error-bounded lossy HPC data compression with SZ
Ma et al.	2017	Garaph: Efficient {GPU-accelerated} graph processing on a single machine with balanced replication
Zou et al.	2014	FlexAnalytics: a flexible data analytics framework for big data applications with I/O performance improvement
US11436065B2 (en)	2022-09-06	System for efficient large-scale data distribution in distributed and parallel processing environment
Zhou et al.	2021	Designing high-performance mpi libraries with on-the-fly compression for modern gpu clusters
Knecht et al.	2010	Large-scale parallel configuration interaction. II. Two-and four-component double-group general active space implementation with application to BiH
Peterka et al.	2009	A configurable algorithm for parallel image-compositing applications
Huang et al.	2023	An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression
Yu et al.	2022	Ultrafast error-bounded lossy compression for scientific datasets
JPWO2014061481A1 (en)	2016-09-05	Data transfer apparatus and data transfer system using adaptive compression algorithm
Cheng et al.	2021	HAFLO: GPU-based acceleration for federated logistic regression
Al Sideiri et al.	2020	CUDA implementation of fractal image compression
US20190281316A1 (en)	2019-09-12	High efficiency video coding method and apparatus, and computer-readable storage medium
CN117435855A (en)	2024-01-23	Method for performing convolution operation, electronic device, and storage medium
Barrett et al.	2013	Reducing the bulk in the bulk synchronous parallel model
Zhou et al.	2022	Accelerating broadcast communication with gpu compression for deep learning workloads
Wu et al.	2018	Memory-efficient quantum circuit simulation by using lossy data compression
Xu et al.	2017	Scaling up data-parallel analytics platforms: Linear algebraic operation cases
US10210136B2 (en)	2019-02-19	Parallel computer and FFT operation method
Huang et al.	2024	POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters
Markov et al.	2022	CGX: adaptive system support for communication-efficient deep learning
Suresh et al.	2022	Network assisted non-contiguous transfers for GPU-aware MPI libraries
KR20220142059A (en)	2022-10-21	In-memory Decoding Cache and Its Management Scheme for Accelerating Deep Learning Batching Process
Koyama et al.	2022	Scalable data parallel distributed training for graph neural networks