Zheng et al., 2022 - Google Patents

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures

Zheng et al., 2022

View PDF

Document ID: 13439155535787267700
Author: Zheng Z; Yang X; Zhao P; Long G; Zhu K; Zhu F; Zhao W; Liu X; Yang J; Zhai J; Song S; Lin W
Publication year: 2022
Publication venue: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

External Links

Cited by

Snippet

This work reveals that memory-intensive computation is a rising performance-critical factor in recent machine learning models. Due to a unique set of new challenges, existing ML optimizing compilers cannot perform efficient fusion under complex two-level dependencies …

Continue reading at jamesthez.github.io (PDF) (other versions)

238000005457 optimization 0 title abstract description 17

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/31—Programming languages or programming paradigms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition

Similar Documents

Publication	Publication Date	Title
Zheng et al.	2022	AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures
Gao et al.	2020	Estimating GPU memory consumption of deep learning models
Xin et al.	2018	Accelerating human-in-the-loop machine learning: Challenges and opportunities
Jia et al.	2019	TASO: optimizing deep learning computation with automatic generation of graph substitutions
Dai et al.	2022	Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment
Zhang et al.	2020	Retiarii: A deep learning {Exploratory-Training} framework
Zheng et al.	2020	Fusionstitching: boosting memory intensive computations for deep learning workloads
Wang et al.	2016	Deep learning at scale and at ease
Gu et al.	2017	Improving execution concurrency of large-scale matrix multiplication on distributed data-parallel platforms
Phillips et al.	2014	A CUDA implementation of the High Performance Conjugate Gradient benchmark
Totoni et al.	2017	HPAT: high performance analytics with scripting ease-of-use
Liou et al.	2020	GEVO: GPU code optimization using evolutionary computation
Rausch et al.	2022	A data-centric optimization framework for machine learning
de Andrade et al.	2019	Software deployment on heterogeneous platforms: A systematic mapping study
Fang et al.	2014	Aristotle: A performance impact indicator for the OpenCL kernels using local memory
Yu et al.	2021	Lorien: Efficient deep learning workloads delivery
Chen et al.	2024	Slapo: A schedule language for progressive optimization of large deep learning model training
Kim et al.	2019	{STRADS-AP}: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model
Ivanov et al.	2023	Sten: Productive and efficient sparsity in pytorch
US11573777B2 (en)	2023-02-07	Method and apparatus for enabling autonomous acceleration of dataflow AI applications
Huang et al.	2024	Mind the gap: Attainable data movement and operational intensity bounds for tensor algorithms
Kang et al.	2020	PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems
Alemany et al.	2021	Jespipe: a plugin-based, open MPI framework for adversarial machine learning analysis
Remmelg et al.	2020	High-level hardware feature extraction for GPU performance prediction of stencils
Benoit et al.	2016	Using an intermediate representation to map workloads on heterogeneous parallel systems