Zheng et al., 2022 - Google Patents
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architecturesZheng et al., 2022
View PDF- Document ID
- 13439155535787267700
- Author
- Zheng Z
- Yang X
- Zhao P
- Long G
- Zhu K
- Zhu F
- Zhao W
- Liu X
- Yang J
- Zhai J
- Song S
- Lin W
- Publication year
- Publication venue
- Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
External Links
Snippet
This work reveals that memory-intensive computation is a rising performance-critical factor in recent machine learning models. Due to a unique set of new challenges, existing ML optimizing compilers cannot perform efficient fusion under complex two-level dependencies …
- 238000005457 optimization 0 title abstract description 17
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/31—Programming languages or programming paradigms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zheng et al. | AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures | |
Gao et al. | Estimating GPU memory consumption of deep learning models | |
Xin et al. | Accelerating human-in-the-loop machine learning: Challenges and opportunities | |
Jia et al. | TASO: optimizing deep learning computation with automatic generation of graph substitutions | |
Dai et al. | Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment | |
Zhang et al. | Retiarii: A deep learning {Exploratory-Training} framework | |
Zheng et al. | Fusionstitching: boosting memory intensive computations for deep learning workloads | |
Wang et al. | Deep learning at scale and at ease | |
Gu et al. | Improving execution concurrency of large-scale matrix multiplication on distributed data-parallel platforms | |
Phillips et al. | A CUDA implementation of the High Performance Conjugate Gradient benchmark | |
Totoni et al. | HPAT: high performance analytics with scripting ease-of-use | |
Liou et al. | GEVO: GPU code optimization using evolutionary computation | |
Rausch et al. | A data-centric optimization framework for machine learning | |
de Andrade et al. | Software deployment on heterogeneous platforms: A systematic mapping study | |
Fang et al. | Aristotle: A performance impact indicator for the OpenCL kernels using local memory | |
Yu et al. | Lorien: Efficient deep learning workloads delivery | |
Chen et al. | Slapo: A schedule language for progressive optimization of large deep learning model training | |
Kim et al. | {STRADS-AP}: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model | |
Ivanov et al. | Sten: Productive and efficient sparsity in pytorch | |
US11573777B2 (en) | Method and apparatus for enabling autonomous acceleration of dataflow AI applications | |
Huang et al. | Mind the gap: Attainable data movement and operational intensity bounds for tensor algorithms | |
Kang et al. | PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems | |
Alemany et al. | Jespipe: a plugin-based, open MPI framework for adversarial machine learning analysis | |
Remmelg et al. | High-level hardware feature extraction for GPU performance prediction of stencils | |
Benoit et al. | Using an intermediate representation to map workloads on heterogeneous parallel systems |