[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Zheng et al., 2022 - Google Patents

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures

Zheng et al., 2022

View PDF
Document ID
13439155535787267700
Author
Zheng Z
Yang X
Zhao P
Long G
Zhu K
Zhu F
Zhao W
Liu X
Yang J
Zhai J
Song S
Lin W
Publication year
Publication venue
Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

External Links

Snippet

This work reveals that memory-intensive computation is a rising performance-critical factor in recent machine learning models. Due to a unique set of new challenges, existing ML optimizing compilers cannot perform efficient fusion under complex two-level dependencies …
Continue reading at jamesthez.github.io (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4442Reducing the number of cache misses; Data prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Programme initiating; Programme switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/44Arrangements for executing specific programmes
    • G06F9/455Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition

Similar Documents

Publication Publication Date Title
Zheng et al. AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures
Gao et al. Estimating GPU memory consumption of deep learning models
Xin et al. Accelerating human-in-the-loop machine learning: Challenges and opportunities
Jia et al. TASO: optimizing deep learning computation with automatic generation of graph substitutions
Dai et al. Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment
Zhang et al. Retiarii: A deep learning {Exploratory-Training} framework
Zheng et al. Fusionstitching: boosting memory intensive computations for deep learning workloads
Wang et al. Deep learning at scale and at ease
Gu et al. Improving execution concurrency of large-scale matrix multiplication on distributed data-parallel platforms
Phillips et al. A CUDA implementation of the High Performance Conjugate Gradient benchmark
Totoni et al. HPAT: high performance analytics with scripting ease-of-use
Liou et al. GEVO: GPU code optimization using evolutionary computation
Rausch et al. A data-centric optimization framework for machine learning
de Andrade et al. Software deployment on heterogeneous platforms: A systematic mapping study
Fang et al. Aristotle: A performance impact indicator for the OpenCL kernels using local memory
Yu et al. Lorien: Efficient deep learning workloads delivery
Chen et al. Slapo: A schedule language for progressive optimization of large deep learning model training
Kim et al. {STRADS-AP}: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model
Ivanov et al. Sten: Productive and efficient sparsity in pytorch
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
Huang et al. Mind the gap: Attainable data movement and operational intensity bounds for tensor algorithms
Kang et al. PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems
Alemany et al. Jespipe: a plugin-based, open MPI framework for adversarial machine learning analysis
Remmelg et al. High-level hardware feature extraction for GPU performance prediction of stencils
Benoit et al. Using an intermediate representation to map workloads on heterogeneous parallel systems