More Web Proxy on the site http://driver.im/

research-article

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

Authors:

Xingcheng Zhang,

Yun (Eric) Liang,

Dahua LinAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 200, Pages 1 - 6

https://doi.org/10.1145/3649329.3658483

Published: 07 November 2024 Publication History

Abstract

As deep learning empowers various fields, many domain-specific non-neural network operators have been proposed to improve the accuracy of deep learning models. Researchers often use the imperative programming diagram (PyTorch) to express these new operators, leaving the fusion optimization of these operators to deep learning compilers. Unfortunately, the inherent side effects introduced by imperative tensor programs, especially tensor-level mutations, often make optimization extremely difficult. Previous works either fail to eliminate the side effects of tensor-level mutations or require programmers to manually analyze and transform them. In this paper, we present a holistic functionalization approach (TensorSSA) to optimizing imperative tensor programs beyond control flow boundaries. We first introduce TensorSSA intermediate representation for removing tensor-level mutation and expanding the scope and ability of operator fusion. Based on TensorSSA IR, we propose a TensorSSA conversion algorithm that performs functionalization crossing the boundary of control flow. TensorSSA achieves a 1.79X (1.34X on average) speedup in representative deep learning tasks than state-of-the-art works.

References

[1]

Andrew W. Appel. 1998. SSA is Functional Programming. SIGPLAN Not. (1998).

[2]

The IREE Authors. 2019. IREE. https://openxla.github.io/iree/

[3]

Daniel Bolya et al. 2019. YOLACT: Real-Time Instance Segmentation. In ICCV.

[4]

Tianqi Chen et al. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI.

[5]

Fred Chow et al. 1996. Effective representation of aliases and indirect memory operations in SSA form. In CC.

[6]

Ron Cytron et al. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. (1991).

[7]

Zachary DeVito. 2022. TorchScript: Optimized execution of PyTorch programs. https://program-transformations.github.io/slides/pytorch_neurips.pdf

[8]

Sepp Hochreiter et al. 1997. Long Short-Term Memory. Neural Computation (1997).

[9]

Richard Zou Horace He. 2021. functorch: JAX-like composable function transforms for PyTorch. https://github.com/pytorch/functorch.

[10]

Zhihao Jia et al. 2019. TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions. In SOSP.

[11]

Lijuan Jiang et al. 2023. EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers. In ICPP.

[12]

Chris Lattner et al. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In CGO.

[13]

Wei Liu et al. 2016. SSD: Single Shot MultiBox Detector. In ECCV.

[14]

Lingxiao Ma et al. 2020. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. In OSDI.

[15]

Wei Niu et al. 2021. DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion. In PLDI.

[16]

Adam Paszke et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS.

[17]

Ragan-Kelley et al. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In PLDI.

Digital Library

[18]

Joseph Redmon et al. 2018. YOLOv3: An Incremental Improvement. CoRR (2018).

[19]

James Reed et al. 2022. torch.fx: Practical Program Capture and Transformation for Deep Learning in Python. In MLSys.

[20]

Christian Sarofeen et al. 2022. Introducing nvFuser, a deep learning compiler for PyTorch. https://pytorch.org/blog/introducing-nvfuser-a-deep-learning-compiler-for-pytorch/

[21]

Ilya Sutskever et al. 2014. Sequence to Sequence Learning with Neural Networks. In NeurIPS.

[22]

Zhi Tian et al. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In ICCV.

[23]

Ashish Vaswani et al. 2017. Attention is all you need. NeurIPS (2017).

[24]

Haojie Wang et al. 2021. PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections. In OSDI.

[25]

Peng Wu. 2023. PyTorch 2.0: The Journey to Bringing Compiler Technologies to the Core of PyTorch (Keynote). In CGO.

[26]

Chen Zhang et al. 2023. Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning. In OSDI.

[27]

Lianmin Zheng et al. 2020. Ansor: Generating High-Performance Tensor Programs for Deep Learning. In OSDI.

[28]

Zhen Zheng et al. 2022. AStitch: Enabling a New Multi-Dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures. In ASPLOS.

[29]

Mikhail Zolotukhin. 2021. NNC walkthrough: how PyTorch ops get fused. https://dev-discuss.pytorch.org/t/nnc-walkthrough-how-pytorch-ops-get-fused/125

[30]

Barret Zoph et al. 2018. Learning transferable architectures for scalable image recognition. In CVPR.

Index Terms

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language types
        Functional languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

As deep learning models nowadays are widely adopted by both cloud services and edge devices, reducing the latency of deep learning model inferences becomes crucial to provide efficient model serving. However, it is challenging to develop efficient ...
oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
CGO '24: Proceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization

With the rapid development of deep learning models and hardware support for dense computing, the deep learning (DL) workload characteristics changed significantly from a few hot spots on compute-intensive operations to a broad range of operations ...
Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
75
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)11

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents