[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3350755.3400252acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
extended-abstract
Public Access

A Computational Model for Tensor Core Units

Published: 09 July 2020 Publication History

Abstract

To respond to the need for efficient training and inference of deep neural networks, a plethora of domain-specific architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is the design for efficiently computing a dense matrix product of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including dense and sparse matrix multiplication and the Discrete Fourier Transform. We finally highlight a relation between the TCU model and the external memory model.

References

[1]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. J. ACM, 59(6):32:1--32:23, 2013.
[2]
R. A. Chowdhury, F. Silvestri, and F. Vella. A computational model for tensor core units, 2020. Arxiv 1908.06649.
[3]
A. Dakkak, C. Li, J. Xiong, I. Gelado, and W.-M. Hwu. Accelerating reduction and scan using tensor core units. In Proc. Int. Conf. on Supercomputing (ICS), pages 46--57, 2019.
[4]
R. Jacob and M. Stöckel. Fast output-sensitive matrix multiplication. In Proc. European Symposium on Algorithms (ESA), pages 766--778, 2015.
[5]
N. P. Jouppi et al. In-datacenter performance analysis of a tensor processing unit. In Proc. 44th Int. Symposium on Computer Architecture (ISCA), pages 1--12, 2017.
[6]
Nvidia Tesla V100 GPU architecture. http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf.
[7]
R. Raz. On the complexity of matrix product. SIAM Journal on Computing, 32(5):1356--1369, 2003.
[8]
A. Sorna, X. Cheng, E. D'Azevedo, K. Won, and S. Tomov. Optimizing the fast fourier transform using mixed precision on tensor core hardware. In Proc. 25th Int. Conf. on High Performance Computing Workshops (HiPCW), pages 3--7, 2018.
[9]
J. S. Vitter. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2(4):305--474, 2006.

Cited By

View all
  • (2024)Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the UglyProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3653835(178-189)Online publication date: 7-May-2024
  • (2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
  • (2023)A Parallel Scan Algorithm in the Tensor Core Unit ModelEuro-Par 2023: Parallel Processing10.1007/978-3-031-39698-4_33(489-502)Online publication date: 28-Aug-2023
  • Show More Cited By

Index Terms

  1. A Computational Model for Tensor Core Units

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures
      July 2020
      601 pages
      ISBN:9781450369350
      DOI:10.1145/3350755
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 July 2020

      Check for updates

      Author Tags

      1. computational model
      2. efficient algorithms
      3. graph problems
      4. hardware accelerators
      5. linear algebra
      6. tensor core

      Qualifiers

      • Extended-abstract

      Funding Sources

      • UniBZ-CRC
      • Università degli Studi di Padova
      • National Science Foundation
      • INdAM-GNCS
      • Ministero dellðIstruzione, dellðUniversità e della Ricerca

      Conference

      SPAA '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 447 of 1,461 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)253
      • Downloads (Last 6 weeks)42
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the UglyProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3653835(178-189)Online publication date: 7-May-2024
      • (2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
      • (2023)A Parallel Scan Algorithm in the Tensor Core Unit ModelEuro-Par 2023: Parallel Processing10.1007/978-3-031-39698-4_33(489-502)Online publication date: 28-Aug-2023
      • (2021)Algorithm Design for Tensor UnitsEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_22(353-367)Online publication date: 1-Sep-2021
      • (2020)Similarity Search with Tensor Core UnitsSimilarity Search and Applications10.1007/978-3-030-60936-8_6(76-84)Online publication date: 14-Oct-2020

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media