[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

A sparse iteration space transformation framework for sparse tensor algebra

Published: 13 November 2020 Publication History

Abstract

We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the original iteration space to derived iteration spaces. These functions are needed by the code generator to emit code that maps coordinates between iteration spaces at runtime, since the coordinates in the sparse data structures remain in the original iteration space. We further demonstrate that derived iteration spaces can tile both the universe of coordinates and the subset of nonzero coordinates: the former is analogous to tiling dense iteration spaces, while the latter tiles sparse iteration spaces into statically load-balanced blocks of nonzeros. Tiling the space of nonzeros lets the generated code efficiently exploit heterogeneous compute resources such as threads, vector units, and GPUs.
We implement these concepts by extending the sparse iteration theory implementation in the TACO system. The associated scheduling API can be used by performance engineers or it can be the target of an automatic scheduling system. We outline one heuristic autoscheduling system, but other systems are possible. Using the scheduling API, we show how to optimize mixed sparse-dense tensor algebra expressions on CPUs and GPUs. Our results show that the sparse transformations are sufficient to generate code with competitive performance to hand-optimized implementations from the literature, while generalizing to all of the tensor algebra.

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p119-p-video.mp4)
We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. We further demonstrate that derived iteration spaces can tile both the universe of coordinates and the subset of nonzero coordinates. We implement these concepts by extending the sparse iteration theory implementation in the TACO system. The associated scheduling API can be used by performance engineers or it can be the target of an automatic scheduling system. We outline one heuristic autoscheduling system, but other systems are possible. Using the scheduling API, we show how to optimize mixed sparse-dense tensor algebra expressions on CPUs and GPUs. Our results show that the sparse transformations are sufficient to generate code with competitive performance to hand-optimized implementations from the literature, while generalizing to all of the tensor algebra.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) ( OSDI'16). USENIX Association, USA, 265-283.
[2]
Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. 2019. Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. Graph. 38, 4, Article 121 ( July 2019 ), 12 pages.
[3]
Frances E. Allen and John Cocke. 1972. A Catalogue of Optimizing Transformations. In Design and Optimization of Compilers, R. Rustin (Ed.). Prentice-Hall, Englewood Clifs, NJ, 1-30.
[4]
Corinne Ancourt and François Irigoin. 1991. Scanning polyhedra with DO loops. Principles and Pratice of Parallel Programming 26, 7 (April 1991 ), 39-50.
[5]
Alexander A. Auer, Gerald Baumgartner, David E. Bernholdt, Alina Bibireata, Venkatesh Choppella, Daniel Cociorva, Xiaoyang Gao, Robert Harrison, Sriram Krishnamoorthy, Sandhya Krishnan, Chi-Chung Lam, Qingda Lu, Marcel Nooijen, Russell Pitzer, J. Ramanujam, P. Sadayappan, and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104, 2 ( 2006 ), 211-228.
[6]
R. Baghdadi, J. Ray, M. B. Romdhane, E. D. Sozzo, A. Akkas, Y. Zhang, P. Suriana, S. Kamil, and S. Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 193-205.
[7]
Utpal Banerjee. 1990. Unimodular transformations of double loops. Available as Nicolau A., Gelernter D., Gross T., Padua D. (eds) Advances in languages and compilers for parallel computing ( 1991 ). The MIT Press, Cambridge, pp 192-219. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing (LCPC).
[8]
M. Baskaran, T. Henretty, B. Pradelle, M. H. Langston, D. Bruns-Smith, J. Ezick, and R. Lethin. 2017. Memory-eficient parallel tensor decompositions. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). 1-7. https: //doi.org/10.1109/HPEC. 2017.8091026
[9]
Muthu Baskaran, Benoit Meister, and Richard Lethin. 2014. Low-overhead load-balanced scheduling for sparse tensor computations. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). 1-6. https://doi.org/10.1109/HPEC. 2014.7041006
[10]
Nathan Bell and Michael Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In International Conference for High Performance Computing, Networking, Storage, and Analysis. ACM, Portland, Oregon, 18 : 1-18 : 11. https://doi.org/10.1145/1654059.1654078
[11]
Aart J. C. Bik and Harry A. G. Wijshof. 1993. Compilation Techniques for Sparse Matrix Computations. In Proceedings of the 7th International Conference on Supercomputing (Tokyo, Japan) ( ICS '93). Association for Computing Machinery, New York, NY, USA, 416-424. https://doi.org/10.1145/165939.166023
[12]
Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. Technical Report. University of Southern California. 28 pages. http://citeseerx.ist.psu.edu/viewdoc/download?doi =10.1. 1.214.8396&rep=rep1&type=pdf
[13]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018a. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning This paper is included in the Proceedings of the. In Symposium on Operating Systems Design and Implementation. USENIX Association, Carlsbad, CA, 578-594. https://www.usenix.org/conference/osdi18/ presentation/chen
[14]
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018b. Learning to Optimize Tensor Programs. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 3389-3400. http://papers.nips.cc/paper/7599-learning-to-optimize-tensor-programs.pdf
[15]
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proceedings of the ACM on Programming Languages 2, OOPSLA (nov 2018 ), 123 : 1-123 : 30. https://doi.org/10.1145/3276493
[16]
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2020. Automatic Generation of Eficient Sparse Tensor Format Conversion Routines. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) ( PLDI 2020 ). Association for Computing Machinery, New York, NY, USA, 823-838. https: //doi.org/10.1145/3385412.3385963
[17]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 ( Dec. 2011 ), 25 pages. https://doi.org/10.1145/2049662.2049663
[18]
Evgeny Epifanovsky, Michael Wormit, Tomasz Kuś, Arie Landau, Dmitry Zuev, Kirill Khistyaev, Prashant Manohar, Ilya Kaliman, Andreas Dreuw, and Anna I. Krylov. 2013. New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations. Journal of computational chemistry 34, 26 ( 2013 ), 2293-2309. https://doi.org/10.1002/jcc.23377
[19]
Paul Feautrier. 1988. Parametric integer programming. RAIRO-Operations Research 22, 3 ( 1988 ), 243-268. https://doi.org/10. 1051/ro/1988220302431
[20]
Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. http://eigen.tuxfamily.org
[21]
Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) ( PPoPP '19). Association for Computing Machinery, New York, NY, USA, 300-314. https://doi.org/10.1145/3293883.3295712
[22]
Intel. 2012. Intel math kernel library reference manual. Technical Report. 630813-051US, 2012. http://software.intel.com/ sites/products/documentation/hpc/mkl/mklman/mklman.pdf.
[23]
Inah Jeon, Evangelos E. Papalexakis, U Kang, and Christos Faloutsos. 2015. HaTen2: Billion-scale Tensor Decompositions. In IEEE International Conference on Data Engineering (ICDE). https://doi.org/10.1109/ICDE. 2015.7113355
[24]
Yangqing Jia, Evan Shelhamer, Jef Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Cafe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, Florida, USA) ( MM '14). Association for Computing Machinery, New York, NY, USA, 675-678. https://doi.org/10.1145/2647868.2654889
[25]
Fredrik Kjolstad. 2020. Sparse Tensor Algebra Compilation. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA. http://groups.csail.mit.edu/commit/papers/2020/kjolstad-thesis.pdf
[26]
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In International Symposium on Code Generation and Optimization. IEEE Press, Washington, DC, 180-192. https://doi. org/10.1109/CGO. 2019.8661185
[27]
Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe. 2017a. taco: A Tool to Generate Tensor Algebra Kernels. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 943-948. https://doi.org/10.1109/ASE. 2017.8115709
[28]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017b. The Tensor Algebra Compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (oct 2017 ), 77 : 1-77 : 29. https://doi.org/10.1145/3133901
[29]
Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par'97 Parallel Processing. Springer, 318-327. https://doi.org/10.1007/BFb0002751
[30]
Leslie Lamport. 1974. The Parallel Execution of DO loops. Commun. ACM 17, 2 (Feb. 1974 ), 83-93. https://doi.org/10.1145/ 360827.360844
[31]
Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. 2009. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Raleigh, NC, USA) ( PPoPP 09). Association for Computing Machinery, New York, NY, USA, 10 pages. https://doi.org/10.1145/1504176.1504194
[32]
Duane Merrill and Michael Garland. 2016. Merge-Based Parallel Sparse Matrix-Vector Multiplication. International Conference for High Performance Computing, Networking, Storage and Analysis, SC November ( 2016 ). https://doi.org/10.1109/SC. 2016.57
[33]
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines. ACM Trans. Graph. 35, 4, Article 83 ( July 2016 ), 11 pages. https: //doi.org/10.1145/2897824.2925952
[34]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard W. Vuduc, and P. Sadayappan. 2019. Load-Balanced Sparse MTTKRP on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, Rio de Janeiro, Brazil, May 20-24, 2019. 123-133. https://doi.org/10.1109/IPDPS. 2019.00023
[35]
NVIDIA V10.1.243. 2019. cuSPARSE Software Library. https://docs.nvidia.com/cuda/archive/10.1/cusparse/index.html
[36]
Sreepathi Pai and Keshav Pingali. 2016. A Compiler for Throughput Optimization of Graph Algorithms on GPUs. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Amsterdam, Netherlands) ( OOPSLA 2016 ). Association for Computing Machinery, New York, NY, USA, 1-19. https://doi.org/10.1145/2983990.2984015
[37]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic diferentiation in PyTorch. ( 2017 ). https://openreview.net/pdf?id= BJJsrmfCZ
[38]
William Pugh and Tatiana Shpeisman. 1999. SIPR: A New Framework for Generating Eficient Code for Sparse Matrix Computations. In Languages and Compilers for Parallel Computing, Siddhartha Chatterjee, Jan F. Prins, Larry Carter, Jeanne Ferrante, Zhiyuan Li, David Sehr, and Pen-Chung Yew (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 213-229. https://doi.org/10.1007/3-540-48319-5_14
[39]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics 31, 4 ( 2012 ), 1-12. https://doi.org/10.1145/2185520.2335383
[40]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) ( PLDI '13). ACM, New York, NY, USA, 519-530. https://doi.org/10.1145/2491956.2462176
[41]
Ryan Senanayake. 2020. A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra. M. Eng. Thesis. Massachusetts Institute of Technology, Cambridge, MA. http://groups.csail.mit.edu/commit/papers/2020/ryan_2020.pdf
[42]
Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/
[43]
Shaden Smith and George Karypis. 2015. Tensor-Matrix Products with a Compressed Sparse Tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (Austin, Texas) ( IA3 ' 15 ). Association for Computing Machinery, New York, NY, USA, Article 5, 7 pages. https://doi.org/10.1145/2833179.2833183
[44]
Shaden Smith, Niranjay Ravindran, Nicholas Sidiropoulos, and George Karypis. 2015. SPLATT: Eficient and Parallel Sparse Tensor-Matrix Multiplication. In IEEE International Parallel and Distributed Processing Symposium. IEEE, 61-70. https://doi.org/10.1109/IPDPS. 2015.27
[45]
Edgar Solomonik, Devin Matthews, Jef R. Hammond, John F. Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput. 74, 12 ( 2014 ), 3176-3190. https://doi.org/10.1016/j.jpdc. 2014. 06. 002 Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
[46]
Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE 106, 11 ( 2018 ), 1921-1934. https://doi.org/10.1109/JPROC. 2018. 2857721
[47]
Patricia Suriana, Andrew Adams, and Shoaib Kamil. 2017. Parallel Associative Reductions in Halide. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (Austin, USA) ( CGO '17). IEEE Press, 281-291. https://doi.org/10.1109/CGO. 2017.7863747
[48]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. Technical Report. 12 pages. arXiv: 1802.04730 http://arxiv.org/abs/ 1802.04730
[49]
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015 ). 521-532. https://doi.org/10. 1145/2737924.2738003
[50]
Ziheng Wang. 2020. Automatic Optimization of Sparse Tensor Algebra Programs. M.Eng. Thesis. Massachusetts Institute of Technology, Cambridge, MA. https://hdl.handle. net/1721.1/127536
[51]
Michael J Wolfe. 1982. Optimizing Supercompilers for Supercomputers. Ph.D. Dissertation. University of Illinois at UrbanaChampaign. https://dl.acm.org/doi/book/10.5555/910705
[52]
David Wonnacott and William Pugh. 1995. Nonlinear array dependence analysis. In Proc. Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers.
[53]
Carl Yang, Aydın Buluç, and John D. Owens. 2018. Design Principles for Sparse Matrix Multiplication on the GPU. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.). Springer International Publishing, Cham, 672-687. https://doi.org/10.1007/978-3-319-96983-1_48
[54]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-Performance Graph DSL. Proc. ACM Program. Lang. 2, OOPSLA, Article 121 (Oct. 2018 ), 30 pages. https: //doi.org/10.1145/3276491

Cited By

View all
  • (2024)Compilation of Shape Operators on Sparse ArraysProceedings of the ACM on Programming Languages10.1145/36897528:OOPSLA2(1162-1188)Online publication date: 8-Oct-2024
  • (2024)SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest RestructuringProceedings of the ACM on Programming Languages10.1145/36897308:OOPSLA2(527-556)Online publication date: 8-Oct-2024
  • (2024)DynaSpa: Exploiting Spatial Sparsity for Efficient Dynamic DNN Inference on DevicesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699348(422-435)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
November 2020
3108 pages
EISSN:2475-1421
DOI:10.1145/3436718
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020
Published in PACMPL Volume 4, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Optimizing Transformations
  2. Sparse Iteration Spaces
  3. Sparse Tensor Algebra

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)461
  • Downloads (Last 6 weeks)52
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Compilation of Shape Operators on Sparse ArraysProceedings of the ACM on Programming Languages10.1145/36897528:OOPSLA2(1162-1188)Online publication date: 8-Oct-2024
  • (2024)SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest RestructuringProceedings of the ACM on Programming Languages10.1145/36897308:OOPSLA2(527-556)Online publication date: 8-Oct-2024
  • (2024)DynaSpa: Exploiting Spatial Sparsity for Efficient Dynamic DNN Inference on DevicesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699348(422-435)Online publication date: 4-Nov-2024
  • (2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024
  • (2024)Compiling Recurrences over Dense and Sparse ArraysProceedings of the ACM on Programming Languages10.1145/36498208:OOPSLA1(250-275)Online publication date: 29-Apr-2024
  • (2024)A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638470(377-389)Online publication date: 2-Mar-2024
  • (2024)A Tensor Algebra Compiler for Sparse DifferentiationProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444787(1-12)Online publication date: 2-Mar-2024
  • (2023)BaCO: A Fast and Portable Bayesian Compiler Optimization FrameworkProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624770(19-42)Online publication date: 25-Mar-2023
  • (2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
  • (2023)A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614284(1332-1346)Online publication date: 28-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media