[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3582016.3582051acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

The Sparse Abstract Machine

Published: 25 March 2023 Publication History

Abstract

We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressive enough to incorporate arbitrary iteration orderings and many hardware-specific optimizations. We also present Custard, a compiler from a high-level language to SAM that demonstrates SAM's usefulness as an intermediate representation. We automatically bind from SAM to a streaming dataflow simulator. We evaluate the generality and extensibility of SAM, explore the performance space of sparse tensor algebra optimizations using SAM, and show SAM's ability to represent dataflow hardware.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283.
[2]
Peter Ahrens, Fredrik Kjolstad, and Saman Amarasinghe. 2022. Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI) (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 269–285. isbn:9781450392655 https://doi.org/10.1145/3519939.3523442
[3]
Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB Computations with Sparse and Factored Tensors. SIAM Journal on Scientific Computing, 30, 1 (2007), December, 205–231. https://doi.org/10.1137/060676489
[4]
Vivek Bharadwaj, Aydin Buluç, and James Demmel. 2022. Distributed-Memory Sparse Kernels for Machine Learning. https://doi.org/10.48550/ARXIV.2203.07673
[5]
Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler support for sparse tensor computations in MLIR. ACM Transactions on Architecture and Code Optimization (TACO), 19, 4 (2022), 1–25.
[6]
A. Canning, G. Galli, F. Mauri, A. De Vita, and R. Car. 1996. O(N) tight-binding molecular dynamics on massively parallel computers: an orbital decomposition approach. Computer Physics Communications, 94, 2 (1996), April, 89–102. https://doi.org/10.1016/0010-4655(96)00009-4
[7]
Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, Jackson Melchert, Gedeon Nyengele, Maxwell Strange, Keyi Zhang, Ankita Nayak, Jeff Setter, James Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D’Agostino, Pranil Joshi, Stephen Richardson, Rick Bahr, Christopher Torng, Mark Horowitz, and Priyanka Raina. 2022. Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra. IEEE Symposium on VLSI Technology & Circuits.
[8]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, USA. 579–594. isbn:9781931971478
[9]
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits, 52, 1 (2017), 127–138. https://doi.org/10.1109/JSSC.2016.2616357
[10]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9, 2 (2019), 292–308. https://doi.org/10.1109/JETCAS.2019.2910232
[11]
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, October, 30 pages. issn:2475-1421
[12]
Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the value of flexibility for graph processing accelerators. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 595–608.
[13]
Vidushi Dadu, Jian Weng, Sihao Liu, and Tony Nowatzki. 2019. Towards general purpose acceleration by exploiting common data-dependence forms. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 924–939.
[14]
Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38, 1 (2011), 1–25.
[15]
John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C. North, and Gordon Woodhull. 2002. Graphviz— Open Source Graph Drawing Tools. In Graph Drawing, Petra Mutzel, Michael Jünger, and Sebastian Leipert (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 483–484. isbn:978-3-540-45848-7
[16]
Richard Feynman, Robert B. Leighton, and Matthew L. Sands. 1963. The Feynman Lectures on Physics. Vol. 3. Addison-Wesley.
[17]
Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. IEEE Press, 1–14. isbn:9781728199986
[18]
Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw., 4, 3 (1978).
[19]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44, 3 (2016), 243–254.
[20]
Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Bret, Allan Haldane, Jaime Fernández del Rio, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature, 585, 7825 (2020), 357–362.
[21]
Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Yuhan Chen, Ronald Dreslinski, and Trevor Mudge. 2020. Sparse-TPU: Adapting Systolic Arrays for Sparse Matrices. Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450379830 https://doi.org/10.1145/3392717.3392751
[22]
Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. ExTensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319–333.
[23]
Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. 2021. Compilation of Sparse Array Programming Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 128, October, 29 pages. https://doi.org/10.1145/3485505
[24]
Kenneth E Iverson. 1962. A programming language. In Proceedings of the May 1-3, 1962, spring joint computer conference. 345–351.
[25]
2011. Graph Algorithms in the Language of Linear Algebra, Jeremy Kepner and John R. Gilbert (Eds.) (Software, environments, tools, Vol. 22). SIAM. isbn:978-0-89871-990-1 http://dblp.uni-trier.de/db/books/collections/KG2011.html
[26]
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 180–192. https://doi.org/10.1109/CGO.2019.8661185
[27]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 1–29.
[28]
Fredrik Berg Kjølstad. 2020. Sparse tensor algebra compilation. Ph. D. Dissertation. Massachusetts Institute of Technology.
[29]
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A Language and Compiler for Application Accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 296–311. isbn:9781450356985 https://doi.org/10.1145/3192366.3192379
[30]
Tamara G. Kolda and Jimeng Sun. 2008. Scalable Tensor Decompositions for Multi-aspect Data Mining. In 2008 Eighth IEEE International Conference on Data Mining. 363–372. https://doi.org/10.1109/ICDM.2008.89
[31]
Scott Kovach and Fredrik Kjolstad. 2022. Correct Compilation of Semiring Contractions. https://doi.org/10.48550/ARXIV.2207.13291
[32]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. San Jose, CA, USA. 75–88.
[33]
Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, and Fredrik Kjolstad. 2021. Compiling Halide Programs to Push-Memory Accelerators. CoRR, abs/2105.12858 (2021), arXiv:2105.12858. arxiv:2105.12858
[34]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’52). Association for Computing Machinery, New York, NY, USA. 1009–1022. isbn:9781450369381 https://doi.org/10.1145/3352460.3358254
[35]
Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2020. COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry. In Languages and Compilers for Parallel Computing: 33rd International Workshop, LCPC 2020, Virtual Event, October 14-16, 2020, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg. 87–103. isbn:978-3-030-95952-4 https://doi.org/10.1007/978-3-030-95953-1_7
[36]
Quan M. Nguyen and Daniel Sanchez. 2021. Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). Association for Computing Machinery, New York, NY, USA. 1064–1077. isbn:9781450385572 https://doi.org/10.1145/3466752.3480048
[37]
Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, and Karthikeyan Sankaralingam. 2017. Stream-dataflow acceleration. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 416–429. https://doi.org/10.1145/3079856.3080255
[38]
Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 724–736.
[39]
Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, and Joel Emer. 2013. Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). Association for Computing Machinery, New York, NY, USA. 142–153. isbn:9781450320795 https://doi.org/10.1145/2485922.2485935
[40]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 27–40. https://doi.org/10.1145/3079856.3080254
[41]
Raghu Prabhakar, David Koeplinger, Kevin J. Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, and Kunle Olukotun. 2016. Generating Configurable Hardware from Parallel Patterns. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). Association for Computing Machinery, New York, NY, USA. 651–665. isbn:9781450340915 https://doi.org/10.1145/2872362.2872415
[42]
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture For Parallel Paterns. SIGARCH Comput. Archit. News, 45, 2 (2017), June, 389–402. issn:0163-5964 https://doi.org/10.1145/3140659.3080256
[43]
Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming Heterogeneous Systems from an Image Processing DSL. ACM Trans. Archit. Code Optim., 14, 3 (2017), Article 26, aug, 25 pages. issn:1544-3566 https://doi.org/10.1145/3107953
[44]
Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 58–70. https://doi.org/10.1109/HPCA47549.2020.00015
[45]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Trans. Graph., 31, 4 (2012), Article 32, July, 12 pages. issn:0730-0301 https://doi.org/10.1145/2185520.2185528
[46]
M.M.G. Ricci and T. Levi-Civita. 1901. Méthodes de calcul différentiel absolu et leurs applications. Math. Ann., 54 (1901), 125–201. http://eudml.org/doc/157997
[47]
Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, and Kunle Olukotun. 2021. Capstan: A Vector RDA for Sparsity. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). Association for Computing Machinery, New York, NY, USA. 1022–1035. isbn:9781450385572 https://doi.org/10.1145/3466752.3480047
[48]
Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 158, Nov., 30 pages. https://doi.org/10.1145/3428226
[49]
Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/
[50]
Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. CoRR, abs/1512.00066 (2015), arXiv:1512.00066. arxiv:1512.00066
[51]
Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput., 74, 12 (2014), 3176–3190. issn:0743-7315 https://doi.org/10.1016/j.jpdc.2014.06.002 Domain-Specific Languages and High-Level Frameworks for High-Performance Computing
[52]
Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. MatRaptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 766–780.
[53]
Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 689–702. https://doi.org/10.1109/HPCA47549.2020.00062
[54]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2020. Efficient Processing of Deep Neural Networks. Morgan & Claypool Publishers.
[55]
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. SIGPLAN Not., 50, 6 (2015), June, 521–532. issn:0362-1340 https://doi.org/10.1145/2813885.2738003
[56]
Matthew Vilim, Alexander Rucker, and Kunle Olukotun. 2021. Aurochs: An Architecture for Dataflow Threads. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 402–415. https://doi.org/10.1109/ISCA52012.2021.00039
[57]
Matthew Vilim, Alexander Rucker, Yaqi Zhang, Sophia Liu, and Kunle Olukotun. 2020. Gorgon: Accelerating Machine Learning from Relational Data. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 309–321. https://doi.org/10.1109/ISCA45697.2020.00035
[58]
Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. 2009. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks. 37–42.
[59]
Jian Weng, Sihao Liu, Dylan Kupsh, and Tony Nowatzki. 2022. Unifying spatial accelerator compilation with idiomatic and modular transformations. IEEE Micro, 42, 5 (2022), 59–69.
[60]
Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, and Joel S. Emer. 2022. Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling. https://doi.org/10.48550/ARXIV.2205.05826
[61]
Rohan Yadav, Alex Aiken, and Fredrik Kjolstad. 2022. DISTAL: The Distributed Tensor Algebra Compiler. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 286–300. isbn:9781450392655 https://doi.org/10.1145/3519939.3523437
[62]
Rohan Yadav, Alex Aiken, and Fredrik Kjolstad. 2022. SpDISTAL: Compiling Distributed Sparse Tensor Computations. arXiv preprint arXiv:2207.13901.
[63]
Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2022. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. https://doi.org/10.48550/ARXIV.2207.04606
[64]
Guowei Zhang, Nithya Attaluri, Joel S Emer, and Daniel Sanchez. 2021. GAMMA: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 687–701.
[65]
Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, and Kunle Olukotun. 2021. SARA: Scaling a Reconfigurable Dataflow Accelerator. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 1041–1054. https://doi.org/10.1109/ISCA52012.2021.00085
[66]
Zhekai Zhang, Hanrui Wang, Song Han, and William J Dally. 2020. SpArch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 261–274.

Cited By

View all
  • (2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024
  • (2024)Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMMProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640427(1200-1217)Online publication date: 27-Apr-2024
  • (2024)Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable ArraysIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339054243:10(3055-3067)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
March 2023
820 pages
ISBN:9781450399180
DOI:10.1145/3582016
This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. abstract machine
  2. domain-specific
  3. sparse tensor algebra
  4. streams

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '23

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,024
  • Downloads (Last 6 weeks)124
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024
  • (2024)Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMMProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640427(1200-1217)Online publication date: 27-Apr-2024
  • (2024)Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable ArraysIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339054243:10(3055-3067)Online publication date: Oct-2024
  • (2024)FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00107(1458-1473)Online publication date: 2-Nov-2024
  • (2024)Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data Structures2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00092(1233-1246)Online publication date: 2-Nov-2024
  • (2024)Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration Reuse2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00090(1201-1216)Online publication date: 2-Nov-2024
  • (2024)Stellar: An Automated Design Framework for Dense and Sparse Spatial Accelerators2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00038(409-422)Online publication date: 2-Nov-2024
  • (2024)DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured SparsityIEEE Computer Architecture Letters10.1109/LCA.2024.335517823:1(17-20)Online publication date: 1-Jan-2024
  • (2024)The Dataflow Abstract Machine Simulator Framework2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00046(532-547)Online publication date: 29-Jun-2024
  • (2024)HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00081(1012-1028)Online publication date: 2-Mar-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media