[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3447818.3460355acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Open access

Athena: high-performance sparse tensor contraction sequence on heterogeneous memory

Published: 04 June 2021 Publication History

Abstract

Sparse tensor contraction sequence has been widely employed in many fields, such as chemistry and physics. However, how to efficiently implement the sequence faces multiple challenges, such as redundant computations and memory operations, massive memory consumption, and inefficient utilization of hardware. To address the above challenges, we introduce Athena, a high-performance framework for SpTC sequences. Athena introduces new data structures, leverages emerging Optane-based heterogeneous memory (HM) architecture, and adopts stage parallelism. In particular, Athena introduces shared hash table-represented sparse accumulator to eliminate unnecessary input processing and data migration; Athena uses a novel data-semantic guided dynamic migration solution to make the best use of the Optane-based HM for high performance; Athena also co-runs execution phases with different characteristics to enable high hardware utilization. Evaluating with 12 datasets, we show that Athena brings 327-7362× speedup over the state-of-the-art SpTC algorithm. With the dynamic data placement guided by data semantics, Athena brings performance improvement on Optane-based HM over a state-of-the-art software-based data management solution, a hardware-based data management solution, and PMM-only by 1.58×, 1.82×, and 2.34× respectively. Athena also showcases its effectiveness in quantum chemistry and physics scenarios.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265--283, 2016.
[2]
Neha Agarwal and Thomas F. Wenisch. Thermostat: Application-transparent page management for two-tiered main memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017, Xi'an, China, April 8-12, 2017, pages 631--644, 2017.
[3]
Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor decompositions for learning latent variable models. J. Mach. Learn. Res., 15(1):2773--2832, January 2014.
[4]
Edoardo Apra, Eric J Bylaska, Wibe A De Jong, Niranjan Govind, Karol Kowalski, Tjerk P Straatsma, Marat Valiev, HJJ van Dam, Yuri Alexeev, James Anchell, et al. Nwchem: Past, present, and future. The Journal of chemical physics, 152(18):184102, 2020.
[5]
Alexander A Auer, Gerald Baumgartner, David E Bernholdt, Alina Bibireata, Venkatesh Choppella, Daniel Cociorva, Xiaoyang Gao, Robert Harrison, Sriram Krishnamoorthy, Sandhya Krishnan, et al. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics, 104(2):211--228, 2006.
[6]
Brett W. Bader, Tamara G. Kolda, et al. Matlab tensor toolbox version 3.1. Available online, June 2019.
[7]
Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Prakash Murali, Shivmaran S. Pandian, Yogish Sabharwal, and Dheeraj Sreedhar. On optimizing distributed Tucker decomposition for sparse tensors. In Proceedings of the 32nd ACM International Conference on Supercomputing, ICS '18, 2018.
[8]
Yu Chen, Ivy B. Peng, Zhen Peng, Xu Liu, and Bin Ren. Atmem: Adaptive data placement in graph applications on heterogeneous memories. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2020, 2020.
[9]
Andrzej Cichocki. Era of big data processing: A new approach via tensor networks and tensor decompositions. CoRR, abs/1403.2048, 2014.
[10]
T Daniel Crawford and Henry F Schaefer. An introduction to coupled cluster theory for computational chemists. Reviews in computational chemistry, 14:33--136, 2000.
[11]
Bang Di, Jiawen Liu, Hao Chen, and Dong Li. Fast, flexible, and comprehensive bug detection for persistent memory programs. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 503--516, 2021.
[12]
Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. Data Tiering in Heterogeneous Memory Systems. In European Conference on Computer Systems, 2016.
[13]
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. Reducing DRAM Footprint with NVM in Facebook. In Proceedings of the Thirteenth EuroSys Conference, 2018.
[14]
Evgeny Epifanovsky, Michael Wormit, Tomasz Kuś, Arie Landau, Dmitry Zuev, Kirill Khistyaev, Prashant Manohar, Ilya Kaliman, Andreas Dreuw, and Anna I Krylov. New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations. Journal of computational chemistry, 34(26):2293--2309, 2013.
[15]
Tilman Esslinger. Fermi-hubbard physics with atoms in an optical lattice. Annu. Rev. Condens. Matter Phys., 1(1):129--152, 2010.
[16]
Matthew Fishman, Steven R. White, and E. Miles Stoudenmire. ITensor: A C++ library for efficient tensor network calculations. Available from https://github.com/ITensor/ITensor, August 2020.
[17]
Matthew Fishman, Steven R White, and E Miles Stoudenmire. The ITensor software library for tensor network calculations. arXiv preprint arXiv:2007.14822, 2020.
[18]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. Single machine graph analytics on massive datasets using intel optane dc persistent memory, 2019.
[19]
Albert Hartono, Qingda Lu, Thomas Henretty, Sriram Krishnamoorthy, Huaijian Zhang, Gerald Baumgartner, David E Bernholdt, Marcel Nooijen, Russell Pitzer, J Ramanujam, et al. Performance optimization of tensor contraction expressions for many-body methods in quantum chemistry. The Journal of Physical Chemistry A, 113(45):12715--12723, 2009.
[20]
Thomas Hérault, Yves Robert, George Bosilca, Robert Harrison, Cannada Lewis, and Edward Valeev. Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure. PhD thesis, Inria-Research Centre Grenoble--Rhône-Alpes, 2020.
[21]
So Hirata. Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. The Journal of Physical Chemistry A, 107(46):9887--9897, 2003.
[22]
Takahiro Hirofuchi and Ryousei Takano. Raminate: Hypervisor-based virtualization for hybrid main memory systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing, SoCC '16, pages 112--125, New York, NY, USA, 2016. ACM.
[23]
Joyce C. Ho, Joydeep Ghosh, and Jimeng Sun. Marble: High-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 115--124, New York, NY, USA, 2014. ACM.
[24]
S. Kannan, A. Gavrilovska, V. Gupta, and K. Schwan. Heteroos --- os design for heterogeneous memory management in datacenter. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 521--534, June 2017.
[25]
Daniel Kats and Frederick R Manby. Sparse tensor framework for implementation of general local correlation methods. The Journal of Chemical Physics, 138(14):144101, 2013.
[26]
O. Kaya and B. Uçar. Parallel Candecomp/Parafac decomposition of sparse tensors using dimension trees. SIAM Journal on Scientific Computing, 40(1):C99--C130, 2018.
[27]
Jinsung Kim, Aravind Sukumaran-Rajam, Changwan Hong, Ajay Panyala, Rohit Kumar Srivastava, Sriram Krishnamoorthy, and Ponnuswamy Sadayappan. Optimizing tensor contractions in ccsd (t) for efficient execution on gpus. In Proceedings of the 2018 International Conference on Supercomputing, pages 96--106, 2018.
[28]
Jinsung Kim, Aravind Sukumaran-Rajam, Vineeth Thumma, Sriram Krishnamoorthy, Ajay Panyala, Louis-Noël Pouchet, Atanas Rountev, and Ponnuswamy Sadayappan. A code generator for high-performance tensor contractions on gpus. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 85--95. IEEE, 2019.
[29]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. The tensor algebra compiler. Proc. ACM Program. Lang., 1(OOPSLA):77:1--77:29, October 2017.
[30]
T. Kolda and B. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455--500, 2009.
[31]
Christoph Koppl and Hans-Joachim Werner. Parallel and low-order scaling implementation of hartree--fock exchange using local density fitting. Journal of chemical theory and computation, 12(7):3122--3134, 2016.
[32]
Jean Kossaifi, Yannis Panagakis, Anima Anandkumar, and Maja Pantic. TensorLy: Tensor learning in Python. CoRR, abs/1610.09555, 2018.
[33]
R. Madhava Krishnan, Jaeho Kim, Ajit Mathew, Xinwei Fu, Anthony Demeri, Changwoo Min, and Sudarsun Kannan. Durable transactional memory can scale with timestone. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’20, 2020.
[34]
Pai-Wei Lai, Kevin Stock, Samyam Rajbhandari, Sriram Krishnamoorthy, and Ponnuswamy Sadayappan. A framework for load balancing of tensor contraction expressions via dynamic task partitioning. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1--10, 2013.
[35]
Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap, Taesoo Kim, and Vijay Chidambaram. Recipe: Converting concurrent dram indexes to persistent-memory indexes. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, 2019.
[36]
Ryan Levy, Edgar Solomonik, and Bryan K Clark. Distributed-memory dmrg via sparse and dense parallel tensor contractions. arXiv preprint arXiv:2007.05540, 2020.
[37]
Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. Model-driven sparse cp decomposition for higher-order tensors. In 2017 IEEE international parallel and distributed processing symposium (IPDPS), pages 1048--1057. IEEE, 2017.
[38]
Jiajia Li, Yuchen Ma, Chenggang Yan, and Richard Vuduc. Optimizing sparse tensor times matrix on multi-core and many-core architectures. In Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms, IÂ3 '16, pages 26--33, Piscataway, NJ, USA, 2016. IEEE Press.
[39]
Jiajia Li, Jimeng Sun, and Richard Vuduc. HiCOO: Hierarchical storage of sparse tensors. In Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC), Dallas, TX, USA, November 2018.
[40]
Jiajia Li, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun, Kevin Barker, and Richard Vuduc. Efficient and effective sparse tensor reordering. In Proceedings of the ACM International Conference on Supercomputing, ICS '19, pages 227--237, New York, NY, USA, 2019. ACM.
[41]
Lingjie Li, Wenjian Yu, and Kim Batselier. Faster tensor train decomposition for sparse data. arXiv preprint arXiv:1908.02721, 2019.
[42]
Rui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, and P Sadayappan. Analytical cache modeling and tilesize optimization for tensor contractions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--13, 2019.
[43]
B. Liu, C. Wen, A. D. Sarwate, and M. M. Dehnavi. A unified optimization approach for sparse tensor operations on GPUs. In 2017 IEEE International Conference on Cluster Computing (CLUSTER), pages 47--57, Sept 2017.
[44]
Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, and Jiajia Li. Sparta: High-performance, element-wise sparse tensor contraction on heterogeneous memory. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021.
[45]
Jiawen Liu, Zhen Xie, Dimitrios Nikolopoulos, and Dong Li. {RIANN}: Real-time incremental learning with approximate nearest neighbor on mobile devices. In 2020 {USENIX} Conference on Operational Machine Learning (OpML 20), 2020.
[46]
Jiawen Liu, Hengyu Zhao, Matheus A Ogleari, Dong Li, and Jishen Zhao. Processing-in-memory for energy-efficient neural network training: A heterogeneous approach. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 655--668. IEEE, 2018.
[47]
Jie Liu, Jiawen Liu, Wan Du, and Dong Li. Performance analysis and characterization of training deep learning models on mobile device. In 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), pages 506--515. IEEE, 2019.
[48]
Linjian Ma, Jiayu Ye, and Edgar Solomonik. Autohoot: Automatic high-order optimization for tensors. arXiv preprint arXiv:2005.04540, 2020.
[49]
Samuel Manzer, Evgeny Epifanovsky, Anna I Krylov, and Martin Head-Gordon. A general sparse tensor framework for electronic structure theory. Journal of chemical theory and computation, 13(3):1108--1116, 2017.
[50]
Devin Matthews. High-performance tensor contraction without BLAS. CoRR, abs/1607.00291, 2016.
[51]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prasant Singh Rawat, Sriram Krishnamoorthy, and P. Sadayappan. An efficient mixed-mode representation of sparse tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '19, pages 49:1--49:25, New York, NY, USA, 2019. ACM.
[52]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard Vuduc, and P Sadayappan. Load-balanced sparse mttkrp on gpus. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 123--133. IEEE, 2019.
[53]
David Ozog, Jeff R Hammond, James Dinan, Pavan Balaji, Sameer Shende, and Allen Malony. Inspector-executor load balancing algorithms for block-sparse tensor contractions. In 2013 42nd International Conference on Parallel Processing, pages 30--39. IEEE, 2013.
[54]
Chong Peng, Justus A Calvin, Fabijan Pavosevic, Jinmei Zhang, and Edward F Valeev. Massively parallel implementation of explicitly correlated coupled-cluster singles and doubles using tiledarray framework. The Journal of Physical Chemistry A, 120(51):10231--10244, 2016.
[55]
Ioakeim Perros, Evangelos E. Papalexakis, Fei Wang, Richard Vuduc, Elizabeth Searles, Michael Thompson, and Jimeng Sun. SPARTan: Scalable PARAFAC2 for large & sparse data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17, pages 375--384, New York, NY, USA, 2017. ACM.
[56]
Christos Psarras, Lars Karlsson, and Paolo Bientinesi. The landscape of software for tensor computations. arXiv preprint arXiv:2103.13756, 2021.
[57]
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. Page Placement in Hybrid Memory Systems. In International Conference on Supercomputing (ICS), May 2011.
[58]
Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, and Dong Li. Sentinel: Efficient tensor migration and allocation on heterogeneous memory systems for deep learning. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 598--611. IEEE, 2021.
[59]
Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, and Yuxiong He. Zero-offload: Democratizing billion-scale model training. In 2021 {USENIX} Annual Technical Conference ({USENIX}{ATC} 21), 2021.
[60]
Jie Ren, Kai Wu, and Dong Li. Exploring non-volatility of non-volatile memory for high performance computing under failures. In 2020 IEEE International Conference on Cluster Computing (CLUSTER), pages 237--247. IEEE, 2020.
[61]
Jie Ren, Minjia Zhang, and Dong Li. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In Neurips, 2020.
[62]
Christoph Riplinger, Peter Pinski, Ute Becker, Edward F Valeev, and Frank Neese. Sparse maps---a systematic infrastructure for reduced-scaling electronic structure methods. ii. linear scaling domain based pair natural orbital coupled cluster theory. The Journal of chemical physics, 144(2):024109, 2016.
[63]
Chase Roberts, Ashley Milsted, Martin Ganahl, Adam Zalcman, Bruce Fontaine, Yijian Zou, Jack Hidary, Guifre Vidal, and Stefan Leichenauer. Tensornetwork: A library for physics and machine learning. arXiv preprint arXiv:1905.01330, 2019.
[64]
Zhenyuan Ruan, Malte Schwarzkopf, Marcos K. Aguilera, and Adam Belay. AIFM: High-Performance, Application-Integrated Far Memory. In Proceedings of the Symposium on Operating Systems Design and Implementation, 2020.
[65]
Y. Shi, U. N. Niranjan, A. Anandkumar, and C. Cecka. Tensor contractions with extended BLAS kernels on CPU and GPU. In 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), pages 193--202, Dec 2016.
[66]
N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13):3551--3582, July 2017.
[67]
Ilia Sivkov, Patrick Seewald, Alfio Lazzaro, and Jürg Hutter. DBCSR: A blocked sparse tensor algebra library. arXiv preprint arXiv:1910.13555, 2019.
[68]
Shaden Smith, Jee W Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. Frostt: The formidable repository of open sparse tensors and tools, 2017.
[69]
Shaden Smith and George Karypis. A medium-grained algorithm for distributed sparse tensor factorization. In Parallel and Distributed Processing Symposium (IPDPS), 2016 IEEE International. IEEE, 2016.
[70]
Shaden Smith and George Karypis. Accelerating the Tucker decomposition with compressed sparse tensors. In European Conference on Parallel Processing. Springer, 2017.
[71]
Shaden Smith, Niranjay Ravindran, Nicholas Sidiropoulos, and George Karypis. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In Proceedings of the 29th IEEE International Parallel & Distributed Processing Symposium, IPDPS, 2015.
[72]
Edgar Solomonik, Devin Matthews, Jeff Hammond, and James Demmel. Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pages 813--824. IEEE, 2013.
[73]
Edgar Solomonik, Devin Matthews, Jeff R Hammond, John F Stanton, and James Demmel. A massively parallel tensor contraction framework for coupled-cluster computations. Journal of Parallel and Distributed Computing, 74(12):3176--3190, 2014.
[74]
N. Vervliet, O. Debals, L. Sorber, M. Van Barel, and L. De Lathauwer. Tensorlab (Version 3.0). Available from http://www.tensorlab.net, March 2016.
[75]
Chenxi Wang, Huimin Cui, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, and Guoqing Harry Xu. Panthera: Holistic memory management for big data processing over hybrid memories. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, 2019.
[76]
Wei Wei, Dejun Jiang, Sally A. McKee, Jin Xiong, and Mingyu Chen. Exploiting Program Semantics to Place Data in Hybrid Memory. In PACT, 2015.
[77]
Samuel Webb Williams. Auto-tuning performance on multicore computers. University of California, Berkeley Berkeley, CA, 2008.
[78]
K. Wu, Y. Huang, and D. Li. Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory. In International Conference for High Performance Computing, Networking, Storage and Analysis, 2017.
[79]
Kai Wu, Jie Ren, and Dong Li. Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Memory for Task Parallel Programs. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2018.
[80]
Kai Wu, Jie Ren, and Dong Li. Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 31. IEEE Press, 2018.
[81]
Zhen Xie, Wenqian Dong, Jie Liu, Ivy Peng, Yanbao Ma, and Dong Li. Md-hm: Memoization-based molecular dynamics simulations on big memory system. In Proceedings of the 35th ACM International Conference on Supercomputing, 2021.
[82]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. Nimble page management for tiered memory systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, pages 331--345, New York, NY, USA, 2019. ACM.
[83]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. Nimble Page Management for Tiered Memory Systems. In ASPLOS, 2019.
[84]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. Repository of Nimble Page Management for Tiered Memory Systems in ASPLOS2019. Available from https://github.com/ysarch-lab/nimble_page_management_asplos_2019, July 2020.
[85]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20), 2020.
[86]
H. Yoon, J. Meza, R. Ausavarungnirun, R. A. Harding, and O. Mutlu. Row buffer locality aware caching policies for hybrid memories. In 2012 IEEE 30th International Conference on Computer Design (ICCD), 2012.
[87]
Seongdae Yu, Seongbeom Park, and Woongki Baek. Design and Implementation of Bandwidth-aware Memory Placement and Migration Policies for Heterogeneous Memory Systems. In International Conference on Supercomputing (ICS), 2017.

Cited By

View all
  • (2024)SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest RestructuringProceedings of the ACM on Programming Languages10.1145/36897308:OOPSLA2(527-556)Online publication date: 8-Oct-2024
  • (2024)CoNST: Code Generator for Sparse Tensor NetworksACM Transactions on Architecture and Code Optimization10.1145/368934221:4(1-24)Online publication date: 20-Nov-2024
  • (2024)POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table DesignProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638500(457-459)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing
June 2021
506 pages
ISBN:9781450383356
DOI:10.1145/3447818
© 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. heterogeneous memory
  2. multi-core CPU
  3. non-volatile memory
  4. sparse tensor contraction sequences
  5. sparse tensor product

Qualifiers

  • Research-article

Funding Sources

  • US Department of Energy, Office for Advanced Scientific Computing (ASCR)
  • U.S. National Science Foundation
  • Chameleon Cloud
  • Laboratory Directed Research and Development program at PNNL

Conference

ICS '21
Sponsor:

Acceptance Rates

ICS '21 Paper Acceptance Rate 39 of 157 submissions, 25%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)201
  • Downloads (Last 6 weeks)39
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest RestructuringProceedings of the ACM on Programming Languages10.1145/36897308:OOPSLA2(527-556)Online publication date: 8-Oct-2024
  • (2024)CoNST: Code Generator for Sparse Tensor NetworksACM Transactions on Architecture and Code Optimization10.1145/368934221:4(1-24)Online publication date: 20-Nov-2024
  • (2024)POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table DesignProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638500(457-459)Online publication date: 2-Mar-2024
  • (2024)Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor ContractionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339125435:6(1044-1055)Online publication date: Jun-2024
  • (2023)Data Integration and HarmonisationClinical Applications of Artificial Intelligence in Real-World Data10.1007/978-3-031-36678-9_4(51-67)Online publication date: 5-Nov-2023
  • (2022)SparseLNRProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532386(1-14)Online publication date: 28-Jun-2022
  • (2022)GSpTC: High-Performance Sparse Tensor Contraction on CPU-GPU Heterogeneous Systems2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00080(380-387)Online publication date: Dec-2022
  • (2021)Single-node partitioned-memory for huge graph analyticsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476156(1-14)Online publication date: 14-Nov-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media