More Web Proxy on the site http://driver.im/

research-article

Public Access

An efficient mixed-mode representation of sparse tensors

Authors:

Aravind Sukumaran-Rajam,

Prasant Singh Rawat,

Sriram Krishnamoorthy,

P. SadayappanAuthors Info & Claims

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 49, Pages 1 - 25

https://doi.org/10.1145/3295500.3356216

Published: 17 November 2019 Publication History

Abstract

The Compressed Sparse Fiber (CSF) representation for sparse tensors is a generalization of the Compressed Sparse Row (CSR) format for sparse matrices. For a tensor with d modes, typical tensor methods such as CANDECOMP/PARAFAC decomposition (CPD) require a sequence of d tensor computations, where efficient memory access with respect to different modes is required for each of them. The straightforward solution is to use d distinct representations of the tensor, with each one being efficient for one of the d computations. However, a d-fold space overhead is often unacceptable in practice, especially with memory-constrained GPUs. In this paper, we present a mixed-mode tensor representation that partitions the tensor's nonzero elements into disjoint sections, each of which is compressed to create fibers along a different mode. Experimental results demonstrate that better performance can be achieved while utilizing only a small fraction of the space required to keep d distinct CSF representations.

References

[1]

[n. d.]. nvprof-metrics. https://docs.nvidia.com/cuda/profiler-users-guide/index.html. Accessed: 2018-09-30.

[2]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).

[3]

Arash Ashari, Naser Sedaghati, John Eisenlohr, Srinivasan Parthasarathy, and P Sadayappan. 2014. Fast sparse matrix-vector multiplication on GPUs for graph applications. In Proceedings of the international conference for high performance computing, networking, storage and analysis. IEEE Press, 781--792.

Digital Library

[4]

Brett W Bader, Michael W Berry, and Murray Browne. 2008. Discussion tracking in Enron email using PARAFAC. In Survey of Text Mining II. Springer, 147--163.

[5]

Brett W. Bader, Tamara G. Kolda, et al. 2015. MATLAB Tensor Toolbox Version 2.6. Available online. http://www.sandia.gov/~tgkolda/TensorToolbox/

[6]

M. Baskaran, T. Henretty, B. Pradelle, M. H. Langston, D. Bruns-Smith, J. Ezick, and R. Lethin. 2017. Memory-efficient parallel tensor decompositions. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). 1--7.

[7]

Muthu Baskaran, Benoit Meister, and Richard Lethin. 2014. Low-overhead Load-balanced Scheduling for Sparse Tensor Computations. In IEEE High Performance Extreme Computing Conference. Waltham, MA.

[8]

Muthu Baskaran, Benoit Meister, Nicolas Vasilache, and Richard Lethin. 2012. Efficient and Scalable Computations with Sparse Tensors. In IEEE High Performance Extreme Computing Conference. Waltham, MA.

[9]

Zachary Blanco, Bangtian Liu, and Maryam Mehri Dehnavi. 2018. CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms. In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). ACM, New York, NY, USA, Article 21, 10 pages.

Digital Library

[10]

Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. IEEE, 1--11.

[11]

J Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual differences in multidimensional scaling via an N-way generalization of âĂIJEckart-YoungâĂİ decomposition. Psychometrika 35, 3 (1970), 283--319.

[12]

Jee Choi, Xing Liu, Shaden Smith, and Tyler Simon. 2018. Blocking Optimization Techniques for Sparse Tensor Computation. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 568--577.

[13]

Joon Hee Choi and S Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. In Advances in Neural Information Processing Systems. 1296--1304.

[14]

H Carter Edwards, Christian R Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202--3216.

Digital Library

[15]

Richard A Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multimodal factor analysis. (1970).

[16]

Joyce C Ho, Joydeep Ghosh, Steve R Steinhubl, Walter F Stewart, Joshua C Denny, Bradley A Malin, and Jimeng Sun. 2014. Limestone: High-throughput candidate phenotype generation via tensor factorization. Journal of biomedical informatics 52 (2014), 199--211.

Digital Library

[17]

Joyce C Ho, Joydeep Ghosh, and Jimeng Sun. 2014. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 115--124.

Digital Library

[18]

Inah Jeon, Evangelos E Papalexakis, U Kang, and Christos Faloutsos. 2015. Haten2: Billion-scale tensor decompositions. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 1047--1058.

[19]

U Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. 2012. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 316--324.

Digital Library

[20]

O. Kaya and B. UÃğar. 2015. Scalable Sparse Tensor Decompositions in Distributed Memory Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 77:1--77:11.

Digital Library

[21]

Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455--500.

[22]

Jiajia Li, Yuchen Ma, and Richard Vuduc. 2017. ParTI!: A Parallel Tensor Infrastructure for Multicore CPU and GPUs. Available from https://github.com/hpcgarage/ParTI.

[23]

Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '18). ACM, New York, NY, USA.

Digital Library

[24]

Jiajia Li, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun, Kevin Barker, and Richard Vuduc. 2019. Efficient and Effective Sparse Tensor Reordering. In Proceedings of the ACM International Conference on Supercomputing (ICS '19). ACM, New York, NY, USA, 227--237.

Digital Library

[25]

Bangtian Liu, Chengyao Wen, Anand D Sarwate, and Maryam Mehri Dehnavi. 2017. A Unified Optimization Approach for Sparse Tensor Operations on GPUs. In Cluster Computing (CLUSTER), 2017 IEEE International Conference on. IEEE, 47--57.

[26]

Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard W. Vuduc, and P. Sadayappan. 2019. Load-Balanced Sparse MTTKRP on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vol. abs/1904.03329.

[27]

Israt Nisa, Aravind Sukumaran-Rajam, Rakshith Kunchum, and P Sadayappan. 2017. Parallel ccd++ on gpu for matrix factorization. In Proceedings of the General Purpose GPUs. ACM, 73--83.

Digital Library

[28]

Eric T Phipps and Tamara G Kolda. 2019. Software for Sparse Tensor Decomposition on Emerging Computing Architectures. SIAM Journal on Scientific Computing 41, 3 (2019), C269--C290.

Digital Library

[29]

Shaden Smith. 2019. Algorithms for Large-Scale Sparse Tensor Factorization. Ph.D. Dissertation. University of Minnesota, Minneapolis, MN, USA. http://hdl.handle.net/11299/206375

[30]

Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/

[31]

Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 5.

Digital Library

[32]

Shaden Smith and George Karypis. 2016. A medium-grained algorithm for sparse tensor factorization. In Parallel and Distributed Processing Symposium, 2016 IEEE International. IEEE, 902--911.

[33]

Shaden Smith, Jongsoo Park, and George Karypis. 2016. An Exploration of Optimization Algorithms for High Performance Tensor Completion. Proceedings of the 2016 ACM/IEEE conference on Supercomputing (2016).

Digital Library

[34]

Shaden Smith, Jongsoo Park, and George Karypis. 2017. Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory. 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17) (2017).

[35]

Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 61--70.

Digital Library

[36]

Jimeng Sun, Spiros Papadimitriou, and S Yu Philip. 2006. Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams. In ICDM, Vol. 6. 1076--1080.

[37]

Jimeng Sun, Dacheng Tao, and Christos Faloutsos. 2006. Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 374--383.

Digital Library

[38]

Nico Vervliet, Otto Debals, and Lieven De Lathauwer. 2016. Tensorlab 3.0 - Numerical optimization strategies for large-scale constrained and coupled matrix/tensor factorization. In Signals, Systems and Computers, 2016 50th Asilomar Conference on. IEEE, 1733--1738.

[39]

Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit S Dhillon. 2014. Parallel matrix factorization for recommender systems. Knowledge and Information Systems 41, 3 (2014), 793--819.

Digital Library

[40]

Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative filtering for the netflix prize. In International conference on algorithmic applications in management. Springer, 337--348.

Digital Library

Cited By

Basak BDasgupta PPal A(2024)Efficient Low-Memory Implementation of Sparse CNNs Using Encoded Partitioned Hybrid Sparse FormatACM Transactions on Embedded Computing Systems10.1145/368723923:6(1-30)Online publication date: 22-Aug-2024
https://dl.acm.org/doi/10.1145/3687239
Soh YKannan RSao PChoi J(2024)Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673128(107-116)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673128
Wijeratne SKannan RPrasanna V(2024)Sparse MTTKRP Acceleration for Tensor Decomposition on GPUProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649187(88-96)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649187
Show More Cited By

Index Terms

An efficient mixed-mode representation of sparse tensors
1. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms

Recommendations

ALTO: adaptive linearized storage of sparse tensors
ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

The analysis of high-dimensional sparse data is becoming increasingly popular in many important domains. However, real-world sparse tensors are challenging to process due to their irregular shapes and data distributions. We propose the Adaptive ...
Efficient, out-of-memory sparse MTTKRP on massively parallel architectures
ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

Tensor decomposition (TD) is an important method for extracting latent information from high-dimensional (multi-modal) sparse data. This study presents a novel framework for accelerating fundamental TD operations on massively parallel GPU architectures. ...
Efficient MATLAB Computations with Sparse and Factored Tensors

In this paper, the term tensor refers simply to a multidimensional or $N$-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2019

1921 pages

ISBN:9781450362290

DOI:10.1145/3295500

General Chair:
Michela Taufer,
Program Chairs:
Pavan Balaji,
Antonio J. Peña

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SC '19

Sponsor:

SIGHPC

SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis

November 17 - 19, 2019

Colorado, Denver

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
1,500
Total Downloads

Downloads (Last 12 months)531
Downloads (Last 6 weeks)30

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Basak BDasgupta PPal A(2024)Efficient Low-Memory Implementation of Sparse CNNs Using Encoded Partitioned Hybrid Sparse FormatACM Transactions on Embedded Computing Systems10.1145/368723923:6(1-30)Online publication date: 22-Aug-2024
https://dl.acm.org/doi/10.1145/3687239
Soh YKannan RSao PChoi J(2024)Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673128(107-116)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673128
Wijeratne SKannan RPrasanna V(2024)Sparse MTTKRP Acceleration for Tensor Decomposition on GPUProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649187(88-96)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649187
Bharadwaj VMalik OMurray RBuluç ADemmel JAgrawal KPetrank E(2024)Distributed-Memory Randomized Algorithms for Sparse Tensor CP DecompositionProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659980(155-168)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659980
Xiao GYin CChen YDuan MLi K(2024)Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor ContractionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339125435:6(1044-1055)Online publication date: Jun-2024
https://doi.org/10.1109/TPDS.2024.3391254
Lin WWang HDeng HSun Q(2024)ScalFrag: Efficient Tiled-MTTKRP with Adaptive Launching on GPUs2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00036(335-345)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00036
Wang HYang WOuyang RHu RLi KLi K(2023)A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCNACM Transactions on Parallel Computing10.1145/358437310:2(1-23)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3584373
Miao ZCalhoun JGe RLi J(2023)Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor DecompositionACM Transactions on Parallel Computing10.1145/358031510:2(1-27)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3580315
Wijeratne SWang TKannan RPrasanna VIenne PZhang Z(2023)Accelerating Sparse MTTKRP for Tensor Decomposition on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573179(259-269)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573179
Wang HYang WHu ROuyang RLi KLi K(2023)A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328852034:8(2419-2432)Online publication date: Aug-2023
https://doi.org/10.1109/TPDS.2023.3288520
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents