[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3295500.3356216acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

An efficient mixed-mode representation of sparse tensors

Published: 17 November 2019 Publication History

Abstract

The Compressed Sparse Fiber (CSF) representation for sparse tensors is a generalization of the Compressed Sparse Row (CSR) format for sparse matrices. For a tensor with d modes, typical tensor methods such as CANDECOMP/PARAFAC decomposition (CPD) require a sequence of d tensor computations, where efficient memory access with respect to different modes is required for each of them. The straightforward solution is to use d distinct representations of the tensor, with each one being efficient for one of the d computations. However, a d-fold space overhead is often unacceptable in practice, especially with memory-constrained GPUs. In this paper, we present a mixed-mode tensor representation that partitions the tensor's nonzero elements into disjoint sections, each of which is compressed to create fibers along a different mode. Experimental results demonstrate that better performance can be achieved while utilizing only a small fraction of the space required to keep d distinct CSF representations.

References

[1]
[n. d.]. nvprof-metrics. https://docs.nvidia.com/cuda/profiler-users-guide/index.html. Accessed: 2018-09-30.
[2]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
[3]
Arash Ashari, Naser Sedaghati, John Eisenlohr, Srinivasan Parthasarathy, and P Sadayappan. 2014. Fast sparse matrix-vector multiplication on GPUs for graph applications. In Proceedings of the international conference for high performance computing, networking, storage and analysis. IEEE Press, 781--792.
[4]
Brett W Bader, Michael W Berry, and Murray Browne. 2008. Discussion tracking in Enron email using PARAFAC. In Survey of Text Mining II. Springer, 147--163.
[5]
Brett W. Bader, Tamara G. Kolda, et al. 2015. MATLAB Tensor Toolbox Version 2.6. Available online. http://www.sandia.gov/~tgkolda/TensorToolbox/
[6]
M. Baskaran, T. Henretty, B. Pradelle, M. H. Langston, D. Bruns-Smith, J. Ezick, and R. Lethin. 2017. Memory-efficient parallel tensor decompositions. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). 1--7.
[7]
Muthu Baskaran, Benoit Meister, and Richard Lethin. 2014. Low-overhead Load-balanced Scheduling for Sparse Tensor Computations. In IEEE High Performance Extreme Computing Conference. Waltham, MA.
[8]
Muthu Baskaran, Benoit Meister, Nicolas Vasilache, and Richard Lethin. 2012. Efficient and Scalable Computations with Sparse Tensors. In IEEE High Performance Extreme Computing Conference. Waltham, MA.
[9]
Zachary Blanco, Bangtian Liu, and Maryam Mehri Dehnavi. 2018. CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms. In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). ACM, New York, NY, USA, Article 21, 10 pages.
[10]
Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. IEEE, 1--11.
[11]
J Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual differences in multidimensional scaling via an N-way generalization of âĂIJEckart-YoungâĂİ decomposition. Psychometrika 35, 3 (1970), 283--319.
[12]
Jee Choi, Xing Liu, Shaden Smith, and Tyler Simon. 2018. Blocking Optimization Techniques for Sparse Tensor Computation. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 568--577.
[13]
Joon Hee Choi and S Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. In Advances in Neural Information Processing Systems. 1296--1304.
[14]
H Carter Edwards, Christian R Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202--3216.
[15]
Richard A Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multimodal factor analysis. (1970).
[16]
Joyce C Ho, Joydeep Ghosh, Steve R Steinhubl, Walter F Stewart, Joshua C Denny, Bradley A Malin, and Jimeng Sun. 2014. Limestone: High-throughput candidate phenotype generation via tensor factorization. Journal of biomedical informatics 52 (2014), 199--211.
[17]
Joyce C Ho, Joydeep Ghosh, and Jimeng Sun. 2014. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 115--124.
[18]
Inah Jeon, Evangelos E Papalexakis, U Kang, and Christos Faloutsos. 2015. Haten2: Billion-scale tensor decompositions. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 1047--1058.
[19]
U Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. 2012. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 316--324.
[20]
O. Kaya and B. UÃğar. 2015. Scalable Sparse Tensor Decompositions in Distributed Memory Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 77:1--77:11.
[21]
Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455--500.
[22]
Jiajia Li, Yuchen Ma, and Richard Vuduc. 2017. ParTI!: A Parallel Tensor Infrastructure for Multicore CPU and GPUs. Available from https://github.com/hpcgarage/ParTI.
[23]
Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '18). ACM, New York, NY, USA.
[24]
Jiajia Li, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun, Kevin Barker, and Richard Vuduc. 2019. Efficient and Effective Sparse Tensor Reordering. In Proceedings of the ACM International Conference on Supercomputing (ICS '19). ACM, New York, NY, USA, 227--237.
[25]
Bangtian Liu, Chengyao Wen, Anand D Sarwate, and Maryam Mehri Dehnavi. 2017. A Unified Optimization Approach for Sparse Tensor Operations on GPUs. In Cluster Computing (CLUSTER), 2017 IEEE International Conference on. IEEE, 47--57.
[26]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard W. Vuduc, and P. Sadayappan. 2019. Load-Balanced Sparse MTTKRP on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vol. abs/1904.03329.
[27]
Israt Nisa, Aravind Sukumaran-Rajam, Rakshith Kunchum, and P Sadayappan. 2017. Parallel ccd++ on gpu for matrix factorization. In Proceedings of the General Purpose GPUs. ACM, 73--83.
[28]
Eric T Phipps and Tamara G Kolda. 2019. Software for Sparse Tensor Decomposition on Emerging Computing Architectures. SIAM Journal on Scientific Computing 41, 3 (2019), C269--C290.
[29]
Shaden Smith. 2019. Algorithms for Large-Scale Sparse Tensor Factorization. Ph.D. Dissertation. University of Minnesota, Minneapolis, MN, USA. http://hdl.handle.net/11299/206375
[30]
Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/
[31]
Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 5.
[32]
Shaden Smith and George Karypis. 2016. A medium-grained algorithm for sparse tensor factorization. In Parallel and Distributed Processing Symposium, 2016 IEEE International. IEEE, 902--911.
[33]
Shaden Smith, Jongsoo Park, and George Karypis. 2016. An Exploration of Optimization Algorithms for High Performance Tensor Completion. Proceedings of the 2016 ACM/IEEE conference on Supercomputing (2016).
[34]
Shaden Smith, Jongsoo Park, and George Karypis. 2017. Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory. 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17) (2017).
[35]
Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 61--70.
[36]
Jimeng Sun, Spiros Papadimitriou, and S Yu Philip. 2006. Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams. In ICDM, Vol. 6. 1076--1080.
[37]
Jimeng Sun, Dacheng Tao, and Christos Faloutsos. 2006. Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 374--383.
[38]
Nico Vervliet, Otto Debals, and Lieven De Lathauwer. 2016. Tensorlab 3.0 - Numerical optimization strategies for large-scale constrained and coupled matrix/tensor factorization. In Signals, Systems and Computers, 2016 50th Asilomar Conference on. IEEE, 1733--1738.
[39]
Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit S Dhillon. 2014. Parallel matrix factorization for recommender systems. Knowledge and Information Systems 41, 3 (2014), 793--819.
[40]
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative filtering for the netflix prize. In International conference on algorithmic applications in management. Springer, 337--348.

Cited By

View all
  • (2024)Efficient Low-Memory Implementation of Sparse CNNs Using Encoded Partitioned Hybrid Sparse FormatACM Transactions on Embedded Computing Systems10.1145/368723923:6(1-30)Online publication date: 22-Aug-2024
  • (2024)Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673128(107-116)Online publication date: 12-Aug-2024
  • (2024)Sparse MTTKRP Acceleration for Tensor Decomposition on GPUProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649187(88-96)Online publication date: 7-May-2024
  • Show More Cited By

Index Terms

  1. An efficient mixed-mode representation of sparse tensors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2019
    1921 pages
    ISBN:9781450362290
    DOI:10.1145/3295500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. CANDECOMP/PARAFAC decomposition
    2. GPU
    3. MTTKRP
    4. sparse tensors

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SC '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)531
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Low-Memory Implementation of Sparse CNNs Using Encoded Partitioned Hybrid Sparse FormatACM Transactions on Embedded Computing Systems10.1145/368723923:6(1-30)Online publication date: 22-Aug-2024
    • (2024)Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673128(107-116)Online publication date: 12-Aug-2024
    • (2024)Sparse MTTKRP Acceleration for Tensor Decomposition on GPUProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649187(88-96)Online publication date: 7-May-2024
    • (2024)Distributed-Memory Randomized Algorithms for Sparse Tensor CP DecompositionProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659980(155-168)Online publication date: 17-Jun-2024
    • (2024)Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor ContractionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339125435:6(1044-1055)Online publication date: Jun-2024
    • (2024)ScalFrag: Efficient Tiled-MTTKRP with Adaptive Launching on GPUs2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00036(335-345)Online publication date: 24-Sep-2024
    • (2023)A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCNACM Transactions on Parallel Computing10.1145/358437310:2(1-23)Online publication date: 20-Jun-2023
    • (2023)Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor DecompositionACM Transactions on Parallel Computing10.1145/358031510:2(1-27)Online publication date: 20-Jun-2023
    • (2023)Accelerating Sparse MTTKRP for Tensor Decomposition on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573179(259-269)Online publication date: 12-Feb-2023
    • (2023)A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328852034:8(2419-2432)Online publication date: Aug-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media