Wijeratne et al., 2023 - Google Patents

Dynasor: A dynamic memory layout for accelerating sparse mttkrp for tensor decomposition on multi-core cpu

Wijeratne et al., 2023

Document ID: 10877936377557543025
Author: Wijeratne S; Kannan R; Prasanna V
Publication year: 2023
Publication venue: 2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

External Links

Cited by

Snippet

Sparse Matricized Tensor Times Khatri-Rao Prod-uct (spMTTKRP) is the most time- consuming compute kernel in sparse tensor decomposition. In this paper, we introduce a novel algorithm to minimize the execution time of spMTTKRP across all modes of an input …

Continue reading at arxiv.org (PDF) (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Cano et al.	2012	Speeding up the evaluation phase of GP classification algorithms on GPUs
Chen et al.	2018	FlinkCL: An OpenCL-based in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
Hu et al.	2021	Accelerating triangle counting on GPU
Elteir et al.	2011	Performance characterization and optimization of atomic operations on amd gpus
Boyer et al.	2011	Dense dynamic programming on multi GPU
Liu	2015	Parallel and scalable sparse basic linear algebra subprograms
Haseeb et al.	2021	Evaluating Performance and Portability of a core bioinformatics kernel on multiple vendor GPUs
Zhang et al.	2020	Optimizing streaming parallelism on heterogeneous many-core architectures
Gajinov et al.	2014	Dash: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models
Boehning et al.	1988	A parallel integer linear programming algorithm
Hadri et al.	2010	Tile QR factorization with parallel panel processing for multicore architectures
Khorasani et al.	2016	Eliminating intra-warp load imbalance in irregular nested patterns via collaborative task engagement
Wan et al.	2015	GPU implementation of a parallel two‐list algorithm for the subset‐sum problem
Wijeratne et al.	2023	Dynasor: A dynamic memory layout for accelerating sparse mttkrp for tensor decomposition on multi-core cpu
Garg et al.	2018	Share-a-GPU: Providing simple and effective time-sharing on GPUs
Man et al.	2011	An efficient parallel sorting compatible with the standard qsort
Papadimitriou et al.	2021	Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimes
Fumero et al.	2020	Running parallel bytecode interpreters on heterogeneous hardware
Sui et al.	2016	Hybrid CPU–GPU constraint checking: Towards efficient context consistency
Czarnul et al.	2019	Auto-tuning methodology for configuration and application parameters of hybrid CPU+ GPU parallel systems based on expert knowledge
CN114356526A (en)	2022-04-15	Multi-dimensional function optimization acceleration method based on artificial bee colony algorithm
Li et al.	2015	FreshBreeze: A data flow approach for meeting DDDAS challenges
Biswas et al.	2022	An Efficient Reduced-Memory GPU-based Dynamic Programming Strategy for Bounded Knapsack Problems
Yang et al.	2023	Parallel and Distributed Bayesian Network Structure Learning
Siddiqui et al.	2019	Design space exploration of embedded applications on heterogeneous cpu-gpu platforms