Baransel, 2013 - Google Patents

Strassen's communication-avoiding parallel matrix multiplication algorithm for all-port 2d torus networks

Baransel, 2013

Document ID: 6260160301991756673
Author: Baransel C
Publication year: 2013
Publication venue: Parallel Computing Technologies: 12th International Conference, PaCT 2013, St. Petersburg, Russia, September 30-October 4, 2013. Proceedings 12

External Links

Cited by

Snippet

A parallel implementation of Strassen's matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D all-port torus interconnection networks. The proposed algorithm employs conflict-free routing patterns and operates on completely …

Continue reading at link.springer.com (other versions)

239000011159 matrix material 0 title abstract description 45

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute

Similar Documents

Publication	Publication Date	Title
US8117288B2 (en)	2012-02-14	Optimizing layout of an application on a massively parallel supercomputer
US7730121B2 (en)	2010-06-01	Parallel processing systems and method
Solomonik et al.	2011	Improving communication performance in dense linear algebra via topology aware collectives
Pearce et al.	2014	Faster parallel traversal of scale free graphs at extreme scale with vertex delegates
Barnett et al.	1993	Global combine on mesh architectures with wormhole routing
Lim et al.	1999	Efficient algorithms for block-cyclic redistribution of arrays
US7558248B2 (en)	2009-07-07	Fanning route generation technique for multi-path networks
Schlag et al.	2019	Scalable edge partitioning
EP1502203A2 (en)	2005-02-02	Parallel processing systems and method
Bhatelé et al.	2008	Benefits of topology aware mapping for mesh interconnects
US12040949B2 (en)	2024-07-16	Connecting processors using twisted torus configurations
Guo et al.	2001	A framework for efficient data redistribution on distributed memory multicomputers
Peña et al.	2013	Analysis of topology-dependent MPI performance on Gemini networks
Selvitopi et al.	2016	A recursive hypergraph bipartitioning framework for reducing bandwidth and latency costs simultaneously
US20060268691A1 (en)	2006-11-30	Divide and conquer route generation technique for distributed selection of routes within a multi-path network
Träff et al.	2016	Message-combining algorithms for isomorphic, sparse collective communication
Baransel	2013	Strassen’s communication-avoiding parallel matrix multiplication algorithm for all-port 2d torus networks
Baransel et al.	2012	A parallel implementation of Strassen’s matrix multiplication algorithm for wormhole-routed all-port 2D torus networks
Jacquelin et al.	2016	Enhancing scalability and load balancing of parallel selected inversion via tree-based asynchronous communication
Goldman et al.	2006	Exchanging messages of different sizes
Wang et al.	2001	Balancing traffic load for multi-node multicast in a wormhole 2-D torus/mesh
Malik et al.	2014	Topology-aware optimization of communications for parallel matrix multiplication on hierarchical heterogeneous hpc platform
Karlsson et al.	2011	Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures
Zhu et al.	2021	Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse Graphs
Al-Kerboly	2020	The non-contiguous allocation strategies performance for k-ary n-cube connected multi computers using smallest job first scheduling strategy