Baransel, 2013 - Google Patents
Strassen's communication-avoiding parallel matrix multiplication algorithm for all-port 2d torus networksBaransel, 2013
- Document ID
- 6260160301991756673
- Author
- Baransel C
- Publication year
- Publication venue
- Parallel Computing Technologies: 12th International Conference, PaCT 2013, St. Petersburg, Russia, September 30-October 4, 2013. Proceedings 12
External Links
Snippet
A parallel implementation of Strassen's matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D all-port torus interconnection networks. The proposed algorithm employs conflict-free routing patterns and operates on completely …
- 239000011159 matrix material 0 title abstract description 45
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8117288B2 (en) | Optimizing layout of an application on a massively parallel supercomputer | |
US7730121B2 (en) | Parallel processing systems and method | |
Solomonik et al. | Improving communication performance in dense linear algebra via topology aware collectives | |
Pearce et al. | Faster parallel traversal of scale free graphs at extreme scale with vertex delegates | |
Barnett et al. | Global combine on mesh architectures with wormhole routing | |
Lim et al. | Efficient algorithms for block-cyclic redistribution of arrays | |
US7558248B2 (en) | Fanning route generation technique for multi-path networks | |
Schlag et al. | Scalable edge partitioning | |
EP1502203A2 (en) | Parallel processing systems and method | |
Bhatelé et al. | Benefits of topology aware mapping for mesh interconnects | |
US12040949B2 (en) | Connecting processors using twisted torus configurations | |
Guo et al. | A framework for efficient data redistribution on distributed memory multicomputers | |
Peña et al. | Analysis of topology-dependent MPI performance on Gemini networks | |
Selvitopi et al. | A recursive hypergraph bipartitioning framework for reducing bandwidth and latency costs simultaneously | |
US20060268691A1 (en) | Divide and conquer route generation technique for distributed selection of routes within a multi-path network | |
Träff et al. | Message-combining algorithms for isomorphic, sparse collective communication | |
Baransel | Strassen’s communication-avoiding parallel matrix multiplication algorithm for all-port 2d torus networks | |
Baransel et al. | A parallel implementation of Strassen’s matrix multiplication algorithm for wormhole-routed all-port 2D torus networks | |
Jacquelin et al. | Enhancing scalability and load balancing of parallel selected inversion via tree-based asynchronous communication | |
Goldman et al. | Exchanging messages of different sizes | |
Wang et al. | Balancing traffic load for multi-node multicast in a wormhole 2-D torus/mesh | |
Malik et al. | Topology-aware optimization of communications for parallel matrix multiplication on hierarchical heterogeneous hpc platform | |
Karlsson et al. | Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures | |
Zhu et al. | Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse Graphs | |
Al-Kerboly | The non-contiguous allocation strategies performance for k-ary n-cube connected multi computers using smallest job first scheduling strategy |