[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Baransel, 2013 - Google Patents

Strassen's communication-avoiding parallel matrix multiplication algorithm for all-port 2d torus networks

Baransel, 2013

Document ID
6260160301991756673
Author
Baransel C
Publication year
Publication venue
Parallel Computing Technologies: 12th International Conference, PaCT 2013, St. Petersburg, Russia, September 30-October 4, 2013. Proceedings 12

External Links

Snippet

A parallel implementation of Strassen's matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D all-port torus interconnection networks. The proposed algorithm employs conflict-free routing patterns and operates on completely …
Continue reading at link.springer.com (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17337Direct connection machines, e.g. completely connected computers, point to point communication networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/80Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute

Similar Documents

Publication Publication Date Title
US8117288B2 (en) Optimizing layout of an application on a massively parallel supercomputer
US7730121B2 (en) Parallel processing systems and method
Solomonik et al. Improving communication performance in dense linear algebra via topology aware collectives
Pearce et al. Faster parallel traversal of scale free graphs at extreme scale with vertex delegates
Barnett et al. Global combine on mesh architectures with wormhole routing
Lim et al. Efficient algorithms for block-cyclic redistribution of arrays
US7558248B2 (en) Fanning route generation technique for multi-path networks
Schlag et al. Scalable edge partitioning
EP1502203A2 (en) Parallel processing systems and method
Bhatelé et al. Benefits of topology aware mapping for mesh interconnects
US12040949B2 (en) Connecting processors using twisted torus configurations
Guo et al. A framework for efficient data redistribution on distributed memory multicomputers
Peña et al. Analysis of topology-dependent MPI performance on Gemini networks
Selvitopi et al. A recursive hypergraph bipartitioning framework for reducing bandwidth and latency costs simultaneously
US20060268691A1 (en) Divide and conquer route generation technique for distributed selection of routes within a multi-path network
Träff et al. Message-combining algorithms for isomorphic, sparse collective communication
Baransel Strassen’s communication-avoiding parallel matrix multiplication algorithm for all-port 2d torus networks
Baransel et al. A parallel implementation of Strassen’s matrix multiplication algorithm for wormhole-routed all-port 2D torus networks
Jacquelin et al. Enhancing scalability and load balancing of parallel selected inversion via tree-based asynchronous communication
Goldman et al. Exchanging messages of different sizes
Wang et al. Balancing traffic load for multi-node multicast in a wormhole 2-D torus/mesh
Malik et al. Topology-aware optimization of communications for parallel matrix multiplication on hierarchical heterogeneous hpc platform
Karlsson et al. Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures
Zhu et al. Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse Graphs
Al-Kerboly The non-contiguous allocation strategies performance for k-ary n-cube connected multi computers using smallest job first scheduling strategy