[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Reflects downloads up to 10 Dec 2024Bibliometrics
article
Free
OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple spanning buses

This paper presents the architectural design and RISC based implementation of a prototype supercomputer, namely the Orthogonal MultiProcessor (OMP). The OMP system is constructed with 16 Intel 1860 RISC microprocessors and 256 parallel memory modules, ...

article
Free
A basic architecture supporting LGDG computation

In order to combine the benefits of dataflow and control-flow computation while avoiding the pitfalls of both, the authors propose a two-level model of large-grain dataflow computation, called LGDG computation. A formalism has been provided in a ...

article
Free
An efficient caching support for critical sections in large-scale shared-memory multiprocessors

Directory-based and software-assisted schemes are the two main approaches to solving the cache coherence problem in large scale shared-memory multiprocessors. Until now, the emphasis in software-assisted schemes has been on ascertaining consistency ...

article
Free
An improvement of I/O function for auxiliary storage: parallel I/O for a large scale supercomputing

New I/O technique for external auxiliary storage: magnetic disk unit, has been developed to improve the I/O performance on HITAC VOS3/ES1 with usual hardware architecture. Since the I/O technique is based on the idea that the sequence of I/O processes ...

article
Free
Analysis of a variant hypercube topology

Each node of a hypercube system, when fabricated, comes with a fixed number of links designed for a maximum sized construction. Very often, there are links left unused at each node in a real system. In this article, we study the hypercube in which extra ...

article
Free
Parallel ODE solvers

We are interested in the efficient solution of linear second order Partial Differential Equation (PDE) problems on rectangular domains. The PDE discretisation scheme used is of Finite Element type and is based on quadratic splines and the collocation ...

article
Free
Use of parallel level 3 BLAS in LU factorization on three vector multiprocessors the ALLIANT FX/80, the CRAY-2, and the IBM 3090 VF

We show how to transform the B-spline curve and surface fitting problems into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values on a newly proposed mesh-of-unshuffle network. The ...

article
Free
Schur complement preconditioned conjugate gradient methods for spline collocation equations

We are interested in the efficient solution of linear second order Partial Differential Equation (PDE) problems on rectangular domains. The PDE discretisation scheme used is of Finite Element type and is based on quadratic splines and the collocation ...

article
Free
Cost-optimal parallel B-spline interpolations

We show how to transform the B-spline curve and surface fitting problems into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values on a newly proposed mesh-of-unshuffle network. The ...

article
Free
Solving general sparse linear systems using conjugate gradient-type methods

The problem of finding an approximation of @@@@ = Ab (where A is the pseudo-inverse of A ∈ @@@@m@@@@n with mn and rank(A) = n) is discussed. It is assumed that A is sparse but has neither a special pattern (as bandedness) nor a special property (as ...

article
Free
Dataflow computer development in Japan

This paper describes the research activity on dataflow computing in Japan focusing on dataflow computer development at the Electrotechnical Laboratory (ETL). First, the history of dataflow computer development in Japan is outlined. Some distinguished ...

article
Free
POSC—a partitioning and optimizing SISAL compiler

Single-assignment languages like SISAL offer parallelism at all levels—among arbitrary operations, conditionals, loop iterations, and function calls. All control and data dependencies are local, and can be easily determined from the program. Various ...

article
Free
Loop optimization for horizontal microcoded machines

Long Instruction Word (LIW) architectures exploit parallelism between various functional units. In order to produce efficient code for such an architecture, the microcode compiler will have to expose a relatively large degree of fine grain parallelism ...

article
Free
Compiler techniques for data synchronization in nested parallel loops

The major source of parallelism in ordinary programs is do loops. When loop iterations of parallelized loops are executed on multiprocessors, the cross-iteration data dependencies need to be enforced by synchronization between processors. Existing data ...

article
Free
Compiler techniques for data partitioning of sequentially iterated parallel loops

This paper uses bottom-up, static program partitioning to minimize the execution time of parallel programs by reducing interprocessor communication. Program partitioning is applied to a parallel programming construct known as a sequentially iterated ...

article
Free
On the perfect accuracy of an approximate subscript analysis test

The Banerjee test is commonly considered to be the more accurate of the two major approximate data dependence tests used in automatic vectorization/parallelization of loops, the other being the GCD test. From its derivation, however, there is no simple ...

article
Free
A hardware-based performance monitor for the Intel iPSC/2 hypercube

The complexity of parallel computer systems makes a priori performance prediction difficult and experimental performance analysis crucial. A complete characterization of software and hardware dynamics, needed to understand the performance of high-...

article
Free
Performance degradation due to multiprogramming and system overheads in real workloads: case study on a shared memory multiprocessor

In this paper, performance degradation specifically due to the multiprogramming (MP) overhead in a parallel execution environment is quantified. In addition, total system overhead is also measured. A methodology, which estimates the MP overhead present ...

article
Free
SPARK: a benchmark package for sparse computations

As the diversity of novel architectures expands rapidly there is a growing interest in studying the behavior of these architectures for computations arising in different applications. There has been significant efforts in evaluating the performance of ...

article
Free
Supercomputer performance evaluation and the Perfect Benchmarks

In the past three years, the Perfect BenchmarkTM Suite has evolved from a supercomputer performance evaluation plan, presented by Kuck and Sameh at the 1987 International Conference on Supercomputing, to a vigorous international activity. This paper ...

article
Free
Strategies for large-scale structural problems on high-performance computers

Novel computational strategies are presented for the analysis of large and complex structures. The strategies are based on generating the response of the complex structure using large perturbations from that of a simpler model, associated with a simpler ...

article
Free
Elastodynamics on clustered vector multiprocessors

We present the parallelization of an elastodynamic code on a firmly coupled configuration consisting of two IBM 3090-600 VF, a total of 12 processors, joined with a connection facility. The programming environment used is Clustered FORTRAN which is a ...

article
Free
Implementation of 5-point/9-point multi-level methods on hypercube architectures

Computational complexity of implementing 5/9-point multi-level methods on hypercube architectures is considered. The embedding of the nested red/black structures of these methods is described, and an analysis is made of data distances involved.

article
Free
Supercomputer-based visualization systems used for analyzing output data of a numerical weather prediction model

Comparison of two supercomputer-based visualization systems developed over a half-year period show that the visualization/animation efficiency is largely dependent upon the efficiencies of individual computers, networking, and memory management.

Using a ...

article
Free
Parallel automated wire-routing with a number of competing processors

The purpose of the automated wire routing for VLSI and printed circuit board design is to connect a number of terminal pairs distributed throughout wiring plane with net paths which do not intersect each other. Although maze running and line search are ...

article
Hierarchical algorithms and architectures for parallel scientific computing

There has been a recent emergence of many interesting and highly efficient hierarchical (multilevel) algorithms (e.g. multigrid, domain decomposition, wavelets, multilevel preconditioning, the fast multipole algorithms, etc.) for solving numerical ...

article
Free
Incremental dependence analysis for interactive parallelization

Incrementally updating dependence information during interactive parallelization is a difficult proposition. We have developed a tool (PAT) that maintains dependence information during incremental transformations to a Fortran program, including loop ...

article
Free
Parallelization of FORTRAN code on distributed-memory parallel processors

This paper presents some preliminary results toward the automatic parallelization of uniprocessor FORTRAN code on distributed-memory parallel processors (DMPPs). The paper introduces Oxygen, a compiler for a DMPP under development at the Laboratory. The ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.