Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2025
A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 39, Issue 1Pages 18–31https://doi.org/10.1177/10943420241288567We present the GPU implementation efforts and challenges of the sparse solver package STRUMPACK. The code is made publicly available on github with a permissive BSD license. STRUMPACK implements an approximate multifrontal solver, a sparse LU ...
- research-articleNovember 2024
Non-smooth Bayesian optimization in tuning scientific applications
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 6Pages 633–657https://doi.org/10.1177/10943420241278981Tuning algorithmic parameters to optimize the performance of large, complicated computational codes is an important problem involving finding the optima and identifying regimes defined by non-smooth boundaries in black-box functions. Within the Bayesian ...
- research-articleNovember 2024
Batched sparse direct solver design and evaluation in SuperLU_DIST
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 6Pages 585–598https://doi.org/10.1177/10943420241268200Over the course of interactions with various application teams, the need for batched sparse linear algebra functions has emerged in order to make more efficient use of the GPUs for many small and sparse linear algebra problems. In this paper, we present ...
- research-articleApril 2024
Then and Now: Improving Software Portability, Productivity, and 100× Performance
Computing in Science and Engineering (IEEECS_CISE-NEW), Volume 26, Issue 1Pages 61–70https://doi.org/10.1109/MCSE.2024.3387302The U.S. Exascale Computing Project (ECP) has succeeded in preparing applications to run efficiently on the first reported exascale supercomputers in the world. To achieve this, it modernized the whole leadership software stack, from libraries to ...
- research-articleMarch 2022
Resiliency in numerical algorithm design for extreme scale simulations
- Emmanuel Agullo,
- Mirco Altenbernd,
- Hartwig Anzt,
- Leonardo Bautista-Gomez,
- Tommaso Benacchio,
- Luca Bonaventura,
- Hans-Joachim Bungartz,
- Sanjay Chatterjee,
- Florina M Ciorba,
- Nathan DeBardeleben,
- Daniel Drzisga,
- Sebastian Eibl,
- Christian Engelmann,
- Wilfried N Gansterer,
- Luc Giraud,
- Dominik Göddeke,
- Marco Heisig,
- Fabienne Jézéquel,
- Nils Kohl,
- Xiaoye Sherry Li,
- Romain Lion,
- Miriam Mehl,
- Paul Mycek,
- Michael Obersteiner,
- Enrique S Quintana-Ortí,
- Francesco Rizzi,
- Ulrich Rüde,
- Martin Schulz,
- Fred Fung,
- Robert Speck,
- Linda Stals,
- Keita Teranishi,
- Samuel Thibault,
- Dominik Thönnes,
- Andreas Wagner,
- Barbara Wohlmuth
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 36, Issue 2Pages 251–285https://doi.org/10.1177/10943420211055188This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high ...
Dr. Top-k: delegate-centric Top-k on GPUs
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 39, Pages 1–14https://doi.org/10.1145/3458817.3476141Recent top-k computation efforts explore the possibility of revising various sorting algorithms to answer top-k queries on GPUs. These endeavors, unfortunately, perform significantly more work than needed. This paper introduces Dr. Top-k, a <u>D</u>...
- research-articleJuly 2021
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic
- Ahmad Abdelfattah,
- Hartwig Anzt,
- Erik G Boman,
- Erin Carson,
- Terry Cojean,
- Jack Dongarra,
- Alyson Fox,
- Mark Gates,
- Nicholas J Higham,
- Xiaoye S Li,
- Jennifer Loe,
- Piotr Luszczek,
- Srikara Pranesh,
- Siva Rajamanickam,
- Tobias Ribizel,
- Barry F Smith,
- Kasia Swirydowicz,
- Stephen Thomas,
- Stanimire Tomov,
- Yaohung M Tsai,
- Ulrike Meier Yang
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 35, Issue 4Pages 344–369https://doi.org/10.1177/10943420211003313The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of low-precision special-function units designed for machine ...
- research-articleNovember 2020
C-SAW: a framework for graph sampling and random walk on GPUs
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 56, Pages 1–15Many applications require to learn, mine, analyze and visualize large-scale graphs. These graphs are often too large to be addressed efficiently using conventional graph processing technologies. Fortunately, recent research efforts find out graph ...
- research-articleJuly 2020
A parallel hierarchical blocked adaptive cross approximation algorithm
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 34, Issue 4Pages 394–408https://doi.org/10.1177/1094342020918305This article presents a low-rank decomposition algorithm based on subsampling of matrix entries. The proposed algorithm first computes rank-revealing decompositions of submatrices with a blocked adaptive cross approximation (BACA) algorithm, and then ...
- research-articleJanuary 2020
A Distributed-Memory Algorithm for Computing a Heavy-Weight Perfect Matching on Bipartite Graphs
SIAM Journal on Scientific Computing (SISC), Volume 42, Issue 4Pages C143–C168https://doi.org/10.1137/18M1189348We design and implement an efficient parallel algorithm for finding a perfect matching in a weighted bipartite graph such that weights on the edges of the matching are large. This problem differs from the maximum weight matching problem, for which scalable ...
- research-articleSeptember 2019
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
Journal of Parallel and Distributed Computing (JPDC), Volume 131, Issue CPages 218–234https://doi.org/10.1016/j.jpdc.2019.03.004AbstractWe propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D algorithm for sparse LU uses a three-dimensional MPI process grid, exploits elimination tree ...
Highlights- The 3D sparse LU factorization algorithm reduces the asymptotic communication complexity by ( log n ) for planar sparse matrices of dimension n and by a ...
- research-articleDecember 2018
Consensus Ensemble System for Traffic Flow Prediction
IEEE Transactions on Intelligent Transportation Systems (ITS-TRANSACTIONS), Volume 19, Issue 12Pages 3903–3914https://doi.org/10.1109/TITS.2018.2791505Traffic flow prediction is a key component of an intelligent transportation system. Accurate traffic flow prediction provides a foundation for other tasks, such as signal coordination and travel time forecasting. There are many known methods in literature ...
- research-articleNovember 2018
A Unified Software Framework to Enable Solution of Traffic Assignment Problems at Extreme Scale
2018 21st International Conference on Intelligent Transportation Systems (ITSC)Pages 3917–3922https://doi.org/10.1109/ITSC.2018.8569991We describe a modular software framework for solving user equilibrium traffic assignment problems. The design is based on the formulation of the problem as a variational inequality. Unlike most existing traffic assignment software which focus on specific ...
- research-articleNovember 2018
Efficient Online Hyperparameter Learning for Traffic Flow Prediction
2018 21st International Conference on Intelligent Transportation Systems (ITSC)Pages 164–169https://doi.org/10.1109/ITSC.2018.8569972Compute efficiency is an important consideration for traffic flow prediction models. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. ...
- ArticleSeptember 2014
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations
BRACIS '14: Proceedings of the 2014 Brazilian Conference on Intelligent SystemsPages 201–210https://doi.org/10.1109/ICPP.2014.29In the field of nanoparticle material science, X-ray scattering techniques are widely used for characterization of macromolecules and particle systems (ordered, partially-ordered or custom) based on their structural properties at the micro- and nano-...
- ArticleOctober 2014
Tuning HipGISAXS on Multi and Many Core Supercomputers
High Performance Computing Systems. Performance Modeling, Benchmarking and SimulationPages 217–238https://doi.org/10.1007/978-3-319-10214-6_11AbstractWith the continual development of multi and many-core architectures, there is a constant need for architecture-specific tuning of application-codes in order to realize high computational performance and energy efficiency, closer to the theoretical ...
- ArticleApril 2002
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing SymposiumPage 203The increasing gap between processor and memory performance has led to new architectural models for memory-intensive applications. In this paper, we use a set of memory-intensive benchmarks to evaluate a mixed logic and DRAM processor called VIRAM as a ...
- ArticleMay 2000
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems
IPDPS '00: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed ProcessingPages 497–503Computer sim ulationsof realistic applications usually require solving a set of non-linear partial differential equations (PDEs) over a finite region. The process of obtaining numerical solutions to the governing PDEs involves solving large sparse ...