[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-07312-0guideproceedingsBook PagePublication PagesConference Proceedingsacm-pubtype
High Performance Computing: 37th International Conference, ISC High Performance 2022, Hamburg, Germany, May 29 – June 2, 2022, Proceedings
2022 Proceeding
Publisher:
  • Springer-Verlag
  • Berlin, Heidelberg
Conference:
International Conference on High Performance ComputingHamburg, Germany29 May 2022
ISBN:
978-3-031-07311-3
Published:
29 May 2022

Reflects downloads up to 13 Jan 2025Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
front-matter
Front Matter
Pages i–xv
back-matter
Back Matter
Article
Front Matter
Page 1
Article
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters
Abstract

As more High-Performance Computing (HPC) and Deep Learning (DL) applications are adapting to scale using GPUs, the communication of GPU-resident data is becoming vital to end-to-end application performance. Among the available MPI operations in ...

Article
NVIDIA’s Quantum InfiniBand Network Congestion Control Technology and Its Impact on Application Performance
Abstract

Applications running on large scale systems often suffer from degraded performance and lack of reproducible run-times due to network-level congestion, whether caused by the application network traffic itself, or by unrelated background network ...

Article
LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular Workloads
Abstract

As emerging workloads exhibit irregular memory access patterns with poor data reuse and locality, they would benefit from a DRAM that achieves low latency without sacrificing bandwidth and energy efficiency. We propose LLM (Low Latency Memory), a ...

Article
SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs
Abstract

SU3_Bench explores performance portability across multiple programming models using a simple but nontrivial mathematical kernel. This kernel has been derived from the [inline-graphic not available: see fulltext] (LQCD) code used in applications ...

Article
Front Matter
Page 85
Article
“Hey CAI” - Conversational AI Enabled User Interface for HPC Tools
Abstract

HPC system users depend on profiling and analysis tools to obtain insights into the performance of their applications and tweak them. The complexity of modern HPC systems have necessitated advances in the associated HPC tools making them equally ...

Article
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters
Abstract

Recent advances in High Performance Computing (HPC) enable Deep Learning (DL) models to achieve state-of-the-art performance by exploiting multiple processors. Data parallelism is a strategy that replicates the DL model on each processor, which is ...

Article
Front Matter
Page 131
Article
Efficient Application of Hanging-Node Constraints for Matrix-Free High-Order FEM Computations on CPU and GPU
Abstract

This contribution presents an efficient algorithm for resolving hanging-node constraints on the fly for high-order finite-element computations on adaptively refined meshes, using matrix-free implementations. We concentrate on unstructured hex-...

Article
Dynamic Task Fusion for a Block-Structured Finite Volume Solver over a Dynamically Adaptive Mesh with Local Time Stepping
Abstract

Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is difficult, as the load changes with every time step. Task-based programming promises to mitigate the load balancing problem. We study a ...

Article
Accelerating Simulated Quantum Annealing with GPU and Tensor Cores
Abstract

Inspired by quantum annealing, simulated quantum annealing (SQA) mimics quantum tunneling effects on classical computers to perform annealing through a path-integral Monte Carlo simulation, which increases the potential to find the global optima ...

Article
m-Cubes: An Efficient and Portable Implementation of Multi-dimensional Integration for GPUs
Abstract

The task of multi-dimensional numerical integration is frequently encountered in physics and other scientific fields, e.g., in modeling the effects of systematic uncertainties in physical systems and in Bayesian parameter estimation . Multi-...

Article
Front Matter
Page 211
Article
Comparative Evaluation of Call Graph Generation by Profiling Tools
Abstract

Call graphs generated by profiling tools are critical to dissecting the performance of parallel programs. Although many mature and sophisticated profiling tools record call graph data, each tool is different in its runtime overheads, memory ...

Article
MAPredict: Static Analysis Driven Memory Access Prediction Framework for Modern CPUs
Abstract

Application memory access patterns are crucial in deciding how much traffic is served by the cache and forwarded to the dynamic random-access memory (DRAM). However, predicting such memory traffic is difficult because of the interplay of ...

Article
Rapid Execution Time Estimation for Heterogeneous Memory Systems Through Differential Tracing
Abstract

As the complexity of compute nodes in high-performance computing (HPC) keeps increasing, systems equipped with heterogeneous memory devices are becoming paramount. Efficiently utilizing heterogeneous memory-based systems, however, poses ...

Article
Understanding Distributed Deep Learning Performance by Correlating HPC and Machine Learning Measurements
Abstract

Frameworks for Distributed Deep Learning (DDL) have become popular alternatives to distribute training by adding a few lines of code to a single-node script. From a High-Performance Computing (HPC) perspective, traditional profiling tools for ...

Article
A Motivating Case Study on Code Variant Selection by Reinforcement Learning
Abstract

In this paper, we investigate the applicability of reinforcement learning as a possible approach to select code variants. Our approach is based on the observation that code variants are usually convertible between one another by code ...

Article
Front Matter
Page 313
Article
Remote OpenMP Offloading
Abstract

OpenMP has a long and successful history in parallel programming for CPUs. Since the introduction of accelerator offloading, it has evolved into a promising candidate for all intra-node parallel computing needs. While this addition broke with the ...

Article
Hybrid Parallel ILU Preconditioner in Linear Solver Library GaspiLS
Abstract

Krylov subspace solvers such as GMRES and preconditioners such as incomplete LU (ILU) are the most commonly used methods to solve general-purpose, large-scale linear systems in simulations efficiently. Parallel Krylov subspace solvers and ...

Article
A Subset of the CERN Virtual Machine File System: Fast Delivering of Complex Software Stacks for Supercomputing Resources
Abstract

Delivering a reproducible environment along with complex and up-to-date software stacks on thousands of distributed and heterogeneous worker nodes is a critical task. The CernVM-File System (CVMFS) has been designed to help various communities to ...

Contributors
  • University of Twente
  • University of Maryland, College Park
  • Lincoln Laboratory
Index terms have been assigned to the content through auto-classification.
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations