Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
HiRace: Accurate and Fast Data Race Checking for GPU Programs
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 36, Pages 1–14https://doi.org/10.1109/SC41406.2024.00042Data races are egregious concurrency bugs that are especially problematic in performance-oriented GPU codes where large thread counts and multiple shared memory regions tend to exacerbate them. In this work, we present a new dynamic data-race checker ...
- ArticleAugust 2024
Bringing Auto-Tuning to HIP: Analysis of Tuning Impact and Difficulty on AMD and Nvidia GPUs
AbstractMany studies have focused on developing and improving auto-tuning algorithms for Nvidia Graphics Processing Units (GPUs), but the effectiveness and efficiency of these approaches on AMD devices have hardly been studied. This paper aims to address ...
- research-articleNovember 2023
High-level GPU code: a case study examining JAX and OpenMP.
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1105–1113https://doi.org/10.1145/3624062.3624186In recent years, a new scientific software design pattern has emerged, pairing a Python interface with high-performance kernels in lower-level languages. The rise of general-purpose GPUs necessitates the rewriting of many such kernels, which poses ...
- ArticleAugust 2023
TPGen: A Self-stabilizing GPU-Based Method for Test and Prime Paths Generation
AbstractThis paper presents a novel scalable GPU-based method for Test Paths (TPs) and Prime Paths (PPs) Generation, called TPGen, used in structural testing and in test data generation. TPGen outperforms existing methods for PPs and TPs generation in ...
- articleJuly 2016
Garment Simulation and Collision Detection on a Mobile Device
International Journal of Mobile Computing and Multimedia Communications (IJMCMC-IGI), Volume 7, Issue 3Pages 1–15https://doi.org/10.4018/IJMCMC.2016070101This paper describes several techniques for accelerating a virtual try-on garment simulation on a mobile device smartphone or tablet using parallel computing on a multicore CPU, GPU computing or both depending on the mobile hardware. The system exploits ...
- research-articleJune 2015
Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach
SCOPES '15: Proceedings of the 18th International Workshop on Software and Compilers for Embedded SystemsPages 38–47https://doi.org/10.1145/2764967.2764968Stream programming based on the synchronous data flow (SDF) model naturally exposes data, task and pipeline parallelism. Statically scheduling stream programs for homogeneous architectures has been an area of extensive research. With graphic processing ...
- articleJuly 2014
Change detection by probabilistic segmentation from monocular view
Machine Vision and Applications (MVAA), Volume 25, Issue 5Pages 1175–1195https://doi.org/10.1007/s00138-013-0564-3We present a method for foreground/background video segmentation (change detection) in real-time that can be used, in applications such as background subtraction or analysis of surveillance cameras. Our approach implements a probabilistic segmentation ...
- ArticleDecember 2012
A Parallel H.264 Encoder with CUDA: Mapping and Evaluation
ICPADS '12: Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed SystemsPages 276–283https://doi.org/10.1109/ICPADS.2012.46Efficient mapping of a real-time HD video application to graphics hardware is challenging. Developers face the challenges of choosing the right parallelism model, balancing thread's process granularity between massive computing resources on the GPU, and ...
- ArticleOctober 2012
Multiagent Systems Modeling Using GPUs -- A Case Study of the Human Immune System
- Oberlan Christo Romao,
- Luis Eduardo de Souza Amorim,
- Ricardo Santos Ferreira,
- Maurilio de Araujo Possi,
- Alcione de Paiva Oliveira
WSCAD-SSC '12: Proceedings of the 2012 13th Symposium on Computing SystemsPages 234–241https://doi.org/10.1109/WSCAD-SSC.2012.31Computer Science development plays an important role to understand natural phenomena. Its advance has impacted on studies results from many areas such as Biology and Medicine. Agent-based Models (ABM) are an alternative to model and to simulate natural ...
- articleJuly 2012
Cellular Automata and GPGPU: An Application to Lava Flow Modeling
International Journal of Grid and High Performance Computing (IJGHPC-IGI), Volume 4, Issue 3Pages 30–47https://doi.org/10.4018/jghpc.2012070102This paper presents an efficient implementation of the SCIARA Cellular Automata computational model for simulating lava flows using the Compute Unified Device Architecture CUDA interface developed by NVIDIA and carried out on Graphical Processing Units ...
- ArticleMay 2012
Parameterized Verification of GPU Kernel Programs
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumPages 2450–2459https://doi.org/10.1109/IPDPSW.2012.302We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool checks the functional equivalence of a kernel and its optimized versions, helping debug ...
- ArticleMay 2012
Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumPages 1858–1865https://doi.org/10.1109/IPDPSW.2012.229Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches - CUDA and OpenCL - are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL ...
- ArticleSeptember 2010
Synthesizing Subdivision Meshes Using Real Time Tessellation
PACIFIC_GRAPHICS '10: Proceedings of the 2010 18th Pacific Conference on Computer Graphics and ApplicationsPages 46–53https://doi.org/10.1109/PacificGraphics.2010.14We propose a new GPU method for synthesizing subdivision meshes with exact adaptive geometry in real time. Our GPU kernel builds upon precomputed tables of basis functions for subdivision surfaces and is therefore supporting all subdivision schemes, ...
- ArticleOctober 2009
Introduction to GPU Programming with GLSL
SIBGRAPI-TUTORIALS '09: Proceedings of the 2009 Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image ProcessingPages 3–16https://doi.org/10.1109/SIBGRAPI-Tutorials.2009.9One of the challenging advents in Computer Science in recent years was the fast evolution of parallel processors, specially the GPU – graphics processing unit. GPUs today play a major role in many computational environments, most notably those regarding ...
- ArticleJuly 2009
A Hardware Accelerated Algorithm for Terrain Visualization
UAHCI '09: Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and ServicesPages 271–280https://doi.org/10.1007/978-3-642-02713-0_29In recent years, rapid development of graphics hardware technology made it possible to render a large scale model in real-time. In this paper, we present a hardware accelerated algorithm for large scale terrain model visualization based on the ROAM (...
- ArticleMarch 2009
Software Pipelined Execution of Stream Programs on GPUs
CGO '09: Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and OptimizationPages 200–209https://doi.org/10.1109/CGO.2009.20The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, ...