No abstract available.
Keynotes
A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine
- Vladimir Gajinov,
- Igor Eric,
- Saša Stojanovic,
- Veljko Milutinovic,
- Osman Unsal,
- Eduard Ayguadé,
- Adrián Cristal
Recently proposed hybrid dataflow and shared memory programming models combine these two underlying models in order to support a wider range of problems naturally. The effectiveness of such hybrid models for parallel implementations of dense and sparse ...
Wait-Free Global Virtual Time Computation in Shared Memory TimeWarp Systems
Global Virtual Time (GVT) is a powerful abstraction used to discriminate what events belong (and what do not belong) to the past history of a parallel/distributed computation. For high performance simulation systems based on the Time Warp ...
Compact Hash Tables for High-Performance Traffic Classification on Multi-core Processors
Traffic classification is one of the kernel applications in network management. Many Machine Learning (ML) traffic classification algorithms are based on decision-trees. While most of the existing implementations of decision-trees are hardwarebased, a ...
Flying Memcache: Lessons Learned from Different Acceleration Strategies
Distributed key-value and always-in-memory store is employed by large and demanding services, such as Facebook and Amazon. It is apparent that generic implementations of such caches can not meet the needs of every application, therefore further research ...
Improving an MPI Application-Level Migration Approach through Checkpoint File Splitting
Traditionally used for load balancing, process migration has been gaining popularity in the fault tolerance context. Recently, checkpoint-based migration has been proposed to implement failure avoidance in MPI applications through the proactive ...
HPCG: Preliminary Evaluation and Optimization on Tianhe-2 CPU-only Nodes
HPCG has become a new metric for the design and ranking of HPC. By incorporating a local symmetric Gauss-Seidel preconditioned, HPCG implements the Conjugate Gradient method to solve a sparse linear system. HPCG performs poorly with irregular memory ...
Leveraging Optimization Methods for Dynamically Assisted Control-Flow Integrity Mechanisms
Dynamic Binary Modification (DBM) tools are useful for cross-platform execution of binaries and are powerful run time environments that allow execution optimizations, instrumentation and profiling. These tools have also been used as enablers for control-...
Energy Efficient Seismic Wave Propagation Simulation on a Low-Power Manycore Processor
Large-scale simulation of seismic wave propagation is an active research topic. Its high demand for processing power makes it a good match for High Performance Computing (HPC). Although we have observed a steady increase on the processing capabilities ...
Performance-Aware Task Management and Frequency Scaling in Embedded Systems
Due to the dissemination of smartphones and tablets, a constant complexity growth can be observed for both embedded systems and mobile applications. However, this results in an increase in energy consumption. To guarantee longer battery life cycles, it ...
Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression
One of the greatest challenges in HPC is total system power and energy consumption. Whereas HPC interconnects have traditionally been designed with a focus on bandwidth and latency, there is an increasing interest in minimising the interconnect's energy ...
Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor
Bit-parallel pattern matching encodes calculated values in bit arrays. This approach gains its efficiency by performing multiple updates within a machine word. An important parameter is therefore the machine word size (e.g. 32 or 64 bits). With the ...
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, ...
High-Performance Traffic Classification on GPU
Traffic classification is an essential task in network management. Recently, there has been a new trend in exploring Graphics Processing Unit (GPU) for network applications. These applications typically do not perform floating point operations and ...
Accelerating Curvature Estimate in 3D Seismic Data Using GPGPU
- Leonardo Martins,
- Marco Aurélio Gonçalves da Silva,
- Marcelo Arruda,
- Joner Duarte,
- Pedro Mário Silva,
- Roberto Beauclair Seixas,
- Marcelo Gattass
Seismic interpretation is a vital step in oil and gas industry. Choosing proper drilling locations is a major challenge to the interpreters, since an ultra-deep water oil well located below 2500 meters of water can cost dozens of millions of dollars. ...
Leveraging OmpSs to Exploit Hardware Accelerators
CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although ...
Improving Signature Behavior by Irrevocability in Transactional Memory Systems
Signatures have been proposed in Hardware Transactional Memory (HTM) to represent read and write sets of transactions and decouple transaction conflict detection from private caches. Generally, signatures are implemented as Bloom filters that allow ...
Scalability Analysis of Signatures in Transactional Memory Systems
Signatures have been proposed in transactional memory systems to represent read and write sets and to decouple transaction conflict detection from private caches or to accelerate it. Generally, signatures are implemented as Bloom filters that allow ...
Profiling Patterns of Bit Flipping for Software Transactional Memories
Software Transactional Memory (STM) is a synchronization method proposed as an alternative to lockbased synchronization. It provides a higher-level abstraction that is easier to program, and that enables software composition. Transactions are defined by ...
Multi-dimensional Evaluation of Haswell's Transactional Memory Performance
This paper presents an extensive performance study of the implementation of Hardware Transactional Memory (HTM) in the Haswell generation of Intel x86 core processors. This study evaluates the strengths and weaknesses of this new architecture exploring ...
DeTrans: Deterministic and Parallel execution of Transactions
Deterministic execution of a multithreaded application guarantees the same output as long as the application runs with the same input parameters. Determinism helps a programmer to test and debug an application and to provide fault-tolerance in the ...
Design Space Exploration of Memory Model for Heterogeneous Computing
Heterogeneous computing that combines a traditional CPU architecture with an accelerator has become a popular architecture. Memory modelling design decisions affect not only architecture designs but also programming models. Hence, comparing them is very ...
Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs
GPUs have gained tremendous popularity in a broad range of application domains. These applications possess varying grains of parallelism and place high demands on compute resources--many times imposing real-time constraints, requiring flexible work ...
Automatic Generation of Custom Parallel Processors for Morphological Image Processing
Image processing applications are well established in modern society, presenting continuous advances and challenges. One of its fundamental techniques is morphological image processing, a nonlinear branch in image processing, which have high performance ...