TACO: Vol 14, No 4

Volume 14, Issue 4December 2017

Volume 14, Issue 4

December 2017

Editor:

Koen De Bosschere
Ghent University

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Cooperative Multi-Agent Reinforcement Learning-Based Co-optimization of Cores, Caches, and On-chip Network

Article No.: 32, Pages 1–25https://doi.org/10.1145/3132170

Modern multi-core systems provide huge computational capabilities, which can be used to run multiple processes concurrently. To achieve the best possible performance within limited power budgets, the various system resources need to be allocated ...

research-article

Open Access

Bringing Parallel Patterns Out of the Corner: The P³ ARSEC Benchmark Suite

Article No.: 33, Pages 1–26https://doi.org/10.1145/3132710

High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time to solution. Pattern-based ...

research-article

Open Access

Cache Exclusivity and Sharing: Theory and Optimization

Article No.: 34, Pages 1–26https://doi.org/10.1145/3134437

A problem on multicore systems is cache sharing, where the cache occupancy of a program depends on the cache usage of peer programs. Exclusive cache hierarchy as used on AMD processors is an effective solution to allow processor cores to have a large ...

research-article

Open Access

Energy-Efficient Compilation of Irregular Task-Parallel Loops

Article No.: 35, Pages 1–29https://doi.org/10.1145/3136063

Energy-efficient compilation is an important problem for multi-core systems. In this context, irregular programs with task-parallel loops present interesting challenges: the threads with lesser work-loads (non-critical-threads) wait at the join-points ...

research-article

Open Access

Compiler-Assisted Loop Hardening Against Fault Attacks

Article No.: 36, Pages 1–25https://doi.org/10.1145/3141234

Secure elements widely used in smartphones, digital consumer electronics, and payment systems are subject to fault attacks. To thwart such attacks, software protections are manually inserted requiring experts and time. The explosion of the Internet of ...

research-article

Open Access

A Transactional Correctness Tool for Abstract Data Types

Article No.: 37, Pages 1–24https://doi.org/10.1145/3148964

Transactional memory simplifies multiprocessor programming by providing the guarantee that a sequential block of code in the form of a transaction will exhibit atomicity and isolation. Transactional data structures offer the same guarantee to concurrent ...

research-article

Open Access

Power Consumption Models for Multi-Tenant Server Infrastructures

Article No.: 38, Pages 1–22https://doi.org/10.1145/3148965

Multi-tenant virtualized infrastructures allow cloud providers to minimize costs through workload consolidation. One of the largest costs is power consumption, which is challenging to understand in heterogeneous environments. We propose a power modeling ...

research-article

Open Access

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

Article No.: 39, Pages 1–26https://doi.org/10.1145/3151034

We introduce the Coarse-Grain Out-of-Order (CG-OoO) general-purpose processor designed to achieve close to In-Order (InO) processor energy while maintaining Out-of-Order (OoO) performance. CG-OoO is an energy-performance-proportional architecture. Block-...

research-article

Open Access

ECS: Error-Correcting Strings for Lifetime Improvements in Nonvolatile Memories

Article No.: 40, Pages 1–29https://doi.org/10.1145/3151083

Emerging nonvolatile memories (NVMs) suffer from low write endurance, resulting in early cell failures (hard errors), which reduce memory lifetime. It was recognized early on that conventional error-correcting codes (ECCs), which are designed for soft ...

research-article

Open Access

SLOOP: QoS-Supervised Loop Execution to Reduce Energy on Heterogeneous Architectures

Article No.: 41, Pages 1–25https://doi.org/10.1145/3148053

Most systems allocate computational resources to each executing task without any actual knowledge of the application’s Quality-of-Service (QoS) requirements. Such best-effort policies lead to overprovisioning of the resources and increase energy loss. ...

research-article

Open Access

MBZip: Multiblock Data Compression

Article No.: 42, Pages 1–29https://doi.org/10.1145/3151033

Compression techniques at the last-level cache and the DRAM play an important role in improving system performance by increasing their effective capacities. A compressed block in DRAM also reduces the transfer time over the memory bus to the caches, ...

research-article

Open Access

Fuse: Accurate Multiplexing of Hardware Performance Counters Across Executions

Article No.: 43, Pages 1–26https://doi.org/10.1145/3148054

Collecting hardware event counts is essential to understanding program execution behavior. Contemporary systems offer few Performance Monitoring Counters (PMCs), thus only a small fraction of hardware events can be monitored simultaneously. We present ...

research-article

Open Access

Could Compression Be of General Use? Evaluating Memory Compression across Domains

Article No.: 44, Pages 1–24https://doi.org/10.1145/3138805

Recent proposals present compression as a cost-effective technique to increase cache and memory capacity and bandwidth. While these proposals show potentials of compression, there are several open questions to adopt these proposals in real systems ...

research-article

Open Access

Improving the Efficiency of GPGPU Work-Queue Through Data Awareness

Article No.: 45, Pages 1–22https://doi.org/10.1145/3151035

The architecture and programming model of current GPGPUs are best suited for applications that are dominated by structured control and data flows across large regular datasets. Parallel workloads with irregular control and data structures cannot easily ...

research-article

Open Access

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Article No.: 46, Pages 1–25https://doi.org/10.1145/3151032

Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs ...

research-article

Open Access

Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach

Article No.: 47, Pages 1–26https://doi.org/10.1145/3155288

The recent evolution in hardware landscape, aimed at producing high-performance computing systems capable of reaching extreme-scale performance, has reignited the interest in fine-grain multithreading, particularly at the intranode level. Indeed, ...

research-article

Open Access

CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory

Article No.: 48, Pages 1–25https://doi.org/10.1145/3155287

Three-dimensional (3D)-stacking technology and the memory-wall problem have popularized processing-in-memory (PIM) concepts again, which offers the benefits of bandwidth and energy savings by offloading computations to functional units inside the ...

research-article

Open Access

Triple Engine Processor (TEP): A Heterogeneous Near-Memory Processor for Diverse Kernel Operations

Article No.: 49, Pages 1–25https://doi.org/10.1145/3155920

The advent of 3D memory stacking technology, which integrates a logic layer and stacked memories, is expected to be one of the most promising memory technologies to mitigate the memory wall problem by leveraging the concept of near-memory processing (...

research-article

Open Access

ReDirect: Reconfigurable Directories for Multicore Architectures

Article No.: 50, Pages 1–23https://doi.org/10.1145/3162015

As we enter the dark silicon era, architects should not envision designs in which every transistor remains turned on permanently but rather ones in which portions of the chip are judiciously turned on/off depending on the characteristics of a workload. ...

research-article

Open Access

HAShCache: Heterogeneity-Aware Shared DRAMCache for Integrated Heterogeneous Systems

Article No.: 51, Pages 1–26https://doi.org/10.1145/3158641

Integrated Heterogeneous System (IHS) processors pack throughput-oriented General-Purpose Graphics Pprocessing Units (GPGPUs) alongside latency-oriented Central Processing Units (CPUs) on the same die sharing certain resources, e.g., shared last-level ...

research-article

Open Access

Optimizing Affine Control With Semantic Factorizations

Article No.: 52, Pages 1–22https://doi.org/10.1145/3162017

Hardware accelerators generated by polyhedral synthesis techniques make extensive use of affine expressions (affine functions and convex polyhedra) in control and steering logic. Since the control is pipelined, these affine objects must be evaluated at ...

research-article

Open Access

Data-Driven Concurrency for High Performance Computing

Article No.: 53, Pages 1–26https://doi.org/10.1145/3162014

In this work, we utilize dynamic dataflow/data-driven techniques to improve the performance of high performance computing (HPC) systems. The proposed techniques are implemented and evaluated through an efficient, portable, and robust programming ...

research-article

Open Access

SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads

Article No.: 54, Pages 1–25https://doi.org/10.1145/3158643

Shared memory machines continue to increase in scale by adding more parallelism through additional cores and complex memory hierarchies. Often, executing multiple applications concurrently, dividing among them hardware threads, provides greater ...

research-article

Open Access

Optimization of Triangular and Banded Matrix Operations Using 2d-Packed Layouts

Article No.: 55, Pages 1–19https://doi.org/10.1145/3162016

Over the past few years, multicore systems have become increasingly powerful and thereby very useful in high-performance computing. However, many applications, such as some linear algebra algorithms, still cannot take full advantage of these systems. ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

Cooperative Multi-Agent Reinforcement Learning-Based Co-optimization of Cores, Caches, and On-chip Network

Bringing Parallel Patterns Out of the Corner: The P³ ARSEC Benchmark Suite

Cache Exclusivity and Sharing: Theory and Optimization

Energy-Efficient Compilation of Irregular Task-Parallel Loops

Compiler-Assisted Loop Hardening Against Fault Attacks

A Transactional Correctness Tool for Abstract Data Types

Power Consumption Models for Multi-Tenant Server Infrastructures

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance

ECS: Error-Correcting Strings for Lifetime Improvements in Nonvolatile Memories

SLOOP: QoS-Supervised Loop Execution to Reduce Energy on Heterogeneous Architectures

MBZip: Multiblock Data Compression

Fuse: Accurate Multiplexing of Hardware Performance Counters Across Executions

Could Compression Be of General Use? Evaluating Memory Compression across Domains

Improving the Efficiency of GPGPU Work-Queue Through Data Awareness

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach

CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory

Triple Engine Processor (TEP): A Heterogeneous Near-Memory Processor for Diverse Kernel Operations

ReDirect: Reconfigurable Directories for Multicore Architectures

HAShCache: Heterogeneity-Aware Shared DRAMCache for Integrated Heterogeneous Systems

Optimizing Affine Control With Semantic Factorizations

Data-Driven Concurrency for High Performance Computing

SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads

Optimization of Triangular and Banded Matrix Operations Using 2d-Packed Layouts