Welcome to the 2013 Conference on Compilers, Architectures and Synthesis for Embedded Systems - CASES'13! We are very pleased to continue the CASES tradition of bringing a unique focus on the intersection of compilers and architectures to Embedded Systems Week (ESWEEK). CASES has served this purpose since 2006 when it first formed part of the three core conferences that comprise ESWEEK.
Proceeding Downloads
Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems
Homogeneous multiprocessor systems with reconfigurable area (also known as Reconfigurable Multiprocessor Systems) are emerging as a popular design choice in current and future technology nodes to meet the heterogeneous computing demand of a multitude of ...
Scrubbing unit repositioning for fast error repair in FPGAs
Field Programmable Gate Arrays (FPGAs) are very successful platforms that rely on large configuration memories to store the circuit functions required by users. Faults affecting such memories are a major dependability threat for these devices, and the ...
Compiled multithreaded data paths on FPGAs for dynamic workloads
Hardware supported multithreading can mask memory latency by switching the execution to ready threads, which is particularly effective on irregular applications. FPGAs provide an opportunity to have multithreaded data paths customized toeach individual ...
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms
Automatic parallelization of sequential applications is the key for efficient use and optimization of current and future embedded multi-core systems. However, existing approaches often fail to achieve efficient balancing of tasks running on ...
Expandable process networks to efficiently specify and explore task, data, and pipeline parallelism
Running each application of a many-core system on an isolated (virtual) guest machine is a widely considered solution for performance and reliability issues. When a new application is started, the guest machine is assigned with an amount of computing ...
A novel compilation approach for image processing graphs on a many-core platform with explicitly managed memory
Explicitly managed memory many-cores (EMM) have been a part of the industrial landscape for the last decade. The IBM CELL processor, general-purpose graphics processing units (GP-GPU) and the STHORM embedded many-core of STMicroelectronics are ...
Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches
The problem of finding the most effective set and ordering of optimization phases to generate the best quality code is a fundamental issue in compiler optimization research. Unfortunately, the exorbitantly large phase order search spaces in current ...
Platform-dependent code generation for embedded real-time software
Code generation for embedded systems is challenging, since the generated code (e.g., C code) is expected to run on a heterogeneous set of target platforms with different characteristics, such as hardware/software architectures and programming ...
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and ...
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator
This paper presents a hybrid compile and run-time memory management technique for a 3D-stacked reconfigurable accelerator including a memory layer composed of multiple memory units whose parallel access allows a very high bandwidth. The technique ...
Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies
Two key parameters that determine the performance of a DRAM cache based multi-core system are DRAM cache hit latency (HL) and DRAM cache miss rate (MR), as they strongly influence the average DRAM cache access latency. Recently proposed DRAM set mapping ...
Minimizing code size via page selection optimization on partitioned memory architectures
For 8-bit microcontrollers, bank-switching is commonly used to increase memory capacity. The disadvantage of this technique is that bank (page) selection instructions are introduced when switching active data (program) bank. The page selection problem ...
EVA: an efficient vision architecture for mobile systems
The capabilities of mobile devices have been increasing at a momentous rate. As better processors have merged with capable cameras in mobile systems, the number of computer vision applications has grown rapidly. However, the computational and energy ...
Hardware acceleration for programs in SSA form
Register allocation is one of the most time-consuming parts of the compilation process. Depending on the quality of the register allocation, a large amount of shuffle code to move values between registers is generated. In this paper, we propose a ...
Power-performance modeling on asymmetric multi-cores
Asymmetric multi-core architectures have recently emerged as a promising alternative in a power and thermal constrained environment. They typically integrate cores with different power and performance characteristics, which makes mapping of workloads to ...
Bitcoin and the age of bespoke silicon
Recently, the Bitcoin cryptocurrency has been an international sensation. This paper tells the story of Bitcoin hardware: how a group of early-adopters self-organized and financed the creation of an entire new industry, leading to the development of ...
Dynamic hardware specialization: using Moore's Bounty without burning the chip down
The era of faster, smaller, greener (more power efficient) transistors in every successive generation appears to be dead. Due to slowing voltage scaling power has becoming a primary design constraint. Using conventional microprocessor techniques does ...
From software to accelerators with LegUp high-level synthesis
- Andrew Canis,
- Jongsok Choi,
- Blair Fort,
- Ruolong Lian,
- Qijing Huang,
- Nazanin Calagar,
- Marcel Gort,
- Jia Jun Qin,
- Mark Aldham,
- Tomasz Czajkowski,
- Stephen Brown,
- Jason Anderson
Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator
Code discovery has been a main challenge for static binary translation, especially when the source ISA (Instruction Set Architecture) has variable-length instructions, such as the X86 architectures. Due to embedded data such as PC-relative data, jump ...
ILPc: a novel approach for scalable timing analysis of synchronous programs
Synchronous programs have been widely used in the design of safety critical systems such as the flight control of Airbus A-380. To validate the implementations of synchronous programs, it is necessary to map the program's logical time (measured in ...
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems
Modern system architectures sometimes include scratch pad memories (SPM) in their memory hierarchy to take advantage of their simpler design, in an attempt to meet the system area, performance, and power budget. These systems employing SPM can be ...
Fault detection and recovery efficiency co-optimization through compile-time analysis and runtime adaptation
The ever scaling-down feature size and noise margin keep elevating hardware failure rates, requiring the incorporation of fault tolerance into computer systems. One fault tolerance scheme that receives a lot of research attention is redundant execution. ...
Global property violation detection and diagnosis for wireless sensor networks
Run-time error detection and deterministic off-line error replay have received wide attention in recent years as a technique to enhance the programmer's ability to find software errors. To apply this technique to wireless sensor networks (WSN), one must ...
An efficient run-time encryption scheme for non-volatile main memory
Emerging non-volatile memories (NVMs) have been considered as promising alternatives of DRAM for future main memory design. The NVM main memory has advantages of low standby power, high density, and good scalability. Its non-volatility, however, induces ...
Index Terms
- Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems