-
Hiroyuki Tomiyama
Article type: Editorial
Subject area: Editorial
2013 Volume 6 Pages
1
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
-
Sangyoung Park, Younghyun Kim, Jaehyun Park, Naehyuck Chang
Article type: Invited Paper
Subject area: System-Level Design
2013 Volume 6 Pages
2-16
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
Semiconductor scaling makes the individual part can no longer share the same supply voltage, and some chips even require multiple different supply voltage levels. Different input and output voltage standard specification of each device make use of multiple supply voltage levels. Various devices such as display, RF, USB, SD card, etc. increase the number of supply voltage levels. Moreover, analog devices often do not allow sharing power supply due to coupling noise. However, those components are commonly powered by a single power source such as a battery. Consequently, power converters such as on- and off-chip switching-mode DC-DC converters, low-dropout linear regulators and charge pumps are largely populated even on a single circuit board. Efficiency of the power converters is known to be high enough and often ignored during power management policy development. However, their actual conversion efficiency varies significantly according to device activity and power mode, which sometimes results in substantially lower efficiency than the value provided in datasheets. Moreover, hardware designers generally optimize the power converters for the maximum power supply current of the device and even perform over-design while the actual device power consumption during runtime could be largely offset from the energy-optimal operating point. This tutorial paper covers a wide range of topics on power converter-aware design and introduces several design practices; i) power converter basics and the conversion efficiency, ii) power converter voltage transition overhead, iii) power converter-aware design of embedded systems, and iv) maximum energy transfer of energy harvesting devices.
View full abstract
-
Yoonmyung Lee, Dongmin Yoon, Yejoong Kim, David Blaauw, Dennis Sylvest ...
Article type: Invited Paper
Subject area: Architectural Low-Power Design
2013 Volume 6 Pages
17-26
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
Designing an ultra-low power sensor node requires careful consideration of the system-level energy budget. Depending on applications, various components can dominate total energy. In this paper, we review three different system energy budget scenarios where any of the microprocessor, memory, and timer of a sensor node can dominate the energy budget. The design space and corresponding trade-offs for these three components are explored to suggest guidelines for the design of ultra-low power sensor nodes.
View full abstract
-
Katsuya Fujiwara, Hideo Fujiwara, Hideo Tamamoto
Article type: Regular Paper
Subject area: Testing
2013 Volume 6 Pages
27-33
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
Scan design makes digital circuits easily testable, however, it can also be exploited to be used for hacking the chip. We have reported a secure and testable scan design approach by using extended shift registers called “SR-equivalents” that are functionally equivalent but not structurally equivalent to shift registers[14][15][16][17][18]. In this paper, to further extend the class of SR-equivalents we introduce a wider class of circuits called “SR-quasi-equivalents” which still satisfy the testability and security similar to SR-equivalents. To estimate the security level, we clarify the cardinality of each equivalent class in SR-quasi-equivalents for several linear structural circuits, and also present the actual number of SR-quasi-equivalents obtained by the enhanced program SREEP.
View full abstract
-
Xin Jiang, Ran Zhang, Takahiro Watanabe
Article type: Regular Paper
Subject area: System-Level Synthesis
2013 Volume 6 Pages
34-41
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
With the progress of 3D IC integration technologies, the application of 3D Networks-on-chip (NoCs) has been proposed as a scalable and efficient solution to the global communication in the interconnect designs. In this work, we propose a new procedure for designing application specific irregular 3D NoC architectures. This procedure does not only satisfy the variability of the highly customized SoC designs, but also achieve significant performance improvement. The objective is to improve both communication latency and power consumption under several 3D constraints. A Genetic Algorithm (GA) based efficient algorithm is applied to optimize both the topology and floorplan. Numerical experiments are implemented on standard benchmarks by comparing the method application in 3D architectures with the 2D designs and then comparing the architecture obtained by our proposed algorithm with both classical topologies and custom based topologies. The experimental results show that the architectures by our design algorithm can achieve more performance improvement than other algorithms and the proposed algorithm also proves to be a time efficient method for exploration in the large solution space.
View full abstract
-
Kosuke Mizuno, Yosuke Terachi, Kenta Takagi, Shintaro Izumi, Hiroshi K ...
Article type: Regular Paper
Subject area: Architectural Design
2013 Volume 6 Pages
42-51
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
This paper describes a Histogram of Oriented Gradients (HOG)-based object detection processor. It features a simplified HOG algorithm with cell-based scanning and simultaneous Support Vector Machine (SVM) calculation, cell-based pipeline architecture, and parallelized modules. To evaluate the effectiveness of our approach, the proposed architecture is implemented onto a FPGA prototyping board. Results show that the proposed architecture can generate HOG features and detect objects with 40MHz for SVGA resolution video (800 × 600pixels) at 72 frames per second (fps).
View full abstract
-
Shogo Nakaya, Makoto Miyamura, Noburo Sakimura, Yuichi Nakamura, Tadah ...
Article type: Regular Paper
Subject area: Architectural Low-Power Design
2013 Volume 6 Pages
52-59
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
Energy saving is currently one of the most important issues in the development of battery-powered wireless sensor nodes (WSNs). We have developed a non-volatile reconfigurable offloader for flexible and highly efficient processing on WSNs that uses NanoBridges (NBs), which are novel non-volatile and reprogrammable switching elements. Non-volatility is essential for the intermittent operation of WSNs due to the requirement of power-on without loading configuration data. We implemented a data compression algorithm on the offloader that reduces energy consumption during data transmission. Simulation results showed that the energy consumption on the offloader was 1/21 of that on an ultra-low power CPU.
View full abstract
-
Kazuhito Ito, Kazuhiko Kameda
Article type: Regular Paper
Subject area: Behavioral Synthesis
2013 Volume 6 Pages
60-70
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
In conditional processing, operations are executed conditionally based on the result of condition operations. While the speculative execution of conditional operations achieves higher processing speed, unnecessary energy may be consumed by the speculatively executed operations. In this paper, reduction of the energy consumption of conditional processing is considered for time and resource constrained processing. An efficient method to calculate the probability of operation execution is presented. Based on the probabilities of execution, a scheduling exploration with the simulated annealing and a heuristic scheduling algorithm are proposed to minimize the energy consumption of the conditional processing by reducing unnecessary speculative operations. The experimental results show 5% to 10% energy can be reduced by the proposed methods for the same configuration of resources.
View full abstract
-
Yuta Kato, Kenshu Seto
Article type: Short Paper
Subject area: System-Level Synthesis
2013 Volume 6 Pages
71-75
Published: 2013
Released on J-STAGE: February 15, 2013
JOURNAL
FREE ACCESS
Loop fusion is often necessary before successful application of high-level synthesis (HLS). Although promising loop optimization tools based on the polyhedral model such as Pluto have been proposed, they sometimes cannot fuse loops into fully nested loops. This paper proposes an effective loop transformation called Outer Loop Shifting (OLS) that facilitates successful loop fusion. With HLS, we found that the OLS generates hardware with 25% less execution cycles on average than that only by Pluto for four benchmark programs.
View full abstract
-
Huang-Chih Kuo, Youn-Long Lin
Article type: Invited Paper
Subject area: Architectural Design
2013 Volume 6 Pages
76-93
Published: 2013
Released on J-STAGE: August 05, 2013
JOURNAL
FREE ACCESS
Intra-frame encoding is useful for many video applications such as security surveillance, digital cinema, and video conferencing because it supports random access to every video frame for easy editing and has low computational complexity that results in low hardware cost. H.264/AVC, which is the most popular video coding standard today, also defines novel intra-coding tools to achieve high compression performance at the expense of significantly increased computational complexity. We present a VLSI design for H.264/AVC intra-frame encoder. The paper summaries several novel approaches to alleviate the performance bottleneck caused by the long data dependency loop among 4 × 4 luma blocks, integrate a high-performance hardwired CABAC entropy encoder, and apply a clock-gating technique to reduce power consumption. Synthesized with a TSMC 130nm CMOS cell library, our design requires 194.1K gates at 108MHz and consumes 19.8mW to encode 1080p (1920 × 1088) video sequences at 30 frames per second (fps). It also delivers the same video quality as the H.264/AVC reference software. We suggest a figure of merit called
Design Efficiency for fair comparison of different works. Experimental results show that the proposed design is more efficient than prior arts.
View full abstract
-
Yiqiang Sheng, Atsushi Takahashi
Article type: Regular Paper
Subject area: Packing
2013 Volume 6 Pages
94-100
Published: 2013
Released on J-STAGE: August 05, 2013
JOURNAL
FREE ACCESS
2D/3D packing optimization is facing big challenges to get better solution with less runtime. In this paper, we propose a new variation of adaptive simulated annealing (ASA) to solve packing problem. In the traditional ASA, the parameters that control temperature scheduling and random step selection are adjusted according to search progress. In the proposed ASA, a guide with adaptive probabilities is used to automatically select moving methods, including crossover to improve its efficiency. The interesting point is the traditional SA with crossover is inefficient, while the proposed ASA with crossover is efficient due to the adaptive guide. Based on the experiment using MCNC, ami49_X and ami98_3D benchmarks, the computational performance is considerably improved. In the case of area minimization, the results gotten by the proposed ASA are normally better than the published data of 2D packing. In the case of volume minimization for 3D packing, the results gotten by the proposed ASA are better than the data of traditional ASA and SA.
View full abstract
-
Hiroyuki Akasaka, Shin-ya Abe, Masao Yanagisawa, Nozomu Togawa
Article type: Regular Paper
Subject area: Behavioral Synthesis
2013 Volume 6 Pages
101-111
Published: 2013
Released on J-STAGE: August 05, 2013
JOURNAL
FREE ACCESS
With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, battery lifetime and device overheating are leading to major design problems hampering further LSI integration. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate interconnection delays and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose a high-level synthesis algorithm for huddle-based distributed-register architectures (HDR architectures) with clock gatings based on concurrency-oriented scheduling/functional unit binding. We assume coarse-grained clock gatings to huddles and we focus on the number of control steps, or
gating steps, at which we can apply the clock gating to registers in every huddle. We propose two methods to increase gating steps: One is that we try to schedule and bind operations to be performed at the same timing. By adjusting the clock gating timings in a high-level synthesis stage, we expect that we can enhance the effect of clock gatings more than applying clock gatings after logic synthesis. The other is that we try to synthesize huddles such that each of the synthesized huddles includes registers which have similar or the same clock gating timings. At this time, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 23.8% compared with several conventional algorithms.
View full abstract
-
Amila Akagic, Hideharu Amano
Article type: Short Paper
Subject area: Architectural Design
2013 Volume 6 Pages
112-121
Published: 2013
Released on J-STAGE: August 05, 2013
JOURNAL
FREE ACCESS
The IP-based storage systems often require bandwidth intensive access to storage devices, thus they exhibit high CPU utilization and low throughput when executed in a principally software implementation. This is especially evident for multi-Gbps networks where the impact of computational overhead is so pronounced that the current state of the art processors cannot take advantage of the capacity of the network. In this paper we propose new iSCSI Offload Engine architecture for high data rate storage networking. Based on our analysis of open source Open-iSCSI initiator, we offload the most computationally intensive and the most executed functions in a common case scenario, while other functions are implemented in a modified Open-iSCSI initiator on a general purpose processor. Our architecture overcomes the performance limitations imposed by a single processor which runs on 15x higher operating frequency than our accelerator. It exhibits very low CPU utilization of approximately 3% on the host CPU, which is 10-15x reduction compared with software implementation. The maximum transmission throughput is 7.81Gbps, while reception throughput is 7.34Gbps, which is 2 times speedup over software. The new architecture also shows comparable performance with Chelsio T110 ASIC-based HBA, and has more flexibility.
View full abstract
-
Yuko Hara-Azumi, Toshinobu Matsuba, Hiroyuki Tomiyama, Shinya Honda, H ...
Article type: Regular Paper
Subject area: Behavioral Synthesis
2013 Volume 6 Pages
122-126
Published: 2013
Released on J-STAGE: August 05, 2013
JOURNAL
FREE ACCESS
For FPGA-based designs generated through high-level synthesis (HLS), effects of resource sharing/unsharing on clock frequency, execution time, and area are quantitatively evaluated for several practically large benchmarks on multiple FPGA devices. Through experiments, we observed five important findings about resource sharing/unsharing, which are contrary to conventional wisdom or have not been sufficiently handled. These five findings will be useful for the further development and advance of the practical HLS technology.
View full abstract
-
Taiga Takata, Masayoshi Yoshimura, Yusuke Matsunaga
Article type: Regular Paper
Subject area: Logic-Level Reliability Analysis
2013 Volume 6 Pages
127-134
Published: 2013
Released on J-STAGE: August 05, 2013
JOURNAL
FREE ACCESS
This paper presents two acceleration techniques of fault simulation for analyzing soft error propagation in sequential circuits. One is an exact technique and the other is a heuristic technique. Since these techniques are independent on how the logic functions of circuits are evaluated, they can be combined with other techniques which accelerate evaluations of the logic functions of circuits, such as event-driven simulation, single pattern parallel fault propagation (SPPFP). Experimental results show that applying the exact technique makes a fault simulator with event-driven simulation and SPPFP 30-143 times faster. A fault simulator with the exact technique finished for several large-scale circuits in 4.6 hours or less, while a fault simulator without the exact technique could not finish for such circuits in 72 hours. Furthermore, applying the heuristic technique makes a fault simulator with the exact technique about 7-17 times faster with only 0.5-2.2% estimation error.
View full abstract
-
Bernard Schmidt, Carlos Villarraga, Thomas Fehmel, Jörg Bormann, ...
Article type: Regular Paper
Subject area: Special Issue on ASP-DAC 2013
2013 Volume 6 Pages
135-145
Published: 2013
Released on J-STAGE: August 05, 2013
JOURNAL
FREE ACCESS
This paper describes a method to generate a computational model for formal verification of hardware-dependent software in embedded systems. The computational model of the combined HW/SW system is a
program netlist (PN) consisting of
instruction cells connected in a directed acyclic graph that compactly represents all execution paths of the software. The model can be easily integrated into SAT-based verification environments such as those based on Bounded Model Checking (BMC). The proposed construction of the model allows for an efficient reasoning of the SAT solver over entire execution paths. Program netlists are compositional. The paper presents how they can be combined to model interrupt-driven systems. We demonstrate the efficiency of our approach by presenting experimental results from the formal verification of an industrial LIN (Local Interconnect Network) bus node, implemented as a software driver on a 32-bit RISC machine.
View full abstract