Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

MICRO 40: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

December 2007

2007 Proceeding

Publisher:

IEEE Computer Society
1730 Massachusetts Ave., NW Washington, DC
United States

Conference:

Micro-40: The 40th Annual IEEE/ACM International Symposium on MicroarchitectureDecember 1 - 5, 2007

ISBN:

978-0-7695-3047-5

Published:

01 December 2007

Sponsors:

SIGMICRO

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Abstract

No abstract available.

Select All

Export Citations Save to Binder

Article

Message from the General Chairs

Page viii

Article

Message from the Program Chairs

Page ixhttps://doi.org/10.1109/MICRO.2007.25

Article

Organizing Committee

Page xhttps://doi.org/10.1109/MICRO.2007.31

Article

Reviewers

Page xihttps://doi.org/10.1109/MICRO.2007.34

Article

Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Pages 3–14https://doi.org/10.1109/MICRO.2007.30

A significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor perfor- mance, power dissipation, and thermal management strategies. There are a number of interconnect ...

Article

Process Variation Tolerant 3T1D-Based Cache Architectures

Pages 15–26https://doi.org/10.1109/MICRO.2007.33

Process variations will greatly impact the stability, leakage power consumption, and performance of future microprocessors. These variations are especially detrimental to 6T SRAM (6-transistor static memory) structures and will become critical with ...

Article

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing

Pages 27–42https://doi.org/10.1109/MICRO.2007.27

Parameter variation is detrimental to a processor's frequency and leakage power. One proposed technique to mitigate it is Fine-Grain Body Biasing (FGBB), where different parts of the processor chip are given a voltage bias that changes the speed and ...

Article

Optimal versus Heuristic Global Code Scheduling

Sebastian Winkel

Pages 43–55

We present a global instruction scheduler based on inte- ger linear programming (ILP) that was implemented exper- imentally in the Intel Itanium® product compiler. It features virtually the full scale of known EPIC scheduling optimiza- tions, more than ...

Article

Global Multi-Threaded Instruction Scheduling

Pages 56–68https://doi.org/10.1109/MICRO.2007.17

Recently, the microprocessor industry has moved toward chip multiprocessor (CMP) designs as a means of utiliz- ing the increasing transistor counts in the face of physi- cal and micro-architectural limitations. Despite this move, CMPs do not directly ...

Article

Revisiting the Sequential Programming Model for Multi-Core

Pages 69–84https://doi.org/10.1109/MICRO.2007.35

Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased ...

Article

Penelope: The NBTI-Aware Processor

Pages 85–96

Transistors consist of lower number of atoms with every technology generation. Such atoms may be displaced due to the stress caused by high temperature, frequency and current, leading to failures. NBTI (negative bias temperature instability) is one of ...

Article

Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation

Pages 97–108https://doi.org/10.1109/MICRO.2007.39

As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common. Such de- fects are bound to hinder the correct operation of future processor systems, unless new online techniques become available to ...

Article

Self-calibrating Online Wearout Detection

Pages 109–122https://doi.org/10.1109/MICRO.2007.37

Technology scaling, characterized by decreasing feature size, thin- ning gate oxide, and non-ideal voltage scaling, will become a major hindrance to microprocessor reliability in future technology gener- ations. Physical analysis of device failure ...

Article

Implementing Signatures for Transactional Memory

Pages 123–133

Transactional Memory (TM) systems must track the read and write sets--items read and written during a transaction--to detect conflicts among concurrent trans- actions. Several TMs use signatures, which summarize unbounded read/write sets in bounded ...

Article

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

Pages 134–145https://doi.org/10.1109/MICRO.2007.38

DRAMs require periodic refresh for preserving data stored in them. The refresh interval for DRAMs depends on the vendor and the de- sign technology they use. For each refresh in a DRAM row, the stored information in each cell is read out and then ...

Article

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Pages 146–160https://doi.org/10.1109/MICRO.2007.40

DRAM memory is a major resource shared among cores in a chip multiprocessor (CMP) system. Memory requests from different threads can interfere with each other. Existing memory access scheduling techniques try to optimize the overall data throughput ...

Article

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Pages 161–171

Sincetheintroductionofthe10GbEstandardin2002,theabilityofgeneralpurposeprocessorstoefficientlyprocessnetworktrafficwithcommonprotocolssuchasTCP/IPhasbeenrevisitedandcriticallyevaluated.However,recentcommerciallyavailableprocessorssuchasIntel®...

Article

Flattened Butterfly Topology for On-Chip Networks

Pages 172–182https://doi.org/10.1109/MICRO.2007.15

With the trend towards increasing number of cores in chip multiprocessors, the on-chip interconnect that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip interconnection net- works and ...

Article

Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

Pages 183–196https://doi.org/10.1109/MICRO.2007.44

In today's digital world, computer security issues have become increasingly important. In particular, researchers have proposed designs for secure processors which utilize hardware-based mem- ory encryption and integrity verification to protect the ...

Article

Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding

Pages 197–209

In deep sub-micron ICs, growing amounts of on- die memory and scaling effects make embedded memories increasingly vulnerable to reliability and yield problems. As scaling progresses, soft and hard errors in the memory system will increase and single ...

Article

Argus: Low-Cost, Comprehensive Error Detection in Simple Cores

Pages 210–222https://doi.org/10.1109/MICRO.2007.8

We have developed Argus, a novel approach for pro- viding low-cost, comprehensive error detection for simple cores. The key to Argus is that the operation of a von Neumann core consists of four fundamental tasks--control flow, dataflow, computation, and ...

Article

Leveraging 3D Technology for Improved Reliability

Pages 223–235

Aggressive technology scaling over the years has helped improve processor performance but has caused a reduc- tion in processor reliability. Shrinking transistor sizes and lower supply voltages have increased the vulnerability of computer systems ...

Article

Effective Optimistic-Checker Tandem Core Design through Architectural Pruning

Pages 236–248https://doi.org/10.1109/MICRO.2007.13

Design complexity is rapidly becoming a limiting fac- tor in the design of modern, high-performance micro- processors. This paper introduces an optimization tech- nique to improve the efficiency of complex processors. Us- ing a new metric ( µUtilization)...

Article

FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

Pages 249–261

This paper describes FAST, a novel simulation methodol- ogy that can produce simulators that (i) are orders of mag- nitude faster than comparable simulators, (ii) are cycle- accurate, (iii) model the entire system running unmodified applications and ...

Article

Microarchitectural Design Space Exploration Using an Architecture-Centric Approach

Pages 262–271https://doi.org/10.1109/MICRO.2007.26

The microarchitectural design space of a new processor is too large for an architect to evaluate in its entirety. Even with the use of statistical simulation, evaluation of a single configuration can take excessive time due to the need to run a set of ...

Article

Informed Microarchitecture Design Space Exploration Using Workload Dynamics

Pages 274–285https://doi.org/10.1109/MICRO.2007.21

Program runtime characteristics exhibit significant variation. As microprocessor architectures become more complex, their efficiency depends on the capability of adapting with workload dynamics. Moreover, with the approaching billion-transistor ...

Article

Time Interpolation: So Many Metrics, So Few Registers

Pages 286–300https://doi.org/10.1109/MICRO.2007.42

The performance of computer systems varies over the course of their execution. A system may perform well dur- ing some parts of its execution and poorly during others. To understand why a system behaves in this way performance analysts need to study its ...

Article

Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications

Yuan Chou

Pages 301–313https://doi.org/10.1109/MICRO.2007.23

The performance of many important commercial workloads, such as on-line transaction processing, is limited by the frequent stalls due to off-chip instruction and data accesses. These applica- tions are characterized by irregular control flow and complex ...

Article

A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

Pages 314–327https://doi.org/10.1109/MICRO.2007.5

Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduction and prefetching suggest that further useful patterns emerge with a ...

Article

Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors

Pages 327–342https://doi.org/10.1109/MICRO.2007.43

Snoopy cache coherence can be implemented in any physical network topology by embedding a logical unidirectional ring in the network. Control messages are forwarded using the ring, while other messages can use any path. While the resulting coherence ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Acceptance Rates

MICRO 40 Paper Acceptance Rate 35 of 166 submissions, 21%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Year	Submitted	Accepted	Rate
MICRO-48	283	61	22%
MICRO-47	279	53	19%
MICRO-46	239	39	16%
MICRO 41	210	40	19%
MICRO 40	166	35	21%
MICRO 39	174	42	24%
MICRO 38	147	29	20%
MICRO 37	158	29	18%
MICRO 36	134	35	26%
MICRO 33	110	31	28%
MICRO 32	131	27	21%
MICRO 31	108	28	26%
MICRO 30	103	35	34%
Overall	2,242	484	22%

MICRO

Sections

Message from the General Chairs

Message from the Program Chairs

Organizing Committee

Reviewers

Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Process Variation Tolerant 3T1D-Based Cache Architectures

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing

Optimal versus Heuristic Global Code Scheduling

Global Multi-Threaded Instruction Scheduling

Revisiting the Sequential Programming Model for Multi-Core

Penelope: The NBTI-Aware Processor

Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation

Self-calibrating Online Wearout Detection

Implementing Signatures for Transactional Memory

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Flattened Butterfly Topology for On-Chip Networks

Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding

Argus: Low-Cost, Comprehensive Error Detection in Simple Cores

Leveraging 3D Technology for Improved Reliability

Effective Optimistic-Checker Tandem Core Design through Architectural Pruning

FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

Microarchitectural Design Space Exploration Using an Architecture-Centric Approach

Informed Microarchitecture Design Space Exploration Using Workload Dynamics

Time Interpolation: So Many Metrics, So Few Registers

Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications

A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors

LICS '20: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science

LICS '16: Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science

LICS '18: Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science

Acceptance Rates