Frequent loop detection using efficient non-intrusive on-chip hardware
A Gordon-Ross, F Vahid - … of the 2003 international conference on …, 2003 - dl.acm.org
A Gordon-Ross, F Vahid
Proceedings of the 2003 international conference on Compilers, architecture …, 2003•dl.acm.orgDynamic software optimization methods are becoming increasingly popular for improving
software performance and power. The first step in dynamic optimization consists of detecting
frequently executed code, or" critical regions." Previous critical region detectors have been
targeted to desktop processors. We introduce a critical region detector targeted to
embedded processors, with the unique features of being very size and power efficient, and
being completely non-intrusive to the software's execution-features needed in timing …
software performance and power. The first step in dynamic optimization consists of detecting
frequently executed code, or" critical regions." Previous critical region detectors have been
targeted to desktop processors. We introduce a critical region detector targeted to
embedded processors, with the unique features of being very size and power efficient, and
being completely non-intrusive to the software's execution-features needed in timing …
Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or "critical regions." Previous critical region detectors have been targeted to desktop processors. We introduce a critical region detector targeted to embedded processors, with the unique features of being very size and power efficient, and being completely non-intrusive to the software's execution - features needed in timing-sensitive embedded systems. Our detector not only finds the critical regions, but also determines their relative frequencies, a potentially important feature for selecting among alternative dynamic optimization methods. Our detector uses a tiny cache coupled with a small amount of logic. We provide results of extensive explorations across seventeen embedded system benchmarks. We show that highly accurate results can be achieved with only a 0.02% power overhead and acceptable size overhead. Our detector is currently being used as part of a dynamic hardware/software partitioning approach, but is applicable to a wide-variety of situations.
ACM Digital Library