More Web Proxy on the site http://driver.im/

research-article

Aérgia: exploiting packet latency slack in on-chip networks

Authors:

Reetuparna Das,

Thomas Moscibroda,

Chita R. DasAuthors Info & Claims

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Pages 106 - 116

https://doi.org/10.1145/1815961.1815976

Published: 19 June 2010 Publication History

Abstract

Traditional Network-on-Chips (NoCs) employ simple arbitration strategies, such as round-robin or oldest-first, to decide which packets should be prioritized in the network. This is counter-intuitive since different packets can have very different effects on system performance due to, e.g., different level of memory-level parallelism (MLP) of applications. Certain packets may be performance-critical because they cause the processor to stall, whereas others may be delayed for a number of cycles with no effect on application-level performance as their latencies are hidden by other outstanding packets'latencies. In this paper, we define slack as a key measure that characterizes the relative importance of a packet. Specifically, the slack of a packet is the number of cycles the packet can be delayed in the network with no effect on execution time. This paper proposes new router prioritization policies that exploit the available slack of interfering packets in order to accelerate performance-critical packets and thus improve overall system performance. When two packets interfere with each other in a router, the packet with the lower slack value is prioritized. We describe mechanisms to estimate slack, prevent starvation, and combine slack-based prioritization with other recently proposed application-aware prioritization mechanisms.

We evaluate slack-based prioritization policies on a 64-core CMP with an 8x8 mesh NoC using a suite of 35 diverse applications. For a representative set of case studies, our proposed policy increases average system throughput by 21.0% over the commonlyused round-robin policy. Averaged over 56 randomly-generated multiprogrammed workload mixes, the proposed policy improves system throughput by 10.3%, while also reducing application-level unfairness by 30.8%.

References

[1]

N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W. king Su. Myrinet - A Gigabit-per-Second Local-Area Network. IEEE Micro, 1995.

Digital Library

[2]

E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny. QNoC: QoS architecture and design process for network on chip. Journal of Systems Arch., 2004.

Digital Library

[3]

E. Bolotin, Z. Guz, I. Cidon, R. Ginosar, and A. Kolodny. The Power of Priority: NoC Based Distributed Cache Coherency. In NOCS'07, 2007.

Digital Library

[4]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA-11, 2005.

Digital Library

[5]

J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS-21, 2007.

Digital Library

[6]

A. A. Chien and J. H. Kim. Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks. ISCA-23, 1996.

[7]

W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2003.

Digital Library

[8]

R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application-Aware Prioritization Mechanisms for On-Chip Networks. In MICRO-42, 2009.

Digital Library

[9]

A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queueing algorithm. In SIGCOMM, 1989.

Digital Library

[10]

J. Dundas and T. Mudge. Improving data cache performance by pre-executing instructions under a cache miss. In ICS-11, 1997.

Digital Library

[11]

E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems. In ASPLOS-XV, 2010.

Digital Library

[12]

S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, May-June 2008.

Digital Library

[13]

B. Fields, R. Bodík, and M. Hill. Slack: Maximizing performance under technological constraints. In ISCA-29, 2002.

Digital Library

[14]

B. Fields, S. Rubin, and R. Bodík. Focusing processor policies via critical-path prediction. In ISCA-28, 2001.

Digital Library

[15]

D. Garcia and W. Watson. Servernet II. Parallel Computing, Routing, and Communication Workshop, June 1997.

[16]

A. Glew. MLP Yes! ILP No! Memory Level Parallelism, or, Why I No Longer Worry About IPC. In ASPLOS Wild and Crazy Ideas Session, 1998.

[17]

B. Grot, S. W. Keckler, and O. Mutlu. Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip. In MICRO-42, 2009.

Digital Library

[18]

L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on cmps: caches as a shared resource. In PACT-15, 2006.

Digital Library

[19]

Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers. In HPCA-16, 2010.

[20]

D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981.

Digital Library

[21]

J. W. Lee, M. C. Ng, and K. Asanovic. Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks. In ISCA-35, 2008.

Digital Library

[22]

O. Mutlu, H. Kim, and Y. N. Patt. Efficient runahead execution: Power-efficient memory latency tolerance. IEEE Micro, 2006.

Digital Library

[23]

O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007.

Digital Library

[24]

O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. In ISCA-35, 2008.

Digital Library

[25]

O. Mutlu, J. Stark, C. Wilkerson, and Y. Patt. Runahead execution: an alternative to very large instruction windows for out-of-order processors. In HPCA-9, 2003.

Digital Library

[26]

K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO-39, 2006.

Digital Library

[27]

V. G. Oklobdzija and R. K. Krishnamurthy. Energy-Delay Characteristics of CMOS Adders, High-Performance Energy-Efficient Microprocessor Design, chapter 6. Springer US, 2006.

[28]

H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO-37, 2004.

Digital Library

[29]

M. Qureshi, D. Lynch, O. Mutlu, and Y. Patt. A Case for MLP-Aware Cache Replacement. In ISCA-33, 2006.

Digital Library

[30]

M. Qureshi and Y. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In MICRO-39, 2006.

Digital Library

[31]

E. Rijpkema, K. Goossens, A. Radulescu, J. Dielissen, J. van Meerbergen, P. Wielage, and E. Waterlander. Trade-offs in the design of a router with both guaranteed and best-effort services for networks on chip. DATE, 2003.

Digital Library

[32]

S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. In MICRO-31, 1998.

Digital Library

[33]

S. Subramaniam, A. Bracy, H. Wang, and G. Loh. Criticality-based optimizations for efficient load processing. In HPCA-15, 2009.

[34]

T. J. Teorey and T. B. Pinkerton. A comparative analysis of disk scheduling policies. Communications of the ACM, 1972.

Digital Library

[35]

R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 1967.

Digital Library

[36]

T. Y. Yeh and Y. N. Patt. Two-level adaptive training branch prediction. In MICRO-24, 1991.

Digital Library

[37]

K. H. Yum, E. J. Kim, and C. Das. QoS provisioning in clusters: an investigation of router and NIC design. In ISCA-28, 2001.

Digital Library

[38]

L. Zhang. Virtual clock: a new traffic control algorithm for packet switching networks. SIGCOMM, 1990.

Digital Library

Cited By

Liao QLin J(2024)TacVar: Tackling Variability in Short-Interval Timing Measurements on X86 Processors2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00062(496-506)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00062
Mutlu O(2021)Intelligent Architectures for Intelligent Computing Systems2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474073(318-323)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474073
Gunasekaran JThinakaran PNachiappan NKandemir MDas C(2020)FiferProceedings of the 21st International Middleware Conference10.1145/3423211.3425683(280-295)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3423211.3425683
Show More Cited By

Index Terms

Aérgia: exploiting packet latency slack in on-chip networks
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
    2. Parallel architectures
      1. Interconnection architectures

Recommendations

Aérgia: exploiting packet latency slack in on-chip networks
ISCA '10

Traditional Network-on-Chips (NoCs) employ simple arbitration strategies, such as round-robin or oldest-first, to decide which packets should be prioritized in the network. This is counter-intuitive since different packets can have very different ...
Application-aware prioritization mechanisms for on-chip networks
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Network-on-Chips (NoCs) are likely to become a critical shared resource in future many-core processors. The challenge is to develop policies and mechanisms that enable multiple applications to efficiently and fairly share the network, to improve system ...
Aérgia: A Network-on-Chip Exploiting Packet Latency Slack

A traditional Network-on-Chip (NoC) employs simple arbitration strategies, such as round robin or oldest first, which treat packets equally regardless of the source applications' characteristics. This is suboptimal because packets can have different ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

June 2010

520 pages

ISBN:9781450300537

DOI:10.1145/1815961

General Chair:
André Seznec
INRIA Rennes
,
Program Chairs:
Uri Weiser
Technion
,
Ronny Ronen
Intel

ACM SIGARCH Computer Architecture News Volume 38, Issue 3
ISCA '10
June 2010
508 pages
ISSN:0163-5964
DOI:10.1145/1816038
Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '10

Sponsor:

SIGARCH

ISCA '10: The 37th Annual International Symposium on Computer Architecture

June 19 - 23, 2010

Saint-Malo, France

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

123
Total Citations
View Citations
763
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)4

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liao QLin J(2024)TacVar: Tackling Variability in Short-Interval Timing Measurements on X86 Processors2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00062(496-506)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00062
Mutlu O(2021)Intelligent Architectures for Intelligent Computing Systems2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474073(318-323)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474073
Gunasekaran JThinakaran PNachiappan NKandemir MDas C(2020)FiferProceedings of the 21st International Middleware Conference10.1145/3423211.3425683(280-295)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3423211.3425683
Yin JSethumurugan SEckert YPatel CSmith AMorton EOskin MEnright Jerger NLoh G(2020)Experiences with ML-Driven Design: A NoC Case Study2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00058(637-648)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00058
Lin ZDai HMantor MZhou H(2019)Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel ExecutionACM Transactions on Architecture and Code Optimization10.1145/332612416:3(1-27)Online publication date: 17-Jun-2019
https://dl.acm.org/doi/10.1145/3326124
Das ABabu SJose JJose SPalesi MLu ZVangal SXu JBogdan P(2018)Critical packet prioritisation by slack-aware re-routing in on-chip networksProceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip10.5555/3306619.3306631(1-8)Online publication date: 4-Oct-2018
https://dl.acm.org/doi/10.5555/3306619.3306631
Kislal OKotra JTang XKandemir MJung M(2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192386
Besta MHassan SYalamanchili SAusavarungnirun RMutlu OHoefler T(2018)Slim NoCACM SIGPLAN Notices10.1145/3296957.317715853:2(43-55)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3177158
Rahmani ADonyanavard BMück TMoazzemi KJantsch AMutlu ODutt N(2018)SPECTRACM SIGPLAN Notices10.1145/3296957.317319953:2(169-183)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173199
Ausavarungnirun RMiller VLandgraf JGhose SGandhi JJog ARossbach CMutlu O(2018)MASKACM SIGPLAN Notices10.1145/3296957.317316953:2(503-518)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173169
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents