Instead of scaling up the frequency of a single core to increase performance, chip multiprocessors (CMPs) have emerged as the practical alternative to scale performance by leveraging parallelism as the means to meet the increasing demands of applications. As chip multiprocessors continue to scale to larger numbers of processing cores, the demands on the on-chip communication framework will grow to satisfy the data and communication requirements of future multithreaded applications. This problem is exacerbated by poor wire scaling, which increases the latency and power consumption of on-chip communication. In response, two alternative interconnects have emerged, both based on electromagnetic wave propagation and both with latency effectively limited by the speed of light: optical interconnect (OI) and RF interconnect (RF-I).
In the first part of this dissertation, we focus on the use of alternative interconnects in future many-core systems to provide performance and power benefit by reducing on-chip access latency. In most conventional NoCs, link bandwidths are allocated in a uniform way in order to provide sufficient bandwidth for varying traffic demands. By studying the communication demands in different applications, we observed that applications tend to exhibit diverse patterns of communication. We demonstrate the use of RF-I to adapt to these varying communication patterns by flexibly allocating RF-I bandwidth to the critical paths of communication. By allocating RF-I bandwidth between components that communicate frequently and using lower bandwidth in other parts of the NoC, we can provide NoC power savings without significant loss in performance.
In order to leverage the abundant processing resources available on-chip, future many-core systems will require an effective means of sharing data between the collaborati cores. Hence, a power-efficient, scalable, and coherent interconnect fabric is vital to scale application performance in the many-core era. We propose a scalable architecture to enable snooping-based coherence, by introducing a low-latency interconnect structure specialized for store traffic in addition to the regular baseline NoC for all other traffic. We see a need to separate store requests from the rest of the on-chip traffic to avoid the impact of stores on load latency and bandwidth. We demonstrate the performance and power advantage of our snooping-based cache coherence architecture.
As part of this dissertation, we also try to study the scalability of the two emerging alternative interconnect technologies, by providing a quantitative comparison of both OI and RF-I at the same technology generation. Ultimately, we will demonstrate where OI and RF-I will most likely be used for future designs. Our analysis will include on-chip communication, and chip-to-chip communication.
Recommendations
Improving parallel system performance by changing the arrangement of the network links
ICS '00: Proceedings of the 14th international conference on SupercomputingThe Midimew network is an excellent contender for implementing the communication subsystem of a high performance computer. This network is an optimal 2D topology in the sense there are no other symmetric direct networks of degree 4 with a lower average ...
A Torus-Based Hierarchical Optical-Electronic Network-on-Chip for Multiprocessor System-on-Chip
Networks-on-chip (NoCs) are emerging as a key on-chip communication architecture for multiprocessor systems-on-chip (MPSoCs). Optical communication technologies are introduced to NoCs in order to empower ultra-high bandwidth with low power consumption. ...
A Simulation Times Model of Multi-core Simulation
WCSE '09: Proceedings of the 2009 WRI World Congress on Software Engineering - Volume 01Chip multi-processor (CMP) increases processor throughput by duplicating resources for many threads. Due to the main frequency of a single processor approaching to limit, CMP is becoming more and more popular. However, it is not well studied how to ...