Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

George Kalokerinos¹⁶,
Vassilis Papaefstathiou¹⁶,
George Nikiforos¹⁶,
Stamatis Kavadias¹⁶,
Xiaojun Yang¹⁶,
Dionisios Pnevmatikatos¹⁶ &
…
Manolis Katevenis¹⁶

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 11225))

520 Accesses

Abstract

We present the hardware design and implementation of a local memory system for individual processors inside future chip multiprocessors (CMP). Our memory system supports both implicit communication via caches, and explicit communication via directly accessible local (“scratchpad”) memories and remote DMA (RDMA). We provide run-time configurability of the SRAM blocks that lie near each processor, so that portions of them operate as 2nd level (local) cache, while the rest operate as scratchpad. We also strive to merge the communication subsystems required by the cache and scratchpad into one integrated Network Interface (NI) and Cache Controller (CC), in order to economize on circuits. The processor interacts with the NI at user-level through virtualized command areas in scratchpad; the NI uses a similar access mechanism to provide efficient support for two hardware synchronization primitives: counters, and queues. We describe the NI design, the hardware cost, and the latencies of our FPGA-based prototype implementation that integrates four MicroBlaze processors, each with 64 KBytes of local SRAM, a crossbar NoC, and a DRAM controller. One-way, end-to-end, user-level communication completes within about 20 clock cycles for short transfer sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rhymes+: A Software Shared Virtual Memory System with Three Way Coherence Protocols on the Intel Single-Chip Cloud Computer

Memory Architectures

Notes

1.
Write-back policy can also be used, provided that coherence between L1 and L2 is maintained. However, the write-through policy simplifies coherence without any performance loss. The inclusion property assumed here, is more intuitive than exclusion that would require moving locked lines between the cache levels.

References

Banakar, R., Steinke, S., Lee, B., Balakrishnan, M., Marwedel, P.: Scratchpad memory: a design alternative for cache on-chip memory in embedded systems. In: Proceedings of 10th International Symposium on HW/SW Codesign (CODES), Colorado (2002)
Google Scholar
Bellens, P., Perez, J., Badia, R., Labarta, J.: CellSs: a programming model for the cell BE architecture. In: Proceedings of ACM/IEEE Conference on Supercomputing (SC), Tampa, Florida (2006)
Google Scholar
Bhoedjang, R., Ruhl, T., Bal, H.: User-level network interface protocols. IEEE Comput. 31(11), 53–60 (1998)
Article Google Scholar
Brewer, E., Chong, F., Liu, L., Sharma, S., Kubiatowicz, J.: Remote queues: exposing message queues for optimization and atomicity. In: Proceedings of 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA), St. Barbara (1995)
Google Scholar
Byrd, G., Delagi, B.: Streamline: cache-based message passing in scalable multiprocessors. In: Proceedings of the International Conference on Parallel Processing (ICPP) (1991)
Google Scholar
Byrd, G.T., Flynn, M.: Producer-consumer communication in distributed shared memory multiprocessors. Proc. IEEE 87(3), 456–466 (1999)
Article Google Scholar
Fatahalian, K., et al.: Sequoia: programming the memory hierarchy. In: Proceedings of ACM/IEEE Conference on Supercomputing (SC), Florida (2006)
Google Scholar
Heinlein, J., Gharachorloo, K., Dresser, S., Gupta, A.: Integration of message passing and shared memory in the Stanford FLASH multiprocessor. ACM SIGOPS Oper. Syst. Rev. 28(5), 38–50 (1994)
Article Google Scholar
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the cell multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)
Article Google Scholar
Kapasi, U., et al.: Programmable stream processors. IEEE Comput. 36(8), 54–62 (2003). https://doi.org/10.1109/MC.2003.1220582
Article Google Scholar
Katevenis, M.: Interprocessor communication seen as load-store instruction generalization. In: The Future of Computing, Essays in Memory of Stamatis Vassiliadis, Delft, The Netherlands (2007)
Google Scholar
Kavadias, S., Katevenis, M., Zampetakis, M., Nikolopoulos, D.: On-chip communication and synchronization with cache-integrated network interfaces. In: Proceedings of ACM International Conference on Computing Frontiers (CF 2010), Bertinoro, Italy (2010)
Google Scholar
Kubiatowicz, J., Agarwal, A.: Anatomy of a message in the Alewife multiprocessor. In: Proceedings of the ACM International Conference on Supercomputing (ICS), Tokyo (1993)
Google Scholar
Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W., Horowitz, M.: Smart memories: a modular reconfigurable architecture. In: Proceedings of the 27th International Symposium on Computer Architecture (ISCA) (2000)
Google Scholar
Markatos, E., Katevenis, M.: Telegraphos: high-performance networking for parallel processing on workstation clusters. In: Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture (HPCA), San Jose, CA USA (1996)
Google Scholar
Mukherjee, S., Falsafi, B., Hill, M., Wood, D.: Coherent network interfaces for fine-grain communication. In: Proceedings of the 23rd International Symposium on Computer Architecture (ISCA) (1996)
Google Scholar
Sankaralingam, K., et al.: Distributed microarchitectural protocols in the TRIPS prototype processor. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) (2006)
Google Scholar

Download references

Acknowledgments

This work was supported by the European Commission in the context of the projects SARC (FP6 IP #27648) and UNiSIX (Marie-Curie #509595). We also thank, for their assistance in designing the architecture and developing the prototype: Dimitris Nikolopoulos, Alex Ramirez, Georgi Gaydadjiev, Spyros Lyberis, Christos Sotiriou, Dimitris Tsaliagos, and Michael Ligerakis.

Author information

Authors and Affiliations

Institute of Computer Science, FORTH, Heraklion, Crete, Greece
George Kalokerinos, Vassilis Papaefstathiou, George Nikiforos, Stamatis Kavadias, Xiaojun Yang, Dionisios Pnevmatikatos & Manolis Katevenis

Authors

George Kalokerinos
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis Papaefstathiou
View author publications
You can also search for this author in PubMed Google Scholar
George Nikiforos
View author publications
You can also search for this author in PubMed Google Scholar
Stamatis Kavadias
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dionisios Pnevmatikatos
View author publications
You can also search for this author in PubMed Google Scholar
Manolis Katevenis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vassilis Papaefstathiou .

Editor information

Editors and Affiliations

Politecnico di Milano, Milan, Italy
Cristina Silvano
Delft University of Technology, Delft, The Netherlands
Koen Bertels
University of Wisconsin–Madison, Madison, WI, USA
Michael Schulte

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kalokerinos, G. et al. (2019). Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability. In: Silvano, C., Bertels, K., Schulte, M. (eds) Transactions on High-Performance Embedded Architectures and Compilers V. Lecture Notes in Computer Science(), vol 11225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58834-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-58834-5_6
Published: 23 February 2019
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58833-8
Online ISBN: 978-3-662-58834-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rhymes+: A Software Shared Virtual Memory System with Three Way Coherence Protocols on the Intel Single-Chip Cloud Computer

Memory Architectures

Memory Architectures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rhymes+: A Software Shared Virtual Memory System with Three Way Coherence Protocols on the Intel Single-Chip Cloud Computer

Memory Architectures

Memory Architectures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation