A Configurable Shared Scratchpad Memory for GPU-like Processors

Alessandro Cilardo⁵,
Mirko Gagliardi⁵ &
Ciro Donnarumma⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 1))

Included in the following conference series:

International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

1738 Accesses
1 Citations

Abstract

During the last years Field Programmable Gate Arrays and Graphics Processing Units have become increasingly important for high-performance computing. In particular, a number of industrial solutions and academic projects are proposing design frameworks based on FPGA-implemented GPU-like compute units. Existing GPU-like core projects provide limited hardware support for shared scratchpad memory and particularly for the problem of bank conflicts, a major source of performance loss with many parallel kernels. In this paper, we present a configurable, GPU-like oriented scratchpad memory with built-in support for bank remapping. The core is fully synthetizable on FPGA with a contained hardware cost. We also validated the presented architecture with a cycle-accurate event-driven emulator written in C++ as well as an RTL simulator tool. Last, we demonstrated the impact of bank remapping and other parameters available with the proposed configurable shared scratchpad memory by evaluating the performance of two real-world parallelized kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 143.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 179.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Hardware Support for Scratchpad Memory Transactions on GPU Architectures

Accelerated bulk memory operations on heterogeneous multi-core systems

Article 08 September 2018

Observed Memory Bandwidth and Power Usage on FPGA Platforms with OneAPI and Vitis HLS: A Comparison with GPUs

References

The Altera SDK for open computing language (OpenCL). https://www.altera.com/products/design-software/embedded-softwaredevelopers/opencl/overview.html
Nvidia’s next generation cuda compute architecture. NVidia, Santa Clara, Calif, USA (2009)
Google Scholar
An independent analysis of Altera’s FPGA floating-point DSP design flow. Berkeley Design Technology, Inc (2011)
Google Scholar
Al-Dujaili, A., Deragisch, F., Hagiescu, A.,Wong,W.F.: Guppy: A GPU-like soft-core processor. In: Field-Programmable Technology (FPT), 2012 International Conference on, pp. 57–60 (2012)
Google Scholar
Amato, F., Barbareschi, M., Casola, V., Mazzeo, A.: An FPGA-based smart classifier for decision support systems. Studies in Computational Intelligence 511, 289–299 (2014)
Google Scholar
Amato, F., Fasolino, A., Mazzeo, A., Moscato, V., Picariello, A., Romano, S., Tramontana, P.: Ensuring semantic interoperability for e-health applications. In: Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2011, pp. 315–320 (2011)
Google Scholar
Amato, F., Mazzeo, A., Penta, A., Picariello, A.: Building RDF ontologies from semistructured legal documents. pp. 997–1002 (2008)
Google Scholar
Balasubramanian, R., Gangadhar, V., Guo, Z., Ho, C.H., Joseph, C., Menon, J., Drumond, M.P., Paul, R., Prasad, S., Valathol, P., Sankaralingam, K.: Enabling GPGPU low-level hardware explorations with MIAOW: An open-source RTL implementation of a GPGPU. ACM Trans. Archit. Code Optim. 12(2), 21:21:1–21:21:25 (2015)
Google Scholar
Barbareschi, M., Del Prete, S., Gargiulo, F., Mazzeo, A., Sansone, C.: Decision tree-based multiple classifier systems: An FPGA perspective. In: International Workshop on Multiple Classifier Systems, pp. 194–205. Springer (2015)
Google Scholar
Barbareschi, M., Iannucci, F., Mazzeo, A.: Automatic design space exploration of approximate algorithms for big data applications. In: 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 40–45. IEEE (2016)
Google Scholar
Barbareschi, M., Iannucci, F., Mazzeo, A.: An extendible design exploration tool for supporting approximate computing techniques. In: 2016 International Conference on Design and Technology of Integrated Systems in Nanoscale Era (DTIS), pp. 1–6. IEEE (2016)
Google Scholar
Bush, J., Dexter, P., Miller, T.N.: Nyami: a synthesizable GPU architectural model for generalpurpose and graphics-specific workloads. In: Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, pp. 173–182 (2015)
Google Scholar
Chatterjee, S., et al.: Generating local addresses and communication sets for data-parallel programs. SIGPLAN Not. 28(7), 149–158 (1993)
Google Scholar
Cilardo, A.: Exploring the potential of threshold logic for cryptography-related operations. IEEE Transactions on Computers 60(4), 452–462 (2011)
Google Scholar
Cilardo, A., De Caro, D., Petra, N., Caserta, F., Mazzocca, N., Napoli, E., Strollo, A.: High speed speculative multipliers based on speculative carry-save tree. IEEE Transactions on Circuits and Systems I: Regular Papers 61(12), 3426–3435 (2014)
Google Scholar
Cilardo, A., Durante, P., Lofiego, C., Mazzeo, A.: Early prediction of hardware complexity in HLL-to-HDL translation. pp. 483–488 (2010)
Google Scholar
Cilardo, A., Gallo, L.: Improving multibank memory access parallelism with lattice-based partitioning. ACM Transactions on Architecture and Code Optimization (TACO) 11(4), 45 (2015)
Google Scholar
Cilardo, A., Gallo, L., Mazzeo, A., Mazzocca, N.: Efficient and scalable OpenMP-based system-level design. pp. 988–991 (2013)
Google Scholar
Coon, B., et al.: Shared memory with parallel access and access conflict resolution mechanism. U.S. Patent No. 8,108,625 (2012)
Google Scholar
Farber, R.: CUDA application design and development. Elsevier (2011)
Google Scholar
Fusella, E., Cilardo, A.: H2ONoC: A hybrid optical-electronic NoC based on hybrid topology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2016)
Google Scholar
Fusella, E., Cilardo, A.: Minimizing power loss in optical networks-on-chip through application-specific mapping. Microprocessors and Microsystems (2016)
Google Scholar
Kingyens, J., Steffan, J.: The potential for a GPU-like overlay architecture for FPGAs. International Journal of Reconfigurable Computing (2011)
Google Scholar
Kuon, I., Rose, J.: Measuring the gap between FPGAs and ASICs. In: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, FPGA ’06, pp. 21–30. ACM, New York, NY, USA (2006)
Google Scholar
Paranjape, K., Hebert, S., Masson, B.: Heterogeneous computing in the cloud: Crunching big data and democratizing HPC access for the life sciences. Intel Corporation (2010)
Google Scholar
Pouchet, L.N.: Polybench: The polyhedral benchmark suite. http://www.cs.ucla.edu/pouchet/software/polybench (2012)
Sarkar, S., et al.: Hardware accelerators for biocomputing: A survey. In: Proceedings of 2010 IEEE International Symposium on Circuits and Systems (2010)
Google Scholar
Snyder, W., Wasson, P., Galbi, D.: Verilator (2007)
Google Scholar
Wang, Y., Li, P., Cong, J.: Theory and algorithm for generalized memory partitioning in highlevel synthesis. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Fieldprogrammable Gate Arrays, FPGA ’14, pp. 199–208. ACM, New York, NY, USA (2014)
Google Scholar
Wirbel, L.: Xilinx SDAccel: a unified development environment for tomorrow’s data center. The Linley Group Inc (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Naples Federico II and Centro Regionale ICT (CeRICT), Naples, Italy
Alessandro Cilardo & Mirko Gagliardi
University of Naples Federico II, Naples, Italy
Ciro Donnarumma

Authors

Alessandro Cilardo
View author publications
You can also search for this author in PubMed Google Scholar
Mirko Gagliardi
View author publications
You can also search for this author in PubMed Google Scholar
Ciro Donnarumma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirko Gagliardi .

Editor information

Editors and Affiliations

Campus Nord,Ed. Omega (Room 109), Technical University of Catalonia Campus Nord,Ed. Omega (Room 109), Barcelona, Spain
Fatos Xhafa
Fukuoka Institute of Technology , Fukuoka, Japan
Leonard Barolli
Federico II, Università degli Studi di Napoli Federico II, Napoli, Italy
Flora Amato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cilardo, A., Gagliardi, M., Donnarumma, C. (2017). A Configurable Shared Scratchpad Memory for GPU-like Processors. In: Xhafa, F., Barolli, L., Amato, F. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2016. Lecture Notes on Data Engineering and Communications Technologies, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-49109-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-49109-7_1
Published: 22 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49108-0
Online ISBN: 978-3-319-49109-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Configurable Shared Scratchpad Memory for GPU-like Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Hardware Support for Scratchpad Memory Transactions on GPU Architectures

Accelerated bulk memory operations on heterogeneous multi-core systems

Observed Memory Bandwidth and Power Usage on FPGA Platforms with OneAPI and Vitis HLS: A Comparison with GPUs

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Configurable Shared Scratchpad Memory for GPU-like Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Hardware Support for Scratchpad Memory Transactions on GPU Architectures

Accelerated bulk memory operations on heterogeneous multi-core systems

Observed Memory Bandwidth and Power Usage on FPGA Platforms with OneAPI and Vitis HLS: A Comparison with GPUs

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation