[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

A Configurable Shared Scratchpad Memory for GPU-like Processors

  • Conference paper
  • First Online:
Advances on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2016)

Abstract

During the last years Field Programmable Gate Arrays and Graphics Processing Units have become increasingly important for high-performance computing. In particular, a number of industrial solutions and academic projects are proposing design frameworks based on FPGA-implemented GPU-like compute units. Existing GPU-like core projects provide limited hardware support for shared scratchpad memory and particularly for the problem of bank conflicts, a major source of performance loss with many parallel kernels. In this paper, we present a configurable, GPU-like oriented scratchpad memory with built-in support for bank remapping. The core is fully synthetizable on FPGA with a contained hardware cost. We also validated the presented architecture with a cycle-accurate event-driven emulator written in C++ as well as an RTL simulator tool. Last, we demonstrated the impact of bank remapping and other parameters available with the proposed configurable shared scratchpad memory by evaluating the performance of two real-world parallelized kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 143.50
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 179.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. The Altera SDK for open computing language (OpenCL). https://www.altera.com/products/design-software/embedded-softwaredevelopers/opencl/overview.html

  2. Nvidia’s next generation cuda compute architecture. NVidia, Santa Clara, Calif, USA (2009)

    Google Scholar 

  3. An independent analysis of Altera’s FPGA floating-point DSP design flow. Berkeley Design Technology, Inc (2011)

    Google Scholar 

  4. Al-Dujaili, A., Deragisch, F., Hagiescu, A.,Wong,W.F.: Guppy: A GPU-like soft-core processor. In: Field-Programmable Technology (FPT), 2012 International Conference on, pp. 57–60 (2012)

    Google Scholar 

  5. Amato, F., Barbareschi, M., Casola, V., Mazzeo, A.: An FPGA-based smart classifier for decision support systems. Studies in Computational Intelligence 511, 289–299 (2014)

    Google Scholar 

  6. Amato, F., Fasolino, A., Mazzeo, A., Moscato, V., Picariello, A., Romano, S., Tramontana, P.: Ensuring semantic interoperability for e-health applications. In: Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2011, pp. 315–320 (2011)

    Google Scholar 

  7. Amato, F., Mazzeo, A., Penta, A., Picariello, A.: Building RDF ontologies from semistructured legal documents. pp. 997–1002 (2008)

    Google Scholar 

  8. Balasubramanian, R., Gangadhar, V., Guo, Z., Ho, C.H., Joseph, C., Menon, J., Drumond, M.P., Paul, R., Prasad, S., Valathol, P., Sankaralingam, K.: Enabling GPGPU low-level hardware explorations with MIAOW: An open-source RTL implementation of a GPGPU. ACM Trans. Archit. Code Optim. 12(2), 21:21:1–21:21:25 (2015)

    Google Scholar 

  9. Barbareschi, M., Del Prete, S., Gargiulo, F., Mazzeo, A., Sansone, C.: Decision tree-based multiple classifier systems: An FPGA perspective. In: International Workshop on Multiple Classifier Systems, pp. 194–205. Springer (2015)

    Google Scholar 

  10. Barbareschi, M., Iannucci, F., Mazzeo, A.: Automatic design space exploration of approximate algorithms for big data applications. In: 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 40–45. IEEE (2016)

    Google Scholar 

  11. Barbareschi, M., Iannucci, F., Mazzeo, A.: An extendible design exploration tool for supporting approximate computing techniques. In: 2016 International Conference on Design and Technology of Integrated Systems in Nanoscale Era (DTIS), pp. 1–6. IEEE (2016)

    Google Scholar 

  12. Bush, J., Dexter, P., Miller, T.N.: Nyami: a synthesizable GPU architectural model for generalpurpose and graphics-specific workloads. In: Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, pp. 173–182 (2015)

    Google Scholar 

  13. Chatterjee, S., et al.: Generating local addresses and communication sets for data-parallel programs. SIGPLAN Not. 28(7), 149–158 (1993)

    Google Scholar 

  14. Cilardo, A.: Exploring the potential of threshold logic for cryptography-related operations. IEEE Transactions on Computers 60(4), 452–462 (2011)

    Google Scholar 

  15. Cilardo, A., De Caro, D., Petra, N., Caserta, F., Mazzocca, N., Napoli, E., Strollo, A.: High speed speculative multipliers based on speculative carry-save tree. IEEE Transactions on Circuits and Systems I: Regular Papers 61(12), 3426–3435 (2014)

    Google Scholar 

  16. Cilardo, A., Durante, P., Lofiego, C., Mazzeo, A.: Early prediction of hardware complexity in HLL-to-HDL translation. pp. 483–488 (2010)

    Google Scholar 

  17. Cilardo, A., Gallo, L.: Improving multibank memory access parallelism with lattice-based partitioning. ACM Transactions on Architecture and Code Optimization (TACO) 11(4), 45 (2015)

    Google Scholar 

  18. Cilardo, A., Gallo, L., Mazzeo, A., Mazzocca, N.: Efficient and scalable OpenMP-based system-level design. pp. 988–991 (2013)

    Google Scholar 

  19. Coon, B., et al.: Shared memory with parallel access and access conflict resolution mechanism. U.S. Patent No. 8,108,625 (2012)

    Google Scholar 

  20. Farber, R.: CUDA application design and development. Elsevier (2011)

    Google Scholar 

  21. Fusella, E., Cilardo, A.: H2ONoC: A hybrid optical-electronic NoC based on hybrid topology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2016)

    Google Scholar 

  22. Fusella, E., Cilardo, A.: Minimizing power loss in optical networks-on-chip through application-specific mapping. Microprocessors and Microsystems (2016)

    Google Scholar 

  23. Kingyens, J., Steffan, J.: The potential for a GPU-like overlay architecture for FPGAs. International Journal of Reconfigurable Computing (2011)

    Google Scholar 

  24. Kuon, I., Rose, J.: Measuring the gap between FPGAs and ASICs. In: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, FPGA ’06, pp. 21–30. ACM, New York, NY, USA (2006)

    Google Scholar 

  25. Paranjape, K., Hebert, S., Masson, B.: Heterogeneous computing in the cloud: Crunching big data and democratizing HPC access for the life sciences. Intel Corporation (2010)

    Google Scholar 

  26. Pouchet, L.N.: Polybench: The polyhedral benchmark suite. http://www.cs.ucla.edu/pouchet/software/polybench (2012)

  27. Sarkar, S., et al.: Hardware accelerators for biocomputing: A survey. In: Proceedings of 2010 IEEE International Symposium on Circuits and Systems (2010)

    Google Scholar 

  28. Snyder, W., Wasson, P., Galbi, D.: Verilator (2007)

    Google Scholar 

  29. Wang, Y., Li, P., Cong, J.: Theory and algorithm for generalized memory partitioning in highlevel synthesis. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Fieldprogrammable Gate Arrays, FPGA ’14, pp. 199–208. ACM, New York, NY, USA (2014)

    Google Scholar 

  30. Wirbel, L.: Xilinx SDAccel: a unified development environment for tomorrow’s data center. The Linley Group Inc (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mirko Gagliardi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Cilardo, A., Gagliardi, M., Donnarumma, C. (2017). A Configurable Shared Scratchpad Memory for GPU-like Processors. In: Xhafa, F., Barolli, L., Amato, F. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2016. Lecture Notes on Data Engineering and Communications Technologies, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-49109-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49109-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49108-0

  • Online ISBN: 978-3-319-49109-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics