[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Source-to-Source Optimization for HLS

  • Chapter
  • First Online:
FPGAs for Software Programmers

Abstract

This chapter describes the source code optimization techniques and automation tools for FPGA design with high-level synthesis (HLS) design flow. HLS has lifted the design abstraction from RTL to C/C++, but in practice extensive source code rewriting is often required to achieve a good design using HLS—especially when the design space is too large to determine the proper design options in advance. In addition, this code rewriting requires not only the knowledge of hardware microarchitecture design, but also familiarity with the coding style for the high-level synthesis tools. Automatic source-to-source transformation techniques have been applied in software compilation and optimization for a long time. They can also greatly benefit the FPGA accelerator design in a high-level synthesis design flow. In general, source-to-source optimization for FPGA will be much more complex and challenging than that for CPU software because of the much larger design space in microarchitecture choices combined with temporal/spatial resource allocation. The goal of source-to-source transformation is to reduce or eliminate the design abstraction gap between software/algorithm development and existing HLS design flows. This will enable the fully automated FPGA design flows for software developers, which is especially important for deploying FPGAs in data centers, so that many software developers can efficiently use FPGAs with minimal effort for acceleration.

This work was performed while the author J. Cong served as the Chief Scientific Advisor of Falcon Computing Solutions Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 79.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 99.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. S. Aditya, V. Kathail, Algorithmic synthesis using PICO: an integrated framework for application engine synthesis and verification from high level C algorithms, High-Level Synthesis: From Algorithm to Digital Circuit, Springer Netherlands, 2008, Chap. 4, pp. 53–74.

    Google Scholar 

  2. C. Bastoul. Code generation in the polyhedral model is easier than you think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 7–16. IEEE Computer Society, 2004.

    Google Scholar 

  3. C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

    MATH  Google Scholar 

  4. A. Cilardo and L. Gallo. Improving multibank memory access parallelism with lattice-based partitioning. ACM Trans. Archit. Code Optim., 11(4):45:1–45:25, January 2015.

    Google Scholar 

  5. J. Cong, M. Huang, B. Liu, P. Zhang, and Y. Zou. Combining module selection and replication for throughput-driven streaming programs. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’12, pages 1018–1023, San Jose, CA, USA, 2012. EDA Consortium.

    Google Scholar 

  6. J. Cong, M. Huang, and P. Zhang. Combining computation and communication optimizations in system synthesis for streaming applications. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA ’14, pages 213–222, New York, NY, USA, 2014. ACM.

    Google Scholar 

  7. J. Cong, W. Jiang, B. Liu, and Y. Zou. Automatic memory partitioning and scheduling for throughput and power optimization. In Proceedings of the 2009 International Conference on Computer-Aided Design, ICCAD ’09, pages 697–704, New York, NY, USA, 2009. ACM.

    Google Scholar 

  8. J. Cong, W. Jiang, B. Liu, and Y. Zou. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(2):15, 2011.

    Google Scholar 

  9. J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-level synthesis for FPGAs: From prototyping to deployment. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(4):473–491, 2011.

    Article  Google Scholar 

  10. J. Cong, P. Zhang, and Y. Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Proceedings of the 49th Annual Design Automation Conference, pages 1233–1238. ACM, 2012.

    Google Scholar 

  11. P. Feautrier. Some efficient solutions to the affine scheduling problem. part ii. multidimensional time. International journal of parallel programming, 21(6):389–420, 1992.

    Google Scholar 

  12. S. Gupta, R. K. Gupta, N. D. Dutt, and A. Nicolau. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 9(4):441–470, October 2004.

    Article  Google Scholar 

  13. A. Hagiescu, W.-F. Wong, D. F. Bacon, and R. Rabbah. A computing origami: Folding streams in FPGAs. In Design Automation Conference, 2009. DAC’09. 46th ACM/IEEE, pages 282–287. IEEE, 2009.

    Google Scholar 

  14. LLVM. LLVM - Low Level Virtual Machine, 2015. http://www.llvm.org [Online; accessed 1-April].

  15. OpenAcc. OpenACC directives for accelerators, 2015. http://www.openacc-standard.org/ [Online; accessed 4-August].

  16. OpenMP. The OpenMP API specification for parallel programming, 2015. http://openmp.org/ [Online; accessed 4-August].

  17. L.-N. Pouchet. Interative Optimization in the Polyhedral Model. PhD thesis, University of Paris-Sud 11, Orsay, France, January 2010.

    Google Scholar 

  18. N. K. Pham, A. K. Singh, A. Kumar, and M. M. A. Khin. Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pages 157–162. EDA Consortium, 2015.

    Google Scholar 

  19. L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, pages 29–38. ACM, 2013.

    Google Scholar 

  20. B. C. Schafer and K. Wakabayashi. Design space exploration acceleration through operation clustering. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 29(1):153–157, 2010.

    Article  Google Scholar 

  21. B. C. Schafer and K. Wakabayashi. Divide and conquer high-level synthesis design space exploration. ACM Trans. Des. Autom. Electron. Syst., 17(3):29:1–29:19, July 2012.

    Google Scholar 

  22. F. Winterstein, S. Bayliss, and G. A. Constantinides. Separation logic-assisted code transformations for efficient high-level synthesis. In Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on, pages 1–8. IEEE, 2014.

    Google Scholar 

  23. F. Winterstein, K. Fleming, H.-J. Yang, S. Bayliss, and G. Constantinides. Matchup: Memory abstractions for heap manipulating programs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 136–145. ACM, 2015.

    Google Scholar 

  24. M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. Parallel and Distributed Systems, IEEE Transactions on, 2(4):452–471, 1991.

    Article  Google Scholar 

  25. Y. Wang, P. Li, and J. Cong. Theory and algorithm for generalized memory partitioning in high-level synthesis. In Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays, pages 199–208. ACM, 2014.

    Google Scholar 

  26. Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong. Memory partitioning for multidimensional arrays in high-level synthesis. In Proceedings of the 50th Annual Design Automation Conference, page 12. ACM, 2013.

    Google Scholar 

  27. Y. Wang, P. Zhang, X. Cheng, and J. Cong. An integrated and automated memory optimization flow for FPGA behavioral synthesis. In Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific, pages 257–262. IEEE, 2012.

    Google Scholar 

  28. H. Yang, K. Fleming, M. Adler, and J. Emer. LEAP shared memories: Automating the construction of FPGA coherent memories. In 2014 Symposium on Field-Programmable Custom Computing Machines, pages 117–124. IEEE, 2014.

    Google Scholar 

  29. Z. Zhang, Y. Fan, W. Jiang, G. Han, C. Yang, and J. Cong. AutoPilot: A platform-based ESL synthesis system. In High-Level Synthesis, pages 99–112. Springer, 2008.

    Google Scholar 

  30. W. Zuo, P. Li, D. Chen, L.-N. Pouchet, S. Zhong, and J. Cong. Improving polyhedral code generation for high-level synthesis. In Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, page 15. IEEE Press, 2013.

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the National Science Foundation Small Business Innovation Research (SBIR) Grant No. 1520449 for project entitled “Customized Computing for Big Data Applications”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason Cong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Cong, J., Huang, M., Pan, P., Wang, Y., Zhang, P. (2016). Source-to-Source Optimization for HLS. In: Koch, D., Hannig, F., Ziener, D. (eds) FPGAs for Software Programmers. Springer, Cham. https://doi.org/10.1007/978-3-319-26408-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26408-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26406-6

  • Online ISBN: 978-3-319-26408-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics