[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Application development with the FlexWAFE real-time stream processing architecture for FPGAs

Published: 29 October 2009 Publication History

Abstract

The challenges posed by complex real-time digital image processing at high resolutions cannot be met by current state-of-the-art general-purpose or DSP processors, due to the lack of processing power. On the other hand, large arrays of FPGA-based accelerators are too inefficient to cover the needs of cost sensitive professional markets. We present a new architecture composed of a network of configurable flexible weakly programmable processing elements, Flexible Weakly programmable Advanced Film Engine (FlexWAFE). This architecture delivers both programmability and high efficiency when implemented on an FPGA basis. We demonstrate these claims using a professional next-generation noise reducer with more than 170G image operations/s at 80% FPGA area utilization on four Virtex II-Pro FPGAs. This article will focus on the FlexWAFE architecture principle and implementation on a PCI-Express board.

References

[1]
Ahn, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., and Das, A. 2004. Evaluating the imagine stream architecture. SIGARCH Comput. Archit. News 32, 2, 14.
[2]
Aspex, Ltd. 2008. ASProCore overview Web site. http://www.aspex-semi.com.
[3]
Blythe, D. 2008. Rise of the graphics processor. Proc. IEEE 96, 5, 761--778.
[4]
CCSDS. 1997. Lossless data compression, Blue Book. Consultation Committee for Space Data Systems. http://public.ccsds.org/publications/archive/121x0b1c2_tca724.pdf
[5]
Cloutier, J., Pigeon, S., Boyer, F. R., Cosatto, E., and Simard, P. Y. 1996. Vip: An FPGA-based processor for image processing and neural networks. In Proceedings of the International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems. ACM, New York, 330.
[6]
Crookes, D., Benkrid, K., Bouridane, A., Alotaibi, K., and Benkrid, A. 2000. Design and implementation of a high-level programming environment for FPGA-based image processing. IEE Proc. Vision Signal Process. 147, 4, 377--384.
[7]
Da Vinci Systems. 2008. Da Vinci Systems Web site. http://www.geniusofdavinci.com.
[8]
Digital Vision AB. 2008. Digital vision DVNR Web site. http://www.digitalvision.se.
[9]
do Carmo Lucas, A. and Ernst, R. 2005. An image processor for digital film. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architecture Processors. IEEE, Los Alamitos, CA.
[10]
do Carmo Lucas, A., Heithecker, S., and Ernst, R. 2007. FlexWAFE: A high-end real-time stream processing library for FPGAs. In Proceedings of the 44th Annual Conference on Design Automation (DAC'07). ACM, New York, 916--921.
[11]
do Carmo Lucas, A., Heithecker, S., Rüffer, P., Ernst, R., Rückert, H., Wischermann, G., Gebel, K., Fach, R., Hunther, W., Eichner, S., and Scheller, G. 2006. A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'06). IEEE, Los Alamitos, CA, 194--199.
[12]
Dutta, S., Jensen, R., and Rieckmann, A. 2001. Viper: A multiprocessor SoC for advanced set-top box and digital TV systems. IEEE Des. Test Comput. 21--31.
[13]
Eichner, S., Scheller, G., Wessely, U., Rückert, H., and Hedtke, R. 2005. Motion compensated spatial-temporal reduction of film grain noise in the wavelet domain. In Proceedings of the SMPTE Technical Conference. SMPTE, White Plains, NY.
[14]
Guo, Z., Mitra, A., and Najjar, W. 2006. Automation of IP core interface generation for reconfigurable computing. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL'06). IEEE, Los Alamitos, CA, 1--6.
[15]
Hartenstein, R., Hirschbiel, A., and Weber, M. 1987. MOM: Map oriented machine. In Proceedings of the International Workshop on Hardware Accelerators. IEEE, Los Alamitos, CA.
[16]
Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2003. A mixed QoS SDRAM controller for FPGA-based high-end image processing. In Proceedings of the Workshop on Signal Processing Systems Design and Implementation. Elsevier, The Netherlands.
[17]
Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2007. A high-end real-time digital film processing reconfigurable platform. EURASIP J. Embed. Syst. 1, 12.
[18]
Heithecker, S. and Ernst, R. 2005. Traffic shaping for an FPGA-based SDRAM controller with complex QoS requirements. In Proceedings of the Design Automation Conference (DAC'05). ACM, New York, 575--578.
[19]
Hoover, G. and Brewer, F. 2008. Synthesizing Synchronous Elastic Flow Networks. In Proceedings of the Design, Automation and Test in Europe (DATE'08). IEEE, Los Alamitos, CA, 306--311.
[20]
Hunt Engineering, Ltd. Homepage. http://www.hunteng.co.uk.
[21]
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the cell multiprocessor. IBM J. Res. Develop. 49, 4/5, 589--604.
[22]
Karp, R. and Miller, R. 1966. Properties of a model for parallel computations: Determinacy, termination, queueing. SIAM J. Appl. Math. 40, 6.
[23]
Kumar, A., Hansson, A., Huisken, J., and Corporaal, H. 2007. An FPGA design flow for reconfigurable network-based multi-processor systems on chip. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). IEEE, Los Alamitos, CA, 1--6.
[24]
Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. In Proceedings of the IEEE. 75, 1235--1245.
[25]
Lee, K.-B., Lin, T.-C., and Jen, C.-W. 2005. An efficient quality-aware memory controller for multimedia platform SoC. IEEE Trans. Circuits Syst. Video Technol. 15, 5, 620--633.
[26]
Loo, S., Wells, B., Freije, N., and Kulick, J. 2002. Handel-C for rapid prototyping of VLSI coprocessors for real-time systems. In Proceedings of the 34th Southeastern Symposium on System Theory. IEEE, Los Alamitos, CA, 6--10.
[27]
Najjar, W., Bohm, W., Draper, B., Hammes, J., Rinker, R., Beveridge, J., Chawathe, M., and Ross, C. 2003. High-level language abstraction for reconfigurable computing. Computer 36, 8, 63--69.
[28]
Nallatech, Ltd. 2007. DIMEtalk 3 Product Brief. http://www.nallatech.com/index.php/FPGA-Development-Tools/dimetalk.html
[29]
Quantel Ltd. 2008. Quantel Pablo Web site. http://www.quantel.com.
[30]
Rice, R. F. 1979. Some practical universal noiseless coding techniques. JPL Publication 91-3, Part i11, Module PSI14, K+.
[31]
Rixner, S., Dally, W. J., and Kapasi, U. J. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 128--138.
[32]
Shukla, S., Bergmann, N., and Becker, J. 2007. QUKU: A FPGA-based flexible coarse grain architecture design paradigm using process networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07). IEEE, Los Alamitos, CA, 1--7.
[33]
Shukla, S., Bergmann, N. W., and Becker, J. 2005. APEX—A coarse-grained reconfigurable overlay for FPGAs. In Proceedings of the IFIP VLSI SoC.
[34]
Sonics, Inc. 2005. Sonics MemMax 2.0 multi-threaded DRAM access scheduler. Data sheet, Sonics Inc. http://74.125.113.132/search?q=cache:sA2W0BPm61wJ:www.sonicsinc.com/download_doc.php%3Fdoc%3DMemMax2datasheet0906.pdf+Sonics+MemMax+2.0+Multi-threaded+DRAM+Access+Scheduler&cd=1&hl=en&ct=clnk&gl=us
[35]
Stream Processors Inc. 2008. Storm-1 SP16HP-G220 Product Brief. http://www.streamprocessors.com.
[36]
The Mathworks. 2008. Simulink—Simulation and Model-Based Design Homepage. http://www.mathworks.com/products/simulink/.
[37]
Thoma, F., Kühnle, M., Bonnot, P., Panainte, E. M., Bertels, K., Goller, S., Schneider, A., Guyetant, S., Schüler, E., Müller-Glaser, K. D., and Becker, J. 2007. MORPHEUS: Heterogeneous reconfigurable computing. In Proceedings of 17th International Conference on Field-Programmable Logic and Applications (FPL'07). IEEE, Los Alamitos, CA.
[38]
Thomson Grassvalley. 2008. Scream 4K/2K/HD noise reducer Web site. http://www.thomsongrassvalley.com.
[39]
Weber, W.-D. 2001. Efficient shared DRAM subsystems for SOCs. Microprocessor Forum.
[40]
Whitty, S. and Ernst, R. 2008. A bandwidth optimized SDRAM controller for the MORPHEUS reconfigurable architecture. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS). IEEE, Los Alamitos, CA.
[41]
Xilinx, Inc. 2008. Xilinx Virtex 5 Family Overview. http://www.xilinx.com.

Cited By

View all
  • (2012)A high-performance dense block matching solution for automotive 6D-visionProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492775(268-271)Online publication date: 12-Mar-2012
  • (2012)A Flexible High-Performance Accelerator Platform for Automotive Sensor ApplicationsSAE International Journal of Passenger Cars - Electronic and Electrical Systems10.4271/2012-01-09395:1(280-291)Online publication date: 16-Apr-2012
  • (2011)Towards Synthesis-Free JIT Compilation to Commodity FPGAsProceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2011.25(202-205)Online publication date: 1-May-2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 9, Issue 1
October 2009
184 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1596532
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 29 October 2009
Accepted: 01 February 2009
Revised: 01 January 2009
Received: 01 June 2008
Published in TECS Volume 9, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Communication centric
  2. FPGA
  3. PCI-Express
  4. QoS
  5. SDRAM-controller
  6. communication scheduling
  7. digital film
  8. real-time
  9. reconfigurable
  10. stream-based architecture
  11. weakly-programmable

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2012)A high-performance dense block matching solution for automotive 6D-visionProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492775(268-271)Online publication date: 12-Mar-2012
  • (2012)A Flexible High-Performance Accelerator Platform for Automotive Sensor ApplicationsSAE International Journal of Passenger Cars - Electronic and Electrical Systems10.4271/2012-01-09395:1(280-291)Online publication date: 16-Apr-2012
  • (2011)Towards Synthesis-Free JIT Compilation to Commodity FPGAsProceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2011.25(202-205)Online publication date: 1-May-2011
  • (2010)Back SuctionProceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip10.1109/NOCS.2010.38(155-162)Online publication date: 3-May-2010
  • (2010)A reconfigurable architecture for real-time vision systems on FPGA2010 International Conference on Microelectronics10.1109/ICM.2010.5696187(455-458)Online publication date: Dec-2010
  • (2009)Mapping of a film grain removal algorithm to a heterogeneous reconfigurable architectureProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874630(27-32)Online publication date: 20-Apr-2009
  • (2009)The Hardware ServicesDynamic System Reconfiguration in Heterogeneous Platforms10.1007/978-90-481-2427-5_7(77-91)Online publication date: 2009
  • (2009)Real-Time Digital Film ProcessingDynamic System Reconfiguration in Heterogeneous Platforms10.1007/978-90-481-2427-5_14(185-193)Online publication date: 2009

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media