[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Comparison Study on Implementing Optical Flow and Digital Communications on FPGAs and GPUs

Published: 01 May 2010 Publication History

Abstract

FPGA devices have often found use as higher-performance alternatives to programmable processors for implementing computations. Applications successfully implemented on FPGAs typically contain high levels of parallelism and often use simple statically scheduled control and modest arithmetic. Recently introduced computing devices such as coarse-grain reconfigurable arrays, multi-core processors, and graphical processing units promise to significantly change the computational landscape and take advantage of many of the same application characteristics that fit well on FPGAs. One real-time computing task, optical flow, is difficult to apply in robotic vision applications because of its high computational and data rate requirements, and so is a good candidate for implementation on FPGAs and other custom computing architectures. This article reports on a series of experiments mapping a collection of different algorithms onto both an FPGA and a GPU. For two different optical flow algorithms the GPU had better performance, while for a set of digital comm MIMO computations, they had similar performance. In all cases the FPGA implementations required 10x the development time. Finally, a discussion of the two technology’s characteristics is given to show they achieve high performance in different ways.

References

[1]
Alamouti, S. 1998. A simple transmit diversity technique for wireless communication. IEEE J. Selected Areas Comm. 16, 1451--1458.
[2]
Arribas, P. C. and Macia, F. M. H. 2001. FPGA implementation of camus correlation optical flow algorithm for real time images. In Proceedings of the 14th International Conference on Vision Interface. 32--38.
[3]
Baker, Z. K., Gokhale, M. B., and Tripp, J. L. 2007. Matched filter computation on FPGA, cell and GPU. In Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 207--218.
[4]
Chase, J., Nelson, B., Bodily, J., Z., W., and D.J., L. 2008. Real-Time optical flow calculations on FPGA and GPU architectures: A comparison study. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM’08). IEEE Computer Society Press.
[5]
Cope, B., Cheung, P., Luk, W., and Witt, S. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proceedings of the IEEE International Conference on Field-Programmable Technology. 111--118.
[6]
Correia, M. and Campilho, A. 2002. Real-Time implementation of an optical flow algorithm. In Proceedings of the IEEE International Conference on Image Processing (ICIP’02). Vol. 4. 247--250.
[7]
Diaz, J., Ros, E., Pelayo, F., Ortigosa, E. M., and Mota, S. 2006. FPGA-Based real-time optical-flow system. IEEE Trans. Circ. Syst. Video Technol. 16, 2, 274--279.
[8]
Diepold, K., Durkovic, M., Obermeier, F., and Zwick, M. 2006. Performance of optical flow techniques on graphics hardware. In Proceedings of the International Congress on Mathematical Education (ICME’06). 241--244.
[9]
Farneback, G. 2000a. Fast and accurate motion estimation using orientation tensors and parametric motion models. In Proceedings of the International Conference on Pattern Recognition (ICPR’00). Vol. 1. 135--139.
[10]
Farneback, G. 2000b. Orientation estimation based on weighted projection onto quadratic polynomials. In Proceedings of the Conference on Vision, Modeling, and Visualization. 89--96.
[11]
Farneback, G. 2001. Very high accuracy velocity estimation using orientation tensors, parametric motion, and simultaneous segmentation of the motion field. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’01). Vol. 1. 77--80.
[12]
Graham, P. and Nelson, B. 1996. Genetic algorithms in software and in hardware---A performance analysis of workstation and custom computing machine implementations. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines. J. Arnold and K. Pocek, Eds. 216--225.
[13]
Graham, P. and Nelson, B. 1998. FPGA-Based sonar processing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. J. Cong and S. Kaptanoglu, Eds. ACM Press, 201--208.
[14]
Grob, J. 2003. Linear regression. Lecture Notes in Statistics.
[15]
Haussecker, H. and Spies, H. 1999. Handbook of Computer Vision and Application. Vol. 2. Academic Press, New York.
[16]
He, S. and Torkelson, M. 1996. A new approach to pipeline fft processor. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’96). 766--770.
[17]
Hoerl, A. and Kennard, R. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1, 55--67.
[18]
Howes, L., Price, P., Mencer, O., Beckmann, O., and Pell, O. 2006. Comparing FPGAs to graphics accelerators and the playstation 2 using a unified source description. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’06). 1--6.
[19]
Johansson, B. and Farneback, G. 2002. A theoretical comparison of different orientation tensors. In Proceedings of the Symposium on Image Analysis (SSAB’02). 69--73.
[20]
Martin, J. L., Zuloaga, A., Cuadrado, C., Lazaro, J., and Bidarte, U. 2005. Hardware implementation of optical flow constraint equation using FPGAs. Comput. Vis. Image Understand. 98, 462--490.
[21]
Mizukami, Y. and Tadamura, K. 2007. Optical flow computation on compute unified device architecture. In Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP’07). 179--184.
[22]
Niitsuma, H. and Maruyama, T. 2005. High speed computation of the optical flow. Lecture Notes in Computer Science, vol. 3617. Springer, 287--295.
[23]
Palmer, J. and Nelson, B. 2004. A parallel FFT architecture for FPGAs. In Proceedings of the 14th International Conference on Field Programmable Logic and Applications (FPL’04). 948--953.
[24]
Strzodka, R. and Garbe, C. 2004. Real-Time motion estimation and visualization on graphics cards. In Proceedings of the Conference on Visualization (VIS’04). IEEE Computer Society, 545--552.
[25]
Wei, Z., Lee, D., Nelson, B., and Archibald, J. 2008. Real-Time accurate optical flow sensor. In Proceedings of the International Conference on Pattern Recognition (ICPR’08).
[26]
Wei, Z., Lee, D. J., Nelson, B., and Martineau, M. 2007. A fast and accurate tensor-based optical flow algorithm implemented in FPGA. In Proceedings of the IEEE Workshop on Application of Computer Vision (WACV’07). 18.
[27]
Zach, C., Pock, T., and Bischof, H. 2007. A duality based approach for realtime TV-L1 optical flow. In Proceedings of the DAGM Symposium on Pattern Recognition. 214--223.
[28]
Zuloaga, A., Martin, J. L., and Ezquerra, J. 1998. Hardware architecture for optical flow estimation in real time. In Proceedings of the IEEE International Conference on Image Processing (ICIP’98). Vol. 3. 972--976.

Cited By

View all

Recommendations

Reviews

Vivek Venugopal

Bodily et al. compare applications prototyped on both field-programmable gate arrays (FPGAs) and graphics processing units (GPUs). The authors describe the design effort and performance parameters, such as pipelining and parallelism, when implementing on both platforms. The first application is an optical flow calculation implemented using two algorithms: tensor based and ridge regression based. The tensor-based algorithm was implemented on a Xilinx XUP V2P board consisting of a Virtex-2 Pro XC2VP30, and it was implemented using an embedded development kit (EDK) that indicates the usage of embedded PowerPC and very-high-speed integrated circuits hardware description language (VHDL). It used 10,288 slices, which resulted in a processing power of 64 frames per second (fps) for 640x480 images and 258 fps for 320x240 images. The GPU implementation was done on a NVIDIA 8800 GTX, and the host machine was an Intel Xeon 1.86 gigahertz (GHz) with 1 gigabyte (GB) of random access memory (RAM). The GPU implementation was highly optimized for block sizes, and it resulted in 238 fps for 640x480 images and 847 fps for 320x240 images. The FPGA provided better estimates for power consumption, memory architecture, and flexibility than the GPU. The ridge regression algorithm was implemented on a custom FPGA platform consisting of a Xilinx Virtex-4 FX60 FPGA with two PowerPC embedded processors. The FPGA implementation processed 640x480 images at 15 fps. The GPU implementation was able to process 640x480 images at 158 fps. Both algorithms had better accuracy on the GPU, but the GPU used a single precision floating-point implementation whereas the FPGA used a fixed-point implementation. The development time was longer on the FPGA, resulting in ten to 12 times the design effort as compared to the GPU. The second application is based on the performance evaluation of blocks in communication systems, including the Viterbi decoder, the timing and channel estimator, and the pilot detector. The FPGA implementation of the Viterbi decoder met the timing requirement of 320 microseconds per frame, using 8,790 slices on a Xilinx Virtex-2 Pro. The GPU implementation of the Viterbi decoder provided better throughput over the FPGA by 20 percent, with an increase in latency of 32 times. The estimator used 7,000 slices on a Xilinx Virtex-2 Pro and calculated 70 surface points within the 320 microseconds time frame. The GPU implementation met the 320-microsecond time frame using a combination of coarse and fine search algorithms over the sample sizes. The pilot detector was implemented using seven 1,024-point complex fast Fourier transforms (FFTs), using 16,000 slices on the Xilinx Virtex-2 Pro FPGA. Each FFT on the FPGA took 12 microseconds and used additional logic blocks to keep up with the 41.6M sample data rate. The GPU implementation used the CUFFT library for the FFT processing and took about 60 microseconds for a single 1,024-point FFT. In summary, FPGAs offer more flexibility for custom input/output (I/O) computation, whereas GPUs offer better compute-to-I/O ratio. The memory bottleneck is present for both, as data needs to be sent to both the FPGA and the GPU onboard memory. By leveraging the pipelining of FPGAs and the parallelism of GPUs, readers can use a combination of these architectures to solve real-time applications. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 3, Issue 2
May 2010
141 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/1754386
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2010
Accepted: 01 April 2009
Revised: 01 November 2008
Received: 01 July 2008
Published in TRETS Volume 3, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Digital communications
  2. FPGA
  3. GPU
  4. optical flow
  5. reconfigurable computing

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Architecturally truly diverse systems: A reviewFuture Generation Computer Systems10.1016/j.future.2020.03.061Online publication date: Apr-2020
  • (2020)A novel framework for UAV returning based on FPGAThe Journal of Supercomputing10.1007/s11227-020-03434-4Online publication date: 25-Sep-2020
  • (2017)FPGA Implementation of a Dense Optical Flow Algorithm Using Altera OpenCL SDKICT Innovations 201710.1007/978-3-319-67597-8_9(89-101)Online publication date: 7-Sep-2017
  • (2016)Parallelizing the Chambolle Algorithm for Performance-Optimized Mapping on FPGA DevicesACM Transactions on Embedded Computing Systems10.1145/285149715:3(1-27)Online publication date: 7-Mar-2016
  • (2014)Fast and Accurate Optical Flow Estimation using FPGAACM SIGARCH Computer Architecture News10.1145/2693714.269372042:4(27-32)Online publication date: 3-Dec-2014
  • (2014)Vision-Based Egomotion Estimation on FPGA for Unmanned Aerial Vehicle NavigationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2013.229135624:6(1070-1083)Online publication date: Jun-2014
  • (2014)A heterogeneous platform with GPU and FPGA for power efficient high performance computing2014 International Symposium on Integrated Circuits (ISIC)10.1109/ISICIR.2014.7029447(220-223)Online publication date: Dec-2014
  • (2014)Evaluating latency and throughput bound acceleration of FPGAs and GPUs for adaptive optics algorithms2014 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2014.7040964(1-6)Online publication date: Sep-2014
  • (2014)FPGA Implementation of Optical Flow Algorithm Based on Cost Aggregation2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2014.57(175-175)Online publication date: May-2014
  • (2013)Global Interconnect and Control Synthesis in System Level Architectural Synthesis FrameworkProceedings of the 2013 Euromicro Conference on Digital System Design10.1109/DSD.2013.12(11-17)Online publication date: 4-Sep-2013
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media