[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1413370.1413373acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

High performance discrete Fourier transforms on graphics processors

Published: 15 November 2008 Publication History

Abstract

We present novel algorithms for computing discrete Fourier transforms with high performance on GPUs. We present hierarchical, mixed radix FFT algorithms for both power-of-two and non-power-of-two sizes. Our hierarchical FFT algorithms efficiently exploit shared memory on GPUs using a Stockham formulation. We reduce the memory transpose overheads in hierarchical algorithms by combining the transposes into a block-based multi-FFT algorithm. For non-power-of-two sizes, we use a combination of mixed radix FFTs of small primes and Bluestein's algorithm. We use modular arithmetic in Bluestein's algorithm to improve the accuracy. We implemented our algorithms using the NVIDIA CUDA API and compared their performance with NVIDIA's CUFFT library and an optimized CPU-implementation (Intel's MKL) on a high-end quad-core CPU. On an NVIDIA GPU, we obtained performance of up to 300 GFlops, with typical performance improvements of 2--4x over CUFFT and 8--40x improvement over MKL for large sizes.

References

[1]
"HPC challenge," http://icl.cs.utk.edu/hpcc/, 2007.
[2]
"NAS parallel benchmarks," http://www.nas.nasa.gov/Resources/Software/npb.html. 2007.
[3]
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. Purcell, "A survey of general-purpose computation on graphics hardware," Computer Graphics Forum, vol. 26, no. 1, pp. 80--113, Mar. 2007.
[4]
"ATI CTM Guide," Advanced Micro Devices, Inc., 2006.
[5]
NVIDIA CUDA: Compute Unified Device Architecture, NVIDIA Corp., 2007.
[6]
C. Boyd, "The DirectX 11 compute shader," http://s08.idav.ucdavis.edu/boyd-dx11-compute-shader.pdf, 2008.
[7]
A. Munshi, "OpenCL," http://s08.idav.ucdavis.edu/munshi-opencl.pdf, 2008.
[8]
H. V. Sorensen and C. S. Burrus, Fast Fourier Transform Database. PWS Publishing, 1995.
[9]
C. V. Loan, Computational Frameworks for the Fast Fourier Transform. Society for Industrial Mathematics, 1992.
[10]
K. Moreland and E. Angel, "The FFT on a GPU," in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2003, pp. 112--119.
[11]
J. Spitzer, "Implementing a GPU-efficient FFT," SIGGRAPH Course on Interactive Geometric and Scientific Computations with Graphics Hardware, 2003.
[12]
J. L. Mitchell, M. Y. Ansari, and E. Hart, "Advanced image processing with DirectX 9 pixel shaders," in ShaderX2: Shader Programming Tips and Tricks with DirectX 9.0, W. Engel, Ed. Wordware Publishing, Inc., 2003.
[13]
T. Jansen, B. von Rymon-Lipinski, N. Hanssen, and E. Keeve, "Fourier volume rendering on the GPU using a split-stream-FFT," in Proceedings of the Vision, Modeling, and Visualization Conference 2004, 2004, pp. 395--403.
[14]
T. Sumanaweera and D. Liu, "Medical image reconstruction with the FFT," in GPU Gems 2, M. Pharr, Ed. Addison-Wesley, 2005, pp. 765--784.
[15]
N. K. Govindaraju, S. Larsen, J. Gray, and D. Manocha, "A memory model for scientific algorithms on graphics processors," Supercomputing 2006, pp. 6--6, 2006.
[16]
CUDA CUFFT Library, NVIDIA Corp., 2007.
[17]
V. Volkov and B. Kazian, "Fitting FFT onto the G80 architecture," http:www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_report.pdf.
[18]
A. C. Chow, G. C. Fossum, and D. A. Brokenshire, "A programming example: Large FFT on the cell broadband engine," Whitepaper, 2005.
[19]
L. Cico, R. Cooper, and J. Greene, "Performance and programmability of the IBM/Sony/Toshiba Cell broadband engine processor," Whitepaper, 2006.
[20]
S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick, "The potential of the cell processor for scientific computing," in CF '06: Proceedings of the 3rd Conference on Computing Frontiers, 2006, pp. 9--20.
[21]
D. A. Bader and V. Agarwal, "FFTC: fastest Fourier transform for the IBM Cell broadband engine," 14th IEEE International Conference on High Performance Computing (HiPC), pp. 172--184, 2007.
[22]
M. Frigo and S. G. Johnson, "FFTW on the cell processor," http://www.fftw.org/cell/index.html, 2007.
[23]
D. H. Bailey, "FFTs in external or hierarchical memory," Supercomputing, pp. 23--35, 1990.

Cited By

View all
  • (2024)Memory Efficiency Oriented Fine-Grain Representation and Optimization of FFTProceedings of the International Symposium on Memory Systems10.1145/3695794.3695818(245-256)Online publication date: 30-Sep-2024
  • (2023)MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT FrameworkACM Transactions on Architecture and Code Optimization10.1145/360514820:3(1-23)Online publication date: 22-Jul-2023
  • (2021)Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp ShufflesProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00032(345-355)Online publication date: 26-Sep-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing
November 2008
739 pages
ISBN:9781424428359

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 November 2008

Check for updates

Qualifiers

  • Research-article

Conference

SC '08
Sponsor:

Acceptance Rates

SC '08 Paper Acceptance Rate 59 of 277 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Memory Efficiency Oriented Fine-Grain Representation and Optimization of FFTProceedings of the International Symposium on Memory Systems10.1145/3695794.3695818(245-256)Online publication date: 30-Sep-2024
  • (2023)MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT FrameworkACM Transactions on Architecture and Code Optimization10.1145/360514820:3(1-23)Online publication date: 22-Jul-2023
  • (2021)Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp ShufflesProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00032(345-355)Online publication date: 26-Sep-2021
  • (2019)PagodaACM Transactions on Parallel Computing10.1145/33656576:4(1-23)Online publication date: 19-Nov-2019
  • (2019)Generating efficient FFT GPU code with LiftProceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3331553.3342613(1-13)Online publication date: 18-Aug-2019
  • (2019)A mixed-radix FFT algorithm implementation based on Petri nets to assist the certification of bio-medical systemsIECON 2019 - 45th Annual Conference of the IEEE Industrial Electronics Society10.1109/IECON.2019.8927112(2761-2766)Online publication date: 14-Oct-2019
  • (2018)Performance analysis and autotuning setup of the cuFFT libraryProceedings of the 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems10.1145/3295816.3295817(1-6)Online publication date: 4-Nov-2018
  • (2018)FlashabacusProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190544(1-15)Online publication date: 23-Apr-2018
  • (2017)HalideCommunications of the ACM10.1145/315021161:1(106-115)Online publication date: 27-Dec-2017
  • (2017)Solving Poissons equation using FFT in a GPU clusterJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.09.004102:C(28-36)Online publication date: 1-Apr-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media