[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2847263.2847268acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Resolve: Generation of High-Performance Sorting Architectures from High-Level Synthesis

Published: 21 February 2016 Publication History

Abstract

Field Programmable Gate Array (FPGA) implementations of sorting algorithms have proven to be efficient, but existing implementations lack portability and maintainability because they are written in low-level hardware description languages that require substantial domain expertise to develop and maintain. To address this problem, we develop a framework that generates sorting architectures for different requirements (speed, area, power, etc.). Our framework provides ten highly optimized basic sorting architectures, easily composes basic architectures to generate hybrid sorting architectures, enables non-hardware experts to quickly design efficient hardware sorters, and facilitates the development of customized heterogeneous FPGA/CPU sorting systems. Experimental results show that our framework generates architectures that perform at least as well as existing RTL implementations for arrays smaller than 16K elements, and are comparable to RTL implementations for sorting larger arrays. We demonstrate a prototype of an end-to-end system using our sorting architectures for large arrays (16K-130K) on a heterogeneous FPGA/CPU system.

References

[1]
S. G. Akl. Parallel sorting algorithms. AP, Inc, 1985.
[2]
O. Arcas-Abella et al. An empirical evaluation of high-level synthesis languages and tools for database acceleration. In International Conference on Field Programmable Logic and Applications. IEEE, 2014.
[3]
M. Bednara et al. Tradeoff analysis and architecture design of a hybrid hardware/software sorter. In International Conference on Application-Specific Systems, Architectures, and Processors, pages 299--308. IEEE, 2000.
[4]
V. Brajovic et al. A sorting image sensor: An example of massively parallel intensity-to-time processing for low-latency computational sensors. In International Conference on Robotics and Automation, volume 2, pages 1638--1643. IEEE, 1996.
[5]
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. 1994.
[6]
J. Casper et al. Hardware acceleration of database operations. In International symposium on Field-programmable gate arrays, pages 151--160. ACM, 2014.
[7]
R. Chen et al. Energy and memory efficient mapping of bitonic sorting on fpga. In International Symposium on Field-Programmable Gate Arrays, pages 240--249. ACM, 2015.
[8]
J. Chhugani et al. Efficient implementation of sorting on multi-core simd cpu architecture. Proceedings of the VLDB Endowment, 1(2):1313--1324, 2008.
[9]
J. Dean et al. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[10]
N. George et al. Hardware system synthesis from domain-specific languages. In Field Programmable Logic and Applications, pages 1--8. IEEE, 2014.
[11]
G. Graefe. Implementing sorting in database systems. ACM Computing Surveys (CSUR), 38(3):10, 2006.
[12]
M. Jacobsen et al. Riffa 2.1: A reusable integration framework for fpga accelerators. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2015.
[13]
D. E. Knuth. The art of computer programming, volume 3: sorting and searching. Addison-Wesley Professional, 1998.
[14]
D. Koch et al. Fpgasort: A high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting. In International symposium on Field programmable gate arrays, pages 45--54. ACM, 2011.
[15]
C. Lauterbach et al. Fast bvh construction on gpus. In Computer Graphics Forum, volume 28, pages 375--384. Wiley Online Library, 2009.
[16]
R. Marcelino et al. Sorting units for fpga-based embedded systems. In Distributed Embedded Systems: Design, Middleware and Resources, pages 11--22. Springer, 2008.
[17]
J. Matai et al. Enabling fpgas for the masses. In First International Workshop on FPGAs for Software Programmers, 2014.
[18]
R. Mueller et al. Data processing on fpgas. Proceedings of the VLDB Endowment, 2(1):910--921, 2009.
[19]
R. Mueller et al. Sorting networks on fpgas. The VLDB Journal?The International Journal on Very Large Data Bases, 21(1):1--23, 2012.
[20]
R. Mueller, J. Teubner, and G. Alonso. Data processing on fpgas. Proceedings of the VLDB Endowment, 2(1):910--921, 2009.
[21]
J. Ortiz et al. A streaming high-throughput linear sorter system with contention buffering. International Journal of Reconfigurable Computing, 2011.
[22]
N. Satish et al. Designing efficient sorting algorithms for manycore gpus. In IPDPS, pages 1--10. IEEE, 2009.
[23]
M. Zuluaga et al. Computer generation of streaming sorting networks. In Design Automation Conference, pages 1245--1253. ACM, 2012.

Cited By

View all
  • (2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
  • (2024)Hardware Implementation of a Particle Filter for GNSS Interference Source Localization2024 22nd IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NewCAS58973.2024.10666366(70-74)Online publication date: 16-Jun-2024
  • (2023)TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.322857511:2(404-419)Online publication date: 1-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2016
298 pages
ISBN:9781450338561
DOI:10.1145/2847263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. domain-specific framework
  2. domain-specific language
  3. fpga
  4. high-level synthesis
  5. high-performance sorting
  6. sorting

Qualifiers

  • Research-article

Conference

FPGA'16
Sponsor:

Acceptance Rates

FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;
Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
  • (2024)Hardware Implementation of a Particle Filter for GNSS Interference Source Localization2024 22nd IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NewCAS58973.2024.10666366(70-74)Online publication date: 16-Jun-2024
  • (2023)TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.322857511:2(404-419)Online publication date: 1-Apr-2023
  • (2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 21-Apr-2022
  • (2022)HLS-based Optimization of Tau Triggering Algorithm for LHC: a case study2022 18th Biennial Baltic Electronics Conference (BEC)10.1109/BEC56180.2022.9935599(1-6)Online publication date: 4-Oct-2022
  • (2022)DGCNN on FPGA: Acceleration of the Point Cloud Classifier Using FPGAsCircuits, Systems, and Signal Processing10.1007/s00034-022-02179-042:2(748-779)Online publication date: 6-Oct-2022
  • (2021)A High-Performance Bidirectional Architecture for the Quasi-Comparison-Free Sorting AlgorithmIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2020.304895568:4(1493-1506)Online publication date: Apr-2021
  • (2021)How Many CPU Cores is an FPGA Worth? Lessons Learned from Accelerating String Sorting on a CPU-FPGA SystemJournal of Signal Processing Systems10.1007/s11265-021-01686-8Online publication date: 23-Sep-2021
  • (2020)SnugProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392740(1-13)Online publication date: 29-Jun-2020
  • (2020)An Extended Nonstrict Partially Ordered Set-Based Configurable Linear Sorter on FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.297707439:5(1031-1044)Online publication date: May-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media