More Web Proxy on the site http://driver.im/

research-article

Resolve: Generation of High-Performance Sorting Architectures from High-Level Synthesis

Authors:

Janarbek Matai,

Dustin Richmond,

Ryan KastnerAuthors Info & Claims

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 195 - 204

https://doi.org/10.1145/2847263.2847268

Published: 21 February 2016 Publication History

Abstract

Field Programmable Gate Array (FPGA) implementations of sorting algorithms have proven to be efficient, but existing implementations lack portability and maintainability because they are written in low-level hardware description languages that require substantial domain expertise to develop and maintain. To address this problem, we develop a framework that generates sorting architectures for different requirements (speed, area, power, etc.). Our framework provides ten highly optimized basic sorting architectures, easily composes basic architectures to generate hybrid sorting architectures, enables non-hardware experts to quickly design efficient hardware sorters, and facilitates the development of customized heterogeneous FPGA/CPU sorting systems. Experimental results show that our framework generates architectures that perform at least as well as existing RTL implementations for arrays smaller than 16K elements, and are comparable to RTL implementations for sorting larger arrays. We demonstrate a prototype of an end-to-end system using our sorting architectures for large arrays (16K-130K) on a heterogeneous FPGA/CPU system.

References

[1]

S. G. Akl. Parallel sorting algorithms. AP, Inc, 1985.

Digital Library

[2]

O. Arcas-Abella et al. An empirical evaluation of high-level synthesis languages and tools for database acceleration. In International Conference on Field Programmable Logic and Applications. IEEE, 2014.

[3]

M. Bednara et al. Tradeoff analysis and architecture design of a hybrid hardware/software sorter. In International Conference on Application-Specific Systems, Architectures, and Processors, pages 299--308. IEEE, 2000.

Digital Library

[4]

V. Brajovic et al. A sorting image sensor: An example of massively parallel intensity-to-time processing for low-latency computational sensors. In International Conference on Robotics and Automation, volume 2, pages 1638--1643. IEEE, 1996.

[5]

M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. 1994.

[6]

J. Casper et al. Hardware acceleration of database operations. In International symposium on Field-programmable gate arrays, pages 151--160. ACM, 2014.

Digital Library

[7]

R. Chen et al. Energy and memory efficient mapping of bitonic sorting on fpga. In International Symposium on Field-Programmable Gate Arrays, pages 240--249. ACM, 2015.

Digital Library

[8]

J. Chhugani et al. Efficient implementation of sorting on multi-core simd cpu architecture. Proceedings of the VLDB Endowment, 1(2):1313--1324, 2008.

Digital Library

[9]

J. Dean et al. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.

Digital Library

[10]

N. George et al. Hardware system synthesis from domain-specific languages. In Field Programmable Logic and Applications, pages 1--8. IEEE, 2014.

[11]

G. Graefe. Implementing sorting in database systems. ACM Computing Surveys (CSUR), 38(3):10, 2006.

Digital Library

[12]

M. Jacobsen et al. Riffa 2.1: A reusable integration framework for fpga accelerators. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2015.

Digital Library

[13]

D. E. Knuth. The art of computer programming, volume 3: sorting and searching. Addison-Wesley Professional, 1998.

Digital Library

[14]

D. Koch et al. Fpgasort: A high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting. In International symposium on Field programmable gate arrays, pages 45--54. ACM, 2011.

Digital Library

[15]

C. Lauterbach et al. Fast bvh construction on gpus. In Computer Graphics Forum, volume 28, pages 375--384. Wiley Online Library, 2009.

[16]

R. Marcelino et al. Sorting units for fpga-based embedded systems. In Distributed Embedded Systems: Design, Middleware and Resources, pages 11--22. Springer, 2008.

[17]

J. Matai et al. Enabling fpgas for the masses. In First International Workshop on FPGAs for Software Programmers, 2014.

[18]

R. Mueller et al. Data processing on fpgas. Proceedings of the VLDB Endowment, 2(1):910--921, 2009.

Digital Library

[19]

R. Mueller et al. Sorting networks on fpgas. The VLDB Journal?The International Journal on Very Large Data Bases, 21(1):1--23, 2012.

Digital Library

[20]

R. Mueller, J. Teubner, and G. Alonso. Data processing on fpgas. Proceedings of the VLDB Endowment, 2(1):910--921, 2009.

Digital Library

[21]

J. Ortiz et al. A streaming high-throughput linear sorter system with contention buffering. International Journal of Reconfigurable Computing, 2011.

[22]

N. Satish et al. Designing efficient sorting algorithms for manycore gpus. In IPDPS, pages 1--10. IEEE, 2009.

Digital Library

[23]

M. Zuluaga et al. Computer generation of streaming sorting networks. In Design Automation Conference, pages 1245--1253. ACM, 2012.

Digital Library

Cited By

Pelton BSapek AEguro KLo DForin AHumphrey MXi JCox DKarandikar Rde Fine Licht JBabin ECaulfield ABurger D(2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656420
Vu DBiswas SKan ACetin E(2024)Hardware Implementation of a Particle Filter for GNSS Interference Source Localization2024 22nd IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NewCAS58973.2024.10666366(70-74)Online publication date: 16-Jun-2024
https://doi.org/10.1109/NewCAS58973.2024.10666366
Qiao WGuo LFang ZChang MCong J(2023)TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.322857511:2(404-419)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3228575
Show More Cited By

Index Terms

Resolve: Generation of High-Performance Sorting Architectures from High-Level Synthesis

Recommendations

FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting
FPGA '11: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays

This paper analyses different hardware sorting architectures in order to implement a highly scaleable sorter for solving huge problems at high performance up to the GB range in linear time complexity. It will be proven that a combination of a FIFO-based ...
Code generation from a domain-specific language for C-based HLS of hardware accelerators
CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis

As today's computer architectures are becoming more and more heterogeneous, a plethora of options including CPUs, GPUs, DSPs, reconfigurable logic (FPGAs), and other application-specific processors come into consideration for close-to-sensor processing. ...
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2016

298 pages

ISBN:9781450338561

DOI:10.1145/2847263

General Chair:
Deming Chen
University of Illinois at Urbana-Champaign, USA
,
Program Chair:
Jonathan Greene
Microsemi, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA'16

Sponsor:

SIGDA

FPGA'16: The 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 21 - 23, 2016

California, Monterey, USA

Acceptance Rates

FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
319
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pelton BSapek AEguro KLo DForin AHumphrey MXi JCox DKarandikar Rde Fine Licht JBabin ECaulfield ABurger D(2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656420
Vu DBiswas SKan ACetin E(2024)Hardware Implementation of a Particle Filter for GNSS Interference Source Localization2024 22nd IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NewCAS58973.2024.10666366(70-74)Online publication date: 16-Jun-2024
https://doi.org/10.1109/NewCAS58973.2024.10666366
Qiao WGuo LFang ZChang MCong J(2023)TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.322857511:2(404-419)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3228575
Cong JLau JLiu GNeuendorffer SPan PVissers KZhang Z(2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 21-Apr-2022
https://dl.acm.org/doi/10.1145/3530775
Cherezova NMihhailov DDevadze SJutman A(2022)HLS-based Optimization of Tau Triggering Algorithm for LHC: a case study2022 18th Biennial Baltic Electronics Conference (BEC)10.1109/BEC56180.2022.9935599(1-6)Online publication date: 4-Oct-2022
https://doi.org/10.1109/BEC56180.2022.9935599
Jamali Golzar SKarimian GShoaran MFattahi Sani M(2022)DGCNN on FPGA: Acceleration of the Point Cloud Classifier Using FPGAsCircuits, Systems, and Signal Processing10.1007/s00034-022-02179-042:2(748-779)Online publication date: 6-Oct-2022
https://doi.org/10.1007/s00034-022-02179-0
Chen WChen RChen PHsiao Y(2021)A High-Performance Bidirectional Architecture for the Quasi-Comparison-Free Sorting AlgorithmIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2020.304895568:4(1493-1506)Online publication date: Apr-2021
https://doi.org/10.1109/TCSI.2020.3048955
Asiatici MMaiorano DIenne P(2021)How Many CPU Cores is an FPGA Worth? Lessons Learned from Accelerating String Sorting on a CPU-FPGA SystemJournal of Signal Processing Systems10.1007/s11265-021-01686-8Online publication date: 23-Sep-2021
https://doi.org/10.1007/s11265-021-01686-8
Heidarshenas AGangwani TYesil SMorrison ATorrellas JAyguadé EHwu WBadia RHofstee H(2020)SnugProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392740(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392740
Li DHuang LGao TFeng YTavares AWang K(2020)An Extended Nonstrict Partially Ordered Set-Based Configurable Linear Sorter on FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.297707439:5(1031-1044)Online publication date: May-2020
https://doi.org/10.1109/TCAD.2020.2977074
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents