More Web Proxy on the site http://driver.im/

article

Free access

Sorting networks on FPGAs

Authors:

Gustavo AlonsoAuthors Info & Claims

The VLDB Journal — The International Journal on Very Large Data Bases, Volume 21, Issue 1

Pages 1 - 23

Published: 01 February 2012 Publication History

Abstract

Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. Software-configurable microprocessors and FPGAs add further diversity but also increase complexity. In this paper, we explore the use of sorting networks on field-programmable gate arrays (FPGAs). FPGAs are very versatile in terms of how they can be used and can also be added as additional processing units in standard CPU sockets. Our results indicate that efficient usage of FPGAs involves non-trivial aspects such as having the right computation model (a sorting network in this case); a careful implementation that balances all the design constraints in an FPGA; and the proper integration strategy to link the FPGA to the rest of the system. Once these issues are properly addressed, our experiments show that FPGAs exhibit performance figures competitive with those of modern general-purpose CPUs while offering significant advantages in terms of power consumption and parallel stream evaluation.

References

[1]

Abadi, D. J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120-139 (2003).

[2]

Abadi, D.J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A.S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The design of the Borealis stream processing engine. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2005).

[3]

Ajtai, M., Komlós, J., Szemerédi, E.: An O(n log n) sorting network. In: ACM Symposium on Theory of Computing (STOC), pp. 1-9 (1983).

[4]

Arasu, A., Babu, S., Widom, J.: The cql continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121-142 (2006).

[5]

Batcher, K.E.: Sorting networks and their applications. In: AFIPS Spring Joint Computer Conference, pp. 307-314 (1968).

[6]

Burleson, W.P., Ciesielski, M., Klass, F., Liu, W.: Wave-pipelining: a tutorial and research survey. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 6(3), 464-474.

[7]

Chhugani, J., Nguyen, A. D., Lee, V. W., Macy, W., Hagog, M., Chen, Y. K., Baransi, A., Kumar, S., Dubey, P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. Proc. VLDB Endow. 1(2), 1313-1324 (2008).

[8]

Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C.: Introduction to Algorithms. 2nd edn. MIT Press, Cambridge (2001).

[9]

DeWitt, D.J.: DIRECT--a multiprocessor organization for supporting relational database management systems. IEEE Trans. Comput. 28(6) (1979).

[10]

Furtak, T., Amaral, J.N., Niewiadomski, R.: Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 348-357 (2007).

[11]

Gedik, B., Bordawekar, R.R., Yu, P.S.: CellSort: high performance sorting on the cell processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, pp. 1286-1297 (2007).

[12]

Gold, B.T., Ailamaki, A., Huston, L., Falsafi, B.: Accelerating database operators using a network processor. In: International Workshop on Data Management on New Hardware (DaMoN), Baltimore, MD, USA (2005).

[13]

Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, Paris, France, pp. 215-226 (2004).

[14]

Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTera-Sort: high performance graphics coprocessor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, pp. 325-336 (2006).

[15]

Greaves, D.J., Singh, S.: Kiwi: Synthesis of FPGA circuits from parallel programs. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM) (2008).

[16]

Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: Proceedings of the 2005 ACMSIGMOD International Conference on Management of Data, Baltimore, MD, USA (2005).

[17]

Huang, S.S., Hormati, A., Bacon, D.F., Rabbah, R.: Liquid metal: object-oriented programming across the hardware/software boundary. In: European Conference on Object-Oriented Programming, Paphos, Cyprus (2008).

[18]

Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort a new parallel sorting algorithm for multi-core SIMD processors. In: International Conference on Parallel Architecture and Compilation Techniques (PACT), Brasov, Romania, pp. 189-198 (2007).

[19]

Kickfire: http://www.kickfire.com/ (2009).

[20]

Knuth, D. E.: The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd edn. Addison-Wesley, Reading (1998).

[21]

Manegold, S., Boncz, P. A., Kersten, M. L.: Optimizing database architecture for the new bottleneck: Memory access. VLDB J. 9(3), 231-246 (2000).

[22]

Mitra, A., Vieira, M.R., Bakalov, P., Tsotras, V.J., Najjar, W.: Boosting XML filtering through a scalable FPGA-based architecture. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2009).

[23]

Mueller, R.: Data processing on embedded devices. PhD thesis, ETH Zurich, Diss. ETH No. 19163 (2010).

[24]

Mueller, R., Eguro, K.: FPGA-accelerated deserialization of object structures. Technical report MSR-TR-2009-126, Microsoft Research Redmond (2009).

[25]

Mueller, R., Teubner, J., Alonso, G.: Data processing on fpgas. Proc. VLDB Endow. 2(1) (2009a).

[26]

Mueller, R., Teubner, J., Alonso, G.: Streams on wires--a query compiler for FPGAs. Proc. VLDB Endow. 2(1) (2009b).

[27]

Netezza: http://www.netezza.com/ (2009).

[28]

Oflazer, K.: Design and implementation of a single-chip 1-dmedian filter. IEEE Trans. Acoust. Speech Signal Process. 31, 1164-1168 (1983).

[29]

Q6700 datasheet: Intel Core 2 Extreme Quad-Core processor XQ6000 Sequence and Intel Core 2 Quad Processor Q600 Sequence Datasheet. Intel (2007).

[30]

Rabiner, L. R., Sambur, M. R., Schmidt, C. E.: Applications of a nonlinear smoothing algorithm to speech processing. IEEE Trans. Acoust. Speech Signal Process. 23(6), 552-557 (1975).

[31]

Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, (1977).

[32]

Wendt, P.D., Coyle, E.J., Gallagher, N.C., Jr.: Stack filters. IEEE Trans. Acoust. Speech Signal Process. 34(4) (1986).

[33]

Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5) (2007).

[34]

Xilinx: Virtex-5 FGPA Data Sheet: DC and Switching Characteristics. Xilinx Inc., v5.0 edn (2009a).

[35]

Xilinx: Virtex-5 FPGA User Guide. Xilinx Inc., v4.5 edn (2009b).

[36]

XtremeData: http://www.xtremedatainc.com/ (2009).

[37]

Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA (2002).

Cited By

Gao THuang LGao SWang K(2024)SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter ArrayIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343433235:10(1721-1734)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3434332
Papaphilippou PSano KAdhi BLuk W(2023)Experimental Survey of FPGA-Based Monolithic Switches and a Novel Queue BalancerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.324458934:5(1621-1634)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3244589
Papaphilippou PMeng JGebara NLuk W(2021)Hipernetch: High-Performance FPGA Network SwitchACM Transactions on Reconfigurable Technology and Systems10.1145/347705415:1(1-31)Online publication date: 30-Nov-2021
https://dl.acm.org/doi/10.1145/3477054
Show More Cited By

Sorting networks on FPGAs
1. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs
2. Information systems
  1. Data management systems

Recommendations

Sorting networks on FPGA
TELE-INFO'11/MINO'11/SIP'11: Proceedings of the 10th WSEAS international conference on Telecommunications and informatics and microelectronics, nanoelectronics, optoelectronics, and WSEAS international conference on Signal processing

Speed and efficiency of sorting algorithms are essential for high speed data processing. FPGA based hardware accelerators show better performance than the general purpose processors. Similarly traditional algorithms may not be always efficient on FPGAs. ...
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases

The VLDB Journal — The International Journal on Very Large Data Bases Volume 21, Issue 1

February 2012

163 pages

ISSN:1066-8888

Issue’s Table of Contents

Copyright © Copyright © 2012 Springer-Verlag.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
613
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)5

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gao THuang LGao SWang K(2024)SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter ArrayIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343433235:10(1721-1734)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3434332
Papaphilippou PSano KAdhi BLuk W(2023)Experimental Survey of FPGA-Based Monolithic Switches and a Novel Queue BalancerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.324458934:5(1621-1634)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3244589
Papaphilippou PMeng JGebara NLuk W(2021)Hipernetch: High-Performance FPGA Network SwitchACM Transactions on Reconfigurable Technology and Systems10.1145/347705415:1(1-31)Online publication date: 30-Nov-2021
https://dl.acm.org/doi/10.1145/3477054
Papaphilippou PMeng JLuk WNeuendorffer SShannon L(2020)High-Performance FPGA Network Switch ArchitectureProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375299(76-85)Online publication date: 23-Feb-2020
https://dl.acm.org/doi/10.1145/3373087.3375299
Serre FPüschel M(2019)DSL-Based Hardware Generation with ScalaACM Transactions on Reconfigurable Technology and Systems10.1145/335975413:1(1-23)Online publication date: 19-Dec-2019
https://dl.acm.org/doi/10.1145/3359754
Korat UAlimohammad A(2019)A Reconfigurable Hardware Architecture for Principal Component AnalysisCircuits, Systems, and Signal Processing10.1007/s00034-018-0953-y38:5(2097-2113)Online publication date: 1-May-2019
https://dl.acm.org/doi/10.1007/s00034-018-0953-y
Sun YWang LWang CWang Y(2017)Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAsACM Transactions on Embedded Computing Systems10.1145/309295016:4(1-26)Online publication date: 13-Jul-2017
https://dl.acm.org/doi/10.1145/3092950
Zhao QAmagasaki MIida MKuga MSueyoshi T(2017)A Study of Heterogeneous Computing Design Method based on Virtualization TechnologyACM SIGARCH Computer Architecture News10.1145/3039902.303991844:4(86-91)Online publication date: 11-Jan-2017
https://dl.acm.org/doi/10.1145/3039902.3039918
Sklyarov VSkliarova I(2017)Data processing in the firmware systems for logic control based on search networksAutomation and Remote Control10.1134/S000511791701008878:1(100-112)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1134/S0005117917010088
Ehlers T(2017)Merging almost sorted sequences yields a 24-sorterInformation Processing Letters10.1016/j.ipl.2016.08.005118:C(17-20)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1016/j.ipl.2016.08.005
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents