[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Sorting networks on FPGAs

Published: 01 February 2012 Publication History

Abstract

Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. Software-configurable microprocessors and FPGAs add further diversity but also increase complexity. In this paper, we explore the use of sorting networks on field-programmable gate arrays (FPGAs). FPGAs are very versatile in terms of how they can be used and can also be added as additional processing units in standard CPU sockets. Our results indicate that efficient usage of FPGAs involves non-trivial aspects such as having the right computation model (a sorting network in this case); a careful implementation that balances all the design constraints in an FPGA; and the proper integration strategy to link the FPGA to the rest of the system. Once these issues are properly addressed, our experiments show that FPGAs exhibit performance figures competitive with those of modern general-purpose CPUs while offering significant advantages in terms of power consumption and parallel stream evaluation.

References

[1]
Abadi, D. J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120-139 (2003).
[2]
Abadi, D.J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A.S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The design of the Borealis stream processing engine. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2005).
[3]
Ajtai, M., Komlós, J., Szemerédi, E.: An O(n log n) sorting network. In: ACM Symposium on Theory of Computing (STOC), pp. 1-9 (1983).
[4]
Arasu, A., Babu, S., Widom, J.: The cql continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121-142 (2006).
[5]
Batcher, K.E.: Sorting networks and their applications. In: AFIPS Spring Joint Computer Conference, pp. 307-314 (1968).
[6]
Burleson, W.P., Ciesielski, M., Klass, F., Liu, W.: Wave-pipelining: a tutorial and research survey. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 6(3), 464-474.
[7]
Chhugani, J., Nguyen, A. D., Lee, V. W., Macy, W., Hagog, M., Chen, Y. K., Baransi, A., Kumar, S., Dubey, P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. Proc. VLDB Endow. 1(2), 1313-1324 (2008).
[8]
Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C.: Introduction to Algorithms. 2nd edn. MIT Press, Cambridge (2001).
[9]
DeWitt, D.J.: DIRECT--a multiprocessor organization for supporting relational database management systems. IEEE Trans. Comput. 28(6) (1979).
[10]
Furtak, T., Amaral, J.N., Niewiadomski, R.: Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 348-357 (2007).
[11]
Gedik, B., Bordawekar, R.R., Yu, P.S.: CellSort: high performance sorting on the cell processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, pp. 1286-1297 (2007).
[12]
Gold, B.T., Ailamaki, A., Huston, L., Falsafi, B.: Accelerating database operators using a network processor. In: International Workshop on Data Management on New Hardware (DaMoN), Baltimore, MD, USA (2005).
[13]
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, Paris, France, pp. 215-226 (2004).
[14]
Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTera-Sort: high performance graphics coprocessor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, pp. 325-336 (2006).
[15]
Greaves, D.J., Singh, S.: Kiwi: Synthesis of FPGA circuits from parallel programs. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM) (2008).
[16]
Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: Proceedings of the 2005 ACMSIGMOD International Conference on Management of Data, Baltimore, MD, USA (2005).
[17]
Huang, S.S., Hormati, A., Bacon, D.F., Rabbah, R.: Liquid metal: object-oriented programming across the hardware/software boundary. In: European Conference on Object-Oriented Programming, Paphos, Cyprus (2008).
[18]
Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort a new parallel sorting algorithm for multi-core SIMD processors. In: International Conference on Parallel Architecture and Compilation Techniques (PACT), Brasov, Romania, pp. 189-198 (2007).
[19]
Kickfire: http://www.kickfire.com/ (2009).
[20]
Knuth, D. E.: The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd edn. Addison-Wesley, Reading (1998).
[21]
Manegold, S., Boncz, P. A., Kersten, M. L.: Optimizing database architecture for the new bottleneck: Memory access. VLDB J. 9(3), 231-246 (2000).
[22]
Mitra, A., Vieira, M.R., Bakalov, P., Tsotras, V.J., Najjar, W.: Boosting XML filtering through a scalable FPGA-based architecture. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2009).
[23]
Mueller, R.: Data processing on embedded devices. PhD thesis, ETH Zurich, Diss. ETH No. 19163 (2010).
[24]
Mueller, R., Eguro, K.: FPGA-accelerated deserialization of object structures. Technical report MSR-TR-2009-126, Microsoft Research Redmond (2009).
[25]
Mueller, R., Teubner, J., Alonso, G.: Data processing on fpgas. Proc. VLDB Endow. 2(1) (2009a).
[26]
Mueller, R., Teubner, J., Alonso, G.: Streams on wires--a query compiler for FPGAs. Proc. VLDB Endow. 2(1) (2009b).
[27]
Netezza: http://www.netezza.com/ (2009).
[28]
Oflazer, K.: Design and implementation of a single-chip 1-dmedian filter. IEEE Trans. Acoust. Speech Signal Process. 31, 1164-1168 (1983).
[29]
Q6700 datasheet: Intel Core 2 Extreme Quad-Core processor XQ6000 Sequence and Intel Core 2 Quad Processor Q600 Sequence Datasheet. Intel (2007).
[30]
Rabiner, L. R., Sambur, M. R., Schmidt, C. E.: Applications of a nonlinear smoothing algorithm to speech processing. IEEE Trans. Acoust. Speech Signal Process. 23(6), 552-557 (1975).
[31]
Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, (1977).
[32]
Wendt, P.D., Coyle, E.J., Gallagher, N.C., Jr.: Stack filters. IEEE Trans. Acoust. Speech Signal Process. 34(4) (1986).
[33]
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5) (2007).
[34]
Xilinx: Virtex-5 FGPA Data Sheet: DC and Switching Characteristics. Xilinx Inc., v5.0 edn (2009a).
[35]
Xilinx: Virtex-5 FPGA User Guide. Xilinx Inc., v4.5 edn (2009b).
[36]
XtremeData: http://www.xtremedatainc.com/ (2009).
[37]
Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA (2002).

Cited By

View all
  • (2024)SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter ArrayIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343433235:10(1721-1734)Online publication date: 1-Oct-2024
  • (2023)Experimental Survey of FPGA-Based Monolithic Switches and a Novel Queue BalancerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.324458934:5(1621-1634)Online publication date: 1-May-2023
  • (2021)Hipernetch: High-Performance FPGA Network SwitchACM Transactions on Reconfigurable Technology and Systems10.1145/347705415:1(1-31)Online publication date: 30-Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 21, Issue 1
February 2012
163 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2012

Author Tags

  1. FPGA
  2. Hardware accelerators
  3. Sorting networks

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)5
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter ArrayIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343433235:10(1721-1734)Online publication date: 1-Oct-2024
  • (2023)Experimental Survey of FPGA-Based Monolithic Switches and a Novel Queue BalancerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.324458934:5(1621-1634)Online publication date: 1-May-2023
  • (2021)Hipernetch: High-Performance FPGA Network SwitchACM Transactions on Reconfigurable Technology and Systems10.1145/347705415:1(1-31)Online publication date: 30-Nov-2021
  • (2020)High-Performance FPGA Network Switch ArchitectureProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375299(76-85)Online publication date: 23-Feb-2020
  • (2019)DSL-Based Hardware Generation with ScalaACM Transactions on Reconfigurable Technology and Systems10.1145/335975413:1(1-23)Online publication date: 19-Dec-2019
  • (2019)A Reconfigurable Hardware Architecture for Principal Component AnalysisCircuits, Systems, and Signal Processing10.1007/s00034-018-0953-y38:5(2097-2113)Online publication date: 1-May-2019
  • (2017)Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAsACM Transactions on Embedded Computing Systems10.1145/309295016:4(1-26)Online publication date: 13-Jul-2017
  • (2017)A Study of Heterogeneous Computing Design Method based on Virtualization TechnologyACM SIGARCH Computer Architecture News10.1145/3039902.303991844:4(86-91)Online publication date: 11-Jan-2017
  • (2017)Data processing in the firmware systems for logic control based on search networksAutomation and Remote Control10.1134/S000511791701008878:1(100-112)Online publication date: 1-Jan-2017
  • (2017)Merging almost sorted sequences yields a 24-sorterInformation Processing Letters10.1016/j.ipl.2016.08.005118:C(17-20)Online publication date: 1-Feb-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media