More Web Proxy on the site http://driver.im/

research-article

Lowering the latency of data processing pipelines through FPGA based hardware acceleration

Authors:

Gustavo Alonso,

Laura Fogliarini,

Anthony Hock-Koon,

Pierre-Etienne MeletAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 13, Issue 1

Pages 71 - 85

https://doi.org/10.14778/3357377.3357383

Published: 01 September 2019 Publication History

Abstract

Web search engines often involve a complex pipeline of processing stages including computing, scoring, and ranking potential answers plus returning the sorted results. The latency of such pipelines can be improved by minimizing data movement, making stages faster, and merging stages. The throughput is determined by the stage with the smallest capacity and it can be improved by allocating enough parallel resources to each stage. In this paper we explore the possibility of employing hardware acceleration (an FPGA) as a way to improve the overall performance when computing answers to search queries. With a real use case as a baseline and motivation, we focus on accelerating the scoring function implemented as a decision tree ensemble, a common approach to scoring and classification in search systems. Our solution uses a novel decision tree ensemble implementation on an FPGA to: 1) increase the number of entries that can be scored per unit of time, and 2) provide a compact implementation that can be combined with previous stages. The resulting system, tested in Amazon F1 instances, significantly improves the quality of the search results and improves performance by two orders of magnitude over the existing CPU based solution.

References

[1]

J. A. Konstan and J. Riedl. Recommender Systems: From Algorithms to User Experience. User Modeling and User-Adapted Interaction, 22(1):101--123, 2012.

Digital Library

[2]

Alibaba Cloud. When Databases Meet FPGA Achieving 1 Million TPS with X-DB Heterogeneous Computing. https://www.alibabacloud.com/blog/.

[3]

G. Alonso, Z. Istvan, K. Kara, M. Owaida, and D. Sidler. DoppioDB 1.0: Machine Learning inside a Relational Engine. IEEE Data Engineering Bulletin, 42(2):19--31, 2019.

[4]

Altexsoft. Fraud Detection: How Machine Learning Systems Help Reveal Scams in Fintech, Healthcare, and eCommerce. Technical report, 2017. https://www.altexsoft.com/whitepapers/.

[5]

F. Amato, M. Barbareschi, V. Casola, and A. Mazzeo. An FPGA-Based Smart Classifier for Decision Support Systems. In Proceedings of the International Symposium on Intelligent Distributed Computing (IDC), pages 289--299, 2014.

[6]

X. Amatriain and J. Basilico. Recommender Systems in Industry: A Netflix Case Study; Recommender Systems Handbook. Springer, 2015.

[7]

Amazon AWS. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/f1/.

[8]

I. Arapakis, X. Bai, and B. B. Cambazoglu. Impact of Response Latency on User Behavior in Web Search. In Proceedings of the International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 103--112, 2014.

Digital Library

[9]

F. Bentayeb and J. Darmont. Decision Tree Modeling with Relational Views. In Proceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS), pages 423--431, 2007.

[10]

J. D. Brutlag, H. Hutchinson, and M. Stone. User Preference and Search Engine Latency. In JSM Proceedings, Qualify and Productivity Research Section., pages 1--13, 2008.

[11]

A. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, and et al. A Cloud-Scale Acceleration Architecture. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--13, 2016.

[12]

M. Chau and H. Chen. A Machine Learning Approach to Web Page Filtering Using Content and Structure Analysis. Decision Support Systems, 44(2):482--494, 2008.

Digital Library

[13]

T. Chen and C. Guestrin. XGBoost: A Scalable Tree Boosting System. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pages 785--794, 2016.

Digital Library

[14]

Y.-T. Chen, J. Cong, Z. Fang, J. Lei, and P. Wei. When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration. In Proceedings of the USENIX Conference on Hot Topics in Cloud Computing (HotCloud), pages 64--70, 2016.

[15]

E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, and et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 8--20, 2018.

[16]

D. Cook. Practical Machine Learning with H2O: Powerful, Scalable Techniques for Deep Learning and AI. O'Reilly Media, 2016.

[17]

T. G. Dietterich. Ensemble Methods in Machine Learning. In Proceedings International Workshop on Multiple Classifier Systems (MSC), pages 1--15, 2000.

[18]

M. Endrei, J. Ang, A. Arsanjani, S. Chua, P. Comte, P. Krogdahl, M. Luo, and T. Newling. Patterns: Service-Oriented Architecture and Web Services, volume 1. IBM Redbooks, 2004.

[19]

V. Ershov. CatBoost Enables Fast Gradient Boosting on Decision Trees Using GPUs, 2018. https://devblogs.nvidia.com/category/artificial-intelligence/.

[20]

B. V. Essen, C. Macaraeg, M. Gokhale, and R. Prenger. Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 232--239, 2012.

Digital Library

[21]

J. Gehrke, V. Ganti, R. Ramakrishnan, and W.-Y. Loh. BOAT---Optimistic Decision Tree Construction. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 169--180, 1999.

[22]

J. Gehrke, R. Ramakrishnan, and V. Ganti. RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. Data Mining and Knowledge Discovery, 4(2):127--162, 2000.

Digital Library

[23]

Z. He, D. Sidler, Z. István, and G. Alonso. A Flexible K-Means Operator for Hybrid Databases. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 368--3683, 2018.

[24]

M. Kainth, D. Pritsker, and H. S. Neoh. FPGA Inline Acceleration for Streaming Analytics. Technical report, 2018.

[25]

K. Kara, D. Alistarh, G. Alonso, O. Mutlu, and C. Zhang. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 160--167, 2017.

[26]

K. Kara, K. Eguro, C. Zhang, and G. Alonso. ColumnML: Column-store Machine Learning with On-the-fly Data Transformation. PVLDB, 12(4):348--361, 2018.

[27]

K. Kara, J. Giceva, and G. Alonso. FPGA-Based Data Partitioning. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 433--445, 2017.

Digital Library

[28]

B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell. Explaining the User Experience of Recommender Systems. User Modeling and User-Adapted Interaction, 22(4):441--504, 2012.

Digital Library

[29]

D. Koeplinger, M. Feldman, R. Prabhakar, Y. Zhang, S. Hadjis, R. Fiszel, T. Zhao, L. Nardi, A. Pedram, C. Kozyrakis, and K. Olukotun. Spatial: A Language and Compiler for Application Accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 296--311, 2018.

Digital Library

[30]

B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the Symposium on Operating Systems Principles (SOSP), pages 137--152, 2017.

Digital Library

[31]

Y. Liao, A. Rubinsteyn, R. Power, and J. Li. Learning Random Forests on the GPU. In Proceedings of Big learning: Advances in Algorithms and Data Management, pages 1--6, 2013.

[32]

D. Mahajan, J. K. Kim, J. Sacks, A. Ardalan, A. Kumar, and H. Esmaeilzadeh. In-RDBMS Hardware Acceleration of Advanced Analytics. PVLDB, 11(11):1317--1331, 2018.

[33]

N. Mehta. UltraScale Architecture: Highest Device Utilization, Performance and Scalability. Technical report, 2015. https://www.xilinx.com/support/documentation/white_papers/wp455-utilization.pdf.

[34]

A. Mitra, W. Najjar, and L. Bhuyan. Compiling PCRE to FPGA for Accelerating SNORT IDS. In Proceedings of the ACM/IEEE Symposium on Architecture for networking and communications systems (ANCS), pages 127--136, 2007.

Digital Library

[35]

A. Natekin and A. Knoll. Gradient Boosting Machines, a Tutorial. Frontiers in Neurorobotics, 7(Dec), 2013.

[36]

J. Oberg, K. Eguro, and R. Bittner. Random Decision Tree Body Part Recognition Using FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 330--337, 2012.

[37]

N. Oliver, R. Sharma, S. Chang, et al. A Reconfigurable Computing System Based on a Cache-Coherent Fabric. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), pages 80--85, 2011.

Digital Library

[38]

Oracle Inc. Oracle Data Mining Concepts. https://docs.oracle.com/database/121/DMCON/toc.htm.

[39]

J. Ouyang, W. Qi, Y. Wang, Y. Tu, J. Wang, and B. Jia. SDA: Software-Defined Accelerator For General-Purpose Big Data Analysis System. In Proceedings of the IEEE Hot Chips Symposium, pages 1--23, 2016.

[40]

M. Owaida and G. Alonso. Application partitioning on fpga clusters: Inference over decision tree ensembles. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 295--300, 2018.

[41]

M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A framework for hybrid cpu-fpga databases. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 211--218, 2017.

[42]

M. Owaida, H. Zhang, C. Zhang, and G. Alonso. Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 1--8, 2017.

[43]

B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. PVLDB, 2(2):1426--1437, 2009.

Digital Library

[44]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(nov):2825--2830, 2011.

[45]

A. Prost-Boucle, F. Pétrot, V. Leroy, and H. Alemdar. Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 10(3):1--21, 2017.

[46]

A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, and et. al. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 13--24, 2014.

[47]

Y. R. Qu and V. K. Prasanna. Scalable and Dynamically Updatable Lookup Engine for Decision-Trees on FPGA. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC), pages 1--6, 2014.

[48]

S. Rao, T. Nanditale, and V. Deshpande. GBM Inferencing on GPU. NVIDIA GPU Technology Conference, 2018. http://on-demand-gtc.gputechconf.com/gtc-quicklink/ghywWyq.

[49]

B. Ronak and S. A. Fahmy. Mapping for Maximum Performance on FPGA DSP Blocks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35(4):573--585, 2016.

[50]

F. Saqib, A. Dutta, and J. Plusquellic. Pipelined Decision Tree Classification Accelerator Implementation in FPGA (DT-CAIF). IEEE Transactions on Computers, 64(1):280--285, 2015.

[51]

K.-U. Sattler and O. Dunemann. SQL Database Primitives for Decision Tree Classifiers. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), pages 379--386, 2001.

Digital Library

[52]

T. Sharp. Implementing Decision Trees and Forests on a GPU. In Proceedings of the European Conference on Computer Vision (ECCV), pages 595--608, 2008.

[53]

D. Sidler, Z. István, M. Owaida, and G. Alonso. Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 403--415, 2017.

Digital Library

[54]

D. Sidler, M. Owaida, Z. István, K. Kara, and G. Alonso. doppioDB: A Hardware Accelerated Database. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 1659--1662, 2017.

Digital Library

[55]

A. Singhal, P. Sinha, and R. Pant. Use of Deep Learning in Modern Recommendation System: A Summary of Recent Works. International Journal of Computer Applications, 180(7):17--22, 2017.

[56]

B. Sukhwani, H. Min, M. Thoennes, P. Dube, B. Brezzo, S. Asaad, and D. E. Dillenberger. Database Analytics: A Reconfigurable-Computing Approach. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 19--29, 2014.

[57]

Q. Tang, M. Su, L. Jiang, J. Yang, and X. Bai. A Scalable Architecture for Low-Latency Market-Data Processing on FPGA. In Proceedings of the IEEE Symposium on Computers and Communication (ISCC), pages 597--603, 2016.

[58]

P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. PVLDB, 2(1):706--717, 2009.

Digital Library

[59]

Z. Wang, K. Kara, H. Zhang, G. Alonso, O. Mutlu, and C. Zhang. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-Precision Learning. PVLDB, 12(7):807--821, 2019.

Digital Library

[60]

Xilinx. Accelerating DNNs with Xilinx Alveo Accelerator Cards. Technical report, 2018. https://www.xilinx.com/support/documentation/white_papers/wp504-accel-dnns.pdf.

[61]

Xilinx. Introduction to FPGA Design with Vivado High-Level Synthesis. Technical report, 2019. https://www.xilinx.com/support/documentation/sw_manuals/ug998-vivado-intro-fpga-design-hls.pdf.

[62]

M. Zareapoor and P. Shamsolmoali. Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier. Procedia Computer Science, 48:679--685, 2015.

Cited By

Samanta AHatai IMal A(2024)A Survey on Hardware Accelerator Design of Deep Learning for Edge DevicesWireless Personal Communications: An International Journal10.1007/s11277-024-11443-2137:3(1715-1760)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1007/s11277-024-11443-2
Habich DKrause APietrzyk JFaerber CLehner W(2023)Simplicity done right for SIMDified query processing on CPU and FPGAProceedings of the 1st Workshop on Simplicity in Management of Data10.1145/3596225.3596229(1-5)Online publication date: 23-Jun-2023
https://dl.acm.org/doi/10.1145/3596225.3596229
Maschi FAlonso G(2023)The Difficult Balance Between Modern Hardware and Conventional CPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595314(53-62)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595314
Show More Cited By

Recommendations

Fingerprint image processing acceleration through run-time reconfigurable hardware

To the best of the authors' knowledge, this is the first brief that implements a complete automatic fingerprint-based authentication system (AFAS) application under a dynamically partial self-reconfigurable field-programmable gate array (FPGA). The main ...
Hardware Acceleration of SVM-Based Classifier for Melanoma Images
Image and Video Technology – PSIVT 2015 Workshops
Abstract
Melanoma is the most aggressive form of skin cancer which is responsible for the majority of skin cancer related deaths. Recently, image-based Computer Aided Diagnosis (CAD) systems are being increasingly used to help skin cancer specialists in ...
Automatic Compilation of C Applications for FPGA-Based Hardware Acceleration
PAAP '11: Proceedings of the 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming

Advancement in design tools is necessary to bridge the widening productivity gap between hardware design and software development in state-of-the-art Field Programmable Gate Arrays (FPGA). We present a design exploration framework that automatically ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 13, Issue 1

September 2019

85 pages

ISSN:2150-8097

Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2019

Published in PVLDB Volume 13, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
301
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Samanta AHatai IMal A(2024)A Survey on Hardware Accelerator Design of Deep Learning for Edge DevicesWireless Personal Communications: An International Journal10.1007/s11277-024-11443-2137:3(1715-1760)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1007/s11277-024-11443-2
Habich DKrause APietrzyk JFaerber CLehner W(2023)Simplicity done right for SIMDified query processing on CPU and FPGAProceedings of the 1st Workshop on Simplicity in Management of Data10.1145/3596225.3596229(1-5)Online publication date: 23-Jun-2023
https://dl.acm.org/doi/10.1145/3596225.3596229
Maschi FAlonso G(2023)The Difficult Balance Between Modern Hardware and Conventional CPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595314(53-62)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595314
Maschi FKorolija DAlonso GUstiugov DBruno RFonseca PGrot BBarbalace A(2023)Serverless FPGA: Work-In-ProgressProceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies10.1145/3592533.3592804(1-4)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3592533.3592804
Jiang WKorolija DAlonso GDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Data Processing with FPGAs on Modern ArchitecturesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589410(77-82)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589410
Wrembel R(2023)Data Integration Revitalized: From Data Warehouse Through Data Lake to Data MeshDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_1(3-18)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-39847-6_1
Hu YLi YTseng HIves ZBonifati AEl Abbadi A(2022)TCUDB: Accelerating Database with Tensor ProcessorsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517869(1360-1374)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517869
Wrembel R(2022)Data Integration, Cleaning, and Deduplication: Research Versus Industrial ProjectsInformation Integration and Web Intelligence10.1007/978-3-031-21047-1_1(3-17)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-21047-1_1
Kara KAlonso G(2020)PipeArchACM Transactions on Reconfigurable Technology and Systems10.1145/341846514:1(1-28)Online publication date: 5-Nov-2020
https://dl.acm.org/doi/10.1145/3418465
Bressana PZilberman NVucinic DSoulé R(2020)Trading Latency for Compute in the NetworkProceedings of the Workshop on Network Application Integration/CoDesign10.1145/3405672.3405807(35-40)Online publication date: 14-Aug-2020
https://dl.acm.org/doi/10.1145/3405672.3405807
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents