[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Lowering the latency of data processing pipelines through FPGA based hardware acceleration

Published: 01 September 2019 Publication History

Abstract

Web search engines often involve a complex pipeline of processing stages including computing, scoring, and ranking potential answers plus returning the sorted results. The latency of such pipelines can be improved by minimizing data movement, making stages faster, and merging stages. The throughput is determined by the stage with the smallest capacity and it can be improved by allocating enough parallel resources to each stage. In this paper we explore the possibility of employing hardware acceleration (an FPGA) as a way to improve the overall performance when computing answers to search queries. With a real use case as a baseline and motivation, we focus on accelerating the scoring function implemented as a decision tree ensemble, a common approach to scoring and classification in search systems. Our solution uses a novel decision tree ensemble implementation on an FPGA to: 1) increase the number of entries that can be scored per unit of time, and 2) provide a compact implementation that can be combined with previous stages. The resulting system, tested in Amazon F1 instances, significantly improves the quality of the search results and improves performance by two orders of magnitude over the existing CPU based solution.

References

[1]
J. A. Konstan and J. Riedl. Recommender Systems: From Algorithms to User Experience. User Modeling and User-Adapted Interaction, 22(1):101--123, 2012.
[2]
Alibaba Cloud. When Databases Meet FPGA Achieving 1 Million TPS with X-DB Heterogeneous Computing. https://www.alibabacloud.com/blog/.
[3]
G. Alonso, Z. Istvan, K. Kara, M. Owaida, and D. Sidler. DoppioDB 1.0: Machine Learning inside a Relational Engine. IEEE Data Engineering Bulletin, 42(2):19--31, 2019.
[4]
Altexsoft. Fraud Detection: How Machine Learning Systems Help Reveal Scams in Fintech, Healthcare, and eCommerce. Technical report, 2017. https://www.altexsoft.com/whitepapers/.
[5]
F. Amato, M. Barbareschi, V. Casola, and A. Mazzeo. An FPGA-Based Smart Classifier for Decision Support Systems. In Proceedings of the International Symposium on Intelligent Distributed Computing (IDC), pages 289--299, 2014.
[6]
X. Amatriain and J. Basilico. Recommender Systems in Industry: A Netflix Case Study; Recommender Systems Handbook. Springer, 2015.
[7]
Amazon AWS. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/f1/.
[8]
I. Arapakis, X. Bai, and B. B. Cambazoglu. Impact of Response Latency on User Behavior in Web Search. In Proceedings of the International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 103--112, 2014.
[9]
F. Bentayeb and J. Darmont. Decision Tree Modeling with Relational Views. In Proceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS), pages 423--431, 2007.
[10]
J. D. Brutlag, H. Hutchinson, and M. Stone. User Preference and Search Engine Latency. In JSM Proceedings, Qualify and Productivity Research Section., pages 1--13, 2008.
[11]
A. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, and et al. A Cloud-Scale Acceleration Architecture. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--13, 2016.
[12]
M. Chau and H. Chen. A Machine Learning Approach to Web Page Filtering Using Content and Structure Analysis. Decision Support Systems, 44(2):482--494, 2008.
[13]
T. Chen and C. Guestrin. XGBoost: A Scalable Tree Boosting System. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pages 785--794, 2016.
[14]
Y.-T. Chen, J. Cong, Z. Fang, J. Lei, and P. Wei. When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration. In Proceedings of the USENIX Conference on Hot Topics in Cloud Computing (HotCloud), pages 64--70, 2016.
[15]
E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, and et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 8--20, 2018.
[16]
D. Cook. Practical Machine Learning with H2O: Powerful, Scalable Techniques for Deep Learning and AI. O'Reilly Media, 2016.
[17]
T. G. Dietterich. Ensemble Methods in Machine Learning. In Proceedings International Workshop on Multiple Classifier Systems (MSC), pages 1--15, 2000.
[18]
M. Endrei, J. Ang, A. Arsanjani, S. Chua, P. Comte, P. Krogdahl, M. Luo, and T. Newling. Patterns: Service-Oriented Architecture and Web Services, volume 1. IBM Redbooks, 2004.
[19]
V. Ershov. CatBoost Enables Fast Gradient Boosting on Decision Trees Using GPUs, 2018. https://devblogs.nvidia.com/category/artificial-intelligence/.
[20]
B. V. Essen, C. Macaraeg, M. Gokhale, and R. Prenger. Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 232--239, 2012.
[21]
J. Gehrke, V. Ganti, R. Ramakrishnan, and W.-Y. Loh. BOAT---Optimistic Decision Tree Construction. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 169--180, 1999.
[22]
J. Gehrke, R. Ramakrishnan, and V. Ganti. RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. Data Mining and Knowledge Discovery, 4(2):127--162, 2000.
[23]
Z. He, D. Sidler, Z. István, and G. Alonso. A Flexible K-Means Operator for Hybrid Databases. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 368--3683, 2018.
[24]
M. Kainth, D. Pritsker, and H. S. Neoh. FPGA Inline Acceleration for Streaming Analytics. Technical report, 2018.
[25]
K. Kara, D. Alistarh, G. Alonso, O. Mutlu, and C. Zhang. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 160--167, 2017.
[26]
K. Kara, K. Eguro, C. Zhang, and G. Alonso. ColumnML: Column-store Machine Learning with On-the-fly Data Transformation. PVLDB, 12(4):348--361, 2018.
[27]
K. Kara, J. Giceva, and G. Alonso. FPGA-Based Data Partitioning. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 433--445, 2017.
[28]
B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell. Explaining the User Experience of Recommender Systems. User Modeling and User-Adapted Interaction, 22(4):441--504, 2012.
[29]
D. Koeplinger, M. Feldman, R. Prabhakar, Y. Zhang, S. Hadjis, R. Fiszel, T. Zhao, L. Nardi, A. Pedram, C. Kozyrakis, and K. Olukotun. Spatial: A Language and Compiler for Application Accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 296--311, 2018.
[30]
B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the Symposium on Operating Systems Principles (SOSP), pages 137--152, 2017.
[31]
Y. Liao, A. Rubinsteyn, R. Power, and J. Li. Learning Random Forests on the GPU. In Proceedings of Big learning: Advances in Algorithms and Data Management, pages 1--6, 2013.
[32]
D. Mahajan, J. K. Kim, J. Sacks, A. Ardalan, A. Kumar, and H. Esmaeilzadeh. In-RDBMS Hardware Acceleration of Advanced Analytics. PVLDB, 11(11):1317--1331, 2018.
[33]
N. Mehta. UltraScale Architecture: Highest Device Utilization, Performance and Scalability. Technical report, 2015. https://www.xilinx.com/support/documentation/white_papers/wp455-utilization.pdf.
[34]
A. Mitra, W. Najjar, and L. Bhuyan. Compiling PCRE to FPGA for Accelerating SNORT IDS. In Proceedings of the ACM/IEEE Symposium on Architecture for networking and communications systems (ANCS), pages 127--136, 2007.
[35]
A. Natekin and A. Knoll. Gradient Boosting Machines, a Tutorial. Frontiers in Neurorobotics, 7(Dec), 2013.
[36]
J. Oberg, K. Eguro, and R. Bittner. Random Decision Tree Body Part Recognition Using FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 330--337, 2012.
[37]
N. Oliver, R. Sharma, S. Chang, et al. A Reconfigurable Computing System Based on a Cache-Coherent Fabric. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), pages 80--85, 2011.
[38]
Oracle Inc. Oracle Data Mining Concepts. https://docs.oracle.com/database/121/DMCON/toc.htm.
[39]
J. Ouyang, W. Qi, Y. Wang, Y. Tu, J. Wang, and B. Jia. SDA: Software-Defined Accelerator For General-Purpose Big Data Analysis System. In Proceedings of the IEEE Hot Chips Symposium, pages 1--23, 2016.
[40]
M. Owaida and G. Alonso. Application partitioning on fpga clusters: Inference over decision tree ensembles. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 295--300, 2018.
[41]
M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A framework for hybrid cpu-fpga databases. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 211--218, 2017.
[42]
M. Owaida, H. Zhang, C. Zhang, and G. Alonso. Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 1--8, 2017.
[43]
B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. PVLDB, 2(2):1426--1437, 2009.
[44]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(nov):2825--2830, 2011.
[45]
A. Prost-Boucle, F. Pétrot, V. Leroy, and H. Alemdar. Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 10(3):1--21, 2017.
[46]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, and et. al. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 13--24, 2014.
[47]
Y. R. Qu and V. K. Prasanna. Scalable and Dynamically Updatable Lookup Engine for Decision-Trees on FPGA. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC), pages 1--6, 2014.
[48]
S. Rao, T. Nanditale, and V. Deshpande. GBM Inferencing on GPU. NVIDIA GPU Technology Conference, 2018. http://on-demand-gtc.gputechconf.com/gtc-quicklink/ghywWyq.
[49]
B. Ronak and S. A. Fahmy. Mapping for Maximum Performance on FPGA DSP Blocks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35(4):573--585, 2016.
[50]
F. Saqib, A. Dutta, and J. Plusquellic. Pipelined Decision Tree Classification Accelerator Implementation in FPGA (DT-CAIF). IEEE Transactions on Computers, 64(1):280--285, 2015.
[51]
K.-U. Sattler and O. Dunemann. SQL Database Primitives for Decision Tree Classifiers. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), pages 379--386, 2001.
[52]
T. Sharp. Implementing Decision Trees and Forests on a GPU. In Proceedings of the European Conference on Computer Vision (ECCV), pages 595--608, 2008.
[53]
D. Sidler, Z. István, M. Owaida, and G. Alonso. Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 403--415, 2017.
[54]
D. Sidler, M. Owaida, Z. István, K. Kara, and G. Alonso. doppioDB: A Hardware Accelerated Database. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 1659--1662, 2017.
[55]
A. Singhal, P. Sinha, and R. Pant. Use of Deep Learning in Modern Recommendation System: A Summary of Recent Works. International Journal of Computer Applications, 180(7):17--22, 2017.
[56]
B. Sukhwani, H. Min, M. Thoennes, P. Dube, B. Brezzo, S. Asaad, and D. E. Dillenberger. Database Analytics: A Reconfigurable-Computing Approach. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 19--29, 2014.
[57]
Q. Tang, M. Su, L. Jiang, J. Yang, and X. Bai. A Scalable Architecture for Low-Latency Market-Data Processing on FPGA. In Proceedings of the IEEE Symposium on Computers and Communication (ISCC), pages 597--603, 2016.
[58]
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. PVLDB, 2(1):706--717, 2009.
[59]
Z. Wang, K. Kara, H. Zhang, G. Alonso, O. Mutlu, and C. Zhang. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-Precision Learning. PVLDB, 12(7):807--821, 2019.
[60]
Xilinx. Accelerating DNNs with Xilinx Alveo Accelerator Cards. Technical report, 2018. https://www.xilinx.com/support/documentation/white_papers/wp504-accel-dnns.pdf.
[61]
Xilinx. Introduction to FPGA Design with Vivado High-Level Synthesis. Technical report, 2019. https://www.xilinx.com/support/documentation/sw_manuals/ug998-vivado-intro-fpga-design-hls.pdf.
[62]
M. Zareapoor and P. Shamsolmoali. Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier. Procedia Computer Science, 48:679--685, 2015.

Cited By

View all
  • (2024)A Survey on Hardware Accelerator Design of Deep Learning for Edge DevicesWireless Personal Communications: An International Journal10.1007/s11277-024-11443-2137:3(1715-1760)Online publication date: 19-Jul-2024
  • (2023)Simplicity done right for SIMDified query processing on CPU and FPGAProceedings of the 1st Workshop on Simplicity in Management of Data10.1145/3596225.3596229(1-5)Online publication date: 23-Jun-2023
  • (2023)The Difficult Balance Between Modern Hardware and Conventional CPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595314(53-62)Online publication date: 18-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 1
September 2019
85 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2019
Published in PVLDB Volume 13, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey on Hardware Accelerator Design of Deep Learning for Edge DevicesWireless Personal Communications: An International Journal10.1007/s11277-024-11443-2137:3(1715-1760)Online publication date: 19-Jul-2024
  • (2023)Simplicity done right for SIMDified query processing on CPU and FPGAProceedings of the 1st Workshop on Simplicity in Management of Data10.1145/3596225.3596229(1-5)Online publication date: 23-Jun-2023
  • (2023)The Difficult Balance Between Modern Hardware and Conventional CPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595314(53-62)Online publication date: 18-Jun-2023
  • (2023)Serverless FPGA: Work-In-ProgressProceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies10.1145/3592533.3592804(1-4)Online publication date: 8-May-2023
  • (2023)Data Processing with FPGAs on Modern ArchitecturesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589410(77-82)Online publication date: 4-Jun-2023
  • (2023)Data Integration Revitalized: From Data Warehouse Through Data Lake to Data MeshDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_1(3-18)Online publication date: 28-Aug-2023
  • (2022)TCUDB: Accelerating Database with Tensor ProcessorsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517869(1360-1374)Online publication date: 10-Jun-2022
  • (2022)Data Integration, Cleaning, and Deduplication: Research Versus Industrial ProjectsInformation Integration and Web Intelligence10.1007/978-3-031-21047-1_1(3-17)Online publication date: 28-Nov-2022
  • (2020)PipeArchACM Transactions on Reconfigurable Technology and Systems10.1145/341846514:1(1-28)Online publication date: 5-Nov-2020
  • (2020)Trading Latency for Compute in the NetworkProceedings of the Workshop on Network Application Integration/CoDesign10.1145/3405672.3405807(35-40)Online publication date: 14-Aug-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media