[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Deploying Computational Storage for HTAP DBMSs Takes More Than Just Computation Offloading

Published: 01 February 2023 Publication History

Abstract

Hybrid transactional/analytical processing (HTAP) would overload database systems. To alleviate performance interference between transactions and analytics, recent research pursues the potential of in-storage processing (ISP) using commodity computational storage devices (CSDs). However, in-storage query processing faces technical challenges in HTAP environments. Continuously updated data versions pose two hurdles: (1) data items keep changing, and (2) finding visible data versions incurs excessive data access in CSDs. Such access patterns dominate the cost of query processing, which may hinder the active deployment of CSDs.
This paper addresses the core issues by proposing an analytic offload engine (AIDE) that transforms engine-specific query execution logic into vendor-neutral computation through a canonical interface. At the core of AIDE are the canonical representation of vendor-specific data and the separate management of data locators. It enables any CSD to execute vendor-neutral operations on canonical tuples with separate indexes, regardless of host databases. To eliminate excessive data access, we prescreen the indexes before offloading; thus, host-side prescreening can obviate the need for running costly version searching in CSDs and boost analytics. We implemented our prototype for PostgreSQL and MyRocks, demonstrating that AIDE supports efficient ISP for two databases using the same FPGA logic. Evaluation results show that AIDE improves query latency up to 42× on PostgreSQL and 34× on MyRocks.

References

[1]
2020. sysbench-1.0.20. Available at https://github.com/akopytov/sysbench.
[2]
2022. HammerDB Version 4.4. Available at https://github.com/TPC-Council/HammerDB/releases/tag/v4.4.
[3]
2022. NTT OSS Center DBMS Development and Support Team: pg_hint_plan-1.4. Available at https://github.com/ossc-db/pg_hint_plan.
[4]
2022. Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393): Vitis Analyzer. Available at https://docs.xilinx.com/r/en-US/ug1393-vitis-application-acceleration/Using-the-Vitis-Analyzer.
[5]
Amazon Web Services, Inc. 2022. What Is AWS Glue? https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html.
[6]
Oracle Corporation and/or its affiliates. 2022. MySQL 8.0 Reference Manual: 15.3 InnoDB Multi-Versioning. https://dev.mysql.com/doc/refman/8.0/en/innodb-multi-versioning.html
[7]
Oracle Corporation and/or its affiliates. 2022. Oracle Database Concept: 9 Data Concurrency and Consistency. https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/data-concurrency-and-consistency.html
[8]
Philip A. Bernstein and Nathan Goodman. 1982. Concurrency Control Algorithms for Multiversion Database Systems. In Proceedings of the First ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (Ottawa, Canada) (PODC '82). Association for Computing Machinery, New York, NY, USA, 209--215.
[9]
Philip A. Bernstein and Nathan Goodman. 1983. Multiversion Concurrency Control---Theory and Algorithms. ACM Trans. Database Syst. 8, 4 (Dec. 1983), 465--483.
[10]
Wei Cao, Yang Liu, Zhushi Cheng, Ning Zheng, Wei Li, Wenjie Wu, Linqiang Ouyang, Peng Wang, Yijing Wang, Ray Kuan, et al. 2020. POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 29--41.
[11]
Citus Data. 2020. Citusdata: Tools for running CH-benCHmark with HammerDB. https://github.com/citusdata/ch-benchmark.
[12]
Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, Kai-Uwe Sattler, Michael Seibold, Eric Simon, and Florian Waas. 2011. The Mixed Workload CH-BenCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems (Athens, Greece) (DBTest '11). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages.
[13]
Umur Cubukcu, Ozgun Erdogan, Sumedh Pathak, Sudhakar Sannakkayala, and Marco Slot. 2021. Citus: Distributed PostgreSQL for Data-Intensive Applications. In Proceedings of the 2021 International Conference on Management of Data (Xi'an, Shaanxi, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 2490--2502.
[14]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server's Memory-Optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 1243--1254.
[15]
Jaeyoung Do, Yang-Suk Kee, Jignesh M Patel, Chanik Park, Kwanghyun Park, and David J DeWitt. 2013. Query processing on smart ssds: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1221--1230.
[16]
Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2012. SAP HANA Database: Data Management for Modern Business Applications. SIGMOD Rec. 40, 4 (Jan. 2012), 45--51.
[17]
Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. 2006. GPUTeraSort: high performance graphics co-processor sorting for large database management. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 325--336.
[18]
Carnegie Mellon University Database Group. 2020. Peloton: The Self-driving Database Management System. https://pelotondb.io/
[19]
Carnegie Mellon University Database Group. 2020. Terrier: The Self-driving Database Management System. https://github.com/cmu-db/terrier
[20]
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 511--524.
[21]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. 2020. TiDB: A Raft-Based HTAP Database. Proc. VLDB Endow. 13, 12 (aug 2020), 3072--3084.
[22]
Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind Arvind, and Sungjin Lee. 2020. PinK: High-Speed in-Storage Key-Value Store with Bounded Tails. USENIX Association, USA.
[23]
MemSQL Inc. 2022. MemSQL. https://www.memsql.com/
[24]
Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A High-Performance Database System Leveraging in-Storage Computing. Proc. VLDB Endow. 9, 12 (aug 2016), 924--935.
[25]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE '11). IEEE Computer Society, USA, 195--206.
[26]
Jongbin Kim, Kihwang Kim, Hyunsoo Cho, Jaeseon Yu, Sooyong Kang, and Hyungsoo Jung. 2021. Rethink the Scan in MVCC Databases. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 938--950.
[27]
Jongbin Kim, Jaeseon Yu, Jaechan Ahn, Sooyong Kang, and Hyungsoo Jung. 2022. Diva: Making MVCC Systems HTAP-Friendly. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 49--64.
[28]
Per-Åke Larson, Adrian Birka, Eric N. Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-Time Analytical Processing with SQL Server. Proc. VLDB Endow. 8, 12 (aug 2015), 1740--1751.
[29]
Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 137--152.
[30]
Tianyu Li, Matthew Butrovich, Amadou Ngom, Wan Shen Lim, Wes McKinney, and Andrew Pavlo. 2020. Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats. Proc. VLDB Endow. 14, 4 (Dec. 2020), 534--546.
[31]
Shengwen Liang, Ying Wang, Cheng Liu, Huawei Li, and Xiaowei Li. 2019. InSDLA: An In-SSD Deep Learning Accelerator for Near-Data Processing. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 173--179.
[32]
Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. Association for Computing Machinery, New York, NY, USA, 2530--2542.
[33]
Microsoft. 2022. Microsoft SQL Server. https://www.microsoft.com/en-us/sql-server/
[34]
NuoDB. 2022. NuoDB. https://nuodb.com/
[35]
Christos H. Papadimitriou and Paris C. Kanellakis. 1982. On Concurrency Control by Multiple Versions. In Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Los Angeles, California) (PODS '82). Association for Computing Machinery, New York, NY, USA, 76--82.
[36]
Boris Pismenny, Liran Liss, Adam Morrison, and Dan Tsafrir. 2022. The Benefits of General-Purpose on-NIC Memory. Association for Computing Machinery, New York, NY, USA, 1130--1147.
[37]
D. P. Reed. 1978. Naming and Synchronization in a Decentralized Computer System. Technical Report. USA.
[38]
Erik Riedel, Christos Faloutsos, and David Nagle. 2000. Active disk architecture for databases. Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.
[39]
Rubao Lee, Minghong Zhou, Chi Li, Shenggang Hu, Jianping Teng, Dongyang Li, and Xiaodong Zhang. 2021. The Art of Balance: A RateupDB™ Experience of Building a CPU/GPU Hybrid Database Product. Proc. VLDB Endow. 14, 12 (Aug. 2021), 2999--3013.
[40]
Henry N. Schuh, Weihao Liang, Ming Liu, Jacob Nelson, and Arvind Krishnamurthy. 2021. Xenic: SmartNIC-Accelerated Distributed Transactions. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 740--755.
[41]
Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, Arizona, USA) (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 731--742.
[42]
Malcolm Singh and Ben Leonhardi. 2011. Introduction to the IBM Netezza warehouse appliance. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research. 385--386.
[43]
The PostgreSQL Global Development Group. 2022. PostgreSQL: Documentation for PostgreSQL 12: Chapter 15. Parallel Query. https://www.postgresql.org/docs/12/parallel-query.html.
[44]
The PostgreSQL Global Development Group. 2022. PostgreSQL: Documentation for PostgreSQL 12: Chapter 29.3. Asynchronous Commit. https://www.postgresql.org/docs/12/wal-async-commit.html.
[45]
Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-Efficient, in-Situ Data Analytics on Extreme-Scale Machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (San Jose, CA) (FAST'13). USENIX Association, USA, 119--132.
[46]
Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, and Andreas Koch. 2020. NKV: Near-Data Processing with KV-Stores on Native Computational Storage. In Proceedings of the 16th International Workshop on Data Management on New Hardware (Portland, Oregon) (DaMoN '20). Association for Computing Machinery, New York, NY, USA, Article 10, 11 pages.
[47]
Tobias Vinçon, Christian Knödler, Leonardo Solis-Vasquez, Arthur Bernhardt, Sajjad Tamimi, Lukas Weber, Florian Stock, Andreas Koch, and Ilia Petrov. 2022. Near-Data Processing in Database Systems on Native Computational Storage under HTAP Workloads. Proc. VLDB Endow. 15, 10 (sep 2022), 1991--2004.
[48]
Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2021. Evaluating List Intersection on SSDs for Parallel I/O Skipping. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1823--1828.
[49]
Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD In-Storage Computing for List Intersection. In Proceedings of the 12th International Workshop on Data Management on New Hardware (San Francisco, California) (DaMoN '16). Association for Computing Machinery, New York, NY, USA, Article 4, 7 pages.
[50]
Satoru Watanabe, Kazuhisa Fujimoto, Yuji Saeki, Yoshifumi Fujikawa, and Hiroshi Yoshino. 2019. Column-oriented database acceleration using FPGAs. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 686--697.
[51]
Ronald Weiss. 2012. A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores (2012).
[52]
Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An intelligent storage engine with support for advanced sql offloading. Proceedings of the VLDB Endowment 7, 11 (2014), 963--974.
[53]
Xilinx. 2021. SmartSSD Computational Storage Drive. https://www.xilinx.com/applications/data-center/computational-storage/smartssd.html
[54]
Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 386--399.
[55]
Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 386--399.
[56]
Shuotao Xu, Sungjin Lee, Sang Woo Jun, Ming Liu, Jamey Hicks, and Arvind. 2016. BlueCache: A Scalable Distributed Flash-based Key-value Store. Proc. VLDB Endow. 10 (2016), 301--312.
[57]
Haichang Yang, Zhaoshi Li, Jiawei Wang, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2021. HeteroKV: A Scalable Line-rate Key-Value Store on Heterogeneous CPU-FPGA Platforms. In 2021 Design, Automation Test in Europe Conference Exhibition (DATE). 834--837.
[58]
Jiacheng Yang, Ian Rae, Jun Xu, Jeff Shute, Zhan Yuan, Kelvin Lau, Qiang Zeng, Xi Zhao, Jun Ma, Ziyang Chen, Yuan Gao, Qilin Dong, Junxiong Zhou, Jeremy Wood, Goetz Graefe, Jeff Naughton, and John Cieslewicz. 2020. F1 Lightning: HTAP as a Service. Proc. VLDB Endow. 13, 12 (aug 2020), 3313--3325.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 6
February 2023
393 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2023
Published in PVLDB Volume 16, Issue 6

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 313
    Total Downloads
  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)6
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media