[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3514221.3526055acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Cloud-Native Transactions and Analytics in SingleStore

Published: 11 June 2022 Publication History

Abstract

The last decade has seen a remarkable rise in specialized database systems. Systems for transaction processing, data warehousing, time series analysis, full-text search, data lakes, in-memory caching, document storage, queuing, graph processing, and geo-replicated operational workloads are now available to developers. A belief has taken hold that a single general-purpose database is not capable of running varied workloads at a reasonable cost with strong performance, at the level of scale and concurrency people demand today. There is value in specialization, but the complexity and cost of using multiple specialized systems in a single application environment is becoming apparent. This realization is driving developers and IT decision makers to seek databases capable of powering a broader set of use cases when looking to adopt a new database. Hybrid transaction and analytical (HTAP) databases have been developed to try to tame some of this chaos.
In this paper we introduce SinglestoreDB (S2DB), formerly called MemSQL, a distributed general-purpose SQL database designed to have the versatility to run both operational and analytical workloads with good performance. It was one of the earliest distributed HTAP databases on the market. It can scale out to efficiently utilize 100s of hosts, 1000s of cores and 10s of TBs of RAM while still providing a user experience similar to a single-host SQL database such as Oracle or SQL Server. S2DB's unified table storage runs both transactional and analytical workloads efficiently with operations like fast scans, seeks, filters, aggregations, and updates. This is accomplished through a combination of rowstore, columnstore and vectorization techniques, ability to seek efficiently into a columnstore using secondary indexes, and using in-memory rowstore buffers for recently modified data. It avoids design simplifications (i.e., only supporting batch loading, or limiting the query surface area to particular patterns of queries) that sacrifice the ability to run a broad set of workloads.
Today, after 10 years of development, S2DB runs demanding production workloads for some of the world's largest financial, telecom, high-tech, and energy companies. These customers drove the product towards a database capable of running a breadth of workloads across their organizations, often replacing two or three different databases with S2DB. The design of S2DB's storage, transaction processing, and query processing were developed to maintain this versatility.

References

[1]
AWS Cloud Databases (2021). https://aws.amazon.com/products/databases/
[2]
Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, USA, 2--11.
[3]
Amazon S3 (2021). https://aws.amazon.com/s3/
[4]
Amazon EC2 (2021). https://aws.amazon.com/ec2
[5]
A. Skidanov, A. J. Papito and A. Prout, "A column store engine for real-time streaming analytics," 2016 IEEE 32nd International Conference on Data Engineering (ICDE), 2016, pp. 1287--1297.
[6]
A. Prout, The Story Behind SingleStore's Skiplist Indexes (2019). https://www.singlestore.com/blog/what-is-skiplist-why-skiplist-index-for-memsql/
[7]
Michal Nowakiewicz, Eric Boutin, Eric Hanson, Robert Walzer, and Akash Katipally. 2018. BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 1447--1459.
[8]
O'Neil, P., Cheng, E., Gawlick, D. et al. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 351--385 (1996). https://doi.org/10.1007/s002360050048
[9]
Peter A Boncz, Marcin Zukowski, and Niels Nes. MonetDB/X100: Hyper-Pipelining Query Execution, Proc. of the 2005 CIDR Conf.
[10]
Dong, S., Callaghan, M.D., Galanis, L., Borthakur, D., Savor, T., & Strum, M. (2017). Optimizing Space Amplification in RocksDB. CIDR.
[11]
Chattopadhyay, B., Dutta, P., Liu, W., Tinn, O., McCormick, A., Mokashi, A., Harvey, P., Gonzalez, H., Lomax, D., Mittal, S., Ebenstein, R., Mikhaylin, N., Lee, H., Zhao, X., Xu, T., Perez, L., Shahmohammadi, F., Bui, T., Mckay, N., Aya, S., Lychagina, V., & Elliott, B. (2019). Procella: Unifying serving and analytical data at YouTube. Proc. VLDB Endow., 12, 2022--2034.
[12]
Luo, C., & Carey, M.J. (2019). LSM-based storage techniques: a survey. The VLDB Journal, 29, 393--418.
[13]
Indexing with SSTable attached secondary indexes (SASI). https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndexConcept.html
[14]
P. Larson, A. Birka, E. N. Hanson, W. Huang, M. Nowakiewicz, and V. Papadimos. Real-Time Analytical Processing with SQL Server. PVLDB, 8(12):1740--1751, 2015.
[15]
InnoDB Clustered and Secondary Indexes https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
[16]
Lipcon, Todd et al. Kudu : Storage for Fast Analytics on Fast Data ." (2016).
[17]
Bench marking code. https://github.com/memsql/benchmarks-tpc
[18]
DB-Engines Ranking. https://db-engines.com/en/ranking
[19]
Singlestore Unofficial TPC Benchmarking. https://www.singlestore.com/blog/memsql-tpc-benchmarks/
[20]
Kemper, A., & Neumann, T. (2011). HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. 2011 IEEE 27th International Conference on Data Engineering, 195--206.
[21]
Amazon RDS DB instance storage. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html
[22]
T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, et al. Oracle Database In-Memory: A dual format in-memory database. In ICDE, pages 1253--1258. IEEE Computer Society, 2015.
[23]
Huang, D., Liu, Q., Cui, Q., Fang, Z., Ma, X., Xu, F., Shen, L., Tang, L., Zhou, Y., Huang, M., Wei, W., Liu, C., Zhang, J., Li, J., Wu, X., Song, L., Sun, R., Yu, S., Zhao, L., Cameron, N., Pei, L., & Tang, X. (2020). TiDB: A Raft-based HTAP Database. Proc. VLDB Endow., 13, 3072--3084.
[24]
J. Lee, S. Moon, K. H. Kim, D. H. Kim, S. K. Cha, W. Han, C. G. Park, H. J. Na, and J. Lee. Parallel Replication across Formats in SAP HANA for Scaling Out Mixed OLTP/OLAP Workloads. PVLDB, 10(12):1598--1609, 2017.
[25]
Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., & Kemper, A. (2016). Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. Proceedings of the 2016 International Conference on Management of Data.
[26]
Dageville, B., Cruanes, T., Zukowski, M., Antonov, V.N., Avanes, A., Bock, J., Claybaugh, J., Engovatov, D., Hentschel, M., Huang, J., Lee, A.W., Motivala, A., Munir, A., Pelley, S., Povinec, P., Rahn, G., Triantafyllis, S., & Unterbrunner, P. (2016). The Snowflake Elastic Data Warehouse. Proceedings of the 2016 International Conference on Management of Data.
[27]
Ippokratis Pandis: The evolution of Amazon Redshift. Proc. VLDB Endow. 14(12): 3162--3163 (2021)
[28]
Verbitski, A., Gupta, A., Saha, D., Brahmadesam, M., Gupta, K.K., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvili, T., & Bao, X. (2017). Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. Proceedings of the 2017 ACM International Conference on Management of Data.
[29]
Shekar, K. & Bhoomeshwar, B. (2020). Evolving Database for New Generation Big Data Applications. 10.1007/978--981--15--1632-0_26.
[30]
Armbrust, M., Das, T., Paranjpye, S., Xin, R., Zhu, S., Ghodsi, A., Yavuz, B., Murthy, M., Torres, J., Sun, L., Boncz, P.A., Mokhtar, M., Hovell, H.V., Ionescu, A., Luszczak, A., Switakowski, M., Ueshin, T., Li, X., Szafranski, M., Senster, P., & Zaharia, M. (2020).Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proceedings of the VLDB Endowment, 13, 3411 - 3424.
[31]
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E.J., O'Neil, P.E., Rasin, A., Tran, N., & Zdonik, S.B. (2005). C-Store: A Column-oriented DBMS. VLDB.
[32]
Optimizing Schema Design for Cloud Spanner.https://cloud.google.com/spanner/docs/whitepapers/optimizing-schema-design
[33]
Avinash Lakshman, Prashant Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2): 35--40 (2010)
[34]
Chang, F.W., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., & Gruber, R.E. (2008). Bigtable: A Distributed Storage System for Structured Data. TOCS.
[35]
Stonebraker, M. (1985). The Case for Shared Nothing. IEEE Database Eng. Bull.
[36]
WiredTiger: Schema, Columns, Column Groups, Indices and Projections.https://source.wiredtiger.com/2.5.2/schema.html
[37]
Luo, C., & Carey, M.J. (2019). LSM-based storage techniques: a survey. The VLDB Journal, 29, 393--418.
[38]
Barber, Ronald & Garcia-Arellano, Christian & Grosman, Ronen & Lohman, Guy & Mohan, C. & Mueller, Rene & Pirahesh, Hamid & Raman, Vijayshankar & Sidle, Richard & Storm, Adam & Tian, Yuanyuan & Tozun, Pinar & Wu, Yingjun. (2019). WiSer: A Highly Available HTAP DBMS for IoT Applications.
[39]
E. Hanson, SingleStore's Patented Universal Storage, 2021. https://www.singlestore.com/blog/singlestore-universal-storage-episode-4/
[40]
Lattner, C., & Adve, V.S. (2004). LLVM: a compilation framework for lifelong program analysis & transformation. International Symposium on Code Generation and Optimization, 2004. CGO 2004., 75--86.
[41]
Neumann, T. (2011). Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow., 4, 539--550.
[42]
Özcan, F., Tian, Y., & Tözün, P. (2017). Hybrid Transactional/Analytical Processing: A Survey. Proceedings of the 2017 ACM International Conference on Management of Data.
[43]
Sanders, P., & Transier, F. (2007). Intersection in Integer Inverted Indices. ALENEX.
[44]
Amazon Elastic Block Store (EBS). https://aws.amazon.com/ebs/
[45]
Amazon S3 FAQs. https://aws.amazon.com/s3/faqs/
[46]
Chen, Jack & Jindel, Samir & Walzer, Robert & Sen, Rajkumar & Jimsheleishvilli, Nika & Andrews, Michael. (2016). The MemSQL query optimizer: a modern optimizer for real-time analytics in a distributed database. Proceedings of the VLDB Endowment. 9. 1401--1412. 10.14778/3007263.3007277.
[47]
Performance comparison of HeatWave with Snowflake, Amazon Redshift, Amazon Aurora, and Amazon RDS for MySQL. https://www.oracle.com/mysql/heatwave/performance
[48]
Arora, Vaibhav & Nawab, Faisal & Agrawal, Divyakant & Abbadi, Amr. (2017). Janus: A Hybrid Scalable Multi-Representation Cloud Datastore. IEEE Transactions on Knowledge and Data Engineering. PP. 1--1. 10.1109/TKDE.2017.2773607.
[49]
Eric Boutin, How Careful Engineering Led to Processing Over a Trillion Rows Per Second (2018). https://www.singlestore.com/blog/how-to-process-trillion-rows-per-second-ad-hoc-analytic-queries/
[50]
G. LaLonde, J. Cheng, S. Wang, TPC Benchmarking Results (2021). https://www.singlestore.com/blog/tpc-benchmarking-results/

Cited By

View all
  • (2024)GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage DisaggregationProceedings of the VLDB Endowment10.14778/3685800.368580617:12(3786-3798)Online publication date: 8-Nov-2024
  • (2024)SingleStore-V: An Integrated Vector Database System in SingleStoreProceedings of the VLDB Endowment10.14778/3685800.368580517:12(3772-3785)Online publication date: 8-Nov-2024
  • (2024)Mammoths are Slow: The Overlooked Transactions of Graph DataProceedings of the VLDB Endowment10.14778/3636218.363624117:4(904-911)Online publication date: 5-Mar-2024
  • Show More Cited By

Index Terms

  1. Cloud-Native Transactions and Analytics in SingleStore

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
    June 2022
    2597 pages
    ISBN:9781450392495
    DOI:10.1145/3514221
    This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2022

    Check for updates

    Author Tags

    1. databases
    2. distributed systems
    3. separation of storage and compute
    4. transactions and analytics

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,828
    • Downloads (Last 6 weeks)181
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage DisaggregationProceedings of the VLDB Endowment10.14778/3685800.368580617:12(3786-3798)Online publication date: 8-Nov-2024
    • (2024)SingleStore-V: An Integrated Vector Database System in SingleStoreProceedings of the VLDB Endowment10.14778/3685800.368580517:12(3772-3785)Online publication date: 8-Nov-2024
    • (2024)Mammoths are Slow: The Overlooked Transactions of Graph DataProceedings of the VLDB Endowment10.14778/3636218.363624117:4(904-911)Online publication date: 5-Mar-2024
    • (2024)Sharing Queries with Nonequivalent User-defined Aggregate FunctionsACM Transactions on Database Systems10.1145/364913349:2(1-46)Online publication date: 10-Apr-2024
    • (2024)Vector Database Management Techniques and SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654691(597-604)Online publication date: 9-Jun-2024
    • (2024)Cloud-Native Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339750836:12(7772-7791)Online publication date: Dec-2024
    • (2024)HTAP Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338969336:11(6410-6429)Online publication date: Nov-2024
    • (2024)Are There Fundamental Limitations in Supporting Vector Data Management in Relational Databases? A Case Study of PostgreSQL2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00280(3640-3653)Online publication date: 13-May-2024
    • (2024)A prefetching indexing scheme for in-memory database systemsFuture Generation Computer Systems10.1016/j.future.2024.03.012156:C(179-190)Online publication date: 18-Jul-2024
    • (2024)Survey of vector database management systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00864-x33:5(1591-1615)Online publication date: 15-Jul-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media