[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

An Empirical Evaluation of Columnar Storage Formats

Published: 01 October 2023 Publication History

Abstract

Columnar storage is a core component of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide extensive support to open-source storage formats such as Parquet and ORC to facilitate cross-platform data sharing. But these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both the hardware and workload landscapes have changed.
In this paper, we revisit the most widely adopted open-source columnar storage formats (Parquet and ORC) with a deep dive into their internals. We designed a benchmark to stress-test the formats' performance and space efficiency under different workload configurations. From our comprehensive evaluation of Parquet and ORC, we identify design decisions advantageous with modern hardware and real-world data distributions. These include using dictionary encoding by default, favoring decoding speed over compression ratio for integer encoding algorithms, making block compression optional, and embedding finer-grained auxiliary data structures. We also point out the inefficiencies in the format designs when handling common machine learning workloads and using GPUs for decoding. Our analysis identified important considerations that may guide future formats to better fit modern technology trends.

References

[1]
2016. File Format Benchmark - Avro, JSON, ORC & Parquet. https://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet.
[2]
2016. Format Wars: From VHS and Beta to Avro and Parquet. http://www.svds.com/dataformats/.
[3]
2016. Inside Capacitor, BigQuery's next-generation columnar storage format. https://cloud.google.com/blog/products/bigquery/inside-capacitor-bigquerys-next-generation-columnar-storage-format.
[4]
2017. Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation? http://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-parquet-and-orc-do-we.html.
[5]
2017. Some comments to Daniel Abadi's blog about Apache Arrow. https://wesmckinney.com/blog/arrow-columnar-abadi/.
[6]
2022. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets.php. Accessed: 2022-09-22.
[7]
2023. Amazon S3. https://aws.amazon.com/s3/.
[8]
2023. Apache Arrow. https://arrow.apache.org/.
[9]
2023. Apache Arrow Dataset API. https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html.
[10]
2023. Apache Avro. https://avro.apache.org/.
[11]
2023. Apache Carbondata. https://carbondata.apache.org/.
[12]
2023. Apache Hadoop. https://hadoop.apache.org/.
[13]
2023. Apache Hive. https://hive.apache.org/.
[14]
2023. Apache Hudi. https://hudi.apache.org/.
[15]
2023. Apache Iceberg. https://iceberg.apache.org/.
[16]
2023. Apache Impala. https://impala.apache.org/.
[17]
2023. Apache ORC. https://orc.apache.org/.
[18]
2023. Apache Parquet. https://parquet.apache.org/.
[19]
2023. Apache Presto. https://prestodb.io/.
[20]
2023. Apache Spark. https://spark.apache.org/.
[21]
2023. Arrow C++ and Parquet C++. https://github.com/apache/arrow/tree/main/cpp.
[22]
2023. AutoFaiss. https://github.com/criteo/autofaiss.
[23]
2023. AutoFAISS build index API. https://criteo.github.io/autofaiss/API/_autosummary/autofaiss.external.quantize.build_index.html. Accessed: 2023-07-17.
[24]
2023. Azure Blob Storage. https://azure.microsoft.com/en-us/services/storage/blobs/.
[25]
2023. BP5. https://adios2.readthedocs.io/en/latest/engines/engines.html#bp5.
[26]
2023. Chroma. https://github.com/chroma-core/chroma/.
[27]
2023. ClickHouse. https://clickhouse.com/.
[28]
2023. ClickHouse Example Datasets. https://clickhouse.com/docs/en/getting-started/example-datasets.
[29]
2023. Dremio. https://www.dremio.com//.
[30]
2023. EDGAR Log File Data Sets. https://www.sec.gov/about/data/edgar-log-file-data-sets.html.
[31]
2023. GeoNames Dataset. http://www.geonames.org/.
[32]
2023. Google BigQuery. https://cloud.google.com/bigquery.
[33]
2023. Google Cloud Storage. https://cloud.google.com/storage.
[34]
2023. Google snappy. http://google.github.io/snappy/.
[35]
2023. Hugging Face Datasets Server. https://huggingface.co/docs/datasets-server/quick_start#access-parquet-files. Accessed: 2023-07-09.
[36]
2023. image-parquet. https://discuss.huggingface.co/t/image-dataset-best-practices/13974.
[37]
2023. IMDb Datasets. https://www.imdb.com/interfaces/.
[38]
2023. InfluxData. https://www.influxdata.com/.
[39]
2023. NetCDF. https://www.unidata.ucar.edu/software/netcdf/.
[40]
2023. NVIDIA Nsight Compute. https://developer.nvidia.com/nsight-compute.
[41]
2023. ORC C++. https://github.com/apache/orc/tree/main/c%2B%2B.
[42]
2023. Parquet Bloom Filter Jira Discussion. https://issues.apache.org/jira/browse/PARQUET-41.
[43]
2023. Pinecone. https://www.pinecone.io/.
[44]
2023. Protocol Buffers. https://developers.google.com/protocol-buffers/.
[45]
2023. Public BI benchmark. https://github.com/cwida/public_bi_benchmark.
[46]
2023. Querying Parquet with Millisecond Latency. https://www.influxdata.com/blog/querying-parquet-millisecond-latency/.
[47]
2023. RAPIDS. https://rapids.ai/.
[48]
2023. Samsung 980 PRO 4.0 NVMe SSD. https://www.samsung.com/us/computing/memory-storage/solid-state-drives/980-pro-pcie-4-0-nvme-ssd-1tb-mz-v8p1t0b-am/. Accessed: 2023-02-21.
[49]
2023. SequenceFile. https://cwiki.apache.org/confluence/display/HADOOP2/SequenceFile.
[50]
2023. The DWRF Format. https://github.com/facebookarchive/hive-dwrf.
[51]
2023. Vector Data Lakes. https://www.databricks.com/dataaisummit/session/vector-data-lakes/. Accessed: 2023-07-28.
[52]
2023. Yelp Open Dataset. https://www.yelp.com/dataset/.
[53]
2023. Zarr. https://zarr.dev/.
[54]
2023. Zstandard. https://github.com/facebook/zstd.
[55]
Daniel Abadi, Peter Boncz, Stavros Harizopoulos, Stratos Idreos, Samuel Madden, et al. 2013. The design and implementation of modern column-oriented database systems. Foundations and Trends® in Databases 5, 3 (2013), 197--280.
[56]
Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 671--682.
[57]
Azim Afroozeh and Peter Boncz. 2023. The FastLanes Compression Layout: Decoding> 100 Billion Integers per Second with Scalar Code. Proceedings of the VLDB Endowment 16, 9 (2023), 2132--2144.
[58]
Ankur Agiwal and Kevin Lai et al. 2021. Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google. Proceedings of the VLDB Endowment (PVLDB) 14 (12) (2021), 2986--2998.
[59]
Anastassia Ailamaki, David J DeWitt, Mark D Hill, and Marios Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB, Vol. 1. 169--180.
[60]
Wail Y. Alkowaileet and Michael J. Carey. 2022. Columnar Formats for Schemaless LSM-Based Document Stores. Proc. VLDB Endow. 15, 10 (sep 2022), 2085--2097.
[61]
Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Alicja Łuszczak, et al. 2020. Delta lake: high-performance ACID table storage over cloud object stores. Proceedings of the VLDB Endowment 13, 12 (2020), 3411--3424.
[62]
Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. 2021. Lake-house: a new generation of open platforms that unify data warehousing and advanced analytics. In Proceedings of CIDR. 8.
[63]
Haoqiong Bian and Anastasia Ailamaki. 2022. Pixels: An Efficient Column Store for Cloud Data Lakes. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3078--3090.
[64]
Haoqiong Bian, Ying Yan, Wenbo Tao, Liang Jeff Chen, Yueguo Chen, Xiaoyong Du, and Thomas Moscibroda. 2017. Wide table layout optimization based on column ordering and duplication. In Proceedings of the 2017 ACM International Conference on Management of Data. 299--314.
[65]
Peter Boncz, Thomas Neumann, and Viktor Leis. 2020. FSST: fast random access string compression. Proceedings of the VLDB Endowment 13, 12 (2020), 2649--2661.
[66]
Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, Andrew Mccormick, Aniket Mokashi, Paul Harvey, Hector Gonzalez, David Lomax, Sagar Mittal, et al. 2019. Procella: Unifying serving and analytical data at YouTube. (2019).
[67]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In SoCC. 143--154.
[68]
George P Copeland and Setrag N Khoshafian. 1985. A decomposition storage model. Acm Sigmod Record 14, 4 (1985), 268--279.
[69]
Dario Curreri, Olivier Curé, and Marinella Sciortino. [n.d.]. RDF DATA AND COLUMNAR FORMATS. Master's thesis.
[70]
Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, et al. 2016. The Snowflake Elastic Data Warehouse. In SIGMOD.
[71]
Bailu Ding, Surajit Chaudhuri, Johannes Gehrke, and Vivek Narasayya. 2021. DSB: A decision support benchmark for workload-driven and traditional database systems. Proceedings of the VLDB Endowment 14, 13 (2021), 3376--3388.
[72]
Avrilia Floratou, Umar Farooq Minhas, and Fatma Özcan. 2014. Sql-on-hadoop: Full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment 7, 12 (2014), 1295--1306.
[73]
Mike Folk, Gerd Heber, Quincey Koziol, Elena Pourmal, and Dana Robinson. 2011. An overview of the HDF5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases. 36--47.
[74]
Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1998. Compressing relations and indexes. In Proceedings 14th International Conference on Data Engineering. IEEE, 370--379.
[75]
Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon Redshift and the Case for Simpler Data Warehouses. In SIGMOD.
[76]
Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. 2011. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In 2011 IEEE 27th International Conference on Data Engineering. IEEE, 1199--1208.
[77]
Brian Hentschel, Michael S Kester, and Stratos Idreos. 2018. Column sketches: A scan accelerator for rapid and robust predicate evaluation. In Proceedings of the 2018 International Conference on Management of Data. 857--872.
[78]
Yin Huai, Ashutosh Chauhan, Alan Gates, Gunther Hagleitner, Eric N Hanson, Owen O'Malley, Jitendra Pandey, Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2014. Major technical advancements in apache hive. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1235--1246.
[79]
S Idreos, F Groffen, N Nes, S Manegold, S Mullender, and M Kersten. 2012. Monetdb: Two decades of research in column-oriented database. IEEE Data Engineering Bulletin (2012).
[80]
Todor Ivanov and Matteo Pergolesi. 2020. The impact of columnar file formats on SQL-on-hadoop engine performance: A study on ORC and Parquet. Concurrency and Computation: Practice and Experience 32, 5 (2020), e5523.
[81]
Hao Jiang, Chunwei Liu, John Paparrizos, Andrew A Chien, Jihong Ma, and Aaron J Elmore. 2021. Good to the Last Bit: Data-Driven Encoding with CodecDB. In Proceedings of the 2021 International Conference on Management of Data. 843--856.
[82]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535--547.
[83]
Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis. 2023. BtrBlocks: Efficient Columnar Compression for Data Lakes. Proc. ACM Manag. Data 1, 2, Article 118 (jun 2023), 26 pages.
[84]
Daniel Lemire and Leonid Boytsov. 2015. Decoding billions of integers per second through vectorization. Software: Practice and Experience 45, 1 (2015), 1--29.
[85]
Yinan Li, Jianan Lu, and Badrish Chandramouli. 2023. Selection Pushdown in Column Stores Using Bit Manipulation Instructions. Proc. ACM Manag. Data 1, 2, Article 178 (jun 2023), 26 pages.
[86]
Yinan Li and Jignesh M Patel. 2013. Bitweaving: Fast scans for main memory data processing. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 289--300.
[87]
Panagiotis Liakos, Katia Papakonstantinopoulou, and Yannis Kotidis. 2022. Chimp: efficient lossless floating point compression for time series databases. Proceedings of the VLDB Endowment 15, 11 (2022), 3058--3070.
[88]
Yihao Liu, Xinyu Zeng, and Huanchen Zhang. 2023. LeCo: Lightweight Compression via Learning Serial Correlations. arXiv preprint arXiv:2306.15374 (2023).
[89]
Samuel Madden, Jialin Ding, Tim Kraska, Sivaprasad Sudhir, David Cohen, Timothy Mattson, and Nesime Tatbul. 2022. Self-Organizing Data Containers. In The Conference on Innovative Data Systems Research, CIDR.
[90]
Heikki Mannila. 1985. Measures of presortedness and optimal sorting algorithms. IEEE transactions on computers 100, 4 (1985), 318--325.
[91]
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3, 1-2 (2010), 330--339.
[92]
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis, Hossein Ahmadi, Dan Delorey, Slava Min, et al. 2020. Dremel: A decade of interactive SQL analysis at web scale. Proceedings of the VLDB Endowment 13, 12 (2020), 3461--3472.
[93]
Patrick E O'Neil, Elizabeth J O'Neil, and Xuedong Chen. 2007. The star schema benchmark (SSB). Pat 200, 0 (2007), 50.
[94]
Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment 8, 12 (2015), 1816--1827.
[95]
Pouria Pirzadeh, Michael Carey, and Till Westmann. 2017. A performance study of big data analytics platforms. In 2017 IEEE international conference on big data (big data). IEEE, 2911--2920.
[96]
Felix Putze, Peter Sanders, and Johannes Singler. 2010. Cache-, Hash-, and Space-Efficient Bloom Filters. ACM J. Exp. Algorithmics 14, Article 4 (Jan 2010), 18 pages.
[97]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. 2022. LAION-5B: An open large-scale dataset for training next generation image-text models. In NeurIPS.
[98]
Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, et al. 2019. Presto: SQL on everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1802--1813.
[99]
Anil Shanbhag, Samuel Madden, and Xiangyao Yu. 2020. A study of the fundamental performance characteristics of GPUs and CPUs for database analytics. In Proceedings of the 2020 ACM SIGMOD international conference on Management of data. 1617--1632.
[100]
Anil Shanbhag, Bobbi W. Yogatama, Xiangyao Yu, and Samuel Madden. 2022. Tile-Based Lightweight Integer Compression in GPU. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1390--1403.
[101]
Lefteris Sidirourgos and Martin Kersten. 2013. Column imprints: a secondary index structure. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 893--904.
[102]
Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. 2005. C-Store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005. ACM, 553--564.
[103]
The Transaction Processing Council. 2021. TPC-DS Benchmark (Revision 3.2.0).
[104]
The Transaction Processing Council. 2022. TPC-H Benchmark (Revision 3.0.1).
[105]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2, 2 (2009), 1626--1629.
[106]
Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, and Bernard Metzler. 2018. Albis:{High-Performance} File Format for Big Data Systems. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 615--630.
[107]
Kapil Vaidya, Subarna Chatterjee, Eric Knorr, Michael Mitzenmacher, Stratos Idreos, and Tim Kraska. 2022. SNARF: a learning-enhanced range filter. Proceedings of the VLDB Endowment 15, 8 (2022), 1632--1644.
[108]
Suketu Vakharia, Peng Li, Weiran Liu, and Sundaram Narayanan. 2023. Shared Foundations: Modernizing Meta's Data Lakehouse. In The Conference on Innovative Data Systems Research, CIDR.
[109]
Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, Alfons Kemper, Viktor Leis, Tobias Muehlbauer, Thomas Neumann, and Manuel Then. 2018. Get Real: How Benchmarks Fail to Represent the Real World. In Proceedings of the Workshop on Testing Database Systems (Houston, TX, USA) (DBTest'18). Association for Computing Machinery, New York, NY, USA, Article 1, 6 pages.
[110]
Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data. 2614--2627.
[111]
Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, Wanling Gao, Zhen Jia, Yingjie Shi, Shujie Zhang, Chen Zheng, Gang Lu, Kent Zhan, Xiaona Li, and Bizhu Qiu. 2014. BigDataBench: A big data benchmark suite from internet services. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 488--499.
[112]
Bobbi W Yogatama, Weiwei Gong, and Xiangyao Yu. 2022. Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMS. Proceedings of the VLDB Endowment 15, 11 (2022), 2491--2503.
[113]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 15--28.
[114]
Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, and Huanchen Zhang. 2023. An Empirical Evaluation of Columnar Storage Formats. https://arxiv.org/pdf/2304.05028.pdf/. arXiv preprint arXiv:2304.05028 (2023).
[115]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. Surf: Practical range query filtering with fast succinct tries. In Proceedings of the 2018 International Conference on Management of Data. 323--336.
[116]
Huanchen Zhang, Xiaoxuan Liu, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Order-preserving key compression for in-memory search trees. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1601--1615.
[117]
Marcin Zukowski, Sandor Heman, Niels Nes, and Peter Boncz. 2006. Superscalar RAM-CPU cache compression. In 22nd International Conference on Data Engineering (ICDE'06). IEEE, 59--59.
[118]
Marcin Zukowski, Mark Van de Wiel, and Peter Boncz. 2012. Vectorwise: A vectorized analytical DBMS. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 1349--1350.

Cited By

View all
  • (2024)Apache TsFile: An IoT-Native Time Series File FormatProceedings of the VLDB Endowment10.14778/3685800.368582717:12(4064-4076)Online publication date: 8-Nov-2024
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 1-Jul-2024
  • (2024)NULLS!: Revisiting Null Representation in Modern Columnar FormatsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663452(1-10)Online publication date: 10-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 2
October 2023
185 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2023
Published in PVLDB Volume 17, Issue 2

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)216
  • Downloads (Last 6 weeks)41
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Apache TsFile: An IoT-Native Time Series File FormatProceedings of the VLDB Endowment10.14778/3685800.368582717:12(4064-4076)Online publication date: 8-Nov-2024
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 1-Jul-2024
  • (2024)NULLS!: Revisiting Null Representation in Modern Columnar FormatsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663452(1-10)Online publication date: 10-Jun-2024
  • (2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
  • (2023)Performance of Null Handling in Array Databases2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386100(247-254)Online publication date: 15-Dec-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media