[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2463676.2465322acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

BitWeaving: fast scans for main memory data processing

Published: 22 June 2013 Publication History

Abstract

This paper focuses on running scans in a main memory data processing system at "bare metal" speed. Essentially, this means that the system must aim to process data at or near the speed of the processor (the fastest component in most system configurations). Scans are common in main memory data processing environments, and with the state-of-the-art techniques it still takes many cycles per input tuple to apply simple predicates on a single column of a table. In this paper, we propose a technique called BitWeaving that exploits the parallelism available at the bit level in modern processors. BitWeaving operates on multiple bits of data in a single cycle, processing bits from different columns in each cycle. Thus, bits from a batch of tuples are processed in each cycle, allowing BitWeaving to drop the cycles per column to below one in some case. BitWeaving comes in two flavors: BitWeaving/V which looks like a columnar organization but at the bit level, and BitWeaving/H which packs bits horizontally. In this paper we also develop the arithmetic framework that is needed to evaluate predicates using these BitWeaving organizations. Our experimental results show that both these methods produce significant performance benefits over the existing state-of-the-art methods, and in some cases produce over an order of magnitude in performance improvement.

References

[1]
D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006.
[2]
R. Barber, P. Bendel, M. Czech, O. Draese, F. Ho, N. Hrle, S. Idreos, M.-S. Kim, O. Koeth, J.-G. Lee, T. T. Li, G. M. Lohman, K. Morfonios, R. Müller, K. Murthy, I. Pandis, L. Qiao, V. Raman, R. Sidle, K. Stolze, and S. Szabo. Business analytics in (a) Blink. IEEE Data Eng. Bull., 35(1):9--14, 2012.
[3]
C. Binnig, S. Hildenbrand, and F. Färber. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD Conference, pages 283--296, 2009.
[4]
W. Fang, B. He, and Q. Luo. Database compression on graphics processors. PVLDB, 3(1):670--680, 2010.
[5]
F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database -- an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.
[6]
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudré-Mauroux, and S. Madden. Hyrise - a main memory hybrid storage engine. PVLDB, 4(2):105--116, 2010.
[7]
S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.
[8]
R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. PVLDB, 1(1):622--634, 2008.
[9]
A. Kemper, T. Neumann, F. Funke, V. Leis, and H. Mühe. Hyper: Adapting columnar main-memory data management for transactional and query processing. IEEE Data Eng. Bull., 35(1):46--51, 2012.
[10]
J. Krüger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. PVLDB, 5(1):61--72, 2011.
[11]
L. Lamport. Multiple byte processing with full-word instructions. Commun. ACM, 18(8):471--475, 1975.
[12]
Y. Li and J. M. Patel. BitWeaving: Fast scans for main memory data processing. http://research.cs.wisc.edu/quickstep/pubs/bitweaving-extended.pdf.
[13]
P. E. O'Neil and D. Quass. Improved query performance with variant indexes. In SIGMOD, pages 38--49, 1997.
[14]
Oracle Exalytics In-Memory Machine: A Brief Introduction, October 2011. Oracle White Paper.
[15]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE, pages 60--69, 2008.
[16]
D. Rinfret, P. E. O'Neil, and E. J. O'Neil. Bit-sliced index arithmetic. In SIGMOD, pages 47--57, 2001.
[17]
Transaction Processing Performance Council. TPC Benchmark H. Revision 2.14.3. November 2011.
[18]
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009.
[19]
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, pages 145--156, 2002.
[20]
M. Zukowski, S. Héman, N. Nes, and P. A. Boncz. Super-scalar RAM-CPU cache compression. In ICDE, page 59, 2006.
[21]
M. Zukowski, M. van de Wiel, and P. A. Boncz. Vectorwise: A vectorized analytical DBMS. In ICDE, pages 1349--1350, 2012.

Cited By

View all
  • (2024)SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation InstructionsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663439(1-9)Online publication date: 10-Jun-2024
  • (2024)SHERLOCK: Scheduling Efficient and Reliable Bulk Bitwise Operations in NVMsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3658485(1-6)Online publication date: 23-Jun-2024
  • (2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
  • Show More Cited By

Index Terms

  1. BitWeaving: fast scans for main memory data processing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
    June 2013
    1322 pages
    ISBN:9781450320375
    DOI:10.1145/2463676
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. analytics
    2. bit-parallel
    3. indexing
    4. intra-cycle parallelism
    5. storage organization

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'13
    Sponsor:

    Acceptance Rates

    SIGMOD '13 Paper Acceptance Rate 76 of 372 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)77
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation InstructionsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663439(1-9)Online publication date: 10-Jun-2024
    • (2024)SHERLOCK: Scheduling Efficient and Reliable Bulk Bitwise Operations in NVMsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3658485(1-6)Online publication date: 23-Jun-2024
    • (2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
    • (2024)Cabin: A Compressed Adaptive Binned Scan IndexProceedings of the ACM on Management of Data10.1145/36393122:1(1-26)Online publication date: 26-Mar-2024
    • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
    • (2024)Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344721643:11(3744-3755)Online publication date: Nov-2024
    • (2024)Search-in-Memory: Reliable, Versatile, and Efficient Data Matching in SSD’s NAND Flash Memory Chip for Data Indexing AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370243:11(3864-3875)Online publication date: Nov-2024
    • (2024)REGER: Reordering Time Series Data for Regression Encoding2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00100(1242-1254)Online publication date: 13-May-2024
    • (2024)Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00024(99-114)Online publication date: 24-Jun-2024
    • (2023)SmartLite: A DBMS-Based Serving System for DNN Inference in Resource-Constrained EnvironmentsProceedings of the VLDB Endowment10.14778/3632093.363209517:3(278-291)Online publication date: 1-Nov-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media