[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2619228.2619234acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Vectorized Bloom filters for advanced SIMD processors

Published: 23 June 2014 Publication History

Abstract

Analytics are at the core of many business intelligence tasks. Efficient query execution is facilitated by advanced hardware features, such as multi-core parallelism, shared-nothing low-latency caches, and SIMD vector instructions. Only recently, the SIMD capabilities of mainstream hardware have been augmented with wider vectors and non-contiguous loads termed gathers. While analytical DBMSs minimize the use of indexes in favor of scans based on sequential memory accesses, some data structures remain crucial. The Bloom filter, one such example, is the most efficient structure for filtering tuples based on their existence in a set and its performance is critical when joining tables with vastly different cardinalities. We introduce a vectorized implementation for probing Bloom filters based on gathers that eliminates conditional control flow and is independent of the SIMD length. Our techniques are generic and can be reused for accelerating other database operations. Our evaluation indicates a significant performance improvement over scalar code that can exceed 3X when the Bloom filter is cache-resident.

References

[1]
P. S. Almeida et al. Scalable Bloom filters. Information Processing Letters, 101(6):255--261, Mar. 2007.
[2]
K. S. Beyer and S. Rajagopalan. System and method for generating a cache-aware Bloom filter, Oct. 2011. US Patent 8,032,732 B2 (Filed: June 5, 2008).
[3]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, July 1970.
[4]
B. Chazelle et al. The Bloomier filter: An efficient data structure for static support lookup tables. In SODA, pages 30--39, 2004.
[5]
J. Chhugani et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. In VLDB, pages 1313--1324, 2008.
[6]
S. Cohen and Y. Matias. Spectral Bloom filters. In SIGMOD, pages 241--252, 2003.
[7]
F. Deng and D. Rafiei. Approximately detecting duplicates for streaming data using stable Bloom filters. In SIGMOD, pages 25--36, 2006.
[8]
M. Dietzfelbinger et al. A reliable randomized algorithm for the closest-pair problem. J. Algorithms, 25(1), 1997.
[9]
L. Fan et al. Summary cache: A scalable wide-area web cache sharing protocol. Technical report, University of Wisconsin-Madison, 1998.
[10]
H. Inoue et al. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In PACT, pages 189--198, 2007.
[11]
C. Kim et al. Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD, pages 339--350, 2010.
[12]
D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Software: Practice and Experience, May 2013.
[13]
S. Manegold et al. What happens during a join? dissecting cpu and memory optimization effects. In VLDB, pages 339--350, 2000.
[14]
A. Pagh et al. An optimal Bloom filter replacement. In SODA, pages 823--829, 2005.
[15]
O. Polychroniou et al. High throughput heavy hitter aggregation for modern SIMD processors. In DaMoN, 2013.
[16]
O. Polychroniou and K. A. Ross. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, 2014.
[17]
F. Putze et al. Cache-, hash-, and space-efficient Bloom filters. J. Experimental Algorithmics, 14:4.4--18, Jan. 2010.
[18]
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, Aug. 2013.
[19]
T. Willhalm et al. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, Aug. 2009.
[20]
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, pages 145--156, 2002.

Cited By

View all
  • (2024)SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation InstructionsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663439(1-9)Online publication date: 10-Jun-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2023)A Case for Partitioned Bloom FiltersIEEE Transactions on Computers10.1109/TC.2022.321899572:6(1681-1691)Online publication date: 1-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DaMoN '14: Proceedings of the Tenth International Workshop on Data Management on New Hardware
June 2014
71 pages
ISBN:9781450329712
DOI:10.1145/2619228
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation InstructionsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663439(1-9)Online publication date: 10-Jun-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2023)A Case for Partitioned Bloom FiltersIEEE Transactions on Computers10.1109/TC.2022.321899572:6(1681-1691)Online publication date: 1-Jun-2023
  • (2022)To use or not to use the SIMD gather instruction?Proceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535089(1-5)Online publication date: 12-Jun-2022
  • (2022)Binary Fuse Filters: Fast and Smaller Than Xor FiltersACM Journal of Experimental Algorithmics10.1145/351044927(1-15)Online publication date: 4-Mar-2022
  • (2022)ATrie Group Join: A Parallel Star Group Join and Aggregation for In-Memory Column-StoresIEEE Transactions on Big Data10.1109/TBDATA.2020.30045208:4(1020-1033)Online publication date: 1-Aug-2022
  • (2022)Partition-based SIMD Processing and its Application to Columnar Database SystemsDatenbank-Spektrum10.1007/s13222-022-00431-023:1(53-63)Online publication date: 7-Dec-2022
  • (2022)To share or not to share vector registers?The VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00744-231:6(1215-1236)Online publication date: 28-Apr-2022
  • (2022)Stretching your data with taffy filtersSoftware: Practice and Experience10.1002/spe.312952:11(2349-2367)Online publication date: 2-Aug-2022
  • (2021)A four-dimensional analysis of partitioned approximate filtersProceedings of the VLDB Endowment10.14778/3476249.347628614:11(2355-2368)Online publication date: 1-Jul-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media