[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Cache-, hash-, and space-efficient bloom filters

Published: 05 January 2010 Publication History

Abstract

A Bloom filter is a very compact data structure that supports approximate membership queries on a set, allowing false positives.
We propose several new variants of Bloom filters and replacements with similar functionality. All of them have a better cache-efficiency and need less hash bits than regular Bloom filters. Some use SIMD functionality, while the others provide an even better space efficiency. As a consequence, we get a more flexible trade-off between false-positive rate, space-efficiency, cache-efficiency, hash-efficiency, and computational effort. We analyze the efficiency of Bloom filters and the proposed replacements in detail, in terms of the false-positive rate, the number of expected cache-misses, and the number of required hash bits. We also describe and experimentally evaluate the performance of highly tuned implementations. For many settings, our alternatives perform better than the methods proposed so far.

References

[1]
Bloom, B. H. 1970. Space-time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7.
[2]
Broder, A. and Mitzenmacher, M. 2004. Network applications of bloom filters: A survey. Internet Math. 1, 4.
[3]
Buhrman, H., Miltersen, P. B., Radhakrishnan, J., and Venkatesh, S. 2002. Are bit-vectors optimal? SIAM J. Comput 31, 6, 1723--1744.
[4]
Cleary, J. G. 1984. Compact hash tables using bidirectional linear probing. IEEE Trans. Comput. 33, 9, 828--834.
[5]
Dillinger, P. C. and Manolios, P. 2004a. Bloom filters in probabilistic verification. In Proceedings of the 6th International Formal Methods in Computer-Aided Design (FMCAD'04). Springer, Berlin, 367--381.
[6]
Dillinger, P. C. and Manolios, P. 2004b. Fast and accurate bit-state verification for SPIN. In Proceedings of the 11th International SPIN Workshop. Springer, Berlin, 57--75.
[7]
Fan, L., Cao, P., Almeida, J. M., and Broder, A. Z. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM TON 8, 3, 281--293.
[8]
Kirsch, A. and Mitzenmacher, M. 2006. Less hashing, same performance: Building a better Bloom filter. In Proceedings of the 14th European Symposium (ESA'06). Springer, Berlin, 456--467.
[9]
Lumetta, S. and Mitzenmacher, M. 2006. Using the power of two choices to improve bloom filters. http://www.eecs.harvard.edu/~michaelm/postscripts/bftwo.ps.
[10]
Manber, U. and Wu, S. 1994. An algorithm for approximate membership checking with application to password security. Inform. Process. Lett. 50, 4 (25), 191--197.
[11]
Mitzenmacher, M. 2001. Compressed Bloom filters. In PODC 2001. 144--150.
[12]
Moffat, A. and Turpin, A. 2002. Compression and Coding Algorithms. Kluwer.
[13]
Pagh, A., Pagh, R., and Rao, S. S. 2005. An optimal Bloom filter replacement. In Proceedings of the 16th Annual Symposium on Discrete Algorithms (SODA'05). ACM, New York, 823--829.
[14]
Patrascu, M. 2008. Succincter. In Proceedings of the 49th Annual IEEE Symposium on Foundation of Computer Science (FOCS'08). IEEE, Los Alamitos, CA, 305--313.
[15]
Sanders, P. and Transier, F. 2007. Intersection in integer inverted indices. In Proceedings of the 9th Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics, Philadephia.

Cited By

View all
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • (2024)Simple, Efficient, and Robust Hash Tables for Join ProcessingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663442(1-9)Online publication date: 10-Jun-2024
  • (2024)Vertex Encoding for Edge Nonexistence Determination With SIMD AccelerationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335091936:7(3600-3614)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics
ACM Journal of Experimental Algorithmics  Volume 14, Issue
2009
613 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/1498698
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2010
Accepted: 01 July 2009
Received: 01 January 2009
Published in JEA Volume 14

Author Tags

  1. Approximate dictionary
  2. data compression

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)7
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • (2024)Simple, Efficient, and Robust Hash Tables for Join ProcessingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663442(1-9)Online publication date: 10-Jun-2024
  • (2024)Vertex Encoding for Edge Nonexistence Determination With SIMD AccelerationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335091936:7(3600-3614)Online publication date: Jul-2024
  • (2024)On the Privacy of Multi-Versioned Approximate Membership Check FiltersIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.329896721:4(1981-1993)Online publication date: Jul-2024
  • (2024)Caching in Forschung und IndustrieSchnelles und skalierbares Cloud-Datenmanagement10.1007/978-3-031-54388-3_5(91-140)Online publication date: 3-May-2024
  • (2023)InfiniFilter: Expanding Filters to Infinity and BeyondProceedings of the ACM on Management of Data10.1145/35892851:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Privacy-Preserving Content-Based Similarity Detection Over in-the-Cloud MiddleboxesIEEE Transactions on Cloud Computing10.1109/TCC.2022.316932911:2(1854-1870)Online publication date: 1-Apr-2023
  • (2023)A Case for Partitioned Bloom FiltersIEEE Transactions on Computers10.1109/TC.2022.321899572:6(1681-1691)Online publication date: 1-Jun-2023
  • (2023)Engineering a Distributed-Memory Triangle Counting Algorithm2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00076(702-712)Online publication date: May-2023
  • (2023)The LSM Design Space and its Read Optimizations2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00273(3578-3584)Online publication date: Apr-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media