[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

ChainedFilter: Combining Membership Filters by Chain Rule

Published: 12 December 2023 Publication History

Abstract

Membership (membership query/membership testing) is a fundamental problem across databases, networks and security. However, previous research has primarily focused on either approximate solutions, such as Bloom Filters, or exact methods, like perfect hashing and dictionaries, without attempting to develop an integral theory. In this paper, we propose a unified and complete theory, namely chain rule, for general membership problems, which encompasses both approximate and exact membership as extreme cases. Building upon the chain rule, we introduce a straightforward yet versatile algorithm framework, namely ChainedFilter, to combine different elementary filters without losing information. Our evaluation results demonstrate that ChainedFilter improves performance of many applications including static dictionary, lossless data compression, Cuckoo Hashing, LSM-Tree and Learned Filters.

Supplemental Material

MP4 File
Presentation video

References

[1]
Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. Theory and practice of bloom filters for distributed systems. IEEE Communications Surveys & Tutorials, 14(1):131--155, 2011.
[2]
Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485--509, 2004.
[3]
Shahabeddin Geravand and Mahmood Ahmadi. Bloom filter applications in network security: A state-of-the-art survey. Computer Networks, 57(18):4047--4064, 2013.
[4]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):1--26, 2008.
[5]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. Optimal bloom filters and adaptive merging for lsm-trees. ACM Transactions on Database Systems (TODS), 43(4):1--48, 2018.
[6]
Yoshinori Matsunobu, Siying Dong, and Herman Lee. Myrocks: Lsm-tree database storage engine serving facebook's social graph. Proceedings of the VLDB Endowment, 13(12):3217--3230, 2020.
[7]
Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood. Fast packet classification using bloom filters. In Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems, pages 61--70, 2006.
[8]
Dan Li, Henggang Cui, Yan Hu, Yong Xia, and Xin Wang. Scalable data center multicast using multi-class bloom filter. In 2011 19th IEEE international conference on network protocols, pages 266--275. IEEE, 2011.
[9]
Pedro Reviriego, Jorge Martínez, David Larrabeiti, and Salvatore Pontarelli. Cuckoo filters and bloom filters: Comparison and application to packet classification. IEEE Transactions on Network and Service Management, 17(4):2690--2701, 2020.
[10]
Michael T Goodrich and Michael Mitzenmacher. Invertible bloom lookup tables. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 792--799. IEEE, 2011.
[11]
A Pinar Ozisik, Gavin Andresen, Brian N Levine, Darren Tapp, George Bissias, and Sunny Katkuri. Graphene: efficient interactive set reconciliation applied to blockchain propagation. In Proceedings of the ACM Special Interest Group on Data Communication, pages 303--317. 2019.
[12]
Muhammad Anas Imtiaz, David Starobinski, Ari Trachtenberg, and Nabeel Younis. Churn in the bitcoin network: Characterization and impact. In 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pages 431--439. IEEE, 2019.
[13]
Larry Carter, Robert Floyd, John Gill, George Markowsky, and Mark Wegman. Exact and approximate membership testers. In Proceedings of the tenth annual ACM symposium on Theory of computing, pages 59--65, 1978.
[14]
Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pages 75--88, 2014.
[15]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. Rocksdb: Evolution of development priorities in a key-value store serving large-scale applications. ACM Transactions on Storage (TOS), 17(4):1--32, 2021.
[16]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 international conference on management of data, pages 489--504, 2018.
[17]
Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. Advances in Neural Information Processing Systems, 31, 2018.
[18]
Qiyu Liu, Libin Zheng, Yanyan Shen, and Lei Chen. Stable learned bloom filters for data streams. Proceedings of the VLDB Endowment, 13(12):2355--2367, 2020.
[19]
Zhenwei Dai and Anshumali Shrivastava. Adaptive learned bloom filter (ada-bf): efficient utilization of the classifier with application to real-time information filtering on the web. Advances in Neural Information Processing Systems, 33:11700--11710, 2020.
[20]
The open source code of chainedfilter. https://github.com/ChainedFilter.
[21]
Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 30--39. Citeseer, 2004.
[22]
Denis Charles and Kumar Chellapilla. Bloomier filters: A second look. In Algorithms-ESA 2008: 16th Annual European Symposium, Karlsruhe, Germany, September 15--17, 2008. Proceedings 16, pages 259--270. Springer, 2008.
[23]
Thomas Mueller Graf and Daniel Lemire. Xor filters: Faster and smaller than bloom and cuckoo filters. Journal of Experimental Algorithmics (JEA), 25:1--16, 2020.
[24]
Thomas Mueller Graf and Daniel Lemire. Binary fuse filters: Fast and smaller than xor filters. Journal of Experimental Algorithmics (JEA), 27(1):1--15, 2022.
[25]
Stefan Walzer. Peeling close to the orientability threshold--spatial coupling in hashing-based data structures. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2194--2211. SIAM, 2021.
[26]
Ye Yu, Djamal Belazzougui, Chen Qian, and Qin Zhang. Memory-efficient and ultra-fast network lookup and forwarding using othello hashing. IEEE/ACM Transactions on Networking, 26(3):1151--1164, 2018.
[27]
Yang Tong, Dongsheng Yang, Jie Jiang, Siang Gao, Bin Cui, Lei Shi, and Xiaoming Li. Coloring embedder: A memory efficient data structure for answering multi-set query. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1142--1153. IEEE, 2019.
[28]
Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[29]
Yuriy Arbitman, Moni Naor, and Gil Segev. Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In 2010 IEEE 51st Annual symposium on foundations of computer science, pages 787--796. IEEE, 2010.
[30]
Shachar Lovett and Ely Porat. A space lower bound for dynamic approximate membership data structures. SIAM Journal on Computing, 42(6):2182--2196, 2013.
[31]
Murmur hashing source code. https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp.
[32]
David A Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098--1101, 1952.
[33]
Jarek Duda. Asymmetric numeral systems: entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv preprint arXiv:1311.2540, 2013.
[34]
Jóhannes B Hreinsson, Morten Krøyer, and Rasmus Pagh. Storing a compressed function with constant time access. In Algorithms-ESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September 7--9, 2009. Proceedings 17, pages 730--741. Springer, 2009.
[35]
Pedro Reviriego, Alfonso Sánchez-Macián, Stefan Walzer, and Peter C Dillinger. Approximate membership query filters with a false positive free set. arXiv preprint arXiv:2111.06856, 2021.
[36]
Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122--144, 2004.
[37]
Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, 12(10):1094--1104, 2001.
[38]
Udi Manber and Sun Wu. An algorithm for approximate membership checking with application to password security. Information Processing Letters, 50(4):191--197, 1994.
[39]
Li Fan, Pei Cao, Jussara Almeida, and Andrei Z Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM transactions on networking, 8(3):281--293, 2000.
[40]
Salvatore Pontarelli, Pedro Reviriego, and Michael Mitzenmacher. Emoma: Exact match in one memory access. IEEE Transactions on Knowledge and Data Engineering, 30(11):2120--2133, 2018.
[41]
Yoav Freund, Robert Schapire, and Naoki Abe. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(771--780):1612, 1999.
[42]
The open source code of learned filter. https://github.com/karthikeya20/Learned-Bloom-Filters.
[43]
Yuhan Wu, Jintao He, Shen Yan, Jianyu Wu, Tong Yang, Olivier Ruas, Gong Zhang, and Bin Cui. Elastic bloom filter: deletable and expandable filter using elastic fingerprints. IEEE Transactions on Computers, 71(4):984--991, 2021.
[44]
Zhuochen Fan, Yubo Zhang, Siyuan Dong, Yi Zhou, Fangyi Liu, Tong Yang, Steve Uhlig, and Bin Cui. Hoppingsketch: More accurate temporal membership query and frequency query. IEEE Transactions on Knowledge and Data Engineering, 35(9):9067--9072, 2023.
[45]
Haoyu Li, Qizhi Chen, Yixin Zhang, Tong Yang, and Bin Cui. Stingy sketch: A sketch framework for accurate and fast frequency estimation. Proc. VLDB Endow., 15(7):1426--1438, mar 2022.
[46]
Yuanpeng Li, Feiyu Wang, Xiang Yu, Yilong Yang, Kaicheng Yang, Tong Yang, Zhuo Ma, Bin Cui, and Steve Uhlig. Ladderfilter: Filtering infrequent items with small memory and time overhead. Proceedings of the ACM on Management of Data, 1(1):1--21, 2023.
[47]
Jiarui Guo, Yisen Hong, YuhanWu, Yunfei Liu, Tong Yang, and Bin Cui. Sketchpolymer: Estimate per-item tail quantile using one sketch. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 590--601, 2023.
[48]
Ziwei Wang, Zheng Zhong, Jiarui Guo, Yuhan Wu, Haoyu Li, Tong Yang, Yaofeng Tu, Huanchen Zhang, and Bin Cui. Rencoder: A space-time efficient range filter with local encoder. In 2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 2036--2049, 2023.
[49]
Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste. {SketchLib}: Enabling efficient sketch-based monitoring on programmable switches. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 743--759, 2022.
[50]
Xiaodong Li, Zhuochen Fan, Haoyu Li, Zheng Zhong, Jiarui Guo, Sheng Long, Tong Yang, and Bin Cui. Steadysketch: Finding steady flows in data streams. In 2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS), pages 01--09. IEEE, 2023.
[51]
Zhuochen Fan, Gang Wen, Zhipeng Huang, Yang Zhou, Qiaobin Fu, Tong Yang, Alex X Liu, and Bin Cui. On the evolutionary of bloom filter false positives-an information theoretical approach to optimizing bloom filter parameters. IEEE Transactions on Knowledge and Data Engineering, 2022.
[52]
Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste. Sketchovsky: Enabling ensembles of sketches on programmable switches. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1273--1292, 2023.
[53]
Zirui Liu, Chaozhe Kong, Kaicheng Yang, Tong Yang, Ruijie Miao, Qizhi Chen, Yikai Zhao, Yaofeng Tu, and Bin Cui. Hypercalm sketch: One-pass mining periodic batches in data streams.
[54]
Martin Dietzfelbinger and Rasmus Pagh. Succinct data structures for retrieval and approximate membership. In Automata, Languages and Programming: 35th International Colloquium, ICALP 2008, Reykjavik, Iceland, July 7--11, 2008, Proceedings, Part I 35, pages 385--396. Springer, 2008.
[55]
Ely Porat. An optimal bloom filter replacement based on matrix solving. In Computer Science-Theory and Applications: Fourth International Computer Science Symposium in Russia, CSR 2009, Novosibirsk, Russia, August 18--23, 2009. Proceedings 4, pages 263--273. Springer, 2009.
[56]
Bohdan S Majewski, Nicholas C Wormald, George Havas, and Zbigniew J Czech. A family of perfect hashing methods. The Computer Journal, 39(6):547--554, 1996.
[57]
Michael L Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with 0 (1) worst case access time. Journal of the ACM (JACM), 31(3):538--544, 1984.
[58]
Andrej Brodnik and J Ian Munro. Membership in constant time and minimum space. In Algorithms-ESA'94: Second Annual European Symposium Utrecht, The Netherlands, September 26--28, 1994 Proceedings 2, pages 72--81. Springer, 1994.
[59]
Andrej Brodnik and J Ian Munro. Membership in constant time and almost-minimum space. SIAM Journal on computing, 28(5):1627--1640, 1999.
[60]
Rasmus Pagh. Low redundancy in static dictionaries with constant query time. SIAM Journal on Computing, 31(2):353--363, 2001.
[61]
Ioana Oriana Bercea and Guy Even. A space-efficient dynamic dictionary for multisets with constant time operations. arXiv preprint arXiv:2005.02143, 2020.
[62]
Anna Pagh, Rasmus Pagh, and S Srinivasa Rao. An optimal bloom filter replacement. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 823--829, 2005.

Cited By

View all
  • (2024)Fight Fire with Fire: Towards Robust Graph Neural Networks on Dynamic Graphs via Actively DefenseProceedings of the VLDB Endowment10.14778/3659437.365945717:8(2050-2063)Online publication date: 31-May-2024
  • (2024)ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic GraphsProceedings of the VLDB Endowment10.14778/3641204.364121517:5(1060-1072)Online publication date: 2-May-2024
  • (2024)SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data PlacementProceedings of the ACM on Management of Data10.1145/36549772:3(1-25)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 4
PACMMOD
December 2023
1317 pages
EISSN:2836-6573
DOI:10.1145/3637468
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2023
Published in PACMMOD Volume 1, Issue 4

Permissions

Request permissions for this article.

Author Tags

  1. bloom filter
  2. membership

Qualifiers

  • Research-article

Funding Sources

  • nsfc

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)202
  • Downloads (Last 6 weeks)13
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fight Fire with Fire: Towards Robust Graph Neural Networks on Dynamic Graphs via Actively DefenseProceedings of the VLDB Endowment10.14778/3659437.365945717:8(2050-2063)Online publication date: 31-May-2024
  • (2024)ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic GraphsProceedings of the VLDB Endowment10.14778/3641204.364121517:5(1060-1072)Online publication date: 2-May-2024
  • (2024)SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data PlacementProceedings of the ACM on Management of Data10.1145/36549772:3(1-25)Online publication date: 30-May-2024
  • (2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
  • (2024)Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing ApproachProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671848(1701-1712)Online publication date: 25-Aug-2024
  • (2024)An Empirical Study on Noisy Label Learning for Program UnderstandingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639217(1-12)Online publication date: 20-May-2024
  • (2024)VisionEmbedder: Bit-Level-Compact Key-Value Storage with Constant Lookup, Rapid Updates, and Rare Failure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00324(4248-4261)Online publication date: 13-May-2024
  • (2024)WavingSketch: an unbiased and generic sketch for finding top-k items in data streamsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00869-633:5(1697-1722)Online publication date: 1-Sep-2024
  • (2024)Performant almost-latch-free data structures using epoch protection in more depthThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00859-833:6(1793-1812)Online publication date: 1-Nov-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media