[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

SigMatch: fast and scalable multi-pattern matching

Published: 01 September 2010 Publication History

Abstract

Multi-pattern matching involves matching a data item against a large database of "signature" patterns. Existing algorithms for multi-pattern matching do not scale well as the size of the signature database increases. In this paper, we present sigMatch -- a fast, versatile, and scalable technique for multi-pattern signature matching. At its heart, sigMatch organizes the signature database into a (processor) cache-efficient q-gram index structure, called the sigTree. The sigTree groups patterns based on common sub-patterns, such that signatures that don't match can be quickly eliminated from the matching process. The sigTree also uses parallel Bloom filters and a technique to reduce imbalances across groups, for improved performance. Using extensive empirical evaluation across three diverse domains, we show that sigMatch often outperforms existing methods by an order of magnitude or more.

References

[1]
Alexa Web Rankings. http://www.alexa.com.
[2]
ClamAV Anti-Virus System. http://www.clamav.net/.
[3]
DARPA Intrusion Detection Evaluation Data Set. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/1998data.html.
[4]
Snort Intrusion Detection System. http://www.snort.org/.
[5]
Spam Corp. http://www.spamcop.net.
[6]
Valgrind. http://www.valgrind.org/.
[7]
A. V. Aho and M. J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. CACM, pages 333--340, 1975.
[8]
B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM, 13(7), 1970.
[9]
R. S. Boyer and J. S. Moore. A Fast String Searching Algorithm. CACM, 20(10):762--772, 1977.
[10]
K. Chakrabarti, S. Chaudhuri, V. Ganti, and D. Xin. An Efficient Filter for Approximate Membership Checking. In SIGMOD, pages 805--818, 2008.
[11]
J. Cho and S. Rajagopalan. A Fast Regular Expression Indexing Engine. In ICDE, pages 418--429, 2002.
[12]
B. Commentz-Walter. A String Matching Algorithm Fast on the Average. In Proceedings of the 6th Colloquium, on Automata, Languages and Programming, pages 118--132, 1979.
[13]
P. DeRose, W. Shen, F. C. 0002, Y. Lee, D. Burdick, A. Doan, and R. Ramakrishnan. DBLife: A Community Information Management Platform for the Database Research Community (Demo). In CIDR, pages 169--172, 2007.
[14]
S. Dharmapurikar, P. Krishnamurthy, T. S. Sproull, and J. W. Lockwood. Deep Packet Inspection using Parallel Bloom Filters. IEEE Micro, 24(1):52--61, 2004.
[15]
O. Erdogan and P. Cao. Hash-AV: Fast Virus Signature Scanning by Cache Resident Filters. Int. J. Secur. Netw., 2(1/2):50--59, 2007.
[16]
N. Koudas, A. Marathe, and D. Srivastava. Flexible String Matching Against Large Databases in Practice. In VLDB, pages 1078--1086, 2004.
[17]
C. Li, J. Lu, and Y. Lu. Efficient Merging and Filtering Algorithms for Approximate String Searches. In ICDE, pages 257--266, 2008.
[18]
C. Li, B. Wang, and X. Yang. VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams. In VLDB, pages 303--314, 2007.
[19]
A. Majumder, R. Rastogi, and S. Vanama. Scalable Regular Expression Matching on Data Streams. In SIGMOD, pages 161--172, 2008.
[20]
Y. Miretskiy, A. Das, C. P. Wright, and E. Zadok. Avfs: An On-Access Anti-Virus File System. In SSYM, pages 73--88, 2004.
[21]
L. Salmela, J. Tarhio, and J. Kytöjoki. Multi-Pattern String Matching with q-Grams. J. Exp. Algorithmics, 11:1--19, 2006.
[22]
R. Smith, C. Estan, S. Jha, and S. Kong. Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata. In SIGCOMM, pages 207--218, 2008.
[23]
H. Song and J. W. Lockwood. Multi-Pattern Signature Matching for Hardware Network Intrusion Detection Systems. In IEEE GLOBECOM, pages 1686--1690, 2005.
[24]
D. Starobinski, A. Trachtenberg, and S. Agarwal. Efficient PDA Synchronization. IEEE Transactions on Mobile Computing, 2(1):40--51, 2003.
[25]
S. Wu and U. Manber. A Fast Algorithm for Multi-Pattern Searching. Technical report, University of Arizona, 1994.
[26]
W. A. Wulf and S. A. Mckee. Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News, 23:20--24, 1995.
[27]
F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz. Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection. In ANCS, pages 93--102, 2006.
[28]
X. Zhou, B. Xu, Y. Qi, and J. Li. MRSI: A Fast Pattern Matching Algorithm for Anti-Virus Applications. In International Conference on Networking, pages 256--261, 2008.

Cited By

View all
  • (2023)Exploiting Structure in Regular Expression QueriesProceedings of the ACM on Management of Data10.1145/35892971:2(1-28)Online publication date: 20-Jun-2023
  • (2023)Efficient Index-Based Regular Expression Matching with Optimal Query Plan TreeDatabase Systems for Advanced Applications10.1007/978-3-031-30637-2_3(35-45)Online publication date: 17-Apr-2023
  • (2022)n‐Grams exclusion and inclusion filter for intrusion detection in Internet of Energy big data systemsTransactions on Emerging Telecommunications Technologies10.1002/ett.371133:3Online publication date: 21-Mar-2022
  • Show More Cited By
  1. SigMatch: fast and scalable multi-pattern matching

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
    September 2010
    1658 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2010
    Published in PVLDB Volume 3, Issue 1-2

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Exploiting Structure in Regular Expression QueriesProceedings of the ACM on Management of Data10.1145/35892971:2(1-28)Online publication date: 20-Jun-2023
    • (2023)Efficient Index-Based Regular Expression Matching with Optimal Query Plan TreeDatabase Systems for Advanced Applications10.1007/978-3-031-30637-2_3(35-45)Online publication date: 17-Apr-2023
    • (2022)n‐Grams exclusion and inclusion filter for intrusion detection in Internet of Energy big data systemsTransactions on Emerging Telecommunications Technologies10.1002/ett.371133:3Online publication date: 21-Mar-2022
    • (2021)Index-Accelerated Pattern Matching in Event StoresProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457245(1023-1036)Online publication date: 9-Jun-2021
    • (2018)Co-occurrence pattern mining based on a biological approximation scoring matrixPattern Analysis & Applications10.5555/3288219.328822721:4(977-996)Online publication date: 1-Nov-2018
    • (2017)A Bloom filter based semi-index on q-gramsSoftware—Practice & Experience10.1002/spe.243147:6(799-811)Online publication date: 1-Jun-2017
    • (2016)DFCProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation10.5555/2930611.2930647(551-565)Online publication date: 16-Mar-2016
    • (2012)A prefiltering approach to regular expression matching for network security systemsProceedings of the 10th international conference on Applied Cryptography and Network Security10.1007/978-3-642-31284-7_22(363-380)Online publication date: 26-Jun-2012

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media