[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph

Published: 01 July 2013 Publication History

Abstract

Pattern matching with wildcards is a challenging topic in many domains, such as bioinformatics and information retrieval. This paper focuses on the problem with gap-length constraints and the one-off condition (The one-off condition means that each character can be used at most once in all occurrences of a pattern in the sequence). It is difficult to achieve the optimal solution. We propose a graph structure WON-Net ( WON-Net is a graph structure. It stands for a net work with the w eighted centralization measure based o n each n ode's centrality-degree. Its details are given in Definition 4.1) to obtain all candidate matching solutions and then design the WOW ( WOW stands for pattern matching with w ildcards based o n WON-Net ) algorithm with the weighted centralization measure based on nodes' centrality-degrees. We also propose an adjustment mechanism to balance the optimal solutions and the running time. We also define a new variant of WOW as WOW - . Theoretical analysis and experiments demonstrate that WOW and WOW - are more effective than their peers. Besides, the algorithms demonstrate an advantage on running time by parallel processing.

References

[1]
Pisanti N, Crochemore M, Grossi R, Sagot M-F (2005) Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM Trans Comput Biol Bioinform 2:40-50.
[2]
On B-W, Lee I (2011) Meta similarity. Appl Intell 35(3):359-374.
[3]
Xiao L, Wissmann D, Brown M, Jablonski S (2004) Information extraction from the web: system and techniques. Appl Intell 21(2):195-224.
[4]
Bille P, Gørtz IL, Vildhøj HW, Wind DK (2010) String matching with variable length gaps. In: String processing and information retrieval--17th international symposium, vol 6393, pp 385-394.
[5]
Zhou B, Pei J (2012) Aggregate keyword search on large relational databases. Knowl Inf Syst 30(2):283-318.
[6]
Hofmann K, Bucher P, Falquet L, Bairoch A (1999) The PROSITE database, its status in 1999. Nucleic Acids Res 27:215-219.
[7]
Bucher P, Bairoch A (1994) A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proceedings of the 2nd international conference on intelligent systems for molecular biology, pp 53-61.
[8]
Navarro G, Raffinot M (2002) Flexible pattern matching in strings--practical on-line search algorithms for texts and biological sequences. Cambridge University Press, Cambridge.
[9]
Cole R, Gottlieb L-A, Lewenstein M (2004) Dictionary matching and indexing with errors and don't cares. In: Proceedings of the 36th ACM symposium on the theory of computing. ACM, New York, pp 91-100.
[10]
Ménard PA, Ratté S (2011) Classifier-based acronym extraction for business documents. Knowl Inf Syst 29(2):305-334.
[11]
Sánchez D, Isern D (2011) Automatic extraction of acronym definitions from the web. Appl Intell 34(2):311-327.
[12]
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2011) HUC-prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181-198.
[13]
Shie B-E, Yu PS, Tseng VS (2012) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell.
[14]
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proc of ICDE, Taipei, pp 3-14.
[15]
Chen G, Wu X, Zhu X, Arslan AN, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399-419.
[16]
Fischer MJ, Paterson MS (1974) String matching and other products. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA.
[17]
Zhang M, Kao B, Cheung DW, Yip KY (2005) Mining periodic patterns with gap requirement from sequences. In: Proceedings of ACM SIGMOD, Baltimore, Maryland, USA, pp 623-633.
[18]
Ding B, Lo D, Han J, Khoo S (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of IEEE 25th international conference on data engineering (ICDE 09), Shanghai, PR China, 2009. IEEE Comput Soc, Los Alamitos, pp 1024-1035.
[19]
Min F, Wu X, Lu Z (2009) Pattern matching with independent wildcard gaps. In: Eighth IEEE international conference on dependable, autonomic and secure computing (DASC-2009), Chengdu, China, pp 194-199.
[20]
Guo D, Hong X, Hu X, Gao J, Liu Y, Wu G, Wu X (2011) A bit-parallel algorithm for sequential pattern matching with wildcards. Cybern Syst 42(6):382-401.
[21]
Wang H, Xie F, Hu X, Li P, Wu X (2010) Pattern matching with flexible wildcards and recurring characters. In: 2010 IEEE international conference on granular computing (GrC 2010), Silicon Valley, USA, 2010. IEEE Comput Soc, Los Alamitos, pp 782-786.
[22]
Wu Y, Wu X, Jiang H, Min F (2011) A heuristic algorithm for MPMGOOC. Chin J Comput 32(8):1452-1462.
[23]
Chang Y-I, Chen J-R, Hsu M-T (2010) A hash trie filter method for approximate string matching in genomic databases. Appl Intell 33(1):21-38.
[24]
He D, Wu X, Zhu X (2007) SAIL-APPROX: an efficient on-line algorithm for approximate pattern matching with wildcards and length constraints. In: Proceedings of the IEEE international conference on bioinformatics and biomedicine (BIBM'07), Silicon Valley, USA, pp 151-158.
[25]
Dorneles CF, Gonçalves R, Mello RS (2011) Approximate data instance matching: a survey. Knowl Inf Syst 27(1):1-21.
[26]
Goethals B, Laurent D, Page WL, Dieng CT (2012) Mining frequent conjunctive queries in relational databases through dependency discovery. Knowl Inf Syst. s10115-012-0526-5.
[27]
Xiong N, Funk P (2008) Concise case indexing of time series in health care by means of key sequence discovery. Appl Intell 28(3):247-260.
[28]
Xie F, Wu X, Hu X, Gao J, Guo D, Fei Y, Hua E (2011) MAIL: mining sequential patterns with wildcards. Int J Data Min Bioinforma. http://www.inderscience.com/coming.php?ji=189&jc= ijdmb&np=9&jn=International%20Journal%20of%20Data%20 Mining%20and%20Bioinformatics.
[29]
Martínez-Trinidad JF, Carrasco-Ochoa JA, Ruiz-Shulcloper J (2011) RP-miner: a relaxed prune algorithm for frequent similar pattern mining. Knowl Inf Syst 27(3):451-471.
[30]
Wu Y, Wu X, Min F, Li Y (2010) A nettree for pattern matching with flexible wildcard constraints. In: Proceedings of the 2010 IEEE international conference on information reuse and integration (IRI 2010), Las Vegas, USA, pp 109-114.
[31]
Liu Y, Wu X, Hu X, Gao J et al (2009) Pattern matching with wild-cards based on key character location. In: Proceedings of the 2009 IEEE international conference on information reuse and integration (IRI 2009), Las Vegas, USA, pp 167-170.
[32]
National Center for Biotechnology Information (2009) GenBank sequences from pandemic (H1N1) 2009 viruses. http://www.ncbi. nlm.nih.gov/genomes/FLU/SwineFlu.html
[33]
Artificial data. http://dmic.hfut.edu.cn/HFUT_DMIC/DanGuo/test

Cited By

View all
  • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 13-Apr-2024
  • (2022)OWSP-Miner: Self-adaptive One-off Weak-gap Strong Pattern MiningACM Transactions on Management Information Systems10.1145/347624713:3(1-23)Online publication date: 4-Feb-2022
  • (2022)NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off conditionApplied Intelligence10.1007/s10489-021-03000-252:11(12155-12174)Online publication date: 1-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Intelligence
Applied Intelligence  Volume 39, Issue 1
July 2013
216 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2013

Author Tags

  1. Centrality-degree
  2. Length constraints
  3. One-off condition
  4. Pattern matching
  5. Wildcards

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 13-Apr-2024
  • (2022)OWSP-Miner: Self-adaptive One-off Weak-gap Strong Pattern MiningACM Transactions on Management Information Systems10.1145/347624713:3(1-23)Online publication date: 4-Feb-2022
  • (2022)NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off conditionApplied Intelligence10.1007/s10489-021-03000-252:11(12155-12174)Online publication date: 1-Sep-2022
  • (2022)Self-adaptive nonoverlapping sequential pattern miningApplied Intelligence10.1007/s10489-021-02763-y52:6(6646-6661)Online publication date: 1-Apr-2022
  • (2020)NetDAP: (δ, γ) −approximate pattern matching with length constraintsApplied Intelligence10.1007/s10489-020-01778-150:11(4094-4116)Online publication date: 1-Nov-2020
  • (2018)Efficient pattern matching with periodical wildcards in uncertain sequencesIntelligent Data Analysis10.3233/IDA-17343522:4(829-842)Online publication date: 27-Jun-2018
  • (2016)Frequent and non-frequent pattern detection in big data streamsProceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.5555/3192424.3192600(931-938)Online publication date: 18-Aug-2016
  • (2016)Approximate pattern matching with gap constraintsJournal of Information Science10.1177/016555151560328642:5(639-658)Online publication date: 1-Oct-2016
  • (2016)Repeated patterns detection in big data using classification and parallelism on LERP Reduced Suffix ArraysApplied Intelligence10.1007/s10489-016-0766-245:3(567-597)Online publication date: 1-Oct-2016
  • (2015)Strict approximate pattern matching with general gapsApplied Intelligence10.1007/s10489-014-0612-342:3(566-580)Online publication date: 1-Apr-2015
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media