An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments
<p>The updated database UD which is the combination of <span class="html-italic">SDB</span> and <span class="html-italic">sdb</span>.</p> "> Figure 2
<p>Reticular Sequence Tree.</p> "> Figure 3
<p>The Construction Process.</p> "> Figure 4
<p>The Counted Results.</p> "> Figure 5
<p>Diamond Structure.</p> "> Figure 6
<p>The Monitoring Data Flow.</p> "> Figure 7
<p>The Runtimes on the Dataset of <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mrow> <mi>U</mi> <mi>D</mi> </mrow> <mo>|</mo> </mrow> <mo>=</mo> <mn>100</mn> <mi>K</mi> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>100</mn> <mi>K</mi> </mrow> </semantics></math>.</p> "> Figure 8
<p>The Runtimes on the Dataset of <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mrow> <mi>U</mi> <mi>D</mi> </mrow> <mo>|</mo> </mrow> <mo>=</mo> <mn>100</mn> <mi>K</mi> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>10</mn> <mi>K</mi> </mrow> </semantics></math>.</p> "> Figure 9
<p>The Runtimes on the Dataset of <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mrow> <mi>U</mi> <mi>D</mi> </mrow> <mo>|</mo> </mrow> <mo>=</mo> <mn>200</mn> <mi>K</mi> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>10</mn> <mi>K</mi> </mrow> </semantics></math>.</p> "> Figure 10
<p>The Runtime Comparison with Three Novel SPM Algorithms.</p> "> Figure 11
<p>The Runtime Comparison with the Practical SPM Algorithms.</p> "> Figure 12
<p>The Runtimes on BMSWebView-1 with TIR = 0.1% and PIR = 0.01%.</p> "> Figure 13
<p>The Runtimes on Kosarak with TIR = 0.1% and PIR = 0.01%.</p> "> Figure 14
<p>The Runtimes on the Synthetic Dataset 1 with <math display="inline"><semantics> <mrow> <msub> <mi>R</mi> <mrow> <mi>s</mi> <mi>d</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>5</mn> <mo>%</mo> <mo>,</mo> <msub> <mi>R</mi> <mrow> <mi>n</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> <mo>=</mo> <mn>50</mn> <mo>%</mo> </mrow> </semantics></math>.</p> "> Figure 15
<p>The Runtimes on the Synthetic Dataset 2 with <math display="inline"><semantics> <mrow> <msub> <mi>R</mi> <mrow> <mi>s</mi> <mi>d</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>5</mn> <mo>%</mo> <mo>,</mo> <msub> <mi>R</mi> <mrow> <mi>n</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> <mo>=</mo> <mn>50</mn> <mo>%</mo> </mrow> </semantics></math>.</p> "> Figure 16
<p>The Runtimes with the Different Settings of <math display="inline"><semantics> <mrow> <msub> <mi>R</mi> <mrow> <mi>s</mi> <mi>d</mi> <mi>b</mi> </mrow> </msub> </mrow> </semantics></math>.</p> "> Figure 17
<p>The Runtimes with the Different Settings of <math display="inline"><semantics> <mrow> <msub> <mi>R</mi> <mrow> <mi>n</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> </mrow> </semantics></math>.</p> "> Figure 18
<p>The Amounts of Potential Frequent Sequences on the Different minsups.</p> "> Figure 19
<p>The Memory Usage on the Dataset of <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mrow> <mi>U</mi> <mi>D</mi> </mrow> <mo>|</mo> </mrow> <mo>=</mo> <mn>100</mn> <mi>K</mi> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>100</mn> <mi>K</mi> </mrow> </semantics></math>.</p> "> Figure 20
<p>The Memory Usage Comparison with Three Novel SPM Algorithms.</p> ">
Abstract
:1. Introduction
2. Definitions
3. An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in a WSN Environment
3.1. PrefixSpan+
Algorithm 1 The core code of PrefixSpan+ | |
Input: Sequence dataset S, Threshold of support degree min_sup | |
Output: The set of frequent sequential pattern FS | |
1: | Scan S, count the support number of all the items with length = 1, and delete the items from S, of which the support degree is less than min_sup, then obtain S’. The set of length-1 sequential pattern P contains all the items that are satisfied with the required support degree. |
2: | FS = P |
3: | Extension(S’, P) |
4: | For each , construct CMAP for from S’ |
5: | if |
6: | return FS |
7: | else |
8: | if , traverse |
9: | extend all the items in to by i-extension, and obtain |
10: | if , traverse |
11: | extend all the items in to by s-extension, and obtain |
12: | // Property 3 |
13: | , |
14: | Extension(S’, P’) |
3.2. Obtain Potential Frequent Sequence
- 1.
- Frequent sequences in the original SDB;
- 2.
- The sequences in sdb of which the support number are no less than ;
- 3.
- The sequences discovered from the prefix projection database of potential frequent sequence of sdb in DD, and .
- 1.
- The sequences emerged in sdb, including the sequences existing in SDB and the new sequences produced from sdb;
- 2.
- The new sequences produced from the combination of OD and old.
Algorithm 2 GetPFS (OD, DD, FS, old, new) | |
Input: OD, DD, FS, old, new | |
Output: PFS | |
1: | Put FS into PFS |
2: | Set |
3: | Putinto PFS |
4: | Forin PFS |
5: | Construct projection database of DD to the part of that belongs to OD |
6: | Set |
7: | Put into PFS |
3.3. Fast Support Number-Counting Algorithm
- Construct a reticular sequence tree;
- Count the support number on the constructed reticular sequence tree by traversing the updated sequence database;
- Count the real support number and determine the frequent sequences.
4. Experimental Analysis
4.1. Generation of Incremental Database
4.2. Comparisons
4.2.1. Time Cost on Real Monitoring Datasets
4.2.2. Time Cost on Benchmarking Datasets and Synthetic Datasets
4.2.3. Sensitivity of and
4.2.4. Space Cost
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Yang, D.; Xu, B.; Rao, K.Y.; Sheng, W.H. Passive infrared (PIR)-based indoor position tracking for smart homes using accessibility maps and A-Star algorithm. Sensors 2018, 18, 332. [Google Scholar] [CrossRef] [PubMed]
- Cruz-Piris, L.; Rivera, D.; Fernandez, S.; Marsa-Maestre, I. Optimized sensor network and multi-agent decision support for smart traffic light management. Sensors 2018, 18, 435. [Google Scholar] [CrossRef] [PubMed]
- Gu, Y.L.; Hsu, L.T.; Kamijo, S. Passive sensor integration for vehicle self-localization in urban traffic environment. Sensors 2015, 15, 30199–30220. [Google Scholar] [CrossRef] [PubMed]
- Collier-Oxandale, A.; Coffey, E.; Thorson, J.; Johnston, J.; Hannigan, M. Comparing building and neighborhood-scale variability of CO2 and O3 to inform deployment considerations for low-cost sensor system use. Sensors 2018, 18, 1349. [Google Scholar] [CrossRef] [PubMed]
- Meng, X.L.; Nguyen, D.T.; Xie, Y.L.; Owen, J.S.; Psimoulis, P.; Ince, S.; Chen, Q.; Ye, J.; Bhatia, P. Design and implementation of a new system for large bridge monitoring-GeoSHM. Sensors 2018, 18, 775. [Google Scholar] [CrossRef] [PubMed]
- Weekly, K.; Jin, M.; Zou, H.; Hsu, C.; Soyza, C.; Bayen, A.; Spanos, C. Building-in-Briefcase: A rapidly-deployable environmental sensor suite for the smart building. Sensors 2018, 18, 1381. [Google Scholar] [CrossRef] [PubMed]
- Justino, C.L.; Duarte, A.C.; P, T. Rocha-Santos. Recent progress in biosensors for environmental monitoring: A review. Sensors 2017, 17, 918. [Google Scholar] [CrossRef]
- Klein, I.; Diamant, R. Observability analysis of DVL/PS aided INS for a maneuvering AUV. Sensors 2015, 15, 26818–26837. [Google Scholar] [CrossRef]
- Wu, Y.X.; Tong, Y.; Zhu, X.Q.; Wu, X.D. NOSEP: Nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern. 2017, 48, 2809–2822. [Google Scholar] [CrossRef]
- Philippe, F.V.; Wu, C.W.; Tseng, V.S.; Cao, L.; Nkambou, R. Mining partially-ordered sequential rules common to multiple sequences. IEEE Trans. Knowl. Data Eng. 2015, 27, 2203–2216. [Google Scholar] [CrossRef]
- Agrawal, R.; Srikant, R. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, 6–10 March 1995; pp. 3–14. [Google Scholar] [Green Version]
- Ayres, J.; Flannick, J.; Gehrke, J.; Yiu, T. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 429–435. [Google Scholar]
- Zaki, M.J. SPADE: An efficient algorithm for mining frequent sequences. Mach. Learn. 2001, 42, 31–60. [Google Scholar] [CrossRef]
- Philippe, F.V.; Antonio, G.; Ted, G.; Soltani, A.; Wu, C.W.; Tseng, V.S. SPMF: A java open-source pattern mining library. J. Mach. Learn. Res. 2014, 15, 3389–3393. [Google Scholar]
- Philippe, F.V.; Chun-Wei, L.J.; Antonio, G.; Gueniche, T.; Soltani, A.; Deng, Z.; Lam, H.T. The SPMF open-source data mining library version 2. Mach. Learn. Knowl. Discov. Databases 2016, 9853, 36–40. [Google Scholar] [CrossRef]
- Cheng, H.; Yan, X.F.; Han, J.W. IncSpan: Incremental mining of sequential patterns in large database. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; pp. 527–532. [Google Scholar]
- Pei, J.; Han, J.W.; Mortazavi-Asl, B.; Pinto, H.; Chen, Q.; Dayal, U.; Hsu, M.C. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the Seventeenth International Conference on Data Engineering, Heidelberg, Germany, 2–6 April 2001; pp. 215–224. [Google Scholar]
- Nguyen, S.N.; Sun, X.Z.; Orlowska, M.E. Improvements of IncSpan: Incremental mining of sequential patterns in large database. In Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hanoi, Vietnam, 18–20 May 2005; pp. 442–451. [Google Scholar]
- Liu, J.X.; Yan, S.T.; Wang, Y.Y.; Ren, J.D. Incremental mining algorithm of sequential patterns based on sequence tree. In Proceedings of the Selected papers from 2012 International Conference on Control Systems: Advances in Intelligent Systems, Hong Kong, China, 1–2 March 2012; pp. 61–67. [Google Scholar]
- Zhang, B.B.; Lin, C.W.; Gan, W.S.; Hong, T.P. Maintaining the discovered sequential patterns for sequence insertion in dynamic databases. Eng. Appl. Artif. Intell. 2014, 35, 131–142. [Google Scholar] [CrossRef]
- Lin, J.C.W.; Hong, T.P.; Gan, W.S.; Chen, H.Y.; Li, S.T. Incrementally updating the discovered sequential patterns based on pre-large concept. Intell. Data Anal. 2015, 19, 1071–1089. [Google Scholar] [CrossRef]
- Lee, J.; Yun, U.; Lee, G.; Yoon, E. Efficient incremental high utility pattern mining based on pre-large concept. Eng. Appl. Artif. Intell. 2018, 72, 111–123. [Google Scholar] [CrossRef]
- Huynh, B.; Trinh, C.; Huynh, H.; Van, T.T.; Vo, B.; Snasel, V. An efficient approach for mining sequential patterns using multiple threads on very large databases. Eng. Appl. Artif. Intell. 2018, 74, 242–251. [Google Scholar] [CrossRef]
- Wang, J.Z.; Huang, J.L. Incremental mining of high utility sequential patterns in incremental databases. In Proceedings of the Twenty-fifth ACM International Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 2341–2346. [Google Scholar]
- Lin, J.C.W.; Zhang, J.X.; Fournier-Viger, P. High-Utility sequential pattern mining with multiple minimum utility thresholds. In Proceedings of the First Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Beijing, China, 7–9 July 2017; pp. 215–229. [Google Scholar]
- Tanbeer, S.K.; Hassan, M.M.; Almogren, A.; Zuair, M.; Jeong, B.S. Scalable regular pattern mining in evolving body sensor data. Future Gener. Comput. Syst. 2017, 75, 172–186. [Google Scholar] [CrossRef]
- Fournier-Viger, P.; Gomariz, A.; Campos, M.; Thomas, R. Fast vertical mining of sequential patterns using co-occurrence information. In Proceedings of the Seventeenth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Tainan, Taiwan, 13–16 May 2014; pp. 40–52. [Google Scholar]
- SPMF. Available online: http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php (accessed on 9 December 2018).
- Lin, M.Y.; Lee, S.Y. Incremental update on sequential patterns in large databases by implicit merging and efficient counting. Inform. Syst. 2004, 29, 385–404. [Google Scholar] [CrossRef]
- Boghey, R.K.; Singh, S. A sequential tree approach for incremental sequential pattern mining. Sadhana Acad. Proc. Eng. Sci. 2016, 41, 1369–1380. [Google Scholar] [CrossRef]
- Van, T.; Vo, B.; Le, B. Mining sequential patterns with itemset constraints. Knowl. Inform. Syst. 2018, 57, 311–330. [Google Scholar] [CrossRef]
- Cao, L.B.; Dong, X.J.; Zheng, Z.G. e-NSP: Efficient negative sequential pattern mining. Artif. Intell. 2016, 235, 156–182. [Google Scholar] [CrossRef]
- Dong, X.J.; Gong, Y.S.; Cao, L.B. F-NSP+: A fast negative sequential patterns mining method with self-adaptive data storage. Pattern Recognit. 2018, 84, 13–27. [Google Scholar] [CrossRef]
- Edman, H. Sequential Pattern Mining on Electronic Medical Records for Finding Optimal Clinical Pathways. Available online: http://www.nada.kth.se/~ann/exjobb/henrik_edman.pdf (accessed on 25 October 2018).
- Liu, D.W.; Cai, S.F.; Guo, X.H. Incremental sequential pattern mining algorithms of Web site access in grid structure database. Neural Comput. Appl. 2017, 28, 575–583. [Google Scholar] [CrossRef]
- Adam, O.; Abdullah, Z.; Ngah, A.; Mokhtar, K.; Ahmad, W.M.A.W.; Herawan, T.; Ahmad, N.; Deris, M.M.; Hamdan, A.R.; Abawajy, J.H. IncSPADE: An incremental sequential pattern mining algorithm based on SPADE property. Adv. Mach. Learn. Signal Process. 2016, 387, 81–92. [Google Scholar] [CrossRef]
- Synthetic Dataset 1. Available online: http://www.philippe-fournier-viger.com/spmf/datasets/data.slen_10.tlen_1.seq.patlen_2.lit.patlen_8.nitems_5000_spmf.txtSynthetic (accessed on 9 December 2018).
- Synthetic Dataset 2. Available online: http://www.philippe-fournier-viger.com/spmf/datasets/data.slen_8.tlen_1.seq.patlen_4.lit.patlen_8.nitems_5000_spmf.txt (accessed on 9 December 2018).
sdb | Increment sequence database |
UD | Updated database |
OD | Sequences contained in dotted line area |
DD | Combination of OD and old |
The number of element in set M | |
Support of in SDB | |
Support number of in SDB | |
Support of in sdb | |
Support number of in sdb | |
Support of in UD | |
Support number of in UD | |
old | The sequences contained in both SDB and sdb |
new | The sequences contained in sdb that are different from SDB |
Support number of in OD | |
Support number of in DD | |
Support number of in old | |
Support number of in new |
SID | Sequence |
---|---|
1 | |
2 | |
3 | |
4 | |
5 | |
6 |
SID | Sequence |
---|---|
7 | |
8 | |
9 |
SID | Sequence |
---|---|
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 |
SID | Sequence |
---|---|
2 | |
4 | |
5 | |
8 | |
10 | |
11 | |
12 |
SID | Sequence |
---|---|
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 |
SID | Sequence |
---|---|
2 | |
4 | |
5 | |
8 |
SID | Sequence |
---|---|
2 | |
4 | |
5 | |
8 |
SID | Sequence |
---|---|
1 | |
2 | |
3 | |
4 | |
5 |
Prefix | (10) | (20) | (30) | (40) | (50) | (60) | (70) | (80) | (90) |
Support Number | 1 | 1 | 4 | 2 | 1 | 1 | 3 | 2 | 3 |
SID | Sequence |
1 | |
2 | |
3 | |
4 | |
5 |
Pattern | ||
---|---|---|
30 | 70, 80 | 40, 70, 90 |
40 | 70 | - |
70 | 80 | - |
80 | - | - |
90 | - | - |
Pattern | ||
---|---|---|
(30)(40) | 70 | - |
(30, 70) | 80 | - |
Frequent Sequential Pattern | |
---|---|
length-1 sequential pattern | |
length-2 sequential pattern | |
length-3 sequential pattern |
Parameter | Definition |
---|---|
Amount of sequences in SDB, unit: 1000 | |
Average amount of items | |
Average amount of elements | |
Length of potential frequent sequence | |
Number of items in potential frequent itemset | |
Number of different elements in sequence database | |
Number of potential frequent itemsets | |
Number of potential frequent sequences | |
Amount of sequences in UD, unit: 1000 | |
Ratio of incremental database sdb to updated database UD | |
Ratio of new sequences to incremental database sdb |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lyu, X.; Ma, H. An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments. Sensors 2019, 19, 29. https://doi.org/10.3390/s19010029
Lyu X, Ma H. An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments. Sensors. 2019; 19(1):29. https://doi.org/10.3390/s19010029
Chicago/Turabian StyleLyu, Xin, and Hongxu Ma. 2019. "An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments" Sensors 19, no. 1: 29. https://doi.org/10.3390/s19010029
APA StyleLyu, X., & Ma, H. (2019). An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments. Sensors, 19(1), 29. https://doi.org/10.3390/s19010029