Entropy-Based Approach in Selection Exact String-Matching Algorithms
<p>Exact string matching.</p> "> Figure 2
<p>Methodology for building model based on the entropy approach for string search algorithms selection.</p> "> Figure 3
<p>Methodology implementation for the DNA domain.</p> "> Figure 4
<p>Methodology implementation for the natural language domain.</p> "> Figure 5
<p>Algorithms ranking for entropy class 1.</p> "> Figure 6
<p>Patterns used in the validation phase for the entropy class 9 of the DNA domain.</p> "> Figure 7
<p>Patterns used in model building for the entropy class 9 of the DNA domain.</p> "> Figure 8
<p>Pearson’s linear correlation coefficient.</p> ">
Abstract
:1. Introduction
2. String-Matching
3. Methodology
3.1. Methodology Description
3.1.1. Representative Patterns Sample Size
3.1.2. Entropy
3.1.3. Formal Metric Description
3.2. Methodology Implementation
3.2.1. Selection of Representative Texts for Domain Model Building
- DNA sequences of nucleotides for the DNA domain
- ⮚
- Anabarilius graham (Kanglang fish; RJVU01051648.1 Anabarilius grahami isolate AG-KIZ scaffold371_cov124, whole genome shotgun sequence, 14.747.523 bp (base pairs), 14.3 Mb file size) [46]
- ⮚
- Chelonia mydas (green sea turtle; NW_006571126.1 Chelonia mydas unplaced genomic scaffold, CheMyd_1.0 scaffold1, whole genome shotgun sequence, 7.392.783 , 7.1 Mb) [47]
- ⮚
- Escherichia coli (NZ_LN874954.1 Escherichia coli strain LM33 isolate patient, whole genome shotgun sequence, 49.02.341 bp, 4.8 Mb) [48]
- ⮚
- Macaca mulatta (Rhesus macaque monkey; ML143108.1 Macaca mulatta isolate AG07107 chromosome 19 genomic scaffold ScNM3vo_33 × 44 M, whole genome shotgun sequence, 24.310.526 bp, 24.3 Mb) [49]
- English texts for the natural language domain
- ⮚
- Bible (The King James Version, with a size of 4.047.392 bytes) [50]
3.2.2. Selection of Algorithms
3.2.3. Searching Results for Representative Patterns
3.2.4. Patterns Entropy Calculation
3.2.5. Entropies Discretization
4. Classification of Algorithms in the Built Model
5. Methodology Validation and Discussion
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References and Notes
- Xiong, J. Essential Bioinformatics; Cambridge University Press: Cambridge, UK, 2006; ISBN 9780521600828. [Google Scholar]
- Pizzi, C.; Ornamenti, M.; Spangaro, S.; Rombo, S.E.; Parida, L. Efficient algorithms for sequence analysis with entropic profiles. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 15, 117–128. [Google Scholar] [CrossRef] [PubMed]
- Faro, S.; Lecroq, T.; Borz, S. The String Matching Algorithms Research Tool. Proc. Prague Stringol. Conf. 2016, 99–113. [Google Scholar]
- Al-Khamaiseh, K.; Alshagarin, S. A Survey of String Matching Algorithms. J. Eng. Res. Appl. 2014, 4, 144–156. [Google Scholar]
- SaiKrishna, V.; Rasool, P.A.; Khare, D.N. String Matching and its Application in Diversified Fields. IJCSI Int. J. Comput. Sci. Issues 2012, 9, 219–226. [Google Scholar]
- Sedgewick, R.; Flajolet, P. An Introduction to the Analysis of Algorithms, 2nd ed.; Addison-Wesley/Pearson Education: Westford, MA, USA, 2013; ISBN 9780321905758. [Google Scholar]
- Michailidis, P.D.; Margaritis, K.G. On-line string matching algorithms: Survey and experimental results. Int. J. Comput. Math. 2001, 76, 411–434. [Google Scholar] [CrossRef]
- Faro, S. Evaluation and improvement of fast algorithms for exact matching on genome sequences. In International Conference on Algorithms for Computational Biology; Springer: Cham, Switzerland, 2016; Volume 9702, pp. 145–157. [Google Scholar] [CrossRef]
- Hume, A.; Sunday, D. Fast string searching. Softw. Pract. Exp. 1991, 21, 1221–1248. [Google Scholar] [CrossRef]
- Navarro, G.; Raffinot, M. Flexible Pattern Matching in Strings: Practical Online Search Algorithms for Texts and Biological Sequences. Computer 2002, 35. [Google Scholar] [CrossRef]
- Hakak, S.I.; Kamsin, A.; Shivakumara, P.; Gilkar, G.A.; Khan, W.Z.; Imran, M. Exact String Matching Algorithms: Survey, Issues, and Future Research Directions. IEEE Access 2019, 7, 69614–69637. [Google Scholar] [CrossRef]
- Gusfield, D. Algorithms on strings, trees, and sequences: Computer science and computational biology. Theory Pract. 1997, 28, 554. [Google Scholar]
- Cormen, T.H.; Cormen, T.H. Introduction to Algorithms; MIT Press: Cambridge, MA, USA, 2001; ISBN 9780262032933. [Google Scholar]
- Jiji, N.; Mahalakshmi, T. Survey of Exact String Matching Algorithm for Detecting Patterns in Protein Sequence. Adv. Comput. Sci. Technol. 2017, 10, 2707–2720. [Google Scholar]
- Singla, N.; Garg, D. String Matching Algorithms and their Applicability in various Applications. Int. J. Soft Comput. Eng. 2012, 1, 2231–2307. [Google Scholar]
- Myatt, G.J.; Johnson, W.P. Making Sense of Data I a Practical Guide to Exploratory Data Analysis and Data Mining, 2nd ed.; John Wiley & Sons, Inc.: Somerset, NJ, USA, 2014; ISBN 9781118407417. [Google Scholar]
- Manikandan, S. Frequency distribution. J. Pharmacol. Pharmacother. 2011, 2, 54. [Google Scholar] [CrossRef] [PubMed]
- Bartlett, J.; Kotrlik, J.; Higgins, C. Organizational research: Determining appropriate sample size in survey research. Inf. Technol. Learn. Perform. J. 2001, 19, 43. [Google Scholar]
- Taherdoost, H. Determining Sample Size; How to Calculate Survey Sample Size. Int. J. Econ. Manag. Syst. 2017, 2, 237–239. [Google Scholar]
- Israel, G.D. Determining Sample Size; University of Florida: Gainesville, FL, USA, 1992. [Google Scholar]
- Mohammed, R. Information Analysis of DNA Sequences. arXiv 2010, arXiv:1010.4205, 1–22. [Google Scholar]
- Schmitt, A.O.; Herzel, H. Estimating the entropy of DNA sequences. J. Theor. Biol. 1997, 188, 369–377. [Google Scholar] [CrossRef]
- Ebeling, W.; Nicolis, G. Word frequency and entropy of symbolic sequences: A dynamical perspective. Chaos Solitons Fractals 1992, 2, 635–650. [Google Scholar] [CrossRef]
- Herzel, H.; Ebeling, W.; Schmitt, A.O. Entropies of biosequences: The role of repeats. Phys. Rev. E 1994, 50, 5061–5071. [Google Scholar] [CrossRef]
- Lesne, A.; Blanc, J.L.; Pezard, L. Entropy estimation of very short symbolic sequences. Phys. Rev. E 2009, 79, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Rhodes, P.C.; Garside, G.R. Use of maximum entropy method as a methodology for probabilistic reasoning. Knowl. Based Syst. 1995, 8, 249–258. [Google Scholar] [CrossRef]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Muchnik, A.; Vereshchagin, N. Shannon entropy vs. kolmogorov complexity. In International Computer Science Symposium in Russia; Springer: Berlin/Heidelberg, Germany, 2006; pp. 281–291. [Google Scholar] [CrossRef]
- Grunwald, P.; Vitanyi, P. Shannon Information and Kolmogorov Complexity. 2004. Available online: https://arxiv.org/pdf/cs/0410002.pdf (accessed on 4 May 2020).
- Teixeira, A.; Matos, A.; Souto, A.; Antunes, L. Entropy Measures vs. Kolmogorov Complexity. Entropy 2011, 13, 595–611. [Google Scholar] [CrossRef]
- Goulão, M.; Brito e Abreu, F. Formal definition of metrics upon the CORBA component model. In Quality of Software Architectures and Software Quality; Springer: Berlin/Heidelberg, Germany, 2005; pp. 88–105. [Google Scholar] [CrossRef] [Green Version]
- Barabucci, G.; Ciancarini, P.; Di Iorio, A.; Vitali, F. Measuring the quality of diff algorithms: A formalization. Comput. Stand. Interfaces 2016, 46, 52–65. [Google Scholar] [CrossRef]
- Ivkovic, N.; Jakobovic, D.; Golub, M. Measuring Performance of Optimization Algorithms in Evolutionary Computation. Int. J. Mach. Learn. Comput. 2016, 6, 167–171. [Google Scholar] [CrossRef]
- Aho, A.V.; Hopcroft, J.E.; Ullman, J.D. The Design and Analysis of Computer Algorithms; Addison-Wesley Pub. Co.: Reading, MA, USA, 1974; ISBN 9780201000290. [Google Scholar]
- Hromkovič, J. Theoretical Computer Science: Introduction to Automata, Computability, Complexity, Algorithmics, Randomization, Communication, and Cryptography; Springer: Berlin/Heidelberg, Germany, 2004; ISBN 3540140158. [Google Scholar]
- Jain, P.; Pandey, S. Comparative Study on Text Pattern Matching for Heterogeneous System. Int. J. Comput. Sci. Eng. Technol. 2012, 3, 537–543. [Google Scholar]
- Pandiselvam, P.; Marimuthu, T.; Lawrance, R. A comparative study on string matching algorithms of biological sequences. Int. Conf. Intell. Comput. 2014, 2014, 1–5. [Google Scholar]
- Faro, S.; Lecroq, T. The Exact Online String Matching Problem: A Review of the Most Recent Results. Acm Comput. Surv. 2013, 45, 13. [Google Scholar] [CrossRef]
- Lecroq, T.; Charras, C. Handbook od Exact String Matching; Laboratoire d’Informatique de Rouen Université de Rouen: Rouen, France, 2001. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley and Sons: Hoboken, NJ, USA, 2005; ISBN 9780471241959. [Google Scholar]
- Kucak, D.; Djambic, G.; Fulanovic, B. An empirical study of algorithms performance in implementations of set in Java. In Proceedings of the 23rd DAAAM International Symposium on Intelligent Manufacturing and Automation 2012, Zadar, Croatia, 24–27 October 2012; Volume 1, pp. 565–568. [Google Scholar]
- Alhendawi, K.M.; Baharudin, A.S. String Matching Algoritms (SMAs): Survey & Empirical Analysis. J. Comput. Sci. Manag. 2013, 2, 2637–2644. [Google Scholar]
- The Canterbury Corpus. Available online: http://corpus.canterbury.ac.nz/ (accessed on 21 December 2020).
- Compeau, P.; Pevzner, P. Bioinformatics Algorithms: An Active Learning Approach; Active Learning Publishers: La Jolla, CA, USA, 2015; Volume 1, ISBN 0990374602. [Google Scholar]
- Markić, I.; Štula, M.; Jukić, M. Pattern Searching in Genome. Int. J. Adv. Comput. Technol. 2018, 10, 36–46. [Google Scholar]
- Anabarilius grahami isolate AG-KIZ scaffold371_cov124, whole genome sh—Nucleotide—NCBI.
- Chelonia mydas unplaced genomic scaffold, CheMyd_1.0 scaffold1, whole—Nucleotide—NCBI.
- Escherichia coli strain LM33 isolate patient, whole genome shotgun seq—Nucleotide—NCBI.
- Macaca mulatta isolate AG07107 chromosome 19 genomic scaffold ScNM3vo_—Nucleotide—NCBI.
- The Canterbury Corpus—The King James Version of the Bible. Available online: https://corpus.canterbury.ac.nz/descriptions/. (accessed on 13 February 2020).
- Boyer, R.S.; Moore, J.S. A fast string searching algorithm. Commun. ACM 1977, 20, 762–772. [Google Scholar] [CrossRef]
- Knuth, D.E.; Morris, J.H.; Pratt, V.R.; Morris, J.H., Jr.; Pratt, V.R. Fast Pattern Matching in Strings. SIAM J. Comput. 1977, 6, 323–350. [Google Scholar] [CrossRef] [Green Version]
- Apostolico, A.; Crochemore, M. Optimal canonization of all substrings of a string. Inf. Comput. 1991, 95, 76–95. [Google Scholar] [CrossRef] [Green Version]
- Sunday, D.M. A very fast substring search algorithm. Commun. ACM 1990, 33, 132–142. [Google Scholar] [CrossRef]
- Horspool, R.N. Practical fast searching in strings. Softw. Pract. Exp. 1980, 10, 501–506. [Google Scholar] [CrossRef]
- Hakak, S.; Kamsin, A.; Shivakumara, P.; Idris, M.Y.I.; Gilkar, G.A. A new split based searching for exact pattern matching for natural texts. PLoS ONE 2018, 13, e0200912. [Google Scholar] [CrossRef] [Green Version]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Hum. Commun. Sci. SummerFest 2007, 24. Available online: https://csem.flinders.edu.au/research/techreps/SIE07001.pdf (accessed on 22 March 2020).
- National Center for Biotechnology Information. Available online: https://www.ncbi.nlm.nih.gov/ (accessed on 15 August 2019).
- Wheelan, C. Naked Statistics: Stripping the Dread from the Data; WW Norton & Co.: New York, NY, USA, 2013; ISBN 978-0-39307-195-5. [Google Scholar]
- Barrett, P. Euclidean Distance Raw, Normalized, and Double-Scaled Coefficients. 2005. Available online: https://www.pbarrett.net/techpapers/euclid.pdf (accessed on 16 September 2020).
- Anton, H. Elementary Linear Algebra, 11th ed.; Wiley: New York, NY, USA, 2019; ISBN 978-1-119-62569-8. [Google Scholar]
- Rodgers, J.L.; Nicewander, W.A. Thirteen Ways to Look at the Correlation Coefficient. Am. Stat. 1988, 42, 59–66. [Google Scholar] [CrossRef]
- Raw Data for Entropy Based Approach in Selection Exact String Matching Algorithms. Available online: https://www.dropbox.com/t/kXKUZeIIVpw3hU5O (accessed on 3 November 2020).
Pattern | PattEnt | PattEntRound | PattEntClass |
---|---|---|---|
AAAAAAAA | 0.0000000000000 | 0.00 | <0.22222 |
TTTTTTTTCTTTTTTT | 0.3372900666170 | 0.34 | 0.22222–0.44444 |
AACAAAAA | 0.5435644431996 | 0.54 | 0.44444–0.66667 |
AAAAAAACAAACAACA | 0.6962122601251 | 0.70 | 0.66667–0.88889 |
TGGTAAAAAAAAAAAA | 1.0612781244591 | 1.06 | 0.88889–1.11111 |
AAAAAGCG | 1.2987949406954 | 1.30 | 1.11111–1.33333 |
CAAG | 1.5000000000000 | 1.50 | 1.33333–1.55556 |
CCTACTAAACACCGTA | 1.7640976555739 | 1.76 | 1.55556–1.77778 |
GCATACCTTTCGCAGC | 1.9362781244591 | 1.94 | ≥1.77778 |
Class No. | Entropy Class | Number of Patterns |
---|---|---|
1 | <0.22222 | 9 |
2 | 0.22222–0.44444 | 1 |
3 | 0.44444–0.66667 | 23 |
4 | 0.66667–0.88889 | 72 |
5 | 0.88889–1.11111 | 195 |
6 | 1.11111–1.33333 | 278 |
7 | 1.33333–1.55556 | 961 |
8 | 1.55556–1.77778 | 1.451 |
9 | ≥1.77778 | 4.692 |
Total | 7.682 |
Pattern | PattEnt | PattEntRound | PattEntClass |
---|---|---|---|
Mm | 0.0000000000000 | 0.00 | <0.46004 |
We | 1.0000000000000 | 1.00 | 0.92007–1.38011 |
Full | 1.5000000000000 | 1.50 | 1.38011–1.84014 |
off from | 2.1556390622295 | 2.16 | 1.84014–2.30018 |
ine enemies, eve | 2.6556390622295 | 2.66 | 2.30018–2.76021 |
Joseph remembere | 3.0243974703477 | 3.02 | 2.76021–3.22025 |
e: The LORD lift | 3.5778195311147 | 3.58 | 3.22025–3.68028 |
they have, and deliver our lives | 3.7508072359050 | 3.75 | ≥3.68028 |
Class No. | Entropy Class | Number of Patterns |
---|---|---|
1 | <0.46004 | 3 |
2 | 0.46004–0.92007 | 0 |
3 | 0.92007–1.38011 | 151 |
4 | 1.38011–1.84014 | 53 |
5 | 1.84014–2.30018 | 383 |
6 | 2.30018–2.76021 | 393 |
7 | 2.76021–3.22025 | 283 |
8 | 3.22025–3.68028 | 556 |
9 | ≥3.68028 | 221 |
Total | 2.043 |
Entropy Class/Algorithm | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Quartile 1 | |||||||||
AC | 20.59% | 0.00% | 20.00% | 12.61% | 5.74% | 8.99% | 8.51% | 5.43% | 2.53% |
BF | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
BM | 47.92% | 100.00% | 56.67% | 51.82% | 57.20% | 61.11% | 58.26% | 61.95% | 62.83% |
HOR | 38.24% | 0.00% | 46.67% | 49.55% | 49.88% | 51.06% | 49.92% | 53.92% | 54.60% |
KMP | 14.71% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
MP | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
QS | 55.88% | 100.00% | 53.33% | 51.35% | 55.36% | 53.97% | 54.08% | 53.68% | 54.85% |
Quartile 2 | |||||||||
AC | 29.41% | 100.00% | 16.67% | 35.14% | 41.15% | 31.22% | 38.03% | 36.47% | 34.34% |
BF | 23.53% | 0.00% | 13.33% | 22.97% | 20.95% | 18.52% | 21.75% | 21.58% | 26.36% |
BM | 18.75% | 0.00% | 43.33% | 28.38% | 24.28% | 29.63% | 22.92% | 24.38% | 23.21% |
HOR | 20.59% | 100.00% | 30.00% | 14.41% | 20.95% | 29.89% | 22.54% | 24.32% | 18.41% |
KMP | 35.29% | 0.00% | 13.33% | 26.58% | 20.95% | 18.52% | 26.65% | 21.58% | 27.22% |
MP | 23.53% | 0.00% | 13.33% | 22.97% | 20.95% | 18.52% | 21.75% | 21.58% | 27.22% |
QS | 14.71% | 0.00% | 43.33% | 23.42% | 25.94% | 28.57% | 21.63% | 25.08% | 18.25% |
Quartile 3 | |||||||||
AC | 23.53% | 0.00% | 53.33% | 25.23% | 29.18% | 36.24% | 26.54% | 31.67% | 31.41% |
BF | 26.47% | 0.00% | 16.67% | 24.77% | 27.43% | 25.66% | 24.56% | 24.36% | 26.01% |
BM | 25.00% | 0.00% | 0.00% | 19.80% | 16.05% | 9.26% | 18.82% | 13.68% | 13.95% |
HOR | 29.41% | 0.00% | 23.33% | 35.59% | 26.18% | 19.05% | 27.38% | 21.77% | 26.72% |
KMP | 23.53% | 100.00% | 43.33% | 21.17% | 34.66% | 41.53% | 29.63% | 37.77% | 25.15% |
MP | 26.47% | 0.00% | 33.33% | 24.77% | 27.43% | 25.66% | 24.56% | 24.56% | 25.15% |
QS | 29.41% | 0.00% | 3.33% | 25.23% | 15.96% | 17.46% | 24.28% | 21.24% | 26.65% |
Quartile 4 | |||||||||
AC | 26.47% | 0.00% | 10.00% | 27.03% | 23.94% | 23.54% | 26.93% | 26.43% | 31.72% |
BF | 50.00% | 100.00% | 70.00% | 52.25% | 51.62% | 55.82% | 53.69% | 54.06% | 47.63% |
BM | 8.33% | 0.00% | 0.00% | 0.00% | 2.47% | 0.00% | 0.00% | 0.00% | 0.01% |
HOR | 11.76% | 0.00% | 0.00% | 0.45% | 2.99% | 0.00% | 0.17% | 0.00% | 0.27% |
KMP | 26.47% | 0.00% | 43.33% | 52.25% | 44.39% | 39.95% | 43.72% | 40.65% | 47.63% |
MP | 50.00% | 100.00% | 53.33% | 52.25% | 51.62% | 55.82% | 53.69% | 53.87% | 47.63% |
QS | 0.00% | 0.00% | 0.00% | 0.00% | 2.74% | 0.00% | 0.00% | 0.00% | 0.25% |
Entropy Class/Algorithm | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Quartile 1 | |||||||||
AC | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
BF | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
BM | 66.67% | 0.00% | 40.56% | 60.38% | 57.18% | 64.05% | 65.02% | 58.81% | 66.06% |
HOR | 33.33% | 0.00% | 35.06% | 30.19% | 38.72% | 41.01% | 51.24% | 56.47% | 52.04% |
KMP | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
MP | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
QS | 100.00% | 0.00% | 100.00% | 84.91% | 79.23% | 70.13% | 59.01% | 59.71% | 57.01% |
Quartile 2 | |||||||||
AC | 100.00% | 0.00% | 51.67% | 50.94% | 50.00% | 50.13% | 58.66% | 50.18% | 72.85% |
BF | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
BM | 33.33% | 0.00% | 59.44% | 39.62% | 42.82% | 35.95% | 34.98% | 41.19% | 33.94% |
HOR | 66.67% | 0.00% | 64.94% | 69.81% | 61.28% | 58.99% | 48.76% | 43.53% | 47.96% |
KMP | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
MP | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
QS | 0.00% | 0.00% | 0.00% | 15.09% | 20.77% | 29.87% | 40.99% | 40.29% | 42.99% |
Quartile 3 | |||||||||
AC | 0.00% | 0.00% | 48.33% | 49.06% | 50.00% | 49.87% | 41.34% | 49.82% | 27.15% |
BF | 0.00% | 0.00% | 36.67% | 33.96% | 33.08% | 32.41% | 31.45% | 33.45% | 32.13% |
BM | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
HOR | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
KMP | 0.00% | 0.00% | 44.44% | 49.06% | 45.90% | 46.58% | 47.70% | 46.04% | 47.06% |
MP | 33.33% | 0.00% | 44.44% | 41.51% | 45.90% | 46.08% | 45.94% | 45.50% | 46.15% |
QS | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
Quartile 4 | |||||||||
AC | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
BF | 100.00% | 0.00% | 63.33% | 66.04% | 66.92% | 67.59% | 68.55% | 66.55% | 67.87% |
BM | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
HOR | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
KMP | 0.00% | 0.00% | 55.56% | 50.94% | 54.10% | 53.42% | 52.30% | 53.96% | 52.94% |
MP | 66.67% | 0.00% | 55.56% | 58.49% | 54.10% | 53.92% | 54.06% | 54.50% | 53.85% |
QS | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
i | Built Model (p1) | Validation (p2) | Scaled Euclidean (d1) |
---|---|---|---|
1 | 0.58263 | 0.96667 | 0.14749 |
2 | 0.22916 | 0.03333 | 0.03835 |
… | |||
28 | 0.43718 | 0.16667 | 0.07318 |
Model vs. Validation Result for: | DNA | Natural Language | ||
---|---|---|---|---|
Double-Scaled Euclidean Distance | Similarity Coefficient | Double-Scaled Euclidean Distance | Similarity Coefficient | |
Class 6 | 0.194 | 0.806 (80%) | ||
Class 7 | 0.227 | 0.773 (77%) | ||
Class 9 | 0.231 | 0.769 (77%) | 0.145 | 0.855 (86%) |
Model vs. Validation Result for: | DNA | Natural Language |
---|---|---|
Class 6 | 0.848 | |
Class 7 | 0.795 | |
Class 9 | 0.685 | 0.905 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Markić, I.; Štula, M.; Zorić, M.; Stipaničev, D. Entropy-Based Approach in Selection Exact String-Matching Algorithms. Entropy 2021, 23, 31. https://doi.org/10.3390/e23010031
Markić I, Štula M, Zorić M, Stipaničev D. Entropy-Based Approach in Selection Exact String-Matching Algorithms. Entropy. 2021; 23(1):31. https://doi.org/10.3390/e23010031
Chicago/Turabian StyleMarkić, Ivan, Maja Štula, Marija Zorić, and Darko Stipaničev. 2021. "Entropy-Based Approach in Selection Exact String-Matching Algorithms" Entropy 23, no. 1: 31. https://doi.org/10.3390/e23010031
APA StyleMarkić, I., Štula, M., Zorić, M., & Stipaničev, D. (2021). Entropy-Based Approach in Selection Exact String-Matching Algorithms. Entropy, 23(1), 31. https://doi.org/10.3390/e23010031