More Web Proxy on the site http://driver.im/

research-article

Open access

Reverse-Safe Text Indexing

Authors:

Giulia Bernardini,

Grigorios Loukides,

Solon P. PissisAuthors Info & Claims

Journal of Experimental Algorithmics (JEA), Volume 26

Article No.: 1.10, Pages 1 - 26

https://doi.org/10.1145/3461698

Published: 09 July 2021 Publication History

All formats PDF

Abstract

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm that constructs a z-reverse-safe data structure (z-RSDS) that has size O(n) and answers decision and counting pattern matching queries of length at most d optimally, where d is maximal for any such z-RSDS. The construction algorithm takes O(nɷ log d) time, where ɷ is the matrix multiplication exponent. We show that, despite the nɷ factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We also show that plugging our method in data analysis applications gives insignificant or no data utility loss. Furthermore, we show how our technique can be extended to support applications under realistic adversary models. Finally, we show a z-RSDS for decision pattern matching queries, whose size can be sublinear in n. A preliminary version of this article appeared in ALENEX 2020.

References

[1]

Alfred V. Aho and Margaret J. Corasick. 1975. Efficient string matching: An aid to bibliographic search. Commun. ACM 18, 6 (1975), 333–340.

Digital Library

[2]

Hiroki Arimura and Takeaki Uno. 2007. An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence. J. Comb. Optim. 13, 3 (2007), 243–262.

[3]

Maurizio Atzori, Francesco Bonchi, Fosca Giannotti, and Dino Pedreschi. 2008. Anonymity preserving pattern discovery. VLDB J. 17, 4 (2008), 703–727.

Digital Library

[4]

Lorraine A. K. Ayad, Golnaz Badkobeh, Gabriele Fici, Alice Héliou, and Solon P. Pissis. 2019. Constructing antidictionaries in output-sensitive space. In Proc. 9th DCC. IEEE, Los Alamitos, CA, 538–547.

[5]

Fahiem Bacchus, Adam J. Grove, Joseph Y. Halpern, and Daphne Koller. 1996. From statistical knowledge bases to degrees of belief. Artif. Intell. 87, 1–2 (1996), 75–143.

Digital Library

[6]

Imon Banerjee, Kevin Li, Martin Seneviratne, Michelle Ferrari, Tina Seto, James D. Brooks, Daniel L. Rubin, and Tina Hernandez-Boussard. 2019. Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment. JAMIA Open 2, 1 (2019), 150–159.

[7]

Carl Barton, Alice Héliou, Laurent Mouchard, and Solon P. Pissis. 2014. Linear-time computation of minimal absent words using suffix array. BMC Bioinform. 15 (2014), 388.

[8]

Carl Barton, Alice Héliou, Laurent Mouchard, and Solon P. Pissis. 2015. Parallelising the computation of minimal absent words. In Proc. 11th PPAM. Revised Selected Papers, Part II. Springer, 243–253.

[9]

David R. Bentley. 2006. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 6 (2006), 545–552.

[10]

Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone, and Michelle Sweering. 2020a. Combinatorial algorithms for string sanitization. ACM Trans. Knowl. Discov. Data 15, 1 (2020), Article 8, 34 pages.

[11]

Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. 2019. String sanitization: A combinatorial approach. In Proc. ECML/PKDD.

[12]

Giulia Bernardini, Huiping Chen, Gabriele Fici, Grigorios Loukides, and Solon P. Pissis. 2020b. Reverse-safe data structures for text indexing. In Proc. 22nd ALENEX. 199–213.

[13]

Giulia Bernardini, Huiping Chen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Leen Stougie, and Michelle Sweering. 2020c. String sanitization under edit distance. In Proc. 31st CPM. Article 7, 14 pages.

[14]

Giulia Bernardini, Alessio Conte, Garance Gourdel, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Giulia Punzi, Leen Stougie, and Michelle Sweering. 2020d. Hide and mine in strings: Hardness and algorithms. In Proc. 20th ICDM. IEEE, Los Alamitos, CA, 924–929.

[15]

Elisa Bertino, Gabriel Ghinita, and Ashish Kamra. 2011. Access control for databases: Concepts and systems. Found. Trends Databases 3, 1–2 (2011), 1–148.

Digital Library

[16]

Bruhadeshwar Bezawada, Alex X. Liu, Bargav Jayaraman, Ann L. Wang, and Rui Li. 2015. Privacy preserving string matching for cloud computing. In Proc. 35th ICDCS. IEEE, Los Alamitos, CA, 609–618.

[17]

Christian Böhm and Florian Krebs. 2004. The k-nearest neighbour join: Turbo charging the KDD process. Knowl. Inf. Syst. 6, 6 (2004), 728–749.

Digital Library

[18]

Luca Bonomi, Liyue Fan, and Hongxia Jin. 2016. An information-theoretic approach to individual sequential data sanitization. In Proc. 9th WSDM. ACM, New York, NY, 337–346.

Digital Library

[19]

James R. Bunch and John E. Hopcroft. 1974. Triangular factorization and inversion by fast matrix multiplication. Math. Comp. 28, 125 (1974), 231–236. http://www.jstor.org/stable/2005828.

[20]

Arturo Carpi and Aldo de Luca. 2001. Words and special factors. Theor. Comput. Sci. 259, 1–2 (2001), 145–182.

[21]

Bastien Cazaux, Thierry Lecroq, and Eric Rivals. 2019. Linking indexing data structures to de Bruijn graphs: Construction and update. J. Comput. Syst. Sci. 104 (2019), 165–183.

[22]

Supaporn Chairungsee and Maxime Crochemore. 2012. Using minimal absent words to build phylogeny. Theor. Comput. Sci. 450 (2012), 109–116.

Digital Library

[23]

Panagiotis Charalampopoulos, Maxime Crochemore, Gabriele Fici, Robert Mercas, and Solon P. Pissis. 2018b. Alignment-free sequence comparison using absent words. Inf. Comput. 262, Part (2018), 57–68.

[24]

Panagiotis Charalampopoulos, Maxime Crochemore, and Solon P. Pissis. 2018a. On extended special factors of a word. In Proc. 25th SPIRE. 131–138.

[25]

Panagiotis Charalampopoulos, Costas S. Iliopoulos, Chang Liu, and Solon P. Pissis. 2020. Property suffix array with applications in indexing weighted sequences. ACM J. Exp. Algorithmics 25, 1 (April 2020), Article 1.8, 16 pages.

Digital Library

[26]

Rui Chen, Gergely Acs, and Claude Castelluccia. 2012a. Differentially private sequential data publication via variable-length n-grams. In Proc. 4th CCS. ACM, New York, NY, 638–649.

Digital Library

[27]

Rui Chen, Benjamin C. M. Fung, Bipin C. Desai, and Nériah M. Sossou. 2012b. Differentially private transit data publication: A case study on the montreal transportation system. In Proc. 18th KDD. ACM, New York, NY, 213–221.

[28]

Charles J. Colbourn, Wendy J. Myrvold, and Eugene Neufeld. 1996. Two algorithms for unranking arborescences. J. Algorithms 20, 2 (1996), 268–281.

Digital Library

[29]

Maxime Crochemore, Alice Héliou, Gregory Kucherov, Laurent Mouchard, Solon P. Pissis, and Yann Ramusat. 2020. Absent words in a sliding window with applications. Inf. Comput. 270 (2020), 104461.

[30]

Maxime Crochemore, Lucian Ilie, Costas S. Iliopoulos, Marcin Kubica, Wojciech Rytter, and Tomasz Walen. 2013. Computing the longest previous factor. Eur. J. Comb. 34, 1 (2013), 15–26.

Digital Library

[31]

Maxime Crochemore, Filippo Mignosi, and Antonio Restivo. 1998. Automata and forbidden words. Inf. Process. Lett. 67 (1998), 111–117.

Digital Library

[32]

Maxime Crochemore, Filippo Mignosi, Antonio Restivo, and Sergio Salemi. 2000. Data compression using antidictionaries. Proc. IEEE 88, 11 (2000), 1756–1768.

[33]

James Weldon Demmel, Stanley C. Eisenstat, John R. Gilbert, Xiaoye S. Li, and Joseph W. H. Liu. 1999. A supernodal approach to sparse partial pivoting. SIAM J. Matrix Anal. Appl. 20, 3 (1999), 720–755.

Digital Library

[34]

U.S. Department of Health & Human Services. 1996. Health Insurance Portablility and Accountability Act of 1996. Retrieved May 19, 2021 from https://aspe.hhs.gov/report/health-insurance-portability-and-accountability-act-1996.

[35]

Shiri Dori and Gad M. Landau. 2006. Construction of Aho Corasick automaton in linear time for integer alphabets. Inf. Process. Lett. 98, 2 (2006), 66–72.

Digital Library

[36]

Eigen Library. 2020. Eigen Library. Retrieved May 19, 2021 from http://eigen.tuxfamily.org.

[37]

European Parliament. 2015. General Data Protection Regulation. Retrieved May 19, 2021 from http://data.consilium.europa.eu/doc/document/ST-9565-2015-INIT/en/pdf.

[38]

Martin Farach-Colton. 1997. Optimal suffix tree construction with large alphabets. In Proc. 38th FOCS. IEEE, Los Alamitos, CA, 137–143.

[39]

Gabriele Fici and Pawel Gawrychowski. 2019. Minimal absent words in rooted and unrooted trees. In 26th SPIRE. Lecture Notes in Computer Science, Vol. 11811. Springer, 152–161.

[40]

Gabriele Fici, Filippo Mignosi, Antonio Restivo, and Marinella Sciortino. 2006. Word assembly through minimal forbidden words. Theor. Comput. Sci. 359, 1–3 (2006), 214–230.

Digital Library

[41]

Gabriele Fici, Antonio Restivo, and Laura Rizzo. 2019. Minimal forbidden factors of circular words. Theor. Comput. Sci. 792 (2019), 144–153.

[42]

Johannes Fischer and Volker Heun. 2011. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40, 2 (2011), 465–492.

Digital Library

[43]

Michael L. Fredman, János Komlós, and Endre Szemerédi. 1984. Storing a sparse table with worst case access time. J. ACM 31, 3 (1984), 538–544.

Digital Library

[44]

Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. 2016. Computing DAWGs and minimal absent words in linear time for integer alphabets. In 41st MFCS. Leibniz International Proceedings in Informatics, Vol. 58. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Article 38, 14 pages.

[45]

Sara P. Garcia, O. J. Pinho, João M. O. S. Rodrigues, Carlos A. C. Bastos, and Paulo J. S. G. Ferreira. 2011. Minimal absent words in prokaryotic and eukaryotic genomes. PLoS One 6, 1 (2011), e16065.

[46]

John R. Gilbert and Tim Peierls. 1988. Sparse partial pivoting in time proportional to arithmetic operations. SIAM J. Sci. Comput. 9, 5 (1988), 862–874.

Digital Library

[47]

Aris Gkoulalas-Divanis and Grigorios Loukides. 2011. Revisiting sequential pattern hiding to enhance utility. In Proc. 17th KDD. ACM, New York, NY, 1316–1324.

Digital Library

[48]

Izrail S. Gradshteyn and Iosif M. Ryzhik. 2007. Table of Integrals, Series, and Products (7 ed.). Elsevier/Academic Press, Amsterdam, Netherlands.

[49]

Roberto Grossi, John Iacono, Gonzalo Navarro, Rajeev Raman, and S. Rao Satti. 2017. Asymptotically optimal encodings of range data structures for selection and top-k queries. ACM Trans. Algorithms 13, 2 (2017), Article 28, 31 pages.

Digital Library

[50]

Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35, 2 (2005), 378–407.

Digital Library

[51]

Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences—Computer Science and Computational Biology. Cambridge University Press.

[52]

Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proc. SIGMOD. ACM, New York, NY, 47–57.

Digital Library

[53]

Robert Gwadera, Aris Gkoulalas-Divanis, and Grigorios Loukides. 2013. Permutation-based sequential pattern hiding. In Proc. 13th ICDM. IEEE, Los Alamitos, CA, 241–250.

[54]

Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In Proc. SIGMOD, Vol. 29. 1–12.

Digital Library

[55]

Raymond D. Heatherly, Grigorios Loukides, Joshua C. Denny, Jonathan L. Haines, Dan M. Roden, and Bradley A. Malin. 2013. Enabling genomic-phenomic association discovery without sacrificing anonymity. PLoS One 8 (Feb. 2013), 1–13.

[56]

Michael Hoffmann, John Iacono, Patrick K. Nicholson, and Rajeev Raman. 2018. Encoding nearest larger values. Theor. Comput. Sci. 710 (2018), 97–115.

[57]

Joan P. Hutchinson. 1975. On words with prescribed overlapping subsequences. Utilitas Mathematica 7 (1975), 241–250.

[58]

Juhani Karhumäki, Svetlana Puzynina, Michaël Rao, and Markus A. Whiteland. 2017. On cardinalities of k-abelian equivalence classes. Theor. Comput. Sci. 658 (2017), 190–204.

Digital Library

[59]

Juhani Karhumäki, Aleksi Saarela, and Luca Q. Zamboni. 2013. On a generalization of abelian equivalence and complexity of infinite words. J. Comb. Theory, Ser. A 120, 8 (2013), 2189–2206.

Digital Library

[60]

Juha Kärkkäinen, Marcin Piatkowski, and Simon J. Puglisi. 2017. String inference from longest-common-prefix array. In Proc. 44th ICALP. Article 62, 14 pages.

[61]

Carl Kingsford, Michael C. Schatz, and Mihai Pop. 2010. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinform. 11, 1 (2010), 21.

[62]

Donald Ervin Knuth. 1998. The Art of Computer Programming, Volume II: Seminumerical Algorithms (3rd ed.). Addison-Wesley.

Digital Library

[63]

Dmitry Kosolobov and Nikita Sivukhin. 2019. Compressed multiple pattern matching. In 30th CPM.Leibniz International Proceedings in Informatics, Vol. 128. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, Article 13, 14 pages.

[64]

François Le Gall. 2014. Powers of tensors and fast matrix multiplication. In Proc. 25th ISSAC. ACM, New York, NY, 296–303.

Digital Library

[65]

Chris-Andre Leimeister and Burkhard Morgenstern. 2014. Kmacs: The -mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics 30, 14 (May 2014), 2000–2008.

[66]

Jin Li, Qian Wang, Cong Wang, Ning Cao, Kui Ren, and Wenjing Lou. 2010. Fuzzy keyword search over encrypted data in cloud computing. In Proc. 29th INFOCOM. IEEE, Los Alamitos, CA, 1–5.

[67]

Tiancheng Li and Ninghui Li. 2008. Injector: Mining background knowledge for data anonymization. In Proc. 24th ICDE. IEEE, Los Alamitos, CA, 446–455.

Digital Library

[68]

Po-Ching Lin, Ying-Dar Lin, Yuan-Cheng Lai, and Tsern-Huei Lee. 2008. Using string matching for deep packet inspection. IEEE Comput. 41, 4 (2008), 23–28.

Digital Library

[69]

Grigorios Loukides, Aris Gkoulalas-Divanis, and Bradley Malin. 2010. Anonymization of electronic medical records for validating genome-wide association studies. Proc. Natl. Acad. Sci. USA 107, 17 (2010), 7898–7903.

[70]

Grigorios Loukides and Robert Gwadera. 2015. Optimal event sequence sanitization. In Proc. 15th SDM. 775–783.

[71]

Bradley Malin and Latanya Sweeney. 2000. Determining the identifiability of DNA database entries. In Proc. AMIA. 537–541. http://knowledge.amia.org/amia-55142-a2000a-1.606968/t-001-1.609408/f-001-1.609409/a-108-1.609653/a-109-1.609650.

[72]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.

[73]

Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. 2020. Minimal unique substrings and minimal absent words in a sliding window. In 46th SOFSEM.Lecture Notes in Computer Science, Vol. 12011. Springer, 148–160.

[74]

Filippo Mignosi, Antonio Restivo, and Marinella Sciortino. 2002. Words and forbidden factors. Theor. Comput. Sci. 273, 1–2 (2002), 99–117.

Digital Library

[75]

Noman Mohammed, Benjamin C. M. Fung, Patrick C. K. Hung, and Cheuk-Kwong Lee. 2010. Centralized and distributed anonymization for high-dimensional healthcare data. ACM Trans. Knowl. Discov. Data 4 (2010), Article 18, 33 pages.

[76]

Joong Chae Na, Alberto Apostolico, Costas S. Iliopoulos, and Kunsoo Park. 2003. Truncated suffix trees and their application to data compression. Theor. Comput. Sci. 304, 1 (2003), 87–101.

Digital Library

[77]

Shubha U. Nabar, Krishnaram Kenthapadi, Nina Mishra, and Rajeev Motwani. 2008. A survey of query auditing techniques for data privacy. In Privacy-Preserving Data Mining: Models and Algorithms, Charu C. Aggarwal and Philip S. Yu (Eds.). Springer, 415–431.

[78]

Takahiro Ota and Hiroyoshi Morita. 2010. On the adaptive antidictionary code using minimal forbidden words with constant lengths. In Proc ISITA 2010. IEEE, Los Alamitos, CA, 72–77.

[79]

Giorgos Poulis, Spiros Skiadopoulos, Grigorios Loukides, and Aris Gkoulalas-Divanis. 2014. Apriori-based algorithms for k-anonymizing trajectory data. Trans. Data Priv. 7, 2 (2014), 165–194. http://www.tdp.cat/issues11/abs.a194a14.php.

[80]

Diogo Pratas and Jorge M. Silva. 2020. Persistent minimal sequences of SARS-CoV-2. Bioinformatics 36 (2020), 5129–5132.

[81]

Hong Qin, Hao Wang, Xiaochao Wei, Likun Xue, and Lei Wu. 2019. Privacy-preserving wildcards pattern matching protocol for IoT applications. IEEE Access 7 (2019), 36094–36102.

[82]

Rajeev Raman. 2015. Encoding data structures. In Proc. 9th WALCOM. 1–7.

[83]

David F. Robinson and Les R. Foulds. 1981. Comparison of phylogenetic trees. Math. Biosci. 53, 1 (1981), 131–147.

[84]

Naruya Saitou and Masatoshi Nei. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees.Mol. Biol. Evol. 4, 4 (July 1987), 406–425.

[85]

Pierangela Samarati and Latanya Sweeney. 1998. Generalizing data to provide anonymity when disclosing information (Abstract). In 17th ACM SIGACT-SIGMOD-SIGART, Alberto O. Mendelzon and Jan Paredaens (Eds.). ACM, New York, NY, 188.

[86]

Jingbo Shang, Jian Peng, and Jiawei Han. 2016. MACFP: Maximal approximate consecutive frequent pattern mining under edit distance. In Proc. 16th SDM. 558–566.

[87]

Sumit Sidana, Charlotte Laclau, Massih R. Amini, Gilles Vandelle, and André Bois-Crettez. 2017. KASANDR: A large-scale dataset with implicit feedback for recommendation. In Proc. 40th SIGIR. ACM, New York, NY, 1245–1248.

Digital Library

[88]

Raquel M. Silva, Diogo Pratas, Luísa Castro, Armando J. Pinho, and Paulo J. S. G. Ferreira. 2015. Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics 31, 15 (2015), 2421–2425.

[89]

Henry J. Smith, Tamara Dinev, and Heng Xu. 2011. Information privacy research: An interdisciplinary review. Manag. Inf. Syst. Q. 35, 4 (2011), 989–1015. http://misq.org/catalog/product/view/id/1518/s/information-privacy-research-an-interdisciplinary-review/.

[90]

Acar Tamersoy, Grigorios Loukides, Mehmet Ercan Nergiz, Yücel Saygin, and Bradley Malin. 2012. Anonymization of longitudinal electronic medical records. IEEE Trans. Inf. Technol. Biomed. 16, 3 (2012), 413–423.

Digital Library

[91]

Manolis Terrovitis and Nikos Mamoulis. 2008. Privacy preservation in the publication of trajectories. In Proc. 9th MDM. IEEE, Los Alamitos, CA, 65–72.

Digital Library

[92]

Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2011. Local and global recoding methods for anonymizing set-valued data. VLDB J. 20, 1 (2011), 83–106.

Digital Library

[93]

Manolis Terrovitis, Giorgos Poulis, Nikos Mamoulis, and Spiros Skiadopoulos. 2017. Local suppression and splitting techniques for privacy preserving publication of trajectories. IEEE Trans. Knowl. Data Eng. 29, 7 (2017), 1466–1479.

Digital Library

[94]

Sharma V. Thankachan, Sriram P. Chockalingam, Yongchao Liu, Ambujam Krishnan, and Srinivas Aluru. 2017. A greedy alignment-free distance estimator for phylogenetic inference. BMC Bioinform. 18, 8 (2017), Article 238, 8 pages.

[95]

George Theodorakopoulos, Reza Shokri, Carmela Troncoso, Jean-Pierre Hubaux, and Jean-Yves Le Boudec. 2014. Prolonging the hide-and-seek game: Optimal trajectory privacy for location-based services. In Proc. 13th WPES. ACM, New York, NY, 73–82.

Digital Library

[96]

Igor Ulitsky, David Burstein, Tamir Tuller, and Benny Chor. 2006. The average common substring approach to phylogenomic reconstruction. J. Comput. Biol. 13, 2 (2006), 336–350.

[97]

Di Wang, Yeye He, Elke Rundensteiner, and Jeffrey F. Naughton. 2013. Utility-maximizing event stream suppression. In Proc. SIGMOD. ACM, New York, NY, 589–600.

[98]

Jing Wang, Nikos Ntarmos, and Peter Triantafillou. 2016. Indexing query graphs to speedup graph query processing. In Proc. 19th EDBT. 41–52.

[99]

Peter Weiner. 1973. Linear pattern matching algorithms. In Proc. 14th SWAT. IEEE, Los Alamitos, CA, 1–11.

Digital Library

[100]

Virginia Vassilevska Williams. 2012. Multiplying matrices faster than Coppersmith-Winograd. In Proc. 44th STOC. ACM, New York, NY, 887–898.

Digital Library

[101]

Xiaokui Xiao, Yufei Tao, and Nick Koudas. 2010. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM Trans. Database Syst. 35, 2 (2010), Article 8, 48 pages.

Digital Library

[102]

Shengzhi Xu, Xiang Cheng, Sen Su, Ke Xiao, and Li Xiong. 2016. Differentially private frequent sequence mining. IEEE Trans. Knowl. Data Eng. 28, 11 (2016), 2910–2926.

Digital Library

[103]

Mohammed J. Zaki, Wagner Meira Jr, and Wagner Meira. 2014. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press. http://www.dataminingbook.info/.

Digital Library

Index Terms

Reverse-Safe Text Indexing
1. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Pattern matching

Recommendations

Linear-size suffix tries

Suffix trees are highly regarded data structures for text indexing and string algorithms MCreight 76, Weiner 73. For any given string w of length n = | w | , a suffix tree for w takes O ( n ) nodes and links. It is often presented as a compacted version ...
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text $T$ ...
Full-Fledged Real-Time Indexing for Constant Size Alphabets

In this paper we describe a data structure that supports pattern matching queries on a dynamically arriving text over an alphabet of constant size. Each new symbol can be prepended to T in O(1) worst-case time. At any moment, we can report all ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics

ACM Journal of Experimental Algorithmics Volume 26, Issue

December 2021

479 pages

ISSN:1084-6654

EISSN:1084-6654

DOI:10.1145/3446425

Issue’s Table of Contents

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2021

Accepted: 01 April 2021

Revised: 01 April 2021

Received: 01 October 2020

Published in JEA Volume 26

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
467
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)16

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents