Abstract
We describe a new tool, KATKA, that stores a phylogenetic tree T such that later, given a pattern P[1..m] and an integer k, it can quickly return the root of the smallest subtree of T containing all the genomes in which the k-mer \(P [i..i + k - 1]\) occurs, for \(1 \le i \le m - k + 1\). This is similar to KRAKEN’s functionality but with k given at query time instead of at construction time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abedin, P., Hooshmand, S., Ganguly, A., Thankachan, S.V.: The heaviest induced ancestors problem: better data structures and applications. Algorithmica 1–18 (2022). https://doi.org/10.1007/s00453-022-00955-7
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15775-2_37
Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2017)
Gagie, T., Gawrychowski, P., Nekrich, Y.: Heaviest induced ancestors and longest common substrings. In: Proceedings of the CCCG (2013)
Gao, Y.: Computing matching statistics on repetitive texts. In: Proceedings of the DCC (2022)
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comput. Sci. 483, 115–133 (2013)
Nasko, D.J., Koren, S., Phillippy, A.M., Treangen, T.J.: RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol. 19(1), 1–10 (2018)
Navarro, G.: Compact Data Structures: A Practical Approach. Cambridge University Press, Cambridge (2016)
Navarro, G.: Document listing on repetitive collections with guaranteed performance. Theor. Comput. Sci. 772, 58–72 (2019)
Navarro, G.: Personal communication (2013)
Navarro, G.: Wavelet trees for all. J. Discret. Algorithms 25, 2–20 (2014)
Nekrich, Y.: New data structures for orthogonal range reporting and range minima queries. In: Proceedings of the SODA (2021)
Wood, D.E., Lu, J., Langmead, B.: Improved metagenomic analysis with KRAKEN 2. Genome Biol. 20(1), 1–13 (2019)
Wood, D.E., Salzberg, S.L.: KRAKEN: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), 1–12 (2014)
Acknowledgments
Many thanks to Nathaniel Brown, Younan Gao, Simon Gog, Meng He, Finlay Maguire and Gonzalo Navarro, for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gagie, T., Kashgouli, S., Langmead, B. (2022). KATKA: A KRAKEN-Like Tool with k Given at Query Time. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-20643-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)