KATKA: A KRAKEN-Like Tool with k Given at Query Time

Travis Gagie⁹,
Sana Kashgouli⁹ &
Ben Langmead¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13617))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

503 Accesses
1 Citations

Abstract

We describe a new tool, KATKA, that stores a phylogenetic tree T such that later, given a pattern P[1..m] and an integer k, it can quickly return the root of the smallest subtree of T containing all the genomes in which the k-mer \(P [i..i + k - 1]\) occurs, for \(1 \le i \le m - k + 1\). This is similar to KRAKEN’s functionality but with k given at query time instead of at construction time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing Searches for Optimal Trees Using SIESTA

Large multiple sequence alignments with a root-to-leaf regressive method

Article 02 December 2019

Generation of accurate, expandable phylogenomic trees with uDance

Article 27 July 2023

References

Abedin, P., Hooshmand, S., Ganguly, A., Thankachan, S.V.: The heaviest induced ancestors problem: better data structures and applications. Algorithmica 1–18 (2022). https://doi.org/10.1007/s00453-022-00955-7
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15775-2_37
Chapter Google Scholar
Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2017)
Article MathSciNet MATH Google Scholar
Gagie, T., Gawrychowski, P., Nekrich, Y.: Heaviest induced ancestors and longest common substrings. In: Proceedings of the CCCG (2013)
Google Scholar
Gao, Y.: Computing matching statistics on repetitive texts. In: Proceedings of the DCC (2022)
Google Scholar
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comput. Sci. 483, 115–133 (2013)
Article MathSciNet MATH Google Scholar
Nasko, D.J., Koren, S., Phillippy, A.M., Treangen, T.J.: RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol. 19(1), 1–10 (2018)
Article Google Scholar
Navarro, G.: Compact Data Structures: A Practical Approach. Cambridge University Press, Cambridge (2016)
Book Google Scholar
Navarro, G.: Document listing on repetitive collections with guaranteed performance. Theor. Comput. Sci. 772, 58–72 (2019)
Article MathSciNet MATH Google Scholar
Navarro, G.: Personal communication (2013)
Google Scholar
Navarro, G.: Wavelet trees for all. J. Discret. Algorithms 25, 2–20 (2014)
Article MathSciNet MATH Google Scholar
Nekrich, Y.: New data structures for orthogonal range reporting and range minima queries. In: Proceedings of the SODA (2021)
Google Scholar
Wood, D.E., Lu, J., Langmead, B.: Improved metagenomic analysis with KRAKEN 2. Genome Biol. 20(1), 1–13 (2019)
Article Google Scholar
Wood, D.E., Salzberg, S.L.: KRAKEN: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), 1–12 (2014)
Article Google Scholar

Download references

Acknowledgments

Many thanks to Nathaniel Brown, Younan Gao, Simon Gog, Meng He, Finlay Maguire and Gonzalo Navarro, for helpful discussions.

Author information

Authors and Affiliations

Dalhousie University, Halifax, Canada
Travis Gagie & Sana Kashgouli
Johns Hopkins University, Baltimore, USA
Ben Langmead

Authors

Travis Gagie
View author publications
You can also search for this author in PubMed Google Scholar
Sana Kashgouli
View author publications
You can also search for this author in PubMed Google Scholar
Ben Langmead
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Travis Gagie .

Editor information

Editors and Affiliations

Universidad Técnica Federico Santa María, Valparaíso, Chile
Diego Arroyuelo
Universidad de Chile, Santiago, Chile
Barbara Poblete

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gagie, T., Kashgouli, S., Langmead, B. (2022). KATKA: A KRAKEN-Like Tool with k Given at Query Time. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-20643-6_14
Published: 01 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

KATKA: A KRAKEN-Like Tool with k Given at Query Time

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Searches for Optimal Trees Using SIESTA

Large multiple sequence alignments with a root-to-leaf regressive method

Generation of accurate, expandable phylogenomic trees with uDance

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

KATKA: A KRAKEN-Like Tool with k Given at Query Time

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Searches for Optimal Trees Using SIESTA

Large multiple sequence alignments with a root-to-leaf regressive method

Generation of accurate, expandable phylogenomic trees with uDance

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation