[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Lightweight Metagenomic Classification via eBWT

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2019)

Abstract

The development of Next Generation Sequencing has had a major impact on the study of genetic sequences, and in particular, on the advancement of metagenomics, whose aim is to identify the microorganisms that are present in a sample collected directly from the environment. In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. For the best of our knowledge, this is the first approach that is assembly- and alignment-free, and is not based on k-mers. We show that our experiments confirm the effectiveness of our approach and the high accuracy even in negative control samples. Indeed we only classify 1 short read on 5,726,358 random shuffle reads. Finally, the results are comparable with those achieved by read-mapping classifiers and by k-mer based classifiers.

VG is totally and GR is partially supported by the project MIUR-SIR CMACBioSeq (“Combinatorial methods for analysis and compression of biological sequences”) grant n. RBSI146R5L.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 39.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 49.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/veronicaguerrini/LightMetaEbwt.

  2. 2.

    http://www.gardner-binflab.org/our_research/.

  3. 3.

    https://github.com/veronicaguerrini/LightMetaEbwt/tree/master/Datasets.

  4. 4.

    http://people.unipmn.it/manzini/gap.

  5. 5.

    https://github.com/giovannarosone/BCR_LCP_GSA.

  6. 6.

    https://github.com/felipelouza/egap.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)

    Article  MathSciNet  Google Scholar 

  2. Bauer, M., Cox, A., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theoret. Comput. Sci. 483, 134–148 (2013)

    Article  MathSciNet  Google Scholar 

  3. Bonizzoni, P., Della Vedova, G., Nicosia, S., Pirola, Y., Previtali, M., Rizzi, R.: Divide and conquer computation of the multi-string BWT and LCP array. In: Manea, F., Miller, R.G., Nowotka, D. (eds.) CiE 2018. LNCS, vol. 10936, pp. 107–117. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94418-0_11

    Chapter  MATH  Google Scholar 

  4. Burrows, M., Wheeler, D.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)

    Google Scholar 

  5. Cox, A., Garofalo, F., Rosone, G., Sciortino, M.: Lightweight LCP construction for very large collections of strings. J. Discrete Algorithms 37, 17–33 (2016)

    Article  MathSciNet  Google Scholar 

  6. Egidi, L., Louza, F.A., Manzini, G., Telles, G.P.: External memory BWT and LCP computation for sequence collections with applications. In: WABI 2018. LIPIcs, vol. 113, pp. 10:1–10:14 (2018)

    Google Scholar 

  7. Egidi, L., Manzini, G.: Lightweight BWT and LCP merging via the gap algorithm. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 176–190. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_15

    Chapter  Google Scholar 

  8. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)

    Google Scholar 

  9. Hon, W.-K., Ku, T.-H., Lu, C.-H., Shah, R., Thankachan, S.V.: Efficient algorithm for circular Burrows-Wheeler transform. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 257–268. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31265-6_21

    Chapter  Google Scholar 

  10. Janin, L., Rosone, G., Cox, A.J.: Adaptive reference-free compression of sequence quality scores. Bioinformatics 30(1), 24–30 (2014)

    Article  Google Scholar 

  11. Kim, D., Song, L., Breitwieser, F.P., Salzberg, S.L.: Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26(12), 1721–1729 (2016)

    Article  Google Scholar 

  12. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    Article  Google Scholar 

  13. Lindgreen, S., Adair, K.L., Gardner, P.P.: An evaluation of the accuracy and speed of metagenome analysis tools. Sci. Rep. 6, Article No. 19233 (2016)

    Google Scholar 

  14. Louza, F.A., Telles, G.P., Gog, S., Zhao, L.: Computing Burrows-Wheeler similarity distributions for string collections. In: Gagie, T., Moffat, A., Navarro, G., Cuadros-Vargas, E. (eds.) SPIRE 2018. LNCS, vol. 11147, pp. 285–296. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00479-8_23

    Chapter  Google Scholar 

  15. Louza, F., Gog, S., Telles, G.: Inducing enhanced suffix arrays for string collections. Theor. Comput. Sci. 678, 22–39 (2017)

    Article  MathSciNet  Google Scholar 

  16. Louza, F., Telles, G., Hoffmann, S., Ciferri, C.: Generalized enhanced suffix array construction in external memory. Algorithms Mol. Biol. 12(1), 26 (2017)

    Article  Google Scholar 

  17. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows-Wheeler transform. Theoret. Comput. Sci. 387(3), 298–312 (2007)

    Article  MathSciNet  Google Scholar 

  18. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: A new combinatorial approach to sequence comparison. Theory Comput. Syst. 42(3), 411–429 (2008)

    Article  MathSciNet  Google Scholar 

  19. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M., Versari, L.: Measuring the clustering effect of BWT via RLE. Theoret. Comput. Sci. 698, 79–87 (2017)

    Article  MathSciNet  Google Scholar 

  20. Mantaci, S., Restivo, A., Sciortino, M.: Distance measures for biological sequences: some recent approaches. Int. J. Approx. Reason. 47(1), 109–124 (2008)

    Article  MathSciNet  Google Scholar 

  21. McIntyre, A.B.R., et al.: Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18(1), 182 (2017)

    Article  Google Scholar 

  22. Menzel, P., Ng, K.L., Krogh, A.: Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Commun. 7, 11257 (2016)

    Article  Google Scholar 

  23. Ng, K.H., Ho, C.K., Phon-Amnuaisuk, S.: A hybrid distance measure for clustering expressed sequence tags originating from the same gene family. PLoS One 7(10), e47216 (2012)

    Article  Google Scholar 

  24. Ounit, R., Lonardi, S.: Higher classification sensitivity of short metagenomic reads with CLARK-S. Bioinformatics 32(24), 3823–3825 (2016)

    Article  Google Scholar 

  25. Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16(1), 236 (2015)

    Article  Google Scholar 

  26. Pedersen, M., et al.: Ancient and modern environmental DNA. Philos. Trans. R. Soc. Lond. B Biol. Sci. 370(1660), 20130383 (2015)

    Article  Google Scholar 

  27. Prezza, N., Pisanti, N., Sciortino, M., Rosone, G.: Detecting mutations by eBWT. In: WABI 2018. LIPIcs, vol. 113, pp. 3:1–3:15 (2018)

    Google Scholar 

  28. Restivo, A., Rosone, G.: Balancing and clustering of words in the Burrows-Wheeler transform. Theoret. Comput. Sci. 412(27), 3019–3032 (2011)

    Article  MathSciNet  Google Scholar 

  29. Vinga, S., Almeida, J.: Alignment-free sequence comparison-a review. Bioinformatics 19(4), 513–523 (2003)

    Article  Google Scholar 

  30. Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)

    Article  Google Scholar 

  31. Yang, L., Zhang, X., Wang, T.: The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform. J. Theor. Biol. 262(4), 742–749 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanna Rosone .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guerrini, V., Rosone, G. (2019). Lightweight Metagenomic Classification via eBWT. In: Holmes, I., Martín-Vide, C., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2019. Lecture Notes in Computer Science(), vol 11488. Springer, Cham. https://doi.org/10.1007/978-3-030-18174-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18174-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18173-4

  • Online ISBN: 978-3-030-18174-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics