[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

IDBA-MTP: A Hybrid MetaTranscriptomic Assembler Based on Protein Information

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

Abstract

Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed. However, NGS reads are short and mRNAs share many similar regions and differ tremendously in abundance levels, making de novo assembly challenging. The existing assembler, IDBA-MT, designed specifically for the assembly of metatranscriptomic data only performs well on high-expressed mRNAs.

This paper introduces IDBA-MTP, which adopts a novel approach to metatranscriptomic assembly that makes use of the fact that there is a database of millions of known protein sequences associated with mRNAs. How to effectively use the protein information is non-trivial given the size of the database and given that different mRNAs might lead to proteins with similar functions (because different amino acids might have similar characteristics). IDBA-MTP employs a similarity measure between mRNAs and protein sequences, dynamic programming techniques and seed-and-extend heuristics to tackle the problem effectively and efficiently. Experimental results show that IDBA-MTP outperforms existing assemblers by reconstructing 14% more mRNAs. Availability: www.cs.hku.hk/~alse/hkubrg/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Benson, D., Karsch-Mizrachi, I., Lipman, D., Ostell, J., Rapp, B., Wheeler, D.: GenBank. Nucleic Acids Research 28(1), 15–18 (2000)

    Article  Google Scholar 

  2. Booijink, C., Boekhorst, J., Zoetendal, E., Smidt, H., Kleerebezem, M., de Vos, W.: Metatranscriptome Analysis of the Human Fecal Microbiota Reveals Subject-Specific Expression Profiles, with Genes Encoding Proteins Involved in Carbohydrate Metabolism Being Dominantly Expressed. Appl. Environ. Microbiol. 76(16), 5533–5540 (2010)

    Article  Google Scholar 

  3. ten Bosch, J., Grody, W.: Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. J. Mol. Diagn. 10, 484–492 (2008)

    Article  Google Scholar 

  4. Eisen, J.: Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biology 5(3), e82 (2007)

    Google Scholar 

  5. Finn, R., Tate, J., Mistry, J., et al.: The Pfam Protein Families Database. Nucleic Acids Research 28(1), 263–266 (2000)

    Article  Google Scholar 

  6. Frias-Lopez, J., Shi, Y., Tyson, G., et al.: Microbial community gene expression in ocean surface waters. Proc. Natl. Acad. Sci. 105, 3805–3810 (2008)

    Article  Google Scholar 

  7. Fullwood, M., Wei, C., Liu, E., Ruan, Y.: Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res. 19, 521–532 (2009)

    Article  Google Scholar 

  8. Gilbert, J., Field, D., Huang, Y., et al.: Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One 3, e3042 (2008)

    Google Scholar 

  9. Glazer, A., Kechris, K.: Conserved Amino Acid Sequence Features in the α Subunits of MoFe, VFe, and FeFe Nitrogenases. PLoS One 4(7), e6136 (2009)

    Google Scholar 

  10. Grabherr, M., Haas, B., Yassour, M., et al.: Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29(7), 644–652 (2011)

    Article  Google Scholar 

  11. Henikoff, S., Henikoff, J.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  12. Huang, X., Wang, J., Aluru, S., Yang, S., Hillier, L.: PCAP: AWhole-Genome Assembly Program. Genome Research 13, 2164–2170 (2003)

    Article  Google Scholar 

  13. Kent, J.: BLAT–the BLAST-like alignment tool. Genome Research 12(4), 656–664

    Google Scholar 

  14. Leininger, S., Urich, T., Schloter, M., et al.: Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature 442, 806–809 (2006)

    Article  Google Scholar 

  15. Leung, H., Yiu, S., Parkinson, J., Chin, F.: IDBA-MT: de novo assembler for metatranscriptomic data generated from next-generation sequencing technology. Journal of Computational Biology 20(7), 540–550 (2013)

    Article  MathSciNet  Google Scholar 

  16. Khachatryan, Z., Ktsoyan, Z., Manukyan, G., Kelly, D., Ghazaryan, K., Aminov, R.: Predominant role of host genetics in controlling the composition of gut microbiota. PLoS One 3(8), e3064 (2008)

    Google Scholar 

  17. Parro, V., Moreno-Paz, M., Gonzalez-Toril, E.: Analysis of environmental transcriptomes by DNA microarrays. Env. Microbiol. 9, 453–464 (2007)

    Article  Google Scholar 

  18. Morozova, O., Marra, M.: Applications of next-generation sequencing technologies in functional genomics. Genomics 92, 255–264 (2008)

    Article  Google Scholar 

  19. Mullikin, J., Ning, Z.: The Phusion Assembler. Genome Research 13, 81–90 (2003)

    Article  Google Scholar 

  20. Peng, Y., Leung, H., Yiu, S., Chin, F.: Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics 27(13), i94–i101 (2011)

    Google Scholar 

  21. Peng, Y., Leung, H., Yiu, S., Chin, F.: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11), 1420–1428 (2012)

    Article  Google Scholar 

  22. Pettersson, E., Lundeberg, J., Ahmadian, A.: Generations of sequencing technologies. Genomics 93, 105–111 (2009)

    Article  Google Scholar 

  23. Poretsky, R., Bano, N., Buchan, A., et al.: Analysis of microbial gene transcripts in environmental samples. Appl. Environ. Microbiol. 71, 4121–4126 (2005)

    Article  Google Scholar 

  24. Poretsky, R., Sun, S., Mou, X., Moran, M.: Transporter genes expressed by coastal bacterioplankton in response to dissolved organic carbon. Environ. Microbiol. 12, 616–627 (2010)

    Article  Google Scholar 

  25. Qin, J., Li, R., Raes, J., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285), 59–65 (2010)

    Article  Google Scholar 

  26. Schulz, M., Zerbino, D., Vingron, M., Birney, E.: Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8), 1086–1092 (2012)

    Article  Google Scholar 

  27. Simpson, J., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)

    Google Scholar 

  28. Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., Birol, I.: Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler. Genome Res. 19(6), 1117–1123 (2009)

    Article  Google Scholar 

  29. Tartar, A., Wheeler, M., Zhou, X., Coy, M., Boucias, D., Scharf, M.: Parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite Reticulitermes flavipes. Biotechnology for Biofuels 2, 25 (2009)

    Article  Google Scholar 

  30. Tatusov, R., Koonin, E., Lipman, D.: A Genomic Perspective on Protein Families. Science 278(5338), 631–637 (1997)

    Article  Google Scholar 

  31. Urich, T., Lanzen, A., Qi, J., Huson, D., Schleper, C., Schuster, S.: Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome. PLoS One 3(6), e2527 (2008)

    Google Scholar 

  32. Xiong, X., Frank, D., Robertson, C., et al.: Generation and Analysis of a Mouse Intestinal Metatranscriptome through Illumina Based RNA-Sequencing. PLoS One 7(4), e36009 (2012)

    Google Scholar 

  33. Zerbino, D., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821–829 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Leung, H.C.M., Yiu, S.M., Chin, F.Y.L. (2014). IDBA-MTP: A Hybrid MetaTranscriptomic Assembler Based on Protein Information. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics