Abstract
Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed. However, NGS reads are short and mRNAs share many similar regions and differ tremendously in abundance levels, making de novo assembly challenging. The existing assembler, IDBA-MT, designed specifically for the assembly of metatranscriptomic data only performs well on high-expressed mRNAs.
This paper introduces IDBA-MTP, which adopts a novel approach to metatranscriptomic assembly that makes use of the fact that there is a database of millions of known protein sequences associated with mRNAs. How to effectively use the protein information is non-trivial given the size of the database and given that different mRNAs might lead to proteins with similar functions (because different amino acids might have similar characteristics). IDBA-MTP employs a similarity measure between mRNAs and protein sequences, dynamic programming techniques and seed-and-extend heuristics to tackle the problem effectively and efficiently. Experimental results show that IDBA-MTP outperforms existing assemblers by reconstructing 14% more mRNAs. Availability: www.cs.hku.hk/~alse/hkubrg/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Benson, D., Karsch-Mizrachi, I., Lipman, D., Ostell, J., Rapp, B., Wheeler, D.: GenBank. Nucleic Acids Research 28(1), 15–18 (2000)
Booijink, C., Boekhorst, J., Zoetendal, E., Smidt, H., Kleerebezem, M., de Vos, W.: Metatranscriptome Analysis of the Human Fecal Microbiota Reveals Subject-Specific Expression Profiles, with Genes Encoding Proteins Involved in Carbohydrate Metabolism Being Dominantly Expressed. Appl. Environ. Microbiol. 76(16), 5533–5540 (2010)
ten Bosch, J., Grody, W.: Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. J. Mol. Diagn. 10, 484–492 (2008)
Eisen, J.: Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biology 5(3), e82 (2007)
Finn, R., Tate, J., Mistry, J., et al.: The Pfam Protein Families Database. Nucleic Acids Research 28(1), 263–266 (2000)
Frias-Lopez, J., Shi, Y., Tyson, G., et al.: Microbial community gene expression in ocean surface waters. Proc. Natl. Acad. Sci. 105, 3805–3810 (2008)
Fullwood, M., Wei, C., Liu, E., Ruan, Y.: Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res. 19, 521–532 (2009)
Gilbert, J., Field, D., Huang, Y., et al.: Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One 3, e3042 (2008)
Glazer, A., Kechris, K.: Conserved Amino Acid Sequence Features in the α Subunits of MoFe, VFe, and FeFe Nitrogenases. PLoS One 4(7), e6136 (2009)
Grabherr, M., Haas, B., Yassour, M., et al.: Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29(7), 644–652 (2011)
Henikoff, S., Henikoff, J.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89(22), 10915–10919 (1992)
Huang, X., Wang, J., Aluru, S., Yang, S., Hillier, L.: PCAP: AWhole-Genome Assembly Program. Genome Research 13, 2164–2170 (2003)
Kent, J.: BLAT–the BLAST-like alignment tool. Genome Research 12(4), 656–664
Leininger, S., Urich, T., Schloter, M., et al.: Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature 442, 806–809 (2006)
Leung, H., Yiu, S., Parkinson, J., Chin, F.: IDBA-MT: de novo assembler for metatranscriptomic data generated from next-generation sequencing technology. Journal of Computational Biology 20(7), 540–550 (2013)
Khachatryan, Z., Ktsoyan, Z., Manukyan, G., Kelly, D., Ghazaryan, K., Aminov, R.: Predominant role of host genetics in controlling the composition of gut microbiota. PLoS One 3(8), e3064 (2008)
Parro, V., Moreno-Paz, M., Gonzalez-Toril, E.: Analysis of environmental transcriptomes by DNA microarrays. Env. Microbiol. 9, 453–464 (2007)
Morozova, O., Marra, M.: Applications of next-generation sequencing technologies in functional genomics. Genomics 92, 255–264 (2008)
Mullikin, J., Ning, Z.: The Phusion Assembler. Genome Research 13, 81–90 (2003)
Peng, Y., Leung, H., Yiu, S., Chin, F.: Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics 27(13), i94–i101 (2011)
Peng, Y., Leung, H., Yiu, S., Chin, F.: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11), 1420–1428 (2012)
Pettersson, E., Lundeberg, J., Ahmadian, A.: Generations of sequencing technologies. Genomics 93, 105–111 (2009)
Poretsky, R., Bano, N., Buchan, A., et al.: Analysis of microbial gene transcripts in environmental samples. Appl. Environ. Microbiol. 71, 4121–4126 (2005)
Poretsky, R., Sun, S., Mou, X., Moran, M.: Transporter genes expressed by coastal bacterioplankton in response to dissolved organic carbon. Environ. Microbiol. 12, 616–627 (2010)
Qin, J., Li, R., Raes, J., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285), 59–65 (2010)
Schulz, M., Zerbino, D., Vingron, M., Birney, E.: Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8), 1086–1092 (2012)
Simpson, J., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)
Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., Birol, I.: Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler. Genome Res. 19(6), 1117–1123 (2009)
Tartar, A., Wheeler, M., Zhou, X., Coy, M., Boucias, D., Scharf, M.: Parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite Reticulitermes flavipes. Biotechnology for Biofuels 2, 25 (2009)
Tatusov, R., Koonin, E., Lipman, D.: A Genomic Perspective on Protein Families. Science 278(5338), 631–637 (1997)
Urich, T., Lanzen, A., Qi, J., Huson, D., Schleper, C., Schuster, S.: Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome. PLoS One 3(6), e2527 (2008)
Xiong, X., Frank, D., Robertson, C., et al.: Generation and Analysis of a Mouse Intestinal Metatranscriptome through Illumina Based RNA-Sequencing. PLoS One 7(4), e36009 (2012)
Zerbino, D., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821–829 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Leung, H.C.M., Yiu, S.M., Chin, F.Y.L. (2014). IDBA-MTP: A Hybrid MetaTranscriptomic Assembler Based on Protein Information. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-05269-4_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05268-7
Online ISBN: 978-3-319-05269-4
eBook Packages: Computer ScienceComputer Science (R0)