[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2649387.2649434acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Focus: a new multilayer graph model for short read analysis and extraction of biologically relevant features

Published: 20 September 2014 Publication History

Abstract

With the increasing number of applications in which a group of organisms associated with a common environment are sequenced, there is an urgent need for a new model for representing the sequenced short reads in a way that takes the nature of these organisms into consideration. In addition to facilitating the assembly process, such new models should allow for easy extraction of other useful biological information from the short reads, including conserved regions among the input genomics, sequence motifs, and other information critical to the recognition and/or classification of the organisms. We present Focus, a new multilayer graph model for short read analysis and extraction of biologically relevant features. The proposed model can be viewed as a data-mining tool that takes advantage of the multilayer graph representation of the reads to extract useful information about the associated genomes/organisms. While not primarily an assembly tool, we assessed Focus using known assemblers with excellent results. We also applied Focus in a case study on a HIV read dataset and were able to successfully extract biologically relevant graph features.

References

[1]
Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Sayers, E. W. 2012. GenBank. Nucleic Acids Res., 28, 1 (Nov. 2012), 15--18. DOI= http://dx.doi.org/10.1093/nar/gks1195.
[2]
Chevreux, B. 2005. MIRA: An Automated Genome and EST Assembler. Doctoral Thesis. OCLC Number: 222326348., Heidelberg University.
[3]
Csardi, G., and Nepusz, T. 2006. The igraph software package for complex network research. InterJournal, Complex Systems, 1695, 5.
[4]
Delmont, T. O., Robe, P., Cecillon, S., Clark, I. M., Constancias, F., Simonet, P., Hirsch, P. R., and Vogel, T. M. 2011. Accessing the soil metagenome for studies of microbial diversity. Appl. Environ. Microbiol., 77, 4 (Feb. 2011), 1315--1324. DOI= http://dx.doi.org/10.1128/AEM.01526-10.
[5]
Finegold, S. M. Dowd, S. E., Gontcharova, V., Liu, C., Henley, K. E., Wolcott, R. D., Youn, E., Summanen, P. H., Granpeesheh, D., Dixon, D., Liu, M., Molitoris D. R., and Green III, J. A. 2010. Pyrosequencing study of fecal microflora of autistic and control children. Anaerobe, 16,4 (June 2010), 444--453. DOI= http://dx.doi.org/10.1016/j.anaerobe.2010.06.008.
[6]
Holland Computing Center, 2010. Tusker. http://hcc.unl.edu/tusker/index.php.
[7]
Hong, S. H. Bunge, J., Jeon, S. O., and Epstein, S. S. 2006. Predicting microbial species richness. Proc. Natl Acad. Sci. U.S.A., 103, 1 (Nov. 2005), 117--122. DOI= http://dx.doi.org/10.1073/pnas.0507245102.
[8]
Hsiao, E. Y., McBride, S. W., Hsien, S., Sharon, G., Hyde, E. R., McCue, T., Coodelli J. A, Chow, J., Reisman, S. E., Petrosino J. F., Patterson P. H., and Mazmanian, S. K. 2013. Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell, 155, 7 (Dec. 2013), 1451--1463. DOI= http://dx.doi.org/10.1016/j.cell.2013.11.024.
[9]
Huang, X. and Madan A. 1999. CAP3: A DNA sequence assembly program. Genome Res., 9, 9 (Sep. 1999), 868--877. DOI= http://dx.doi.org/10.1101/gr.9.9.868.
[10]
Huang, X., Wang, J., Aluru, S., Yang, S. P., and Hillier, L. 2003. PCAP: a whole-genome assembly program. Genome Res., 13, (July 2003), 2164--2170. DOI= http://dx.doi.org/10.1101/gr.1390403.
[11]
Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature, 486, 7402 (June 2012) 207--214. DOI= http://dx.doi.org/10.1038/nature11234.
[12]
Karypis, G. and Kumar, V. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20, 1 (July 2006), 359--392. DOI= http://dx.doi.org/10.1137/S1064827595287997.
[13]
Kennedy, J., Flemer, B., Jackson, S. A., Lejon, D. P., Morrissey, J. P., O'gara, F., and Dobson, A. D. 2010. Marine metagenomics: new tools for the study and exploitation of marine microbial metabolism. Mar. Drugs, 8, 3 (March 2010), 608--628. DOI= http://dx.doi.org/10.3390/md8030608.
[14]
Nagarajan, N. and Pop, M. 2013. Sequence assembly demystified. Nat. Rev. Genet., 14, (March 2013), 157--167. DOI= http://dx.doi.org/10.1038/nrg3367.
[15]
Lai, B., Ding, R., Li, Y., Duan, L., and Zhu, H. 2012. A de novo metagenomic assembly program for shotgun DNA reads. Bioinformatics, 28, 11 (April 2012), 1455--1462. DOI= http://dx.doi.org/10.1093/bioinformatics/bts162.
[16]
Laserson, J., Jojic, V., and Koller, D. 2011. Genovo: de novo assembly for metagenomes. J. Comput. Biol., 18, 3 (March 2011), 429--443. DOI= http://dx.doi.org/10.1089/cmb.2010.0244.
[17]
Larsson, N. J. and Sadakane, K. 2007. Faster suffix sorting. Theor. Comput. Science, 387, 3 (Nov. 2007), 258--272. DOI= http://dx.doi.org/10.1016/j.tcs.2007.07.017.
[18]
Myers, E. W. et al. 2000. A Whole-Genome Assembly of Drosophila. Science, 287, 5461 (March 2000), 2196--2204. DOI= http://dx.doi.org/10.1126/science.287.5461.2196.
[19]
Nami ki, T. et al. 2012. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res., 40, 20 (Nov. 2012). DOI= http://dx.doi.org/10.1093/nar/gks678.
[20]
Namiki, T., Hachiya, T., Tanaka, H., and Sakakibara, Y. 2010. IDBA--a practical iterative de Bruijn graph de novo assembler. In Proceedings of Research in Computational Molecular Biology, Lisbon, Portugal, April 2010, Springer, Berlin Heidelberg, 426--440. DOI= http://dx.doi.org/10.1007/978-3-642-12683-3_28.
[21]
Peng, Y. Leung, H. C., Yiu, S. M., and Chin, F. Y. 2012. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 11 (April 2012), 1420--1428. DOI= http://dx.doi.org/10.1093/bioinformatics/bts174.
[22]
Pevzner, P. A. Tang, H., and Waterman, M. S. 2001. A Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A., 98, 17 (August 2001) 9748--9753. DOI= http://dx.doi.org/10.1073/pnas.171285098.
[23]
Pruitt, K. D., Tatusova, T., Klimke, W., and Maglott, D. R. 2009. NCBI reference sequences: current status, policy and new initiatives. Nucleic Acids Res., 37, suppl 1(Oct. 2009), D32--D36. DOI= http://dx.doi.org/10.1093/nar/gkn721.
[24]
Rasmussen, K. R., and Myers, E. W. 2006. Efficient q-gram filters for finding all e-matches over a given length. Journal of Computational Biology, 13, 2 (April 2006), 296--308. DOI= http://dx.doi.org/10.1089/cmb.2006.13.296.
[25]
Richter, D. C., Ott, F., Auch, A. F., Schmid, R., and Huson, D. H. 2008. MetaSim---A sequencing simulator for genomics and metagenomics. PloS One, 3, 10 (Oct. 2008), e3373. DOI= http://dx.doi.org/10.1371/journal.pone.0003373.
[26]
Ruby, J. G., Bellare, P., and DeRisi, J. L. 2013. PRICE: software for the targeted assembly of components of (meta) genomic sequence data. G3: Genes, Genomes, Genetics, 3, 5 (March 2013), 865--880. DOI= http://dx.doi.org/10.1534/g3.113.005967.
[27]
Scholz, M. B., Lo, C. C., and Chain, P. S. 2012. Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr. Opin. Biotech., 23, 1 (Feb. 2012), 9--15. DOI= http://dx.doi.org/10.1016/j.copbio.2011.11.013.
[28]
Simpson, J. T. Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J., and Birol, I., 2009. ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 6 (Feb. 2009), 1117--1123. DOI= http://dx.doi.org/10.1101/gr.089532.108.
[29]
Treangen, T. J. and Salzberg, S. L. 2012. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Rev. Genet., 13, 1, (Jan. 2012), 36--46. DOI = http://dx.doi.org/10.1038/nrg3117.
[30]
Unterseher, M., Jumpponen, A. R. I., Öpik, M., Tedersoo, L., Moora, M., Dormann, C. F., and Schnittler, M 2011. Species abundance distributions and richness estimations in fungal metagenomics--lessons learned from community ecology. Mol. Ecol., 20, 2 (Jan. 2011), 275--285. DOI= http://dx.doi.org/10.1111/j.1365-294X.2010.04948.x.
[31]
Vigna, S. 2008. Broadword implementation of rank/select queries. Experimental Algorithms, 5038, 154--168. DOI= http://dx.doi.org/10.1007/978-3-540-68552-4_12.
[32]
Warnke, J. and Ali, H. H. 2012. An efficient overlap graph coarsening approach for modeling short reads. In Proceedings of Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on, Philadelphia, PA, Oct. 2012, IEEE, 704--711. DOI = http://dx.doi.org/10.1109/BIBMW.2012.6470223.
[33]
Warnke, J. D. and Ali, H. H. 2013. An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads. BMC Bioinformatics, 14, Suppl. 11 (Nov. 2013), S7. DOI= http://dx.doi.org/10.1186/1471-2105-14-S11-S7.
[34]
Zerbino, D. R. and Birney, E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 5 (March 2008), 821--829. DOI= http://dx.doi.org/10.1101/gr.074492.107.

Cited By

View all
  • (2017)Parallel NGS Assembly Using Distributed Assembly Graphs Enriched with Biological Knowledge2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.143(273-282)Online publication date: May-2017
  • (2017)On the integration of assembly and non-assembly approaches for comparing biological sequences2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM.2017.8218007(2232-2234)Online publication date: Nov-2017
  • (2016)Graph mining for next generation sequencing: leveraging the assembly graph for biological insightsBMC Genomics10.1186/s12864-016-2678-217:1Online publication date: 6-May-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
  • General Chairs:
  • Pierre Baldi,
  • Wei Wang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data-mining
  2. graph modeling
  3. metagenomics
  4. next generation sequencing

Qualifiers

  • Short-paper

Funding Sources

Conference

BCB '14
Sponsor:
BCB '14: ACM-BCB '14
September 20 - 23, 2014
California, Newport Beach

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Parallel NGS Assembly Using Distributed Assembly Graphs Enriched with Biological Knowledge2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.143(273-282)Online publication date: May-2017
  • (2017)On the integration of assembly and non-assembly approaches for comparing biological sequences2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM.2017.8218007(2232-2234)Online publication date: Nov-2017
  • (2016)Graph mining for next generation sequencing: leveraging the assembly graph for biological insightsBMC Genomics10.1186/s12864-016-2678-217:1Online publication date: 6-May-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media