[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution

Published: 01 July 2017 Publication History

Abstract

The purpose of de novo assembly is to report more contiguous, complete, and less error prone contigs. Thanks to the advent of the next generation sequencing NGS technologies, the cost of producing high depth reads is reduced greatly. However, due to the disadvantages of NGS, de novo assembly has to face the difficulties brought by repeat regions, error rate, and low sequencing coverage in some regions. Although many de novo algorithms have been proposed to solve these problems, the de novo assembly still remains a challenge. In this article, we developed an iterative seed-extension algorithm for de novo assembly, called ISEA. To avoid the negative impact induced by error rate, ISEA utilizes reads overlap and paired-end information to correct error reads before assemblying. During extending seeds in a De Bruijn graph, ISEA uses an elaborately designed score function based on paired-end information and the distribution of insert size to solve the repeat region problem. By employing the distribution of insert size, the score function can also reduce the influence of error reads. In scaffolding, ISEA adopts a relaxed strategy to join contigs that were terminated for low coverage during the extension. The performance of ISEA was compared with six previous popular assemblers on four real datasets. The experimental results demonstrate that ISEA can effectively obtain longer and more accurate scaffolds.

References

[1]
M. Margulies, M. Egholm, W. E. Altman, et al., "Genome sequencing in microfabricated high-density picolitre reactors," Nature, vol. 437.7057, pp. 376-380, 2005.
[2]
X. Q. Huang and S. P. Yang, "Generating a genome assembly with PCAP," in Current Protocols in Bioinformatics, Berlin, Germany: Springer, 2005,pp. 11-3.
[3]
E. W. Myers, G. G. Sutton, A. L. Delcher, et al., "A whole-genome assembly of Drosophila," Science, vol. 287, no. 5461, pp. 2196- 2204, 2000.
[4]
X. Huang and A. Madan, " CAP3: A DNA sequence assembly program," Genome Res., vol. 9, no. 9, pp. 868-877, 1999.
[5]
Y. Peng, H. C. Leung, S. M. Yiu, et al., "IDBA-A practical iterative de Bruijn graph de novo assembler," in Research in Computational Molecular Biology, Berlin Heidelberg, Springer, 2010, pp. 426-440.
[6]
P. A. Pevzner, H. Tang, and M. S. Waterman, "An Eulerian path approach to DNA fragment assembly," Proc. Nat. Academy Sci., vol. 98, no. 17, pp. 9748-9753, 2001.
[7]
D. R. Zerbino and E. Birney, "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs," Genome Res., vol. 18, no. 5, pp. 821-829, 2008.
[8]
J. T. Simpson, K. Wong, S. D. Jackman, et al., "ABySS: A parallel assembler for short read sequence data," Genome Res., vol. 19, no. 6, pp. 1117-1123, 2009.
[9]
I. MacCallum, et al., "ALLPATHS 2: Small genomes assembled accurately and with high continuity from short paired reads," Genome Biol., vol. 10, p. R103, 2009.
[10]
R. Luo, B. Liu, Y. Xie, Z. Li, et al., "SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler," Gigascience, vol. 1, no. 1, p. 18, 2012.
[11]
Y. Peng, H. C. M Leung, S. M Yiu, et al., "IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth," Bioinf., vol. 28, no. 11, pp. 1420-1428, 2012.
[12]
A. Bankevich, S. Nurk, D. Antipov, et al., "SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing," J. Comput. Biol., vol. 19, no. 5, pp. 455-477, 2012.
[13]
A. V. Zimin, G. Marçais, D. Puiu, et al., "The MaSuRCA genome assembler," Bioinf., vol. 29, no. 21, pp. 2669-2677, 2013.
[14]
J. Luo, J. Wang, Z. Zhang, F. X. Wu, M. Li, et al., "EPGA: de novo assembly using the distributions of reads and insert size," Bioinf., vol. 31, no. 6, pp. 825-833, 2014.
[15]
J. Luo, J. Wang, W. Li, Z. Zhang, F. Wu, M. Li, et al., "EPGA2: Memory-efficient de novo assembler," Bioinf., vol. 31, no. 24, pp. 3988-3990, 2015.
[16]
R. L. Warren, G. G Sutton, S. J. Jones, et al., "Assembling millions of short DNA sequences using SSAKE," Bioinf., vol. 23, no. 4, pp. 500-501, 2007.
[17]
J. C. Dohm, C. Lottaz, T. Borodina, et al., "SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing," Genome Res., vol. 17, no. 11, pp. 1697-1706, 2007.
[18]
W. R. Jeck, J. A. Reinhardt, D. A. Baltrus, M. T. Hickenbotham, et al., "Extending assembly of short DNA sequences to handle error," Bioinf., vol. 23, no. 21, pp. 2942-2944, 2007.
[19]
D. W. Bryant, W. K. Wong, and T. C. Mockler, "QSRA-A quality-value guided de novo short read assembler," BMC Bioinf., vol. 10, no. 1, p. 69, 2009.
[20]
P. N. Ariyaratne and W. K. Sung, "PE-Assembler: de novo assembler using short paired-end reads," Bioinf., vol. 27, no. 2, pp. 167-174, 2011.
[21]
M. A. Bresler, S. Sheehan, A. H. Chan, and Y. S. Song, "Telescoper: de novo assembly of highly repetitive regions," Bioinf., vol. 28, no. 18, pp. i311-i317, 2012.
[22]
S. Boisvert, F. Laviolette, and J. Corbeil, "Ray: Simultaneous assembly of reads from a mix of high-throughput sequencing technologies," J. Comput. Biol., vol. 17, pp. 1519-1533, 2010.
[23]
A. D. Prjibelski, I. Vasilinetc, A. Bankevich, A. Gurevich, T. Krivosheeva, et al., "ExSPAnder: A universal repeat resolver for DNA fragment assembly," Bioinf., vol. 30, pp. i293-i301, 2014.
[24]
B. Langmead and S. L. Salzberg, "Fast gapped-read alignment with Bowtie 2," Nature Methods, vol. 9, no. 4, pp. 357-359, 2012.
[25]
S. L. Salzberg, A. M. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren, et al., "GAGE: A critical evaluation of genome assemblies and assembly algorithms," Genome Res., vol. 22, no. 3, pp. 557-567, 2012.
[26]
D. Earl, K. Bradnam, J. S. John, A. Darling, D. Lin, J. Fass, et al., "Assemblathon 1: A competitive assessment of de novo short read assembly methods," Genome Res., vol. 21, no. 12, pp. 2224- 2241, 2011.
[27]
A. Gurevich, V. Saveliev, N. Vyahhi, and G. Tesler, "QUAST: Quality assessment tool for genome assemblies," Bioinf., vol. 29, pp. 1072-1075, 2013.
[28]
W. Huang, L. Li, J. R. Myers, and Y. Pan, "ART: A next-generation sequencing read simulator," Bioinf., vol. 28, no. 4, pp. 593-594, 2012.
[29]
X. Guo, Y. Ning, X. Ding, J. Wang, and Y. Pan, "DIME: A novel framework for de novo metagenomic sequence assembly," J. Comput. Biol., vol. 22, no. 2, pp. 159-177, 2015.
[30]
X. Guo, X. Ding, Y. M., and Y. Pan, "Cloud computing for de novo metagenomic sequence assembly," Bioinformatics Research and Applications, Berlin, Germany: Springer, 2013, pp. 185-198.

Cited By

View all
  • (2023)An Optimized Scaffolding Algorithm for Unbalanced SequencingNew Generation Computing10.1007/s00354-023-00221-641:3(553-579)Online publication date: 1-Sep-2023
  • (2020)GapReduce: A Gap Filling Algorithm Based on Partitioned Read SetsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2018.278990917:3(877-886)Online publication date: 1-May-2020
  • (2019)A Novel Scaffolding Algorithm Based on Contig Error Correction and Path ExtensionIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2018.285826716:3(764-773)Online publication date: 15-Jul-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 14, Issue 4
July 2017
250 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2017
Published in TCBB Volume 14, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An Optimized Scaffolding Algorithm for Unbalanced SequencingNew Generation Computing10.1007/s00354-023-00221-641:3(553-579)Online publication date: 1-Sep-2023
  • (2020)GapReduce: A Gap Filling Algorithm Based on Partitioned Read SetsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2018.278990917:3(877-886)Online publication date: 1-May-2020
  • (2019)A Novel Scaffolding Algorithm Based on Contig Error Correction and Path ExtensionIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2018.285826716:3(764-773)Online publication date: 15-Jul-2019

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media