[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2506583.2506644acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial
Open access

Suffix-Tree Based Error Correction of NGS Reads Using Multiple Manifestations of an Error

Published: 22 September 2013 Publication History

Abstract

Next Generation Sequencing (NGS) technologies produce large quantities of short length reads with higher error rates. Erroneous reads that cannot be aligned, are either ignored during de-novo sequencing, or must be suitably corrected. Such reads pose problems for mapping as well, since it is difficult to distinguish errors from true variants. Methods for detection and correction of errors typically rely on frequencies of substrings of the reads. Suffix trees are often utilized for this purpose, since they can be used to index and count the frequencies of substrings of all lengths. Existing suffix-tree based methods detect errors by identifying statistically under-represented branches (suffixes) and fix them. However, they do not refer back to the reads to put the correction in context. Since an error in a single read manifests itself at multiple nodes of a suffix tree, a read-driven approach that relies on its multiple manifestations is expected to perform better. Based on this observation, we develop an algorithm, PLURIBUS, which reconciles corrections suggested by multiple manifestations of an error using a voting scheme. We compare the accuracy of PLURIBUS in detecting and correcting errors against existing error correction techniques using simulated sequencing data. We also assess the impact of error correction on the performance of sequence assembly. Our results show that PLURIBUS corrects errors with improved precision and enables the assembler to generate longer contigs, particularly when the genome is longer, or coverage is lower. PLURIBUS is freely available at http://compbio.case.edu/pluribus/.

References

[1]
Keith R Bradnam, Joseph N Fass, Anton Alexandrov, Paul Baranay, Michael Bechner, İnanç Birol, Sébastien Boisvert10, Jarrod A Chapman, Guillaume Chapuis, Rayan Chikhi, et al.Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. arXiv preprint arXiv:1301.5406, 2013.
[2]
Weichun Huang, Leping Li, Jason R. Myers, and Gabor T. Marth. Art: a next-generation sequencing read simulator. Bioinformatics, 28(4):593--594, 2012.
[3]
Lucian Ilie, Farideh Fazayeli, and Silvana Ilie. Hitec: accurate error correction in high-throughput sequencing data. Bioinformatics, 27(3):295--302, 2011.
[4]
David Kelley, Michael Schatz, and Steven Salzberg. Quake: quality-aware detection and correction of sequencing errors. Genome Biology, 11(11):R116, 2010.
[5]
Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An eulerian path approach to dna fragment assembly. Proceedings of the National Academy of Sciences, 98(17):9748--9753, 2001.
[6]
Matthew Ruffalo, Thomas LaFramboise, and Mehmet Koyutürk. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics, 27(20):2790--2796, October 2011.
[7]
Leena Salmela. Correction of sequencing errors in a mixed set of reads. Bioinformatics, 26(10):1284--1290, 2010.
[8]
Jan Schröder, Heiko Schröder, Simon J. Puglisi, Ranjan Sinha, and Bertil Schmidt. Shrec: a short-read error correction method. Bioinformatics, 25(17):2157--2163, 2009.
[9]
Jay Shendure and Hanlee Ji. Next-generation dna sequencing. Nature biotechnology, 26(10):1135--1145, 2008.
[10]
Xiao Yang, Karin S. Dorman, and Srinivas Aluru. Reptile: representative tiling for short read error correction. Bioinformatics, 26(20):2526--2533, October 2010.
[11]
Daniel R Zerbino and Ewan Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821--829, 2008.

Cited By

View all
  • (2017)Pluribus—Exploring the Limits of Error Correction Using a Suffix TreeIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.258606014:6(1378-1388)Online publication date: 1-Nov-2017
  • (2017)DNA-Seq Error Correction Based on Substring IndicesAlgorithms for Next-Generation Sequencing Data10.1007/978-3-319-59826-0_7(147-166)Online publication date: 19-Sep-2017
  • (2017)String-Matching and Alignment Algorithms for Finding Motifs in NGS DataAlgorithms for Next-Generation Sequencing Data10.1007/978-3-319-59826-0_11(235-264)Online publication date: 19-Sep-2017
  • Show More Cited By

Index Terms

  1. Suffix-Tree Based Error Correction of NGS Reads Using Multiple Manifestations of an Error

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
      September 2013
      987 pages
      ISBN:9781450324342
      DOI:10.1145/2506583
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 September 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Bioinformatics
      2. error correction
      3. next generation sequencing
      4. sequence assembly
      5. suffix trees

      Qualifiers

      • Tutorial
      • Research
      • Refereed limited

      Conference

      BCB'13
      Sponsor:
      BCB'13: ACM-BCB2013
      September 22 - 25, 2013
      Wshington DC, USA

      Acceptance Rates

      BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%;
      Overall Acceptance Rate 254 of 885 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)311
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Pluribus—Exploring the Limits of Error Correction Using a Suffix TreeIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.258606014:6(1378-1388)Online publication date: 1-Nov-2017
      • (2017)DNA-Seq Error Correction Based on Substring IndicesAlgorithms for Next-Generation Sequencing Data10.1007/978-3-319-59826-0_7(147-166)Online publication date: 19-Sep-2017
      • (2017)String-Matching and Alignment Algorithms for Finding Motifs in NGS DataAlgorithms for Next-Generation Sequencing Data10.1007/978-3-319-59826-0_11(235-264)Online publication date: 19-Sep-2017
      • (2016) Objective review of de novo stand‐alone error correction methods for NGS data WIREs Computational Molecular Science10.1002/wcms.12396:2(111-146)Online publication date: 11-Jan-2016

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media