[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2808719.2811429acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

A new method for DNA sequencing error verification and correction via an on-disk index tree

Published: 09 September 2015 Publication History

Abstract

Existing sequencing error correction techniques demand large expensive memory space. In this work, we introduce a new disk-based sequencing error correction method to solve the problem. The key idea is to utilize a special on-disk index structure, called the BoND-tree, to store and access a large set of k-mers and their associated metadata on disk. With the BoND-tree, a set of special box queries to retrieve the relevant k-mers and their counts are efficiently processed. A comprehensive voting mechanism is adopted to determine and correct an erroneous base in a genome sequence. Experiments demonstrate that the proposed method is quite promising in verifying and correcting sequencing errors in terms of accuracy and scalability.

References

[1]
C. Chen, A. Watve, S. Pramanik and Q. Zhu. The BoND-tree: An efficient indexing method for box queries in nonordered discrete data spaces. IEEE TKDE, 25(11):2629--2643, 2013.
[2]
D. R. Kelley, M. C. Schatz, S. L. Salzberg, et al. Quake: quality-aware detection and correction of sequencing errors. Genome Biol, 11(11):R116, 2010.
[3]
G. Marçais and C. Kingsford. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6):764--770, 2011.
[4]
G. Qian, Q. Zhu, Q. Xue, and S. Pramanik. Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach. ACM TODS, 31(2):439--484, 2006.
[5]
G. Qian, Q. Zhu, Q. Xue and S. Pramanik. A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces. ACM TOIS, 23(1):79--110, 2006.

Cited By

View all
  • (2020)VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence AnalysesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.288595232:3(602-616)Online publication date: 1-Mar-2020
  • (2019)The BINDS-Tree: A Space-Partitioning Based Indexing Scheme for Box Queries in Non-Ordered Discrete Data SpacesIEICE Transactions on Information and Systems10.1587/transinf.2018DAP0005E102.D:4(745-758)Online publication date: 1-Apr-2019
  • (2016) Objective review of de novo stand‐alone error correction methods for NGS data WIREs Computational Molecular Science10.1002/wcms.12396:2(111-146)Online publication date: 11-Jan-2016

Index Terms

  1. A new method for DNA sequencing error verification and correction via an on-disk index tree

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
        September 2015
        683 pages
        ISBN:9781450338530
        DOI:10.1145/2808719
        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 09 September 2015

        Check for updates

        Author Tags

        1. bioinformatics
        2. disk index tree
        3. sequencing error correction

        Qualifiers

        • Poster

        Funding Sources

        • US National Science Foundation

        Conference

        BCB '15
        Sponsor:

        Acceptance Rates

        BCB '15 Paper Acceptance Rate 48 of 141 submissions, 34%;
        Overall Acceptance Rate 254 of 885 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 19 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2020)VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence AnalysesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.288595232:3(602-616)Online publication date: 1-Mar-2020
        • (2019)The BINDS-Tree: A Space-Partitioning Based Indexing Scheme for Box Queries in Non-Ordered Discrete Data SpacesIEICE Transactions on Information and Systems10.1587/transinf.2018DAP0005E102.D:4(745-758)Online publication date: 1-Apr-2019
        • (2016) Objective review of de novo stand‐alone error correction methods for NGS data WIREs Computational Molecular Science10.1002/wcms.12396:2(111-146)Online publication date: 11-Jan-2016

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media