[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3386052.3386066acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbbConference Proceedingsconference-collections
research-article
Open access

Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads

Published: 18 May 2020 Publication History

Abstract

The seeding heuristic is widely used in many DNA analysis applications to speed up the analysis time. In many applications, seeding takes a substantial amount of the total execution time. In this paper, we present an efficient GPU implementation for computing maximal exact matching (MEM) seeds in long DNA reads. We applied various optimizations to reduce the number of GPU global memory accesses and to avoid redundant computation. Our implementation also extracts maximum parallelism from the MEM computation tasks. We tested our implementation using data from the state-of-the-art third generation Pacbio DNA sequencers, which produces DNA reads that are tens of kilobases long. Our implementation is up to 9x faster for computing MEM seeds as compared to the fastest CPU implementation running on a server-grade machine with 24 threads. Computing suffix array intervals (first part of MEM computation) is up to 3x faster whereas calculating the location of the match (second part) is up to 9x faster. The implementation is publicly available at https://github.com/nahmedraja/GPUseed

References

[1]
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.
[2]
S. F. Altschul et al. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403--410, 1990.
[3]
A. L. Delcher et al. Alignment of whole genomes. Nucleic Acids Research, 27(11):2369--2376, 1999.
[4]
P. Ferragina and G. Manzini. Opportunistic data structures with applications. In In IEEE FOCS 2000, FOCS '00, pages 390--398, 2000.
[5]
H. Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, May 2013.
[6]
Y. Liu and B. Schmidt. Long read alignment based on maximal exact match seeds. Bioinformatics, 28(18):i318-i324, 2012.
[7]
G. Miclotte et al. Jabba: hybrid error correction for long sequencing reads. Algorithms for Molecular Biology, 11(1):10, May 2016.
[8]
F. Fernandes and A. T. Freitas. slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array. Bioinformatics, 30(4):464--471, 2014.
[9]
M. Vyverman et al. essaMEM: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics, 29(6):802--804, 2013.
[10]
Y. Liu and B. Schmidt. CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing. Design Test, IEEE, 31(1):31--39, Feb 2014.
[11]
M. C. Schatz and C. Trapnell. Fast Exact String Matching on the GPU. Technical report, University of Maryland, 2007.
[12]
A. Abu-Doleh, K. Kaya, M. Abouelhoda, and Ü. V. Çatalyürek. Extracting Maximal Exact Matches on GPU. In 2014 IEEE International Parallel Distributed Processing Symposium Workshops, pages 1417--1426, May 2014.
[13]
J. Pantaleoni and N. Subtil. NVBIO. nvlabs.github.io/nvbio/, 2015.
[14]
T. W. Lam et al. High Throughput Short Read Alignment via Bi-directional BWT. In IEEE BIBM 2009, pages 31--36, Nov 2009.
[15]
H. Li. Burrows Wheeler Aligner version 0.5.9. github.com/lh3/bwa/releases/tag/bwa-0.5.9. Accessed 1 January, 2019.
[16]
N. Ahmed et al. GPU Accelerated API for Alignment of Genomics Sequencing Data. In IEEE BIBM 2017, pages 510--515, Nov 2017.
[17]
H. Li. Exploring single-sample SNP and indel calling with whole-genome de novo assembly. Bioinformatics, 28(14):1838--1844, Jul 2012.
[18]
J. Zhang et al. Optimizing burrows-wheeler transform-based sequence alignment on multicore architectures. In IEEE/ACM CCGrid 2013, May 2013.
[19]
D. Merrill. CUB - a configurable C++ template library of high-performance CUDA primitives. nvlabs.github.io/cub. Accessed 2 January, 2019.
[20]
Pacbio. datasets.pacb.com/2013/Human10x/READS/2530572/0001/Analysis_Results/m130929_024849_42213_c100518541910000001823079209281311_s1_p0.1.subreads.fasta, 2014.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICBBB '20: Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics
January 2020
160 pages
ISBN:9781450376761
DOI:10.1145/3386052
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Natl University of Singapore: National University of Singapore
  • RIED, Tokai Univ., Japan: RIED, Tokai University, Japan

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DNA analysis
  2. GPU
  3. maximal exact matches
  4. seeding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICBBB '20

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 216
    Total Downloads
  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)55
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media