More Web Proxy on the site http://driver.im/

research-article

Open access

Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads

Authors:

Zaid Al-ArsAuthors Info & Claims

ICBBB '20: Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics

Pages 28 - 34

https://doi.org/10.1145/3386052.3386066

Published: 18 May 2020 Publication History

Abstract

The seeding heuristic is widely used in many DNA analysis applications to speed up the analysis time. In many applications, seeding takes a substantial amount of the total execution time. In this paper, we present an efficient GPU implementation for computing maximal exact matching (MEM) seeds in long DNA reads. We applied various optimizations to reduce the number of GPU global memory accesses and to avoid redundant computation. Our implementation also extracts maximum parallelism from the MEM computation tasks. We tested our implementation using data from the state-of-the-art third generation Pacbio DNA sequencers, which produces DNA reads that are tens of kilobases long. Our implementation is up to 9x faster for computing MEM seeds as compared to the fastest CPU implementation running on a server-grade machine with 24 threads. Computing suffix array intervals (first part of MEM computation) is up to 3x faster whereas calculating the location of the match (second part) is up to 9x faster. The implementation is publicly available at https://github.com/nahmedraja/GPUseed

References

[1]

T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.

[2]

S. F. Altschul et al. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403--410, 1990.

[3]

A. L. Delcher et al. Alignment of whole genomes. Nucleic Acids Research, 27(11):2369--2376, 1999.

[4]

P. Ferragina and G. Manzini. Opportunistic data structures with applications. In In IEEE FOCS 2000, FOCS '00, pages 390--398, 2000.

[5]

H. Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, May 2013.

[6]

Y. Liu and B. Schmidt. Long read alignment based on maximal exact match seeds. Bioinformatics, 28(18):i318-i324, 2012.

Digital Library

[7]

G. Miclotte et al. Jabba: hybrid error correction for long sequencing reads. Algorithms for Molecular Biology, 11(1):10, May 2016.

[8]

F. Fernandes and A. T. Freitas. slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array. Bioinformatics, 30(4):464--471, 2014.

[9]

M. Vyverman et al. essaMEM: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics, 29(6):802--804, 2013.

Digital Library

[10]

Y. Liu and B. Schmidt. CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing. Design Test, IEEE, 31(1):31--39, Feb 2014.

[11]

M. C. Schatz and C. Trapnell. Fast Exact String Matching on the GPU. Technical report, University of Maryland, 2007.

[12]

A. Abu-Doleh, K. Kaya, M. Abouelhoda, and Ü. V. Çatalyürek. Extracting Maximal Exact Matches on GPU. In 2014 IEEE International Parallel Distributed Processing Symposium Workshops, pages 1417--1426, May 2014.

Digital Library

[13]

J. Pantaleoni and N. Subtil. NVBIO. nvlabs.github.io/nvbio/, 2015.

[14]

T. W. Lam et al. High Throughput Short Read Alignment via Bi-directional BWT. In IEEE BIBM 2009, pages 31--36, Nov 2009.

Digital Library

[15]

H. Li. Burrows Wheeler Aligner version 0.5.9. github.com/lh3/bwa/releases/tag/bwa-0.5.9. Accessed 1 January, 2019.

[16]

N. Ahmed et al. GPU Accelerated API for Alignment of Genomics Sequencing Data. In IEEE BIBM 2017, pages 510--515, Nov 2017.

[17]

H. Li. Exploring single-sample SNP and indel calling with whole-genome de novo assembly. Bioinformatics, 28(14):1838--1844, Jul 2012.

Digital Library

[18]

J. Zhang et al. Optimizing burrows-wheeler transform-based sequence alignment on multicore architectures. In IEEE/ACM CCGrid 2013, May 2013.

Digital Library

[19]

D. Merrill. CUB - a configurable C++ template library of high-performance CUDA primitives. nvlabs.github.io/cub. Accessed 2 January, 2019.

[20]

Pacbio. datasets.pacb.com/2013/Human10x/READS/2530572/0001/Analysis_Results/m130929_024849_42213_c100518541910000001823079209281311_s1_p0.1.subreads.fasta, 2014.

Index Terms

Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads

Recommendations

GPU Acceleration of Pyrosequencing Noise Removal
SAAHPC '12: Proceedings of the 2012 Symposium on Application Accelerators in High Performance Computing

Amplicon Noise [1], an updated version of Py-ronoise [2], is a tool for removing noise from metagenomic data recorded by a 454 pyrosequencer. Amplicon Noise has shown to be effective in reducing overestimation of operational taxonomic units (OTUs) and ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
A Genomic Distance for Assembly Comparison Based on Compressed Maximal Exact Matches

Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICBBB '20: Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics

January 2020

160 pages

ISBN:9781450376761

DOI:10.1145/3386052

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Natl University of Singapore: National University of Singapore
RIED, Tokai Univ., Japan: RIED, Tokai University, Japan

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICBBB '20

ICBBB '20: 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics

January 19 - 22, 2020

Kyoto, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
235
Total Downloads

Downloads (Last 12 months)115
Downloads (Last 6 weeks)10

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten