research-article

Open access

Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment

Authors:

Alberto Zeni,

Seth Onken,

Marco Domenico Santambrogio,

Mehrzad SamadiAuthors Info & Claims

PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques

Pages 133 - 143

https://doi.org/10.1145/3656019.3676894

Published: 13 October 2024 Publication History

All formats PDF

Abstract

Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-the-art baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 × and 8.03 × using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z-drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture.

References

[1]

Stephen F Altschul, Thomas L Madden, Alejandro A Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25, 17 (1997), 3389–3402.

Abstract

References

Index Terms

Recommendations

Improving performance of GPU code using novel features of the NVIDIA kepler architecture

Accelerated high-performance computing through efficient multi-process GPU resource sharing

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations