[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3656019.3676894acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Open access

Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment

Published: 13 October 2024 Publication History

Abstract

Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-the-art baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 × and 8.03 × using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z-drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture.

References

[1]
Stephen F Altschul, Thomas L Madden, Alejandro A Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25, 17 (1997), 3389–3402.
[2]
Mia Yang Ang, Teck Yew Low, Pey Yee Lee, Wan Fahmi Wan Mohamad Nazarie, Victor Guryev, and Rahman Jamal. 2019. Proteogenomics: from next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clinica chimica acta 498 (2019), 38–46.
[3]
Christiam Camacho, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. 2009. BLAST+: architecture and applications. BMC bioinformatics 10, 1 (2009), 1–9.
[4]
Alejandro Chacón, Santiago Marco-Sola, Antonio Espinosa, Paolo Ribeca, and Juan Carlos Moure. 2014. Thread-cooperative, bit-parallel computation of levenshtein distance on GPU. In Proceedings of the 28th ACM international conference on Supercomputing. 103–112.
[5]
Nan Ding and Samuel Williams. 2019. An Instruction Roofline Model for GPUs. 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) (2019).
[6]
Can Firtina, Jisung Park, Mohammed Alser, Jeremie S Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, 2023. BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis. NAR Genomics and Bioinformatics 5, 1 (2023), lqad004.
[7]
Giulia Gerometta, Alberto Zeni, and Marco D Santambrogio. 2023. TSUNAMI: A GPU implementation of the WFA algorithm. In 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 150–161.
[8]
Sneha D Goenka, Yatish Turakhia, Benedict Paten, and Mark Horowitz. 2020. SegAlign: A scalable GPU-based whole genome aligner. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–13.
[9]
Osamu Gotoh. 1982. An improved algorithm for matching biological sequences. Journal of molecular biology 162, 3 (1982), 705–708.
[10]
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th annual international symposium on Computer architecture. 37–47.
[11]
Saurabh Kalikar, Chirag Jain, Vasimuddin Md, and Sanchit Misra. 2022. Accelerating long-read analysis on modern CPUs. bioRxiv (2022), 2021–07.
[12]
Ben Langmead and Steven L Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 4 (2012), 357.
[13]
Heng Li. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013).
[14]
Heng Li. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 18 (2018), 3094–3100.
[15]
Chi-Man Liu, Thomas Wong, Edward Wu, Ruibang Luo, Siu-Ming Yiu, Yingrui Li, Bingqiang Wang, Chang Yu, Xiaowen Chu, Kaiyong Zhao, 2012. SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28, 6 (2012), 878–879.
[16]
Yongchao Liu, Tuan-Tu Tran, Felix Lauenroth, and Bertil Schmidt. 2014. SWAPHI-LS: Smith-Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In 2014 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 257–265.
[17]
DR Mani, Karsten Krug, Bing Zhang, Shankha Satpathy, Karl R Clauser, Li Ding, Matthew Ellis, Michael A Gillette, and Steven A Carr. 2022. Cancer proteogenomics: current impact and future prospects. Nature Reviews Cancer 22, 5 (2022), 298–313.
[18]
Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443–453.
[19]
Torbjørn Rognes and Erling Seeberg. 2000. Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16, 8 (2000), 699–706.
[20]
Harisankar Sadasivan, Milos Maric, Eric Dawson, Vishanth Iyer, Johnny Israeli, and Satish Narayanasamy. 2023. Accelerating Minimap2 for accurate long read alignment on GPUs. Journal of biotechnology and biomedicine 6, 1 (2023), 13.
[21]
Eric E Schadt, Steve Turner, and Andrew Kasarskis. 2010. A window into third-generation sequencing. Human molecular genetics 19, R2 (2010), R227–R240.
[22]
Gloria M Sheynkman, Michael R Shortreed, Anthony J Cesnik, and Lloyd M Smith. 2016. Proteogenomics: integrating next-generation sequencing and mass spectrometry to characterize human proteomic variation. Annual review of analytical chemistry 9 (2016), 521–545.
[23]
Temple F Smith, Michael S Waterman, 1981. Identification of common molecular subsequences. Journal of molecular biology 147, 1 (1981), 195–197.
[24]
Zachary D Stephens, Skylar Y Lee, Faraz Faghri, Roy H Campbell, Chengxiang Zhai, Miles J Efron, Ravishankar Iyer, Michael C Schatz, Saurabh Sinha, and Gene E Robinson. 2015. Big data: astronomical or genomical?PLoS biology 13, 7 (2015), e1002195.
[25]
Pawel Suwinski, ChuangKee Ong, Maurice HT Ling, Yang Ming Poh, Asif M Khan, and Hui San Ong. 2019. Advancing personalized medicine through the application of whole exome sequencing and big data analytics. Frontiers in genetics 10 (2019), 49.
[26]
Hajime Suzuki and Masahiro Kasahara. 2018. Introducing difference recurrence relations for faster semi-global alignment of long sequences. BMC bioinformatics 19, 1 (2018), 45.
[27]
Carolina Teng. 2022. Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine.
[28]
Carolina Teng, Renan W Achjian, Caio C Braga, Marcelo K Zuffo, and Wang J Chau. 2021. Accelerating the base-level alignment step of DNA assembling in Minimap2 Algorithm using FPGA. In 2021 IEEE 12th Latin America Symposium on Circuits and System (LASCAS). IEEE, 1–4.
[29]
Yatish Turakhia, Gill Bejerano, and William J Dally. 2018. Darwin: A genomics co-processor provides up to 15,000 x acceleration on long read assembly. In ACM SIGPLAN Notices, Vol. 53. ACM, 199–213.
[30]
Eric S Tvedte, Mark Gasser, Benjamin C Sparklin, Jane Michalski, Carl E Hjelmen, J Spencer Johnston, Xuechu Zhao, Robin Bromley, Luke J Tallon, Lisa Sadzewicz, 2021. Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes. G3 11, 6 (2021), jkab083.
[31]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65–76.
[32]
Qisheng Xu, Yong Dou, and Yanjie Sun. 2022. Accelerating minimap2 for long-read sequencing on NUMA multi-core CPU. In Proceedings of the 5th International Conference on Computer Science and Software Engineering. 152–158.
[33]
Alberto Zeni, Guido Walter Di Donato, Alessia Della Valle, Filippo Carloni, and Marco D Santambrogio. 2023. On the Genome Sequence Alignment FPGA Acceleration via KSW2z. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5.
[34]
Alberto Zeni, Guido Walter Di Donato, Lorenzo Di Tucci, Marco Rabozzi, and Marco D Santambrogio. 2021. The Importance of Being X-Drop: High Performance Genome Alignment on Reconfigurable Hardware. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 133–141.
[35]
Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, and Katherine Yelick. 2020. Logan: High-performance gpu-based x-drop long-read alignment. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 462–471.
[36]
Anbo Zhou, Timothy Lin, and Jinchuan Xing. 2019. Evaluating nanopore sequencing data processing pipelines for structural variation identification. Genome biology 20 (2019), 1–13.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques
October 2024
375 pages
ISBN:9798400706318
DOI:10.1145/3656019
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2024

Check for updates

Author Tags

  1. DPX
  2. GPU
  3. Genome Alignment
  4. Genomics
  5. KSW2
  6. SIMD
  7. minimap2

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PACT '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 172
    Total Downloads
  • Downloads (Last 12 months)172
  • Downloads (Last 6 weeks)132
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media