[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3458817.3476212acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Accelerating large scale de novo metagenome assembly using GPUs

Published: 13 November 2021 Publication History

Abstract

Metagenomic workflows involve studying uncultured microorganisms directly from the environment. These environmental samples when processed by modern sequencing machines yield large and complex datasets that exceed the capabilities of metagenomic software. The increasing sizes and complexities of datasets make a strong case for exascale-capable metagenome assemblers. However, the underlying algorithmic motifs are not well suited for GPUs. This poses a challenge since the majority of next-generation supercomputers will rely primarily on GPUs for computation. In this paper we present the first of its kind GPU-accelerated implementation of the local assembly approach that is an integral part of a widely used large-scale metagenome assembler, MetaHipMer. Local assembly uses algorithms that induce random memory accesses and non-deterministic workloads, which make GPU offloading a challenging task. Our GPU implementation outperforms the CPU version by about 7x and boosts the performance of MetaHipMer by 42% when running on 64 Summit nodes.

Supplementary Material

MP4 File (Computational Biology - Accelerating Large Scale de Novo Metagenome Assembly Using GPUs.mp4.mp4)
Presentation video

References

[1]
Nauman Ahmed, Tong Dong Qiu, Koen Bertels, and Zaid Al-Ars. 2020. GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads. BMC bioinformatics 21, 13 (2020), 1--17.
[2]
Appleby Austin. [n.d.]. Murmurhash2. https://sites.google.com/site/murmurhash
[3]
Muaaz G Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, and Katherine Yelick. 2020. ADEPT: a domain independent sequence alignment strategy for gpu architectures. BMC bioinformatics 21, 1 (2020), 1--29.
[4]
Nan Ding and Samuel Williams. 2019. An instruction roofline model for gpus. In 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). IEEE, 7--18.
[5]
RA Leo Elworth, Qi Wang, Pavan K Kota, CJ Barberan, Benjamin Coleman, Advait Balaji, Gaurav Gupta, Richard G Baraniuk, Anshumali Shrivastava, and Todd J Treangen. 2020. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic acids research 48, 10 (2020), 5217--5234.
[6]
Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Steven Hofmeyr, Chaitanya Aluru, Rob Egan, Leonid Oliker, Daniel Rokhsar, and Katherine Yelick. 2015. Hipmer: an extreme-scale de novo genome assembler. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--11.
[7]
Evangelos Georganas, Rob Egan, Steven Hofmeyr, Eugene Goltsman, Bill Arndt, Andrew Tritt, Aydin Buluç, Leonid Oliker, and Katherine Yelick. 2018. Extreme scale de novo metagenome assembly. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 122--134.
[8]
Sayan Goswami, Kisung Lee, Shayan Shams, and Seung-Jong Park. 2018. Gpu-accelerated large-scale genome assembly. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 814--824.
[9]
Steven Hofmeyr, Rob Egan, Evangelos Georganas, Alex C Copeland, Robert Riley, Alicia Clum, Emiley Eloe-Fadrosh, Simon Roux, Eugene Goltsman, Aydin Buluç, et al. 2020. Terabase-scale metagenome coassembly with metahipmer. Scientific reports 10, 1 (2020), 1--11.
[10]
Curtis Huttenhower, Dirk Gevers, Rob Knight, Sahar Abubucker, Jonathan H Badger, Asif T Chinwalla, Heather H Creasy, Ashlee M Earl, Michael G FitzGerald, Robert S Fulton, et al. 2012. Structure, function and diversity of the healthy human microbiome. nature 486, 7402 (2012), 207.
[11]
Ashutosh Jain, Anshuj Garg, and Kolin Paul. 2013. GAGM: Genome assembly on GPU using mate pairs. In 20th Annual International Conference on High Performance Computing. IEEE, 176--185.
[12]
JGI. 2021. Marine microbial communities from Western Arctic Ocean. https://gold.jgi.doe.gov/biosamples?id=Gb0192059
[13]
Dinghua Li, Ruibang Luo, Chi-Man Liu, Chi-Ming Leung, Hing-Fung Ting, Kunihiko Sadakane, Hiroshi Yamashita, and Tak-Wah Lam. 2016. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102 (2016), 3--11.
[14]
Zhenyu Li, Yanxiang Chen, Desheng Mu, Jianying Yuan, Yujian Shi, Hao Zhang, Jun Gan, Nan Li, Xuesong Hu, Binghang Liu, et al. 2012. Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Briefings in functional genomics 11, 1 (2012), 25--37.
[15]
Losee L Ling, Tanja Schneider, Aaron J Peoples, Amy L Spoering, Ina Engels, Brian P Conlon, Anna Mueller, Till F Schäberle, Dallas E Hughes, Slava Epstein, etal. 2015. A new antibiotic kills pathogens without detectable resistance. Nature 517, 7535 (2015), 455--459.
[16]
Yongchao Liu and Bertil Schmidt. 2013. CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing. IEEE Design & Test 31, 1 (2013), 31--39.
[17]
Mian Lu, Qiong Luo, Bingqiang Wang, Junkai Wu, and Jiuxin Zhao. 2013. Gpu-accelerated bidirected de bruijn graph construction for genome assembly. In Asia-Pacific Web Conference. Springer, 51--62.
[18]
Chengwei Luo, Luis M Rodriguez-R, Eric R Johnston, Liyou Wu, Lei Cheng, Kai Xue, Qichao Tu, Ye Deng, Zhili He, Jason Zhou Shi, et al. 2014. Soil microbial community responses to a decade of warming as revealed by comparative metagenomics. Applied and environmental microbiology 80, 5 (2014), 1777--1786.
[19]
Syed Faraz Mahmood and Huzefa Rangwala. 2011. Gpu-euler: Sequence assembly using gpgpu. In 2011 IEEE International Conference on High Performance Computing and Communications. IEEE, 153--160.
[20]
NERSC. [n.d.]. Cori GPU node configurations. https://docs-dev.nersc.gov/cgpu/hardware/
[21]
Sergey Nurk, Dmitry Meleshko, Anton Korobeynikov, and Pavel A Pevzner. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research 27, 5 (2017), 824--834.
[22]
OLCF. [n.d.]. Summit node configurations. https://docs.olcf.ornl.gov/systems/summit_user_guide.html
[23]
Jason Pell, Arend Hintze, Rosangela Canino-Koning, Adina Howe, James M Tiedje, and C Titus Brown. 2012. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proceedings of the National Academy of Sciences 109, 33 (2012), 13272--13277.
[24]
Yu Peng, Henry CM Leung, Siu-Ming Yiu, and Francis YL Chin. 2012. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 11 (2012), 1420--1428.
[25]
Raffaella Rizzi, Stefano Beretta, Murray Patterson, Yuri Pirola, Marco Previtali, Gianluca Della Vedova, and Paola Bonizzoni. 2019. Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. Quantitative Biology 7, 4 (2019), 278--292.
[26]
Edans Flavius de O Sandes and Alba Cristina MA de Melo. 2011. Smith-waterman alignment of huge sequences with gpu in linear space. In 2011 IEEE International Parallel & Distributed Processing Symposium. IEEE, 1199--1211.
[27]
Thomas J Sharpton. 2014. An introduction to the analysis of shotgun metagenomic data. Frontiers in plant science 5 (2014), 209.
[28]
Aleksandra Swiercz, Wojciech Frohmberg, Michal Kierzynka, Pawel Wojciechowski, Piotr Zurkowski, Jan Badura, Artur Laskowski, Marta Kasprzak, and Jacek Blazewicz. 2018. GRASShopPER---An algorithm for de novo assembly based on GPU alignments. PloS one 13, 8 (2018), e0202355.
[29]
Silvana R Tridico, Dáithí C Murray, Jayne Addison, Kenneth P Kirkbride, and Michael Bunce. 2014. Metagenomic analyses of bacteria on human hairs: a qualitative assessment for applications in forensic science. Investigative genetics 5, 1 (2014), 1--13.
[30]
Thomas R Turner, Karunakaran Ramakrishnan, John Walshaw, Darren Heavens, Mark Alston, David Swarbreck, Anne Osbourn, Alastair Grant, and Philip S Poole. 2013. Comparative metatranscriptomics reveals kingdom level changes in the rhizosphere microbiome of plants. The ISME journal 7, 12 (2013), 2248--2258.
[31]
Katherine Yelick, Aydin Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, et al. 2020. The parallelism motifs of genomic data analysis. Philosophical Transactions of the Royal Society A 378, 2166 (2020), 20190394.

Cited By

View all
  • (2024)High-Performance Sorting-Based K-mer Counting in Distributed Memory with Flexible Hybrid ParallelismProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673072(919-928)Online publication date: 12-Aug-2024
  • (2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
  • (2023)A Configurable Hierarchical Architecture for Parallel Dynamic Contingency Analysis on GPUsIEEE Open Access Journal of Power and Energy10.1109/OAJPE.2022.322780010(187-194)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Check for updates

Badges

Author Tags

  1. CUDA
  2. GPU
  3. genomic
  4. graph algorithms
  5. metagenomic
  6. sequence assembly
  7. sparse data structures

Qualifiers

  • Research-article

Conference

SC '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)318
  • Downloads (Last 6 weeks)29
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)High-Performance Sorting-Based K-mer Counting in Distributed Memory with Flexible Hybrid ParallelismProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673072(919-928)Online publication date: 12-Aug-2024
  • (2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
  • (2023)A Configurable Hierarchical Architecture for Parallel Dynamic Contingency Analysis on GPUsIEEE Open Access Journal of Power and Energy10.1109/OAJPE.2022.322780010(187-194)Online publication date: 2023
  • (2022)Methodology for Evaluating the Potential of Disaggregated Memory Systems2022 IEEE/ACM International Workshop on Resource Disaggregation in High-Performance Computing (REDIS)10.1109/RESDIS56595.2022.00006(1-11)Online publication date: Nov-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media