[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3295500.3357156acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations

Published: 17 November 2019 Publication History

Abstract

The computational efficiency of a state of the art ab initio quantum transport (QT) solver, capable of revealing the coupled electrothermal properties of atomically-resolved nano-transistors, has been improved by up to two orders of magnitude through a data centric reorganization of the application. The approach yields coarse- and fine-grained data-movement characteristics that can be used for performance and communication modeling, communication-avoidance, and dataflow transformations. The resulting code has been tuned for two top-6 hybrid supercomputers, reaching a sustained performance of 85.45 Pflop/s on 4,560 nodes of Summit (42.55% of the peak) in double precision, and 90.89 Pflop/s in mixed precision. These computational achievements enable the restructured QT simulator to treat realistic nanoelectronic devices made of more than 10,000 atoms within a 14x shorter duration than the original code needs to handle a system with 1,000 atoms, on the same number of CPUs/GPUs and with the same physical accuracy.

References

[1]
T. Ben-Nun, J. de Fine Licht, A. N. Ziogas, T. Schneider, and T. Hoefler. 2019. Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis.
[2]
M. Calderara, S. Brück, A. Pedersen, M. H. Bani-Hashemian, J. VandeVondele, and M. Luisier. 2015. Pushing Back the Limit of Ab-initio Quantum Transport Simulations on Hybrid Supercomputers. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, 3:1--3:12.
[3]
E. Carson, J. Demmel, L. Grigori, N. Knight, P. Koanantakool, O. Schwartz, and H. V. Simhadri. 2016. Write-Avoiding Algorithms. In 2016 IEEE Int'l Parallel and Distributed Processing Symposium (IPDPS). 648--658.
[4]
Swiss National Supercomputing Centre. 2019. Piz Daint. https://www.cscs.ch/computers/piz-daint/
[5]
S. Datta. 1995. Electronic Transport in Mesoscopic Systems. Cambridge Uni. Press.
[6]
J. Demmel. 2013. Communication-avoiding algorithms for linear algebra and beyond. In IEEE 27th Int'l Symposium on Parallel and Distributed Processing.
[7]
Oak Ridge Leadership Computing Facility. 2019. Summit. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/
[8]
J. Ferrer, C. J. Lambert, V. M. García-Suárez, D. Manrique, D. Visontai, L. Oroszlany, R. Rodríguez-Ferradás, I. Grace, S. W. D. Bailey, K. Gillemot, et al. 2014. GOLLUM: a next-generation simulation tool for electron, thermal and spin transport. New Journal of Physics 16, 9 (2014), 093029.
[9]
CEA Grenoble. 2013. TB_Sim. http://inac.cea.fr/Lsim/TBsim/
[10]
C. W. Groth, M. Wimmer, A. R. Akhmerov, and X. Waintal. 2014. Kwant: a software package for quantum transport. New Journal of Physics 16, 6 (2014).
[11]
The Nanoelectronic Modeling Group and Gerhard Klimeck. 2018. NEMO5. https://engineering.purdue.edu/gekcogrp/software-projects/nemo5/
[12]
W. Kohn and L. J. Sham. 1965. Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev. 140 (Nov 1965), A1133-A1138. Issue 4A.
[13]
M. Luisier. 2010. A Parallel Implementation of Electron-Phonon Scattering in Nanoelectronic Devices up to 95k Cores. In SC '10: Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis. 1--11.
[14]
M. Luisier, T. B. Boykin, G. Klimeck, and W. Fichtner. 2011. Atomistic Nanoelectronic Device Engineering with Sustained Performances Up to 1.44 PFlop/s. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, 2:1--2:11.
[15]
M. Luisier, A. Schenk, W. Fichtner, and G. Klimeck. 2006. Atomistic simulation of nanowires in the sp3 d5 s* tight-binding formalism: From boundary conditions to strain calculations. Phys. Rev. B 74 (2006), 12. Issue 20.
[16]
I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra. 2016. High-Performance Matrix-Matrix Multiplications of Very Small Matrices. In Proc. 22Nd Int'l Conference on Euro-Par 2016: Parallel Processing - Volume 9833. Springer-Verlag New York, Inc., 659--671.
[17]
NanoTCAD. 2017. ViDES. http://vides.nanotcad.com/vides/
[18]
P. McCormick. 2019. Yin & Yang: Hardware Heterogeneity & Software Productivity. Talk at SOS23 meeting, Asheville, NC.
[19]
R. Pawlik. 2016. Current CPUs produce 4 times more heat than hot plates. https://cloudandheat.com/blog/current-cpus-produce-4-times-more/
[20]
E. Pop, S. Sinha, and K. E. Goodson. 2006. Heat Generation and Transport in Nanometer-Scale Transistors. Proc. IEEE 94, 8 (Aug 2006), 1587--1601.
[21]
B. Prisacari, G. Rodriguez, C. Minkenberg, and T. Hoefler. 2013. Bandwidth-optimal all-to-all exchanges in fat tree networks. In Proc. 27th Int'l ACM conference on supercomputing. ACM, 139--148.
[22]
C. Stieger, A. Szabo, T. Bunjaku, and M. Luisier. 2017. Ab-initio quantum transport simulation of self-heating in single-layer 2-D materials. Journal of Applied Physics 122, 4 (2017), 045708.
[23]
A. Svizhenko, M. P. Anantram, T. R. Govindan, B. Biegel, and R. Venugopal. 2002. Two-dimensional quantum mechanical modeling of nanotransistors. Journal of Applied Physics 91, 4 (2002), 2343--2354.
[24]
Synopsys. 2019. QuantumATK. http://synopsys.com/silicon/quantumatk.html
[25]
TOP500.org. 2019. TOP500 Supercomputer Sites.
[26]
D. Unat et al. 2017. Trends in Data Locality Abstractions for HPC Systems. IEEE Transactions on Parallel and Distributed Systems 28, 10 (Oct 2017), 3007--3020.
[27]
J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello, T. Chassaing, and J. Hutter. 2005. Quickstep: Fast and accurate density functional calculations using a mixed Gaussian and plane waves approach. Comput. Phys. Comm. 167, 2 (2005), 103--128.
[28]
J. Wei. 2008. Challenges in Cooling Design of CPU Packages for High-Performance Servers. Heat Transfer Engineering 29, 2 (2008), 178--187.
[29]
S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM 52, 4 (2009).
[30]
A. N. Ziogas, T. Ben-Nun, G. Indalecio Fernandez, T. Schneider, M. Luisier, and T. Hoefler. 2019. Optimizing the Data Movement in Quantum Transport Simulations via Data-Centric Parallel Programming. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis.

Cited By

View all
  • (2024)CMOS Scaling for the 5 nm Node and Beyond: Device, Process and TechnologyNanomaterials10.3390/nano1410083714:10(837)Online publication date: 9-May-2024
  • (2023)MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT FrameworkACM Transactions on Architecture and Code Optimization10.1145/360514820:3(1-23)Online publication date: 22-Jul-2023
  • (2023)FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization BugsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3613214(1-15)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)120
  • Downloads (Last 6 weeks)11
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CMOS Scaling for the 5 nm Node and Beyond: Device, Process and TechnologyNanomaterials10.3390/nano1410083714:10(837)Online publication date: 9-May-2024
  • (2023)MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT FrameworkACM Transactions on Architecture and Code Optimization10.1145/360514820:3(1-23)Online publication date: 22-Jul-2023
  • (2023)FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization BugsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3613214(1-15)Online publication date: 12-Nov-2023
  • (2023)Bridging Control-Centric and Data-Centric OptimizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580018(173-185)Online publication date: 17-Feb-2023
  • (2023)A Case Study on DaCe Portability & Performance for Batched Discrete Fourier TransformsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3578178.3578239(55-63)Online publication date: 27-Feb-2023
  • (2023)Atomistic Simulation of Nanoscale DevicesIEEE Nanotechnology Magazine10.1109/MNANO.2023.327896817:4(4-14)Online publication date: Aug-2023
  • (2023)The Future of HPC in Nuclear SecurityIEEE Internet Computing10.1109/MIC.2022.322903727:1(16-23)Online publication date: 1-Jan-2023
  • (2023)Parametric Optimization on HPC Clusters with GenevaComputing and Software for Big Science10.1007/s41781-023-00098-67:1Online publication date: 21-Apr-2023
  • (2023)Unified Programming Models for Heterogeneous High-Performance ComputersJournal of Computer Science and Technology10.1007/s11390-023-2888-438:1(211-218)Online publication date: 31-Jan-2023
  • (2023)A Data-Centric Approach for Efficient and Scalable CFD Implementation on Multi-GPUs ClustersParallel and Distributed Computing, Applications and Technologies10.1007/978-981-99-8211-0_10(93-104)Online publication date: 29-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media