default search action
SC 2021: St. Louis, Missouri, USA
- Bronis R. de Supinski, Mary W. Hall, Todd Gamblin:
International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021. ACM 2021, ISBN 978-1-4503-8442-1
ACM Gordon Bell finalists
- David E. Shaw, Peter J. Adams, Asaph Azaria, Joseph A. Bank, Brannon Batson, Alistair Bell, Michael Bergdorf, Jhanvi Bhatt, J. Adam Butts, Timothy Correia, Robert M. Dirks, Ron O. Dror, Michael P. Eastwood, Bruce Edwards, Amos Even, Peter Feldmann, Michael Fenn, Christopher H. Fenton, Anthony Forte, Joseph Gagliardo, Gennette Gill, Maria Gorlatova, Brian Greskamp, J. P. Grossman, Justin Gullingsrud, Anissa Harper, William Hasenplaugh, Mark Heily, Benjamin Colin Heshmat, Jeremy Hunt, Douglas J. Ierardi, Lev Iserovich, Bryan L. Jackson, Nick P. Johnson, Mollie M. Kirk, John L. Klepeis, Jeffrey S. Kuskin, Kenneth M. Mackenzie, Roy J. Mader, Richard McGowen, Adam McLaughlin, Mark A. Moraes, Mohamed H. Nasr, Lawrence J. Nociolo, Lief O'Donnell, Andrew Parker, Jon L. Peticolas, Goran Pocina, Cristian Predescu, Terry Quan, John K. Salmon, Carl Schwink, Keun Sup Shim, Naseer Siddique, Jochen Spengler, Tamas Szalay, Raymond Tabladillo, Reinhard Tartler, Andrew G. Taube, Michael Theobald, Brian Towles, William Vick, Stanley C. Wang, Michael Wazlowski, Madeleine J. Weingarten, John M. Williams, Kevin A. Yuh:
Anton 3: twenty microseconds of molecular dynamics simulation before lunch. 1 - Jianyuan Xiao, Junshi Chen, Jiangshan Zheng, Hong An, Shenghong Huang, Chao Yang, Fang Li, Ziyu Zhang, Yeqi Huang, Wenting Han, Xin Liu, Dexun Chen, Zixi Liu, Ge Zhuang, Jiale Chen, Guoqiang Li, Xuan Sun, Qiang Chen:
Symplectic structure-preserving particle-in-cell whole-volume simulation of tokamak plasmas to 111.3 trillion particles and 25.7 billion grids. 2 - Yong (Alexander) Liu, Xin (Lucy) Liu, Fang (Nancy) Li, Haohuan Fu, Yuling Yang, Jiawei Song, Pengpeng Zhao, Zhen Wang, Dajia Peng, Huarong Chen, Chu Guo, Heliang Huang, Wenzhao Wu, Dexun Chen:
Closing the "quantum supremacy" gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. 3 - Kien Nguyen-Cong, Jonathan T. Willman, Stan G. Moore, Anatoly B. Belonoshko, Rahulkumar Gayatri, Evan Weinberg, Mitchell A. Wood, Aidan P. Thompson, Ivan I. Oleynik:
Billion atom molecular dynamics simulations of carbon at extreme conditions and experimental time and length scales. 4 - Kohji Yoshikawa, Satoshi Tanaka, Naoki Yoshida:
A 400 trillion-grid Vlasov simulation on Fugaku supercomputer: large-scale distribution of cosmic relic neutrinos in a six-dimensional phase space. 5 - Honghui Shang, Fang Li, Yunquan Zhang, Libo Zhang, You Fu, Yingxiang Gao, Yangjun Wu, Xiaohui Duan, Rongfen Lin, Xin Liu, Ying Liu, Dexun Chen:
Extreme-scale ab initio quantum raman spectra simulations on the leadership HPC system in China. 6
Computational biology
- Muaaz Gul Awan, Steven A. Hofmeyr, Rob Egan, Nan Ding, Aydin Buluç, Jack Deslippe, Leonid Oliker, Katherine A. Yelick:
Accelerating large scale de novo metagenome assembly using GPUs. 7 - Sree Charan Gundabolu, T. N. Vijaykumar, Mithuna Thottethodi:
FastZ: accelerating gapped whole genome alignment on GPUs. 8 - Peng Chen, Mohamed Wahib, Xiao Wang, Takahiro Hirofuchi, Hirotaka Ogawa, Ander Biguri, Richard P. Boardman, Thomas Blumensath, Satoshi Matsuoka:
Scalable FBP decomposition for cone-beam CT reconstruction. 9
Best practice experiences from pre-exascale systems
- Harsh Bhatia, Francesco Di Natale, Joseph Y. Moon, Xiaohua Zhang, Joseph R. Chavez, Fikret Aydin, Christopher B. Stanley, Tomas Oppelstrup, Chris Neale, Sara Kokkila Schumacher, Dong H. Ahn, Stephen Herbein, Timothy S. Carpenter, Sandrasegaram Gnanakaran, Peer-Timo Bremer, James N. Glosli, Felice C. Lightstone, Helgi I. Ingólfsson:
Generalizable coordination of large multiscale workflows: challenges and learnings at scale. 10 - Balazs Gerofi, Kohei Tarumizu, Lei Zhang, Takayuki Okamoto, Masamichi Takagi, Shinji Sumimoto, Yutaka Ishikawa:
Linux vs. lightweight multi-kernels for high performance computing: experiences at pre-exascale. 11 - Woong Shin, Vladyslav Oles, Ahmad Maroof Karimi, J. Austin Ellis, Feiyi Wang:
Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputer. 12
Efficient deep learning tools
- J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian T. Foster, Zhao Zhang:
KAISA: an adaptive second-order optimizer framework for deep neural networks. 13 - Evangelos Georganas, Dhiraj D. Kalamkar, Sasikanth Avancha, Menachem Adelman, Cristina Anderson, Alexander Breuer, Jeremy Bruestle, Narendra Chaudhary, Abhisek Kundu, Denise Kutnick, Frank Laub, Md. Vasimuddin, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Barukh Ziv, Alexander Heinecke:
Tensor processing primitives: a programming abstraction for efficiency and portability in deep learning workloads. 14 - Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, Minyi Guo:
Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction. 15
Trends in scalable computing
- Thomas Häner, Damian S. Steiger, Torsten Hoefler, Matthias Troyer:
Distributed quantum computing with QMPI. 16 - Abdullah Al-Mamun, Feng Yan, Dongfang Zhao:
BAASH: lightweight, efficient, and reliable blockchain-as-a-service for HPC systems. 17 - Eitan Frachtenberg, Rhody D. Kaner:
Representation of women in HPC conferences. 18
Computational fluid dynamics
- Paul Mullowney, Ruipeng Li, Stephen J. Thomas, Shreyas Ananthan, Ashesh Sharma, Jon S. Rood, Alan B. Williams, Michael A. Sprague:
Preparing an incompressible-flow fluid dynamics code for exascale-class wind energy simulations. 19 - Kumar Saurabh, Masado Ishii, Milinda Fernando, Boshun Gao, Kendrick Tan, Ming-Chen Hsu, Adarsh Krishnamurthy, Hari Sundar, Baskar Ganapathysubramanian:
Scalable adaptive PDE solvers in arbitrary domains. 20 - Martin Kronbichler, Niklas Fehn, Peter Munch, Maximilian Bergbauer, Karl-Robert Wichmann, Carolin Geitner, Momme Allalen, Martin Schulz, Wolfgang A. Wall:
A next-generation discontinuous galerkin fluid dynamics solver with application to high-resolution lung airflow simulations. 21
Cloud and edge computing
- Laiping Zhao, Yanan Yang, Yiming Li, Xian Zhou, Keqiu Li:
Understanding, predicting and scheduling serverless workloads under partial interference. 22 - Ahmed Ali-Eldin, Bin Wang, Prashant J. Shenoy:
The hidden cost of the edge: a performance comparison of edge and cloud latencies. 23 - Baolin Li, Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, Karen Gettings, Devesh Tiwari:
RIBBON: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances. 24
Large scale neural network training: Part I
- Shiyang Chen, Shaoyi Huang, Santosh Pandey, Bingbing Li, Guang R. Gao, Long Zheng, Caiwen Ding, Hang Liu:
E.T.: re-thinking self-attention for transformer models on GPUs. 25 - Ankit Srivastava, Sriram P. Chockalingam, Maneesha Aluru, Srinivas Aluru:
Parallel construction of module networks. 26 - Shigang Li, Torsten Hoefler:
Chimera: efficiently training large-scale neural networks with bidirectional pipelines. 27
Application performance optimization
- Tong Shu, Yanfei Guo, Justin M. Wozniak, Xiaoning Ding, Ian T. Foster, Tahsin M. Kurç:
Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications. 28 - Hatem Ltaief, Jesse Cranney, Damien Gratadour, Yuxi Hong, Laurent Gatineau, David E. Keyes:
Meeting the real-time challenges of ground-based telescopes using low-rank matrix computations. 29 - Romain Égelé, Prasanna Balaprakash, Isabelle Guyon, Venkatram Vishwanath, Fangfang Xia, Rick Stevens, Zhengying Liu:
AgEBO-tabular: joint neural architecture and hyperparameter search with autotuned data-parallel training for tabular data. 30
State of the practice
- Christopher S. Daley, Annemarie Southwell, Rahulkumar Gayatri, Scott Biersdorfff, Craig Toepfer, Güray Özen, Nicholas J. Wright:
Non-recurring engineering (NRE) best practices: a case study with the NERSC/NVIDIA OpenMP contract. 31 - Reid Priedhorsky, Shane Richard Canon, Timothy Randles, Andrew J. Younge:
Minimizing privilege for building HPC containers. 32 - Emily Costa, Tirthak Patel, Benjamin Schwaller, Jim M. Brandt, Devesh Tiwari:
Systematically inferring I/O performance variability by examining repetitive job behavior. 33
Networks
- Mayank Parasar, Natalie D. Enright Jerger, Paul V. Gratz, Joshua San Miguel, Tushar Krishna:
SEEC: stochastic escape express channel. 34 - Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler:
Flare: flexible in-network allreduce. 35 - Tianxi Li, Haiyang Shi, Xiaoyi Lu:
HatRPC: hint-accelerated thrift RPC over RDMA. 36
Hardware efficient deep learning
- Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding:
APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores. 37 - Zhuowen Zou, Yeseong Kim, Farhad Imani, Haleh Alimohamadi, Rosario Cammarota, Mohsen Imani:
Scalable edge-based hyperdimensional learning system with brain-like neural adaptation. 38 - Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu:
Dr. Top-k: delegate-centric Top-k on GPUs. 39
Materials science
- Giuseppe M. J. Barca, Jorge L. Galvez Vallejo, David L. Poole, Melisa Alkan, Ryan Stocks, Alistair P. Rendell, Mark S. Gordon:
Enabling large-scale correlated electronic structure calculations: scaling the RI-MP2 method on summit. 40 - Honghui Shang, Fang Li, Yunquan Zhang, Ying Liu, Libo Zhang, Mingchuan Wu, Yangjun Wu, Di Wei, Huimin Cui, Xin Liu, Fei Wang, Yuxi Ye, Yingxiang Gao, Shuang Ni, Xin Chen, Dexun Chen:
Accelerating all-electron ab initio simulation of raman spectra for biological systems. 41 - Ping Gao, Xiaohui Duan, Jiaxu Guo, Jin Wang, Zhenya Song, Lizhen Cui, Xiangxu Meng, Xin Liu, Wusheng Zhang, Ming Ma, Guohui Li, Dexun Chen, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang:
LMFF: efficient and scalable layered materials force field on heterogeneous many-core processors. 42
Accelerator architectures
- Gentaro Morimoto, Yohei M. Koyama, Hao Zhang, Teruhisa S. Komatsu, Yousuke Ohno, Keigo Nishida, Itta Ohmura, Hiroshi Koyama, Makoto Taiji:
Hardware acceleration of tensor-structured multilevel ewald summation method on MDGRAPE-4A, a special-purpose computer system for molecular dynamics simulations. 43 - Benjamin Y. Cho, Jeageun Jung, Mattan Erez:
Accelerating bandwidth-bound deep learning inference with main-memory accelerators. 44 - Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu:
LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing. 45
File system
- Nafiseh Moti, Frederic Schimmelpfennig, Reza Salkhordeh, David Klopp, Toni Cortes, Ulrich Rückert, André Brinkmann:
Simurgh: a fully decentralized and secure NVMM user space file system. 46 - Yiduo Wang, Cheng Li, Xinyang Shao, Youxu Chen, Feng Yan, Yinlong Xu:
Lunule: an agile and judicious metadata load balancer for CephFS. 47 - Qing Zheng, Charles D. Cranor, Gregory R. Ganger, Garth A. Gibson, George Amvrosiadis, Bradley W. Settlemyer, Gary A. Grider:
DeltaFS: a scalable no-ground-truth filesystem for massively-parallel computing. 48
Distributed training and graphs
- Aditya Balu, Sergio Botelho, Biswajit Khara, Vinay Rao, Soumik Sarkar, Chinmay Hegde, Adarsh Krishnamurthy, Santi Adavani, Baskar Ganapathysubramanian:
Distributed multigrid neural solvers on megavoxel domains. 49 - Kasimir Gabert, Kaan Sancak, M. Yusuf Özkaya, Ali Pinar, Ümit V. Çatalyürek:
EIGA: elastic and scalable dynamic graph analysis. 50 - Hongzheng Chen, Minghua Shen, Nong Xiao, Yutong Lu:
Krill: a compiler and runtime system for concurrent graph processing. 51
Tools and modeling
- Chen Wang, Pavan Balaji, Marc Snir:
Pilgrim: scalable and (near) lossless MPI tracing. 52 - Yehia Arafa, Abdel-Hameed A. Badawy, Ammar ElWazir, Atanu Barai, Ali Eker, Gopinath Chennupati, Nandakishore Santhi, Stephan J. Eidenbenz:
Hybrid, scalable, trace-driven performance modeling of GPGPUs. 53 - Hengshan Yue, Xiaohui Wei, Guangli Li, Jianpeng Zhao, Nan Jiang, Jingweijia Tan:
G-SEPM: building an accurate and efficient soft error prediction model for GPGPUs. 54
Performance studies
- Sayan Ghosh, Nathan R. Tallent, Marco Minutoli, Mahantesh Halappanavar, Ramesh Peri, Ananth Kalyanaraman:
Single-node partitioned-memory for huge graph analytics: cost and performance trade-offs. 55 - Kuan-Chieh Hsu, Hung-Wei Tseng:
Accelerating applications using edge tensor processing units. 56 - Qianchao Zhu, Hao Luo, Chao Yang, Mingshuo Ding, Wanwang Yin, Xinhui Yuan:
Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. 57
Large scale neural network training: Part II
- Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia:
Efficient large-scale language model training on GPU clusters using megatron-LM. 58 - Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He:
ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. 59 - Zheng Chai, Yujing Chen, Ali Anwar, Liang Zhao, Yue Cheng, Huzefa Rangwala:
FedAT: a high-performance and communication-efficient federated learning system with asynchronous tiers. 60
High-performance numerical methods
- William S. Moses, Valentin Churavy, Ludger Paehler, Jan Hückelheim, Sri Hari Krishna Narayanan, Michel Schanen, Johannes Doerfert:
Reverse-mode automatic differentiation and optimization of GPU kernels via enzyme. 61 - Tianchen Zhao, Saibal De, Brian Chen, James Stokes, Shravan K. Veerapaneni:
Overcoming barriers to scalability in variational quantum Monte Carlo. 62 - Lukas Krenz, Carsten Uphoff, Thomas Ulrich, Alice-Agnes Gabriel, Lauren S. Abrahams, Eric M. Dunham, Michael Bader:
3D acoustic-elastic coupling with gravity: the dynamics of the 2018 palu, sulawesi earthquake and tsunami. 63
Systems software (1)
- Tyler N. Allen, Rong Ge:
In-depth analyses of unified virtual memory system for GPU accelerated computing. 64 - Jiacheng Ma, Wenyi Wang, Aaron Nelson, Michael Cuevas, Brian Homerding, Conghao Liu, Zhen Huang, Simone Campanoni, Kyle C. Hale, Peter A. Dinda:
Paths to OpenMP in the kernel. 65 - Rupanshu Soi, Michael Bauer, Sean Treichler, Manolis Papadakis, Wonchan Lee, Patrick S. McCormick, Alex Aiken, Elliott Slaughter:
Index launches: scalable, flexible representation of parallel task groups. 66
High performance graph algorithms
- Trevor Steil, Tahsin Reza, Keita Iwabuchi, Benjamin W. Priest, Geoffrey Sanders, Roger Pearce:
TriPoll: computing surveys of triangles in massive-scale temporal graphs with metadata. 67 - Ghadeer Alabandi, Jelena Tesic, Lucas Rusnak, Martin Burtscher:
Discovering and balancing fundamental cycles in large signed graphs. 68 - Lizhi Xiang, Arif Khan, Edoardo Serra, Mahantesh Halappanavar, Aravind Sukumaran-Rajam:
cuTS: scaling subgraph isomorphism on distributed multi-GPU systems using trie based data structure. 69
Linear and multilinear algebra and applications
- Grzegorz Kwasniewski, Marko Kabic, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, Torsten Hoefler:
On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations. 70 - Shengle Lin, Wangdong Yang, Haotian Wang, Qinyun Tsai, Kenli Li:
STM-multifrontal QR: streaming task mapping multifrontal QR factorization empowered by GCN. 71 - Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang:
LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores. 72
HPC and applications
- Honghui Shang, Xin Chen, Xingyu Gao, Rongfen Lin, Lifang Wang, Fang Li, Qian Xiao, Lei Xu, Qiang Sun, Leilei Zhu, Fei Wang, Yunquan Zhang, Haifeng Song:
TensorKMC: kinetic Monte Carlo simulation of 50 trillion atoms driven by deep learning on a new generation of Sunway supercomputer. 73 - Garrett A. Stevenson, Derek Jones, Hyojin Kim, W. F. Drew Bennett, Brian J. Bennion, Monica Borucki, Feliza Bourguet, Aidan Epstein, Magdalena Franco, Brooke Harmon, Stewart He, Max P. Katz, Daniel A. Kirshner, Victoria Lao, Edmond Y. Lau, Jacky Lo, Kevin McLoughlin, Richard Mosesso, Deepa K. Murugesh, Oscar A. Negrete, Edwin A. Saada, Brent Segelke, Maxwell Stefan, Marisa W. Torres, Dina Weilhammer, Sergio Ernesto Wong, Yue Yang, Adam T. Zemla, Xiaohua Zhang, Fangqiang Zhu, Felice C. Lightstone, Jonathan E. Allen:
High-throughput virtual screening of small molecule inhibitors for SARS-CoV-2 protein targets with deep fusion models. 74 - Linus Seelinger, Anne Reinarz, Leonhard Rannabauer, Michael Bader, Peter Bastian, Robert Scheichl:
High performance uncertainty quantification with parallelized multilevel Markov chain Monte Carlo. 75
Sparse neural networks
- Md. Vasimuddin, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj D. Kalamkar, Nesreen K. Ahmed, Sasikanth Avancha:
DistGNN: scalable distributed training for large-scale graph neural networks. 76 - Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, Yogish Sabharwal, Toyotaro Suzumura, Shashanka Ubaru:
Efficient scaling of dynamic graph neural networks. 77 - Zhaodong Chen, Zheng Qu, Liu Liu, Yufei Ding, Yuan Xie:
Efficient tensor core-based GPU kernels for structured sparsity under reduced precision. 78
Systems software (2)
- Jack Kosaian, K. V. Rashmi:
Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs. 79 - Md Hasanur Rahman, Aabid Shamji, Shengjian Guo, Guanpeng Li:
PEPPA-X: finding program test inputs to bound silent data corruption vulnerability in HPC applications. 80 - Sunil Kumar, Akshat Gupta, Vivek Kumar, Sridutt Bhalachandra:
Cuttlefish: library for achieving energy efficiency in multicore parallel programs. 81
Numerical discretization
- Liang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu, Yue Yue:
Temporal vectorization for stencils. 82 - Ioannis Sakiotis, Kamesh Arumugam, Marc F. Paterno, Desh Ranjan, Balsa Terzic, Mohammad Zubair:
PAGANI: a parallel adaptive GPU algorithm for numerical integration. 83 - Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue:
Reducing redundancy in data organization and arithmetic calculation for stencil computations. 84
Performance analysis and optimization
- H. T. Kung, Vikas Natesh, Andrew Sabot:
CAKE: matrix multiplication using constant-bandwidth blocks. 85 - Konstantinos Parasyris, Giorgis Georgakoudis, Harshitha Menon, James Diffenderfer, Ignacio Laguna, Daniel Osei-Kuffuor, Markus Schordan:
HPAC: evaluating approximate computing techniques on HPC OpenMP applications. 86 - Yuya Uezato:
Accelerating XOR-based erasure coding using program optimization techniques. 87
Data analytics and storage systems
- Xin Liang, Qian Gong, Jieyang Chen, Ben Whitney, Lipeng Wan, Qing Liu, David Pugmire, Rick Archibald, Norbert Podhorszki, Scott Klasky:
Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition. 88 - Liangfeng Cheng, Yuchong Hu, Zhaokang Ke, Jia Xu, Qiaori Yao, Dan Feng, Weichun Wang, Wei Chen:
LogECMem: coupling erasure-coded in-memory key-value stores with parity logging. 89
Scalable I/O and persistent memory
- Md. Arifuzzaman, Engin Arslan:
Online optimization of file transfers in high-speed networks. 90 - Zhuohui Duan, Haodi Lu, Haikun Liu, Xiaofei Liao, Hai Jin, Yu Zhang, Song Wu:
Hardware-supported remote persistence for distributed persistent memory. 91 - Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler:
Clairvoyant prefetching for distributed machine learning I/O. 92
Data compression and workflows
- Fabian Knorr, Peter Thoman, Thomas Fahringer:
ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs. 93 - Sihuan Li, Sheng Di, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello:
Resilient error-bounded lossy compressor for data transfer. 94 - Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler:
Productivity, portability, performance: data-centric Python. 95
Quantum computing and simulation
- Ellis Wilson, Frank Mueller, Lindsay Bassman, Costin Iancu:
Empirical evaluation of circuit approximations on noisy quantum devices. 96 - Ang Li, Bo Fang, Christopher E. Granade, Guen Prawiroatmodjo, Bettina Heim, Martin Roetteler, Sriram Krishnamoorthy:
SV-sim: scalable PGAS-based state vector simulation of quantum circuits. 97 - Fang Li, Xin Liu, Yong Liu, Pengpeng Zhao, Yuling Yang, Honghui Shang, Weizhe Sun, Zhen Wang, Enming Dong, Dexun Chen:
SW_Qsim: a minimize-memory quantum simulator with high-performance on a new Sunway supercomputer. 98
GPUs and stream processing
- Kiran Ranganath, Joshua D. Suetterlein, Joseph B. Manzano, Shuaiwen Leon Song, Daniel Wong:
MAPA: multi-accelerator pattern allocation policy for multi-tenant GPU servers. 99 - Zhengda Bian, Shenggui Li, Wei Wang, Yang You:
Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters. 100 - Jie Tan, Hanhua Chen, Yonghui Wang, Hai Jin:
Whale: efficient one-to-many data partitioning in RDMA-assisted distributed stream processing systems. 101
Storage and application characteristics
- Wei Zhang, Suren Byna, Hyogi Sim, Sangkeun Lee, Sudharshan Vazhkudai, Yong Chen:
Exploiting user activeness for data retention in HPC systems. 102 - Jinghan Sun, Jian Huang, Marc Snir:
Pinpointing crash-consistency bugs in the HPC I/O stack: a cross-layer approach. 103 - Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang:
Characterization and prediction of deep learning workloads in large-scale GPU datacenters. 104
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.