default search action
ICS 2020: Barcelona, Spain
- Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, H. Peter Hofstee:
ICS '20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020. ACM 2020, ISBN 978-1-4503-7983-0
Algorithms I
- Max Carlson, Robert M. Kirby, Hari Sundar:
A scalable framework for solving fractional diffusion equations. 2:1-2:11 - Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran:
CFDNet: a deep learning-based accelerator for fluid simulations. 3:1-3:12 - Kanak Mahadik, Qingyun Wu, Shuai Li, Amit Sabne:
Fast distributed bandits for online recommendation systems. 4:1-4:13 - Robin Kumar Sharma, Marc Casas:
Wavefront parallelization of recurrent neural networks on multi-core architectures. 5:1-5:12 - Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. 6:1-6:12 - Shaoshuai Zhang, Ruchi Shah, Panruo Wu:
TensorSVM: accelerating kernel machines with tensor engine. 7:1-7:11
Algorithms II
- Brian Donnelly, Michael Gowanlock:
A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU. 8:1-8:12 - Azin Heidarshenas, Serif Yesil, Dimitrios Skarlatos, Sasa Misailovic, Adam Morrison, Josep Torrellas:
V-Combiner: speeding-up iterative graph processing on a shared-memory platform with vertex merging. 9:1-9:13 - Kshitij Shukla, Sai Charan Regunta, Sai Harsh Tondomker, Kishore Kothapalli:
Efficient parallel algorithms for betweenness- and closeness-centrality in dynamic graphs. 10:1-10:12 - Ruoming Jin, Zhen Peng, Wendell Wu, Feodor F. Dragan, Gagan Agrawal, Bin Ren:
Parallelizing pruned landmark labeling: dealing with dependencies in graph algorithms. 11:1-11:13 - Marco Minutoli, Maurizio Drocco, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman:
cuRipples: influence maximization on multi-GPU systems. 12:1-12:11 - Hans Vandierendonck:
Graptor: efficient pull and push style vectorized graph processing. 13:1-13:13 - Babak Falsafi:
Post-moore server architecture. 14:1
Architecture I
- Laith M. AlBarakat, Paul V. Gratz, Daniel A. Jiménez:
SB-Fetch: synchronization aware hardware prefetching for chip multiprocessors. 15:1-15:12 - Vladimir Dimic, Miquel Moretó, Marc Casas, Jan Ciesko, Mateo Valero:
RICH: implementing reductions in the cache hierarchy. 16:1-16:13 - Xianwei Cheng, Hui Zhao, Mahmut T. Kandemir, Beilei Jiang, Gayatri Mehta:
AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling. 17:1-17:13 - Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, Josep Torrellas:
Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors. 18:1-18:13 - Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Kuan-Yu Chen, Ronald G. Dreslinski, Trevor N. Mudge:
Sparse-TPU: adapting systolic arrays for sparse matrices. 19:1-19:12
Architecture II
- Fei Lei, Dezun Dong, Xiangke Liao, José Duato:
Bundlefly: a low-diameter topology for multicore fiber. 20:1-20:11 - Zaid Salamah A. Alzaid, Saptarshi Bhowmik, Xin Yuan, Michael Lang:
Global link arrangement for practical Dragonfly. 21:1-21:11 - Shivani Tripathy, Debiprasanna Sahoo, Manoranjan Satpathy, Madhu Mutyam:
Fuzzy fairness controller for NVMe SSDs. 22:1-22:12 - Imran Fareed, Mincheol Kang, Wonyoung Lee, Soontae Kim:
Leveraging intra-page update diversity for mitigating write amplification in SSDs. 23:1-23:12 - Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden Kwok-Hay So, Martin C. Herbordt, Ang Li, Yanzhi Wang:
CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks. 24:1-24:12 - Gongjin Sun, Seongyoung Kang, Sang-Woo Jun:
BurstZ: a bandwidth-efficient scientific computing accelerator platform for large-scale data. 25:1-25:12
Performance
- Keren Zhou, Mark W. Krentel, John M. Mellor-Crummey:
Tools for top-down performance analysis of GPU-accelerated applications. 26:1-26:12 - Benjamin Welton, Barton P. Miller:
Identifying and (automatically) remedying performance problems in CPU/GPU applications. 27:1-27:13 - Gleison Souza Diniz Mendonca, Chunhua Liao, Fernando Magno Quintão Pereira:
AutoParBench: a unified test framework for OpenMP-based parallelizers. 28:1-28:10 - Zhengchun Liu, Ryan Lewis, Rajkumar Kettimuthu, Kevin Harms, Philip H. Carns, Nageswara S. V. Rao, Ian T. Foster, Michael E. Papka:
Characterization and identification of HPC applications at leadership computing facility. 29:1-29:12 - Jaemin Choi, David F. Richards, Laxmikant V. Kalé, Abhinav Bhatele:
End-to-end performance modeling of distributed GPU applications. 30:1-30:12 - Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, Nandakishore Santhi, Stephan J. Eidenbenz:
Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles. 31:1-31:12 - Michael Wolfe:
Optimizing supercompilers for supercomputers. 32:1
Runtime
- Jesús Carretero, Emmanuel Jeannot, Guillaume Pallez, David E. Singh, Nicolas Vidal:
Mapping and scheduling HPC applications for optimizing I/O. 33:1-33:12 - Isaac Sánchez Barrera, David Black-Schaffer, Marc Casas, Miquel Moretó, Anastasiia Stupnikova, Mihail Popov:
Modeling and optimizing NUMA effects and prefetching with machine learning. 34:1-34:13 - Rohit Zambre, Aparna Chandramowlishwaran, Pavan Balaji:
How I learned to stop worrying about user-visible endpoints and love MPI. 35:1-35:13 - Masab Ahmad, Mohsin Shan, Akif Rehman, Omer Khan:
Accelerating relax-ordered task-parallel workloads using multi-level dependency checking. 36:1-36:11 - Yudong Wu, Mingyao Shen, Yi-Hui Chen, Yuanyuan Zhou:
Tuning applications for efficient GPU offloading to in-memory processing. 37:1-37:12 - Martin Winter, Daniel Mlakar, Mathias Parger, Markus Steinberger:
Ouroboros: virtualized queues for dynamic memory management on GPUs. 38:1-38:12
Compilers
- Ji Liu, Abdullah-Al Kafi, Xipeng Shen, Huiyang Zhou:
MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA. 39:1-39:12 - Indu K. Prabhu, V. Krishna Nandivada:
Chunking loops with non-uniform workloads. 40:1-40:12 - Tyler Coy, Shuibing He, Bin Ren, Xuechen Zhang:
Compiler aided checkpointing using crash-consistent data structures in NVMM systems. 41:1-41:13 - Jialiang Tan, Shuyin Jiao, Milind Chabbi, Xu Liu:
What every scientific programmer should know about compiler optimizations? 42:1-42:12 - Tao Wang, Nikhil Jain, David Böhme, David Beckingsale, Frank Mueller, Todd Gamblin:
CodeSeer: input-dependent code variants selection via machine learning. 43:1-43:11
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.