default search action
PPoPP 2021: Virtual Event, Republic of Korea
- Jaejin Lee, Erez Petrank:
PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, Republic of Korea, February 27- March 3, 2021. ACM 2021, ISBN 978-1-4503-8294-6 - Pedro Ramalhete, Andreia Correia, Pascal Felber:
Efficient algorithms for persistent transactional memory. 1-15 - Jingna Zeng, Shady Issa, Paolo Romano, Luís E. T. Rodrigues, Seif Haridi:
Investigating the semantics of futures in transactional memory systems. 16-30 - Yuanhao Wei, Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, Yihan Sun:
Constant-time snapshots with applications to concurrent data structures. 31-46 - Yanjun Wang, Jinwei Liu, Dalin Zhang, Xiaokang Qiu:
Reasoning about recursive tree traversals. 47-61 - Zixian Cai, Zhengyang Liu, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi:
Synthesizing optimal collective algorithms. 62-75 - Xiaozhu Meng, Jonathon M. Anderson, John M. Mellor-Crummey, Mark W. Krentel, Barton P. Miller, Srdan Milakovic:
Parallel binary code analysis. 76-89 - Mahmut Taylan Kandemir, Jihyun Ryoo, Xulong Tang, Mustafa Karaköy:
Compiler support for near data computing. 90-104 - Michael Bauer, Wonchan Lee, Elliott Slaughter, Zhihao Jia, Mario Di Renzo, Manolis Papadakis, Galen M. Shipman, Patrick S. McCormick, Michael Garland, Alex Aiken:
Scaling implicit parallelism via dynamic control replication. 105-118 - Kezhao Huang, Jidong Zhai, Zhen Zheng, Youngmin Yi, Xipeng Shen:
Understanding and bridging the gaps in current GNN performance optimizations. 119-132 - Kai Wang, Don Fussell, Calvin Lin:
A fast work-efficient SSSP algorithm for GPUs. 133-146 - Zhifang Li, Mingcong Han, Shangwei Wu, Chuliang Weng:
ShadowVM: accelerating data plane for data analytics with bare metal CPUs and GPUs. 147-160 - Sepideh Maleki, Udit Agarwal, Martin Burtscher, Keshav Pingali:
BiPart: a parallel and deterministic hypergraph partitioner. 161-174 - Ajay Singh, Trevor Brown, Ali José Mashtizadeh:
NBR: neutralization based reclamation. 175-190 - Daniel Solomon, Adam Morrison:
Efficiently reclaiming memory in concurrent search data structures while bounding wasted memory. 191-204 - Andreia Correia, Pedro Ramalhete, Pascal Felber:
OrcGC: automatic lock-free memory reclamation. 205-218 - Martin Winter, Mathias Parger, Daniel Mlakar, Markus Steinberger:
Are dynamic memory managers on GPUs slow?: a survey and benchmarks. 219-233 - Yang Liu, Wissam M. Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James Weldon Demmel, Xiaoye S. Li:
GPTune: multitask learning for autotuning exascale applications. 234-246 - Xiaoyang Zhang, Junmin Xiao, Guangming Tan:
I/O lower bounds for auto-tuning of convolutions in CNNs. 247-261 - Hashim Sharif, Yifan Zhao, Maria Kotsifakou, Akash Kothari, Ben Schreiber, Elizabeth Wang, Yasmin Sarita, Nathan Zhao, Keyur Joshi, Vikram S. Adve, Sasa Misailovic, Sarita V. Adve:
ApproxTuner: a compiler and runtime system for adaptive approximations. 262-277 - Boyuan Feng, Yuke Wang, Guoyang Chen, Weifeng Zhang, Yuan Xie, Yufei Ding:
EGEMM-TC: accelerating scientific computing on tensor cores with extended precision. 278-291 - Constantino Gómez, Filippo Mantovani, Erich Focht, Marc Casas:
Efficiently running SpMV on long vector architectures. 292-303 - Tuowen Zhao, Mary W. Hall, Hans Johansen, Samuel Williams:
Improving communication by optimizing on-node data movement with data layout. 304-317 - Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, Jiajia Li:
Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory. 318-333 - David Álvarez, Kevin Sala, Marcos Maroñas, Aleix Roca, Vicenç Beltran:
Advanced synchronization techniques for task-based runtime systems. 334-347 - Caleb Voss, Vivek Sarkar:
An ownership policy and deadlock detector for promises. 348-361 - Zhimin Li, Harshitha Menon, Kathryn M. Mohror, Peer-Timo Bremer, Yarden Livnat, Valerio Pascucci:
Understanding a program's resiliency through error propagation. 362-373 - Shumpei Shiina, Shintaro Iwasaki, Kenjiro Taura, Pavan Balaji:
Lightweight preemptive user-level threads. 374-388 - Jiarui Fang, Yang Yu, Chengduo Zhao, Jie Zhou:
TurboTransformers: an efficient GPU serving system for transformer models. 389-402 - Marcin Copik, Alexandru Calotoiu, Tobias Grosser, Nicolas Wicki, Felix Wolf, Torsten Hoefler:
Extracting clean performance models from tainted programs. 403-417 - Roberto Castañeda Lozano, Murray Cole, Björn Franke:
Modernizing parallel code with pattern analysis. 418-430 - Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao, Siyu Wang, Zhen Zheng, Chuan Wu, Guoping Long, Jun Yang, Lixue Xia, Lansong Diao, Xiaoyong Liu, Wei Lin:
DAPPLE: a pipelined data parallel approach for training large models. 431-445 - Shreyas Gokhale, Sahil Dhoked, Neeraj Mittal:
On group mutual exclusion for dynamic systems. 446-447 - Jacob Nelson, Ahmed Hassan, Roberto Palmieri:
Bundled references: an abstraction for highly-concurrent linearizable range queries. 448-450 - Sadegh Dalvandi, Brijesh Dongol:
Verifying C11-style weak memory libraries. 451-453 - Giorgos Kappes, Stergios V. Anastasiadis:
A lock-free relaxed concurrent queue for fast work distribution. 454-456 - Jesper Larsson Träff, Manuel Pöter:
A more pragmatic implementation of the lock-free, ordered, linked list. 457-459 - Yifeng Chen, Bei Wang, Xiaolin Wang:
Extending MapReduce framework with locality keys. 460-462 - Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, Torsten Hoefler:
On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization. 463-464 - Marquita Ellis, Aydin Buluç, Katherine A. Yelick:
Asynchrony versus bulk-synchrony for a generalized N-body problem from genomics. 465-466 - Tong Shu, Yanfei Guo, Justin M. Wozniak, Xiaoning Ding, Ian T. Foster, Tahsin M. Kurç:
In-situ workflow auto-tuning through combining component models. 467-468 - Da Yan, Wei Wang, Xiaowen Chu:
Simplifying low-level GPU programming with GAS. 469-471 - YuAng Chen, Yeh-Ching Chung:
Corder: cache-aware reordering for optimizing graph analytics. 472-473 - Jiping Yu, Wei Qin, Xiaowei Zhu, Zhenbo Sun, Jianqiang Huang, Xiaohan Li, Wenguang Chen:
DFOGraph: an I/O- and communication-efficient system for distributed fully-out-of-core graph processing. 474-476 - Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao, Yanjun Wu, Shuaiwen Leon Song:
An efficient uncertain graph processing framework for heterogeneous architectures. 477-479 - Ruobing Han, Min Si, James Demmel, Yang You:
Dynamic scaling for low-precision learning. 480-482 - Ruofan Wu, Feng Zhang, Zhen Zheng, Xiaoyong Du, Xipeng Shen:
Exploring deep reuse in winograd CNN inference. 483-484 - Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao:
A novel memory-efficient deep learning training framework via error-bounded lossy compression. 485-487 - Sultan Durrani, Muhammad Saad Chughtai, Abdul Dakkak, Wen-Mei Hwu, Lawrence Rauchwerger:
FFT blitz: the tensor cores strike back. 488-489
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.