[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3620665.3640366acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation

Published: 27 April 2024 Publication History

Abstract

This paper introduces two extensions to the popular PyTorch machine learning framework, TorchDynamo and TorchInductor, which implement the torch.compile feature released in PyTorch 2. TorchDynamo is a Python-level just-in-time (JIT) compiler that enables graph compilation in PyTorch programs without sacrificing the flexibility of Python. It achieves this by dynamically modifying Python bytecode before execution and extracting sequences of PyTorch operations into an FX graph, which is then JIT compiled using one of many extensible backends. TorchInductor is the default compiler backend for TorchDynamo, which translates PyTorch programs into OpenAI's Triton for GPUs and C++ for CPUs. Results show that TorchDynamo is able to capture graphs more robustly than prior approaches while adding minimal overhead, and TorchInductor is able to provide a 2.27× inference and 1.41× training geometric mean speedup on an NVIDIA A100 GPU across 180+ real-world models, which outperforms six other compilers. These extensions provide a new way to apply optimizations through compilers in eager mode frameworks like PyTorch.

References

[1]
[SW] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, TensorFlow, Large-scale machine learning on heterogeneous systems Nov. 2015.
[2]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Savannah, GA, USA, 265--283. ISBN: 9781931971331.
[3]
Hameer Abbasi, Edward Z Yang, and Ralf Gommers. 2020. Improving subclassing Tensor by propagating subclass instances. https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md. (Aug. 2020).
[4]
Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, and Shanqing Cai. 2019. TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning. CoRR, abs/1903.01855. http://arxiv.org/abs/1903.01855 arXiv: 1903.01855.
[5]
Rami Al-Rfou et al. 2016. Theano: A Python framework for fast computation of mathematical expressions. CoRR, abs/1605.02688. http://arxiv.org/abs/1605.02688 arXiv: 1605.02688.
[6]
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. Opentuner: an extensible framework for program autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). Association for Computing Machinery, Edmonton, AB, Canada, 303--316. ISBN: 9781450328098.
[7]
Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: a polyhedral compiler for expressing fast and portable code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, Washington, DC, USA, 193--205. ISBN: 9781728114361.
[8]
[SW] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang, JAX: composable transformations of Python+NumPy programs version 0.3.13, 2018. url: http://github.com/google/jax.
[9]
Dino Viehland Brett Cannon. 2016. PEP 523: adding a frame evaluation API to CPython. https://peps.python.org/pep-0523/. (2016).
[10]
Jack Cao. 2022. PyTorch/XLA 2022 Q4 dev update. https://dev-discuss.pytorch.org/t/pytorch-xla-2022-q4-dev-update/961. (2022).
[11]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: an automated End-to-End optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, (Oct. 2018), 578--594. ISBN: 978-1-939133-08-3. https://www.usenix.org/conference/osdi18/presentation/chen.
[12]
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to optimize tensor programs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Montréal, Canada, 3393--3404.
[13]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: efficient primitives for deep learning. (2014). arXiv: 1410.07 59 [cs.NE].
[14]
Will Constable, Xu Zhao, Victor Bittorf, Eric Christoffersen, Taylor Robie, Eric Han, Peng Wu, Nick Korovaiko, Jason Ansel, Orion Reblitz-Richardson, and Soumith Chintala. 2020. TorchBench: a collection of open source benchmarks for PyTorch performance and usability evaluation. https://github.com/pytorch/benchmark. (Sept. 2020).
[15]
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE, 5, 1, 46--55.
[16]
ONNX Runtime developers. 2021. ONNX runtime. https://www.onnxruntime.ai. (2021).
[17]
Zachary DeVito et al. 2018. TorchScript. https://pytorch.org/docs/1.9.0/jit.html. (Sept. 2018).
[18]
Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, and Gennady Pekhimenko. 2023. Hidet: task-mapping programming paradigm for deep learning tensor programs. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, Vancouver, BC, Canada, 370--384. ISBN: 9781450399166.
[19]
Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, and Tianqi Chen. 2022. TensorIR: an abstraction for automatic tensorized program optimization. (2022). arXiv: 2207.04296 [cs.LG].
[20]
Alan Gray. 2019. Getting started with CUDA graphs. https://developer.nvidia.com/blog/cuda-graphs/. (2019).
[21]
Charles R. Harris, K. Jarrod Millman, Stéfan J van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature, 585, 357--362.
[22]
Horace He. 2019. The state of machine learning frameworks in 2019. https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/. (2019).
[23]
Mike Innes et al. 2017. On machine learning and programming languages. https://julialang.org/blog/2017/12/ml-pl/. (Dec. 2017).
[24]
ISO. 1998. ISO/IEC 14882:1998: Programming languages --- C++. (Sept. 1998), 732. http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+14882%2D1998.
[25]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross B. Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. CoRR, abs/1408.5093. http://arxiv.org/abs/1408.5093 arXiv: 1408.5093.
[26]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, Toronto, ON, Canada, 1--12. ISBN: 9781450348928.
[27]
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2--14.
[28]
Aaron Meurer, Christopher P. Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B. Kirpichev, Matthew Rocklin, Amit Kumar, Sergiu Ivanov, Jason K. Moore, Sartaj Singh, Thilina Rathnayake, Sean Vig, Brian E. Granger, Richard P. Muller, Fransesco Bonazzi, Harsh Gupta, Shivam Vats, Fredrik Johansson, Fabian Pedregosa, Matthew J. Curry, Andy R. Terrel, Štěpán Roučka, Ashutosh Saboo, Isuru Fernando, Sumith Kulal, Robert Cimrmam, and Anthony Scopatz. 2017. SymPy: symbolic computing in Python. PeerJ Computer Science, 3, (Jan. 2017).
[29]
Adrian Mönnich, Armin Ronacher, David Lord, Grey Li, Joshua Bronson, Markus Unterwaditzer, and Philip Jones. 2023. Jinja project. https://github.com/pallets/jinja. (2023).
[30]
NVIDIA, Péter Vingelmann, and Frank H.P. Fitzek. 2023. CUDA. https://developer.nvidia.com/cuda-toolkit. (2023).
[31]
2023. ONNX. https://onnx.ai/. (2023).
[32]
2019. Pytorch: an imperative style, high-performance deep learning library. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 12 pages.
[33]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). Association for Computing Machinery, Seattle, Washington, USA, 519--530. ISBN: 9781450320146.
[34]
James Reed, Zachary DeVito, Horace He, Ansley Ussery, and Jason Ansel. 2022. Torch.fx: practical program capture and transformation for deep learning in python. In Proceedings of Machine Learning and Systems. D. Marculescu, Y. Chi, and C. Wu, (Eds.) Vol. 4, 638--651. https://proceedings.mlsys.org/paper/2022/file/ca46c1b9512a7a8315fa3c5a946e8265-Paper.pdf.
[35]
Elvis Saravia. 2021. Papers with Code 2021: a year in review. https://medium.com/paperswithcode/papers-with-code-2021-a-year-in-review-de75d5a77b8b. (2021).
[36]
Christian Sarofeen, Piotr Bialecki, Jie Jiang, Kevin Stephano, Masaki Kozuki, Neal Vaidya, and Stas Bekman. 2022. Introducing nvFuser, a deep learning compiler for PyTorch. https://pytorch.org/blog/introducing-nvfuser-a-deep-learning-compiler-for-pytorch/. (2022).
[37]
Frank Seide and Amit Agarwal. 2016. CNTK: microsoft's open-source deep-learning toolkit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, San Francisco, California, USA, 2135. ISBN: 9781450342322.
[38]
Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, and Tianqi Chen. 2022. Tensor program optimization with probabilistic programs. (2022). arXiv: 2205.13603 [cs.LG].
[39]
Alex Suhan, Davide Libenzi, Ailing Zhang, Parker Schuh, Brennan Saeta, Jie Young Sohn, and Denys Shabalin. 2021. LazyTensor: combining eager execution with domain-specific compilers. arXiv preprint arXiv:2102.13267.
[40]
PyTorch Team. 2023. TorchDynamo Benchmarking Code. https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo. (2023).
[41]
PyTorch Team. 2023. TorchInductor Performance Dashboard. https://hud.pytorch.org/benchmark/compilers. (2023).
[42]
PyTorch XLA Team. 2023. PyTorch/XLA. https://github.com/pytorch/xla. (2023).
[43]
Vijay Thakkar, Pradeep Ramani, Cris Cecka, Aniket Shivam, Honghao Lu, Ethan Yan, Jack Kosaian, Mark Hoemmen, Haicheng Wu, Andrew Kerr, Matt Nicely, Duane Merrill, Dustyn Blasig, Fengqi Qiao, Piotr Majcher, Paul Springer, Markus Hohnerbach, Jin Wang, and Manish Gupta. 2023. CUTLASS. https://github.com/NVIDIA/cutlass. Version 3.0.0. (Jan. 2023).
[44]
[SW] The IREE Authors, IREE Sept. 2019. url: https://github.com/openxla/iree.
[45]
The XLA Team. 2017. XLA - Tensorflow, compiled. https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html. (Mar. 2017).
[46]
Philippe Tillet, H. T. Kung, and David Cox. 2019. Triton: an intermediate language and compiler for tiled neural network computations. In (MAPL 2019). Association for Computing Machinery, Phoenix, AZ, USA, 10--19. ISBN: 9781450367196.
[47]
Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, and Hiroyuki Yamazaki Vincent. 2019. Chainer: A Deep Learning Framework for Accelerating the Research Cycle. CoRR, abs/1908.00213. http://arxiv.org/abs/1908.00213 arXiv: 1908.00213.
[48]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: framework-agnostic high-performance machine learning abstractions. (2018). arXiv: 1802.04730 [cs.PL].
[49]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Long Beach, California, USA, 6000--6010. ISBN: 9781510860964.
[50]
B. P. Welford. 1962. Note on a method for calculating corrected sums of squares and products. Technometrics, 4, 3, 419--420.
[51]
Jian Weng, Animesh Jain, Jie Wang, Leyuan Wang, Yida Wang, and Tony Nowatzki. 2021. Unit: unifying tensorized instruction compilation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 77--89.
[52]
Ross Wightman. 2019. PyTorch image models. https://github.com/rwightman/pytorch-image-models. (2019).
[53]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Perric Cistac, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Association for Computational Linguistics, (Oct. 2020), 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[54]
Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, and Yibo Zhu. 2022. Bolt: bridging the gap between auto-tuners and hardware-native performance. In Proceedings of Machine Learning and Systems. D. Marculescu, Y. Chi, and C. Wu, (Eds.) Vol. 4, 204--216. https://proceedings.mlsys.org/paper_files/paper/2022/file/38b3eff8baf56627478ec76a704e9b52-Paper.pdf.
[55]
Shangdi Yu and Horace He. 2023. Transcending runtime-memory tradeoffs in checkpointing by being fusion aware. In Proceedings of Machine Learning and Systems.
[56]
Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, and Gennady Pekhimenko. 2022. Dietcode: automatic optimization for dynamic tensor programs. In Proceedings of Machine Learning and Systems. D. Marculescu, Y. Chi, and C. Wu, (Eds.) Vol. 4, 848--863. https://proceedings.mlsys.org/paper_files/paper/2022/file/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Paper.pdf.
[57]
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: generating high-performance tensor programs for deep learning. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI'20) Article 49. USENIX Association, USA, 17 pages. ISBN: 978-1-939133-19-9.
[58]
Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, and Yun Liang. 2022. AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA '22). Association for Computing Machinery, New York, New York, 874--887. ISBN: 9781450386104.
[59]
Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, and Gennady Pekhimenko. 2022. ROLLER: fast and efficient tensor compilation for deep learning. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, (July 2022), 233--248. ISBN: 978-1-939133-28-1. https://www.usenix.org/conference/osdi22/presentation/zhu.
[60]
Mikhail Zolotukhin. 2021. NNC walkthrough: how PyTorch ops get fused. https://dev-discuss.pytorch.org/t/nnc-walkthrough-how-pytorch-ops-get-fused/125. (2021).

Cited By

View all
  • (2025)Towards good practice for convolution and attention with PANs in federated medical image classificationThe Journal of Supercomputing10.1007/s11227-024-06476-081:1Online publication date: 1-Jan-2025
  • (2024)Solving Electromagnetic Scattering Problems by Isogeometric Analysis with Deep Operator Learning2024 Kleinheubach Conference10.23919/IEEECONF64570.2024.10739053(1-4)Online publication date: 24-Sep-2024
  • (2024)Automatic Detection of Sentiment Towards Explicit Aspect in Russian Publicism Sentences Using Syntactic Structure2024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749930(593-602)Online publication date: 30-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
April 2024
1299 pages
ISBN:9798400703850
DOI:10.1145/3620665
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 April 2024

Check for updates

Badges

Qualifiers

  • Research-article

Conference

ASPLOS '24

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3,608
  • Downloads (Last 6 weeks)707
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Towards good practice for convolution and attention with PANs in federated medical image classificationThe Journal of Supercomputing10.1007/s11227-024-06476-081:1Online publication date: 1-Jan-2025
  • (2024)Solving Electromagnetic Scattering Problems by Isogeometric Analysis with Deep Operator Learning2024 Kleinheubach Conference10.23919/IEEECONF64570.2024.10739053(1-4)Online publication date: 24-Sep-2024
  • (2024)Automatic Detection of Sentiment Towards Explicit Aspect in Russian Publicism Sentences Using Syntactic Structure2024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749930(593-602)Online publication date: 30-Oct-2024
  • (2024)ChainoPy: A Python Library for Discrete Time Markov Chain Based Stochastic AnalysisJournal of Open Source Software10.21105/joss.068809:100(6880)Online publication date: Aug-2024
  • (2024)JAXbind: Bind any function to JAXJournal of Open Source Software10.21105/joss.065329:98(6532)Online publication date: Jun-2024
  • (2024)cblearn: Comparison-based Machine Learning in PythonJournal of Open Source Software10.21105/joss.061399:98(6139)Online publication date: Jun-2024
  • (2024)Methods of sentiment detection towards aspect of economic and social development in Russian sentencesModeling and Analysis of Information Systems10.18255/1818-1015-2024-4-362-38331:4(362-383)Online publication date: 13-Dec-2024
  • (2024)Distributed Training of Large Language Models on AWS TrainiumProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698535(961-976)Online publication date: 20-Nov-2024
  • (2024)Fast and Globally Consistent Normal Orientation based on the Winding Number Normal ConsistencyACM Transactions on Graphics10.1145/368789543:6(1-19)Online publication date: 19-Dec-2024
  • (2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media