[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3173162.3173171acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

In-Memory Data Parallel Processor

Published: 19 March 2018 Publication History

Abstract

Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our in-memory processor. Our results demonstrate 7.5x speedup over a multi-core CPU server for a set of applications from Parsec and 763x speedup over a server-class GPU for a set of Rodinia benchmarks.

References

[1]
Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man, Rajat Monga, Sherry Moore, Derek Murray, Jon Shlens, Benoit Steiner, Ilya Sutskever, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Oriol Vinyals, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. http://download.tensorflow.org/paper/whitepaper2015.pdf
[2]
Nilmini Abeyratne, Reetuparna Das, Qingkun Li, Korey Sewell, Bharan Giridhar, Ronald G Dreslinski, David Blaauw, and Trevor Mudge. 2013. Scaling towards kilo-core processors with asymmetric high-radix topologies High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 496--507.
[3]
Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute Caches 2017 IEEE International Symposium on High Performance Computer Architecture, HPCA 2017, Austin, TX, USA, February 4--8, 2017. IEEE.
[4]
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 105--117.
[5]
Chris Yakopcic and Tarek M Taha. 2013. Energy efficient perceptron pattern recognition using segmented memristor crossbar arrays. In Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 1--8.
[6]
Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: Throughput-oriented Programmable Processing in Memory Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC '14).
[7]
Qiuling Zhu, B. Akin, H.E. Sumbul, F. Sadi, J.C. Hoe, L. Pileggi, and F. Franchetti. 2013. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In 3D Systems Integration Conference (3DIC), 2013 IEEE International.

Cited By

View all
  • (2025)Implications of Noise in Resistive Memory on Deep Neural Networks for Image ClassificationMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74643-7_11(135-149)Online publication date: 1-Jan-2025
  • (2024)PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware ComputationACM Transactions on Architecture and Code Optimization10.1145/369082421:4(1-27)Online publication date: 20-Nov-2024
  • (2024)ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level SparsityIEEE Transactions on Computers10.1109/TC.2023.329086973:9(2320-2334)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
March 2018
827 pages
ISBN:9781450349116
DOI:10.1145/3173162
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 2
    ASPLOS '18
    February 2018
    809 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3296957
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compilation support
  2. data-parallel execution
  3. in-memory computing
  4. reram
  5. spatial architecture

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '18

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)558
  • Downloads (Last 6 weeks)73
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Implications of Noise in Resistive Memory on Deep Neural Networks for Image ClassificationMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74643-7_11(135-149)Online publication date: 1-Jan-2025
  • (2024)PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware ComputationACM Transactions on Architecture and Code Optimization10.1145/369082421:4(1-27)Online publication date: 20-Nov-2024
  • (2024)ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level SparsityIEEE Transactions on Computers10.1109/TC.2023.329086973:9(2320-2334)Online publication date: Sep-2024
  • (2024)Hardware implementation of memristor-based artificial neural networksNature Communications10.1038/s41467-024-45670-915:1Online publication date: 4-Mar-2024
  • (2023)Attacking ReRAM-based Architectures using Repeated Writes2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136903(1-6)Online publication date: Apr-2023
  • (2023)MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in MemoryProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623784(800-814)Online publication date: 28-Oct-2023
  • (2023)ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear SolversProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607077(1-15)Online publication date: 12-Nov-2023
  • (2023)ReaLPrune: ReRAM Crossbar-Aware Lottery Ticket Pruning for CNNsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.322363011:2(303-317)Online publication date: 1-Apr-2023
  • (2023)PASGCN: An ReRAM-Based PIM Design for GCN With Adaptively Sparsified GraphsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317503142:1(150-163)Online publication date: Jan-2023
  • (2023)SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317290742:1(204-217)Online publication date: Jan-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media