research-article

Public Access

In-Memory Data Parallel Processor

Authors:

Daichi Fujiki,

Scott Mahlke,

Reetuparna DasAuthors Info & Claims

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 1 - 14

https://doi.org/10.1145/3173162.3173171

Published: 19 March 2018 Publication History

PDF eReader

Abstract

Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our in-memory processor. Our results demonstrate 7.5x speedup over a multi-core CPU server for a set of applications from Parsec and 763x speedup over a server-class GPU for a set of Rodinia benchmarks.

References

[1]

Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man, Rajat Monga, Sherry Moore, Derek Murray, Jon Shlens, Benoit Steiner, Ilya Sutskever, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Oriol Vinyals, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. http://download.tensorflow.org/paper/whitepaper2015.pdf

Google Scholar

[2]

Nilmini Abeyratne, Reetuparna Das, Qingkun Li, Korey Sewell, Bharan Giridhar, Ronald G Dreslinski, David Blaauw, and Trevor Mudge. 2013. Scaling towards kilo-core processors with asymmetric high-radix topologies High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 496--507.

Digital Library

Google Scholar

[3]

Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute Caches 2017 IEEE International Symposium on High Performance Computer Architecture, HPCA 2017, Austin, TX, USA, February 4--8, 2017. IEEE.

Google Scholar

[4]

J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 105--117.

Digital Library

Google Scholar

[5]

Chris Yakopcic and Tarek M Taha. 2013. Energy efficient perceptron pattern recognition using segmented memristor crossbar arrays. In Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 1--8.

Google Scholar

[6]

Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: Throughput-oriented Programmable Processing in Memory Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC '14).

Digital Library

Google Scholar

[7]

Qiuling Zhu, B. Akin, H.E. Sumbul, F. Sadi, J.C. Hoe, L. Pileggi, and F. Franchetti. 2013. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In 3D Systems Integration Conference (3DIC), 2013 IEEE International.

Google Scholar

Cited By

View all

Emonds YXi KFröning H(2025)Implications of Noise in Resistive Memory on Deep Neural Networks for Image ClassificationMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74643-7_11(135-149)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74643-7_11
Ma SMhatre KWeng JHanindhito BWang ZNowatzki TJohn LArora A(2024)PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware ComputationACM Transactions on Architecture and Code Optimization10.1145/369082421:4(1-27)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3690824
Liu FZhao WWang ZChen YLiang XJiang L(2024)ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level SparsityIEEE Transactions on Computers10.1109/TC.2023.329086973:9(2320-2334)Online publication date: Sep-2024
https://doi.org/10.1109/TC.2023.3290869
Show More Cited By

Index Terms

In-Memory Data Parallel Processor
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Analog computers
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Non-volatile memory

Recommendations

In-Memory Data Parallel Processor
ASPLOS '18

Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the ...
Resistive GP-SIMD Processing-In-Memory

GP-SIMD, a novel hybrid general-purpose SIMD architecture, addresses the challenge of data synchronization by in-memory computing, through combining data storage and massive parallel processing. In this article, we explore a resistive implementation of ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

March 2018

827 pages

ISBN:9781450349116

DOI:10.1145/3173162

General Chairs:
Xipeng Shen
North Carolina State University, USA
,
James Tuck
North Carolina State University, USA
,
Program Chairs:
Ricardo Bianchini
Microsoft Research, USA
,
Vivek Sarkar
Georgia Institute of Technology, USA

ACM SIGPLAN Notices Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ASPLOS '18

Sponsor:

ASPLOS '18: Architectural Support for Programming Languages and Operating Systems

March 24 - 28, 2018

VA, Williamsburg, USA

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

119
Total Citations
View Citations
3,913
Total Downloads

Downloads (Last 12 months)558
Downloads (Last 6 weeks)73

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Emonds YXi KFröning H(2025)Implications of Noise in Resistive Memory on Deep Neural Networks for Image ClassificationMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74643-7_11(135-149)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74643-7_11
Ma SMhatre KWeng JHanindhito BWang ZNowatzki TJohn LArora A(2024)PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware ComputationACM Transactions on Architecture and Code Optimization10.1145/369082421:4(1-27)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3690824
Liu FZhao WWang ZChen YLiang XJiang L(2024)ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level SparsityIEEE Transactions on Computers10.1109/TC.2023.329086973:9(2320-2334)Online publication date: Sep-2024
https://doi.org/10.1109/TC.2023.3290869
Aguirre FSebastian ALe Gallo MSong WWang TYang JLu WChang MIelmini DYang YMehonic AKenyon AVillena MRoldán JWu YHsu HRaghavan NSuñé JMiranda EEltawil ASetti GSmagulova KSalama KKrestinskaya OYan XAng KJain SLi SAlharbi OPazos SLanza M(2024)Hardware implementation of memristor-based artificial neural networksNature Communications10.1038/s41467-024-45670-915:1Online publication date: 4-Mar-2024
https://doi.org/10.1038/s41467-024-45670-9
Joardar BChakrabarty K(2023)Attacking ReRAM-based Architectures using Repeated Writes2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136903(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136903
Fujiki D(2023)MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in MemoryProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623784(800-814)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623784
Song LChen FLi HChen YMohror KArnold DBadia R(2023)ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear SolversProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607077(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607077
Joardar BDoppa JLi HChakrabarty KPande P(2023)ReaLPrune: ReRAM Crossbar-Aware Lottery Ticket Pruning for CNNsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.322363011:2(303-317)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3223630
Yang TLi DMa FSong ZZhao YZhang JLiu FJiang L(2023)PASGCN: An ReRAM-Based PIM Design for GCN With Adaptively Sparsified GraphsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317503142:1(150-163)Online publication date: Jan-2023
https://doi.org/10.1109/TCAD.2022.3175031
Liu FWang ZChen YHe ZYang TLiang XJiang L(2023)SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317290742:1(204-217)Online publication date: Jan-2023
https://doi.org/10.1109/TCAD.2022.3172907
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

In-Memory Data Parallel Processor

Resistive GP-SIMD Processing-In-Memory

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing