More Web Proxy on the site http://driver.im/

research-article

Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs

Authors:

Kosuke Tatsumura,

Sadegh Yazdanshenas,

Vaughn BetzAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 11, Issue 1

Article No.: 6, Pages 1 - 22

https://doi.org/10.1145/3154425

Published: 26 January 2018 Publication History

Abstract

While plentiful on-chip memory is necessary for many designs to fully utilize an FPGA’s computational capacity, SRAM scaling is becoming more difficult because of increasing device variation. An alternative is to build FPGA block RAM (BRAM) from magnetic tunnel junctions (MTJ), as this emerging embedded memory has a small cell size, low energy usage, and good scalability. We conduct a detailed comparison study of SRAM and MTJ BRAMs that includes cell designs that are robust with device variation, transistor-level design and optimization of all the required BRAM-specific circuits, and variation-aware simulation at the 22nm node. At a 256Kb block size, MTJ-BRAM is 3.06× denser and 55% more energy efficient and its F_max is 274MHz, which is adequate for most FPGA system clock domains. We also detail further enhancements that allow these 256 Kb MTJ BRAMs to operate at a higher speed of 353MHz for the streaming FIFOs, which are very common in FPGA designs and describe how the non-volatility of MTJ BRAM enables novel on-chip configuration and power-down modes. For a RAM architecture similar to the latest commercial FPGAs, MTJ-BRAMs could expand FPGA memory capacity by 2.95× with no die size increase.

References

[1]

M. Abdelfattah and V. Betz. 2014. The case for embedded networks on chip on field-programmable gate arrays. IEEE Micro 34, 1 (2014), 80--89.

[2]

C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler, V. Chikarmane, T. Ghani, T. Glassman, and others. 2012. A 22nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’12). 131--132.

[3]

X. Bi, M. Weldon, and H. Li. 2013. STT-RAM designs supporting dual-port accesses. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). 853--858.

Digital Library

[4]

D. Boland. 2016. Reducing memory requirements for high-performance and numerically stable gaussian elimination. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’16). 244--253.

Digital Library

[5]

A. Bsoul and S. Wilton. 2012. An FPGA with power-gated switch blocks. In Proceedings of the International Conference on Field-Programmable Technology (FPT’12). 87--94.

[6]

C. Chiasson. 2013. Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Master’s thesis. University of Toronto.

[7]

C. Chiasson and V. Betz. 2013. Should FPGAs abandon the pass-gate? In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.

[8]

K. Ikegami, H. Noguchi, C. Kamata, M. Amano, K. Abe, K. Kushida, E. Kitagawa, T. Ochiai, N. Shimomura, S. Itai, and others. 2014. Low power and high density STT-MRAM for embedded cache memory using advanced perpendicular MTJ integrations and asymmetric compensation techniques. In Proceedings of the International Electron Devices Meeting (IEDM’14). 650--653.

[9]

Intel Corporation. 2016. Stratix 10 MX (DRAM dystem-in-package) product table and other product data sheets. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/pt/stratix-10-mx-product-table.pdf.

[10]

Intel Corporation. 2017. Arria 10 device datasheet. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/arria-10/a10_datasheet.pdf.

[11]

Intel Corporation. 2017. Intel FPGA buy online. Retrieved from https://www.altera.com/buy.html.

[12]

ITRS. 2011. Interconnect chapter. Retrieved from http://www.itrs2.net.

[13]

E. Kadric, D. Lakata, and A. DeHon. 2015. Impact of memory architecture on FPGA energy consumption. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 146--155.

Digital Library

[14]

S. H. Kang. 2014. Embedded STT-MRAM for energy-efficient and cost-effective mobile systems. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.

[15]

J. Kittl, A. Lauwers, O. Chamirian, M. Van Dal, A. Akheyar, O. Richard, J. Lisoni, M. De Potter, R. Lindsay, and K. Maex. 2003. Ni based silicides: Material issues for advanced CMOS applications. In Proceedings of the International Symposium Advanced Short-time Thermal Processing for Si-based CMOS Devices. 177.

[16]

D. Lewis, D. Cashman, M. Chan, J. Chromczak, G. Lai, A. Lee, T. Vanderhoek, and H. Yu. 2013. Architectural enhancements in stratix VTM. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’13). 147--156.

Digital Library

[17]

N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories Technical Report HPL-2009 (2009), 85.

[18]

G. Nallapati, J. Zhu, J. Wang, J. Sheu, K. Cheng, C. Gan, D. Yang, M. Cai, J. Cheng, L. Ge, and others. 2014. Cost and power/performance optimized 20nm SoC technology for advanced mobile devices. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.

[19]

T. Ngai, J. Rose, and S. Wilton. 1995. An SRAM-programmable field-configurable memory. In Proceedings of the Custom Integrated Circuits Conference (CICC’95). 499--502.

[20]

H. Nii, T. Sanuki, Y. Okayama, K. Ota, T. Iwamoto, T. Fujimaki, T. Kimura, R. Watanabe, T. Komoda, A. Eiho, and others. 2006. A 45nm high performance bulk logic platform technology (CMOS6) using ultra high NA (1.07) immersion lithography with hybrid dual-damascene structure and porous low-k BEOL. In Proceedings of the International Electron Devices Meeting (IEDM’06). 1--4.

[21]

H. Noguchi, K. Kushida, K. Ikegami, K. Abe, E. Kitagawa, S. Kashiwada, C. Kamata, A. Kawasumi, H. Hara, and S. Fujita. 2013. A 250-MHz 256b-I/O 1-Mb STT-MRAM with advanced perpendicular MTJ based dual cell for nonvolatile magnetic caches to reduce active power of processors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’13). C108--C109.

[22]

M. O’Connor. 2014. Highlights of the high-bandwidth memory (HBM) standard. In Proceedings of the Memory Forum, a Workshop at the International Symposium on Computer Architecture (ISCA’14).

[23]

A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, Gopi P. Gopal, Jan Gray, and others. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 13--24.

Digital Library

[24]

R. Rashid, G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 20--27.

[25]

D. Saida, N. Shimomura, E. Kitagawa, C. Kamata, M. Yakabe, Yu. Osawa, S. Fujita, and J. Ito. 2014. Low-current high-speed spin-transfer switching in a perpendicular magnetic tunnel junction for cache memory in mobile processors. IEEE Trans. Magn. 50, 11 (2014), 3401105.

[26]

R. Stefan and S. Cotofana. 2008. Bitstream compression techniques for Virtex 4 FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’08). 323--328.

[27]

K. Tatsumura, S. Yazdanshenas, and V. Betz. 2016. High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 4--11.

[28]

R. Tessier, V. Betz, D. Neto, A. Egier, and T. Gopalsamy. 2007. Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 26, 2 (2007), 278--290.

Digital Library

[29]

L. Thomas, G. Jan, J. Zhu, H. Liu, Y.-J Lee, S. Le, R. Tong, K. Pi, Y. Wang, D. Shen, and others. 2014. Perpendicular spin transfer torque magnetic random access memories with high spin torque efficiency and thermal stability for embedded applications. J. Appl. Phys. 115, 17 (2014), 172615.

[30]

K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe. 2010. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (ISSCC’10). 258--259.

[31]

S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.

Digital Library

[32]

S. Wilton, J. Rose, and Z. Vranesic. 1995. Architecture of centralized field-configurable memory. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’95). 97--103.

Digital Library

[33]

H. Wong, V. Betz, and J. Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’11). 5--14.

Digital Library

[34]

Xilinx Incorporated. 2015. Ultrascale architecture and product overview and other product data sheets. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds890-ultrascale-overview.pdf.

[35]

S. Yazdanshenas, K. Tatsumura, and V. Betz. 2017. Don’t forget the memory: Automatic block RAM modelling, optimization, and architecture exploration. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). 115--124.

Digital Library

[36]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 161--170.

Digital Library

[37]

W. Zhao and Y. Cao. 2007. Predictive technology model for nano-CMOS design exploration. ACM J. Emerg. Technol. Comput. Syst. 3, 1 (2007), 1.

Digital Library

Cited By

Jahanirad H(2023)Dynamic power-gating for leakage power reduction in FPGAs在现场可编程门阵列中用于降低泄漏功率的动态电源门控Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220008424:4(582-598)Online publication date: 11-May-2023
https://doi.org/10.1631/FITEE.2200084
Arora ABhamburkar ABorda AAnand TSehgal RHanindhito BGaillardon PKulkarni JJohn L(2023)CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/360350416:3(1-34)Online publication date: 27-Jul-2023
https://dl.acm.org/doi/10.1145/3603504
Tatsumura K(2021)Large-scale combinatorial optimization in real-time systems by FPGA-based accelerators for simulated bifurcationProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468045(1-6)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3468044.3468045
Show More Cited By

Recommendations

Don't Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

While academic FPGA architecture exploration tools have become sufficiently advanced to enable a wide variety of explorations and optimizations on soft fabric and outing, support for Block RAM (BRAM) has been very limited. In this paper, we present ...
FPGA Logic Block Architectures for Efficient Deep Learning Inference

Reducing the precision of deep neural network (DNN) inference accelerators can yield large efficiency gains with little or no accuracy degradation compared to half or single precision floating-point by enabling more multiplication operations per unit ...
COFFE 2: Automatic Modelling and Optimization of Complex and Heterogeneous FPGA Architectures

FPGAs are becoming more heteregeneous to better adapt to different markets, motivating rapid exploration of different blocks/tiles for FPGAs. To evaluate a new FPGA architectural idea, one should be able to accurately obtain the area, delay, and energy ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 11, Issue 1

Special Section on FCCM 2016 and Regular Papers

March 2018

183 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/3178391

Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 January 2018

Accepted: 01 October 2017

Received: 01 June 2017

Published in TRETS Volume 11, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Connaught Scholarship, Toshiba
NSERC/Intel Industrial Research Chair in Programmable Silicon

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
250
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jahanirad H(2023)Dynamic power-gating for leakage power reduction in FPGAs在现场可编程门阵列中用于降低泄漏功率的动态电源门控Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220008424:4(582-598)Online publication date: 11-May-2023
https://doi.org/10.1631/FITEE.2200084
Arora ABhamburkar ABorda AAnand TSehgal RHanindhito BGaillardon PKulkarni JJohn L(2023)CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/360350416:3(1-34)Online publication date: 27-Jul-2023
https://dl.acm.org/doi/10.1145/3603504
Tatsumura K(2021)Large-scale combinatorial optimization in real-time systems by FPGA-based accelerators for simulated bifurcationProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468045(1-6)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3468044.3468045
Wang XGoyal VYu JBertacco VBoutros ANurvitadhi EAugustine CIyer RDas R(2021)Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00018(88-96)Online publication date: May-2021
https://doi.org/10.1109/FCCM51124.2021.00018
Ju LSui XLi SZhao MXue CHu JJia Z(2018)NVM-Based FPGA Block RAM With Adaptive SLC-MLC ConversionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.285726137:11(2661-2672)Online publication date: Nov-2018
https://doi.org/10.1109/TCAD.2018.2857261

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents