[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs

Published: 26 January 2018 Publication History

Abstract

While plentiful on-chip memory is necessary for many designs to fully utilize an FPGA’s computational capacity, SRAM scaling is becoming more difficult because of increasing device variation. An alternative is to build FPGA block RAM (BRAM) from magnetic tunnel junctions (MTJ), as this emerging embedded memory has a small cell size, low energy usage, and good scalability. We conduct a detailed comparison study of SRAM and MTJ BRAMs that includes cell designs that are robust with device variation, transistor-level design and optimization of all the required BRAM-specific circuits, and variation-aware simulation at the 22nm node. At a 256Kb block size, MTJ-BRAM is 3.06× denser and 55% more energy efficient and its Fmax is 274MHz, which is adequate for most FPGA system clock domains. We also detail further enhancements that allow these 256 Kb MTJ BRAMs to operate at a higher speed of 353MHz for the streaming FIFOs, which are very common in FPGA designs and describe how the non-volatility of MTJ BRAM enables novel on-chip configuration and power-down modes. For a RAM architecture similar to the latest commercial FPGAs, MTJ-BRAMs could expand FPGA memory capacity by 2.95× with no die size increase.

References

[1]
M. Abdelfattah and V. Betz. 2014. The case for embedded networks on chip on field-programmable gate arrays. IEEE Micro 34, 1 (2014), 80--89.
[2]
C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler, V. Chikarmane, T. Ghani, T. Glassman, and others. 2012. A 22nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’12). 131--132.
[3]
X. Bi, M. Weldon, and H. Li. 2013. STT-RAM designs supporting dual-port accesses. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). 853--858.
[4]
D. Boland. 2016. Reducing memory requirements for high-performance and numerically stable gaussian elimination. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’16). 244--253.
[5]
A. Bsoul and S. Wilton. 2012. An FPGA with power-gated switch blocks. In Proceedings of the International Conference on Field-Programmable Technology (FPT’12). 87--94.
[6]
C. Chiasson. 2013. Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Master’s thesis. University of Toronto.
[7]
C. Chiasson and V. Betz. 2013. Should FPGAs abandon the pass-gate? In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.
[8]
K. Ikegami, H. Noguchi, C. Kamata, M. Amano, K. Abe, K. Kushida, E. Kitagawa, T. Ochiai, N. Shimomura, S. Itai, and others. 2014. Low power and high density STT-MRAM for embedded cache memory using advanced perpendicular MTJ integrations and asymmetric compensation techniques. In Proceedings of the International Electron Devices Meeting (IEDM’14). 650--653.
[9]
Intel Corporation. 2016. Stratix 10 MX (DRAM dystem-in-package) product table and other product data sheets. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/pt/stratix-10-mx-product-table.pdf.
[10]
Intel Corporation. 2017. Arria 10 device datasheet. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/arria-10/a10_datasheet.pdf.
[11]
Intel Corporation. 2017. Intel FPGA buy online. Retrieved from https://www.altera.com/buy.html.
[12]
ITRS. 2011. Interconnect chapter. Retrieved from http://www.itrs2.net.
[13]
E. Kadric, D. Lakata, and A. DeHon. 2015. Impact of memory architecture on FPGA energy consumption. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 146--155.
[14]
S. H. Kang. 2014. Embedded STT-MRAM for energy-efficient and cost-effective mobile systems. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.
[15]
J. Kittl, A. Lauwers, O. Chamirian, M. Van Dal, A. Akheyar, O. Richard, J. Lisoni, M. De Potter, R. Lindsay, and K. Maex. 2003. Ni based silicides: Material issues for advanced CMOS applications. In Proceedings of the International Symposium Advanced Short-time Thermal Processing for Si-based CMOS Devices. 177.
[16]
D. Lewis, D. Cashman, M. Chan, J. Chromczak, G. Lai, A. Lee, T. Vanderhoek, and H. Yu. 2013. Architectural enhancements in stratix VTM. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’13). 147--156.
[17]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories Technical Report HPL-2009 (2009), 85.
[18]
G. Nallapati, J. Zhu, J. Wang, J. Sheu, K. Cheng, C. Gan, D. Yang, M. Cai, J. Cheng, L. Ge, and others. 2014. Cost and power/performance optimized 20nm SoC technology for advanced mobile devices. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.
[19]
T. Ngai, J. Rose, and S. Wilton. 1995. An SRAM-programmable field-configurable memory. In Proceedings of the Custom Integrated Circuits Conference (CICC’95). 499--502.
[20]
H. Nii, T. Sanuki, Y. Okayama, K. Ota, T. Iwamoto, T. Fujimaki, T. Kimura, R. Watanabe, T. Komoda, A. Eiho, and others. 2006. A 45nm high performance bulk logic platform technology (CMOS6) using ultra high NA (1.07) immersion lithography with hybrid dual-damascene structure and porous low-k BEOL. In Proceedings of the International Electron Devices Meeting (IEDM’06). 1--4.
[21]
H. Noguchi, K. Kushida, K. Ikegami, K. Abe, E. Kitagawa, S. Kashiwada, C. Kamata, A. Kawasumi, H. Hara, and S. Fujita. 2013. A 250-MHz 256b-I/O 1-Mb STT-MRAM with advanced perpendicular MTJ based dual cell for nonvolatile magnetic caches to reduce active power of processors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’13). C108--C109.
[22]
M. O’Connor. 2014. Highlights of the high-bandwidth memory (HBM) standard. In Proceedings of the Memory Forum, a Workshop at the International Symposium on Computer Architecture (ISCA’14).
[23]
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, Gopi P. Gopal, Jan Gray, and others. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 13--24.
[24]
R. Rashid, G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 20--27.
[25]
D. Saida, N. Shimomura, E. Kitagawa, C. Kamata, M. Yakabe, Yu. Osawa, S. Fujita, and J. Ito. 2014. Low-current high-speed spin-transfer switching in a perpendicular magnetic tunnel junction for cache memory in mobile processors. IEEE Trans. Magn. 50, 11 (2014), 3401105.
[26]
R. Stefan and S. Cotofana. 2008. Bitstream compression techniques for Virtex 4 FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’08). 323--328.
[27]
K. Tatsumura, S. Yazdanshenas, and V. Betz. 2016. High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 4--11.
[28]
R. Tessier, V. Betz, D. Neto, A. Egier, and T. Gopalsamy. 2007. Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 26, 2 (2007), 278--290.
[29]
L. Thomas, G. Jan, J. Zhu, H. Liu, Y.-J Lee, S. Le, R. Tong, K. Pi, Y. Wang, D. Shen, and others. 2014. Perpendicular spin transfer torque magnetic random access memories with high spin torque efficiency and thermal stability for embedded applications. J. Appl. Phys. 115, 17 (2014), 172615.
[30]
K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe. 2010. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (ISSCC’10). 258--259.
[31]
S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.
[32]
S. Wilton, J. Rose, and Z. Vranesic. 1995. Architecture of centralized field-configurable memory. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’95). 97--103.
[33]
H. Wong, V. Betz, and J. Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’11). 5--14.
[34]
Xilinx Incorporated. 2015. Ultrascale architecture and product overview and other product data sheets. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds890-ultrascale-overview.pdf.
[35]
S. Yazdanshenas, K. Tatsumura, and V. Betz. 2017. Don’t forget the memory: Automatic block RAM modelling, optimization, and architecture exploration. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). 115--124.
[36]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 161--170.
[37]
W. Zhao and Y. Cao. 2007. Predictive technology model for nano-CMOS design exploration. ACM J. Emerg. Technol. Comput. Syst. 3, 1 (2007), 1.

Cited By

View all
  • (2023)Dynamic power-gating for leakage power reduction in FPGAs在现场可编程门阵列中用于降低泄漏功率的 动态电源门控Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220008424:4(582-598)Online publication date: 11-May-2023
  • (2023)CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/360350416:3(1-34)Online publication date: 27-Jul-2023
  • (2021)Large-scale combinatorial optimization in real-time systems by FPGA-based accelerators for simulated bifurcationProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468045(1-6)Online publication date: 21-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 1
Special Section on FCCM 2016 and Regular Papers
March 2018
183 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3178391
  • Editor:
  • Steve Wilton
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 January 2018
Accepted: 01 October 2017
Received: 01 June 2017
Published in TRETS Volume 11, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. On-chip memory
  3. SRAM
  4. magnetic tunnel junction

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Connaught Scholarship, Toshiba
  • NSERC/Intel Industrial Research Chair in Programmable Silicon

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Dynamic power-gating for leakage power reduction in FPGAs在现场可编程门阵列中用于降低泄漏功率的 动态电源门控Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220008424:4(582-598)Online publication date: 11-May-2023
  • (2023)CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/360350416:3(1-34)Online publication date: 27-Jul-2023
  • (2021)Large-scale combinatorial optimization in real-time systems by FPGA-based accelerators for simulated bifurcationProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468045(1-6)Online publication date: 21-Jun-2021
  • (2021)Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00018(88-96)Online publication date: May-2021
  • (2018)NVM-Based FPGA Block RAM With Adaptive SLC-MLC ConversionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.285726137:11(2661-2672)Online publication date: Nov-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media