[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs

Published: 26 January 2018 Publication History

Abstract

While plentiful on-chip memory is necessary for many designs to fully utilize an FPGA’s computational capacity, SRAM scaling is becoming more difficult because of increasing device variation. An alternative is to build FPGA block RAM (BRAM) from magnetic tunnel junctions (MTJ), as this emerging embedded memory has a small cell size, low energy usage, and good scalability. We conduct a detailed comparison study of SRAM and MTJ BRAMs that includes cell designs that are robust with device variation, transistor-level design and optimization of all the required BRAM-specific circuits, and variation-aware simulation at the 22nm node. At a 256Kb block size, MTJ-BRAM is 3.06× denser and 55% more energy efficient and its Fmax is 274MHz, which is adequate for most FPGA system clock domains. We also detail further enhancements that allow these 256 Kb MTJ BRAMs to operate at a higher speed of 353MHz for the streaming FIFOs, which are very common in FPGA designs and describe how the non-volatility of MTJ BRAM enables novel on-chip configuration and power-down modes. For a RAM architecture similar to the latest commercial FPGAs, MTJ-BRAMs could expand FPGA memory capacity by 2.95× with no die size increase.

References

[1]
M. Abdelfattah and V. Betz. 2014. The case for embedded networks on chip on field-programmable gate arrays. IEEE Micro 34, 1 (2014), 80--89.
[2]
C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler, V. Chikarmane, T. Ghani, T. Glassman, and others. 2012. A 22nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’12). 131--132.
[3]
X. Bi, M. Weldon, and H. Li. 2013. STT-RAM designs supporting dual-port accesses. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). 853--858.
[4]
D. Boland. 2016. Reducing memory requirements for high-performance and numerically stable gaussian elimination. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’16). 244--253.
[5]
A. Bsoul and S. Wilton. 2012. An FPGA with power-gated switch blocks. In Proceedings of the International Conference on Field-Programmable Technology (FPT’12). 87--94.
[6]
C. Chiasson. 2013. Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Master’s thesis. University of Toronto.
[7]
C. Chiasson and V. Betz. 2013. Should FPGAs abandon the pass-gate? In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.
[8]
K. Ikegami, H. Noguchi, C. Kamata, M. Amano, K. Abe, K. Kushida, E. Kitagawa, T. Ochiai, N. Shimomura, S. Itai, and others. 2014. Low power and high density STT-MRAM for embedded cache memory using advanced perpendicular MTJ integrations and asymmetric compensation techniques. In Proceedings of the International Electron Devices Meeting (IEDM’14). 650--653.
[9]
Intel Corporation. 2016. Stratix 10 MX (DRAM dystem-in-package) product table and other product data sheets. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/pt/stratix-10-mx-product-table.pdf.
[10]
Intel Corporation. 2017. Arria 10 device datasheet. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/arria-10/a10_datasheet.pdf.
[11]
Intel Corporation. 2017. Intel FPGA buy online. Retrieved from https://www.altera.com/buy.html.
[12]
ITRS. 2011. Interconnect chapter. Retrieved from http://www.itrs2.net.
[13]
E. Kadric, D. Lakata, and A. DeHon. 2015. Impact of memory architecture on FPGA energy consumption. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 146--155.
[14]
S. H. Kang. 2014. Embedded STT-MRAM for energy-efficient and cost-effective mobile systems. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.
[15]
J. Kittl, A. Lauwers, O. Chamirian, M. Van Dal, A. Akheyar, O. Richard, J. Lisoni, M. De Potter, R. Lindsay, and K. Maex. 2003. Ni based silicides: Material issues for advanced CMOS applications. In Proceedings of the International Symposium Advanced Short-time Thermal Processing for Si-based CMOS Devices. 177.
[16]
D. Lewis, D. Cashman, M. Chan, J. Chromczak, G. Lai, A. Lee, T. Vanderhoek, and H. Yu. 2013. Architectural enhancements in stratix VTM. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’13). 147--156.
[17]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories Technical Report HPL-2009 (2009), 85.
[18]
G. Nallapati, J. Zhu, J. Wang, J. Sheu, K. Cheng, C. Gan, D. Yang, M. Cai, J. Cheng, L. Ge, and others. 2014. Cost and power/performance optimized 20nm SoC technology for advanced mobile devices. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.
[19]
T. Ngai, J. Rose, and S. Wilton. 1995. An SRAM-programmable field-configurable memory. In Proceedings of the Custom Integrated Circuits Conference (CICC’95). 499--502.
[20]
H. Nii, T. Sanuki, Y. Okayama, K. Ota, T. Iwamoto, T. Fujimaki, T. Kimura, R. Watanabe, T. Komoda, A. Eiho, and others. 2006. A 45nm high performance bulk logic platform technology (CMOS6) using ultra high NA (1.07) immersion lithography with hybrid dual-damascene structure and porous low-k BEOL. In Proceedings of the International Electron Devices Meeting (IEDM’06). 1--4.
[21]
H. Noguchi, K. Kushida, K. Ikegami, K. Abe, E. Kitagawa, S. Kashiwada, C. Kamata, A. Kawasumi, H. Hara, and S. Fujita. 2013. A 250-MHz 256b-I/O 1-Mb STT-MRAM with advanced perpendicular MTJ based dual cell for nonvolatile magnetic caches to reduce active power of processors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’13). C108--C109.
[22]
M. O’Connor. 2014. Highlights of the high-bandwidth memory (HBM) standard. In Proceedings of the Memory Forum, a Workshop at the International Symposium on Computer Architecture (ISCA’14).
[23]
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, Gopi P. Gopal, Jan Gray, and others. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 13--24.
[24]
R. Rashid, G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 20--27.
[25]
D. Saida, N. Shimomura, E. Kitagawa, C. Kamata, M. Yakabe, Yu. Osawa, S. Fujita, and J. Ito. 2014. Low-current high-speed spin-transfer switching in a perpendicular magnetic tunnel junction for cache memory in mobile processors. IEEE Trans. Magn. 50, 11 (2014), 3401105.
[26]
R. Stefan and S. Cotofana. 2008. Bitstream compression techniques for Virtex 4 FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’08). 323--328.
[27]
K. Tatsumura, S. Yazdanshenas, and V. Betz. 2016. High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 4--11.
[28]
R. Tessier, V. Betz, D. Neto, A. Egier, and T. Gopalsamy. 2007. Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 26, 2 (2007), 278--290.
[29]
L. Thomas, G. Jan, J. Zhu, H. Liu, Y.-J Lee, S. Le, R. Tong, K. Pi, Y. Wang, D. Shen, and others. 2014. Perpendicular spin transfer torque magnetic random access memories with high spin torque efficiency and thermal stability for embedded applications. J. Appl. Phys. 115, 17 (2014), 172615.
[30]
K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe. 2010. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (ISSCC’10). 258--259.
[31]
S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.
[32]
S. Wilton, J. Rose, and Z. Vranesic. 1995. Architecture of centralized field-configurable memory. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’95). 97--103.
[33]
H. Wong, V. Betz, and J. Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’11). 5--14.
[34]
Xilinx Incorporated. 2015. Ultrascale architecture and product overview and other product data sheets. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds890-ultrascale-overview.pdf.
[35]
S. Yazdanshenas, K. Tatsumura, and V. Betz. 2017. Don’t forget the memory: Automatic block RAM modelling, optimization, and architecture exploration. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). 115--124.
[36]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 161--170.
[37]
W. Zhao and Y. Cao. 2007. Predictive technology model for nano-CMOS design exploration. ACM J. Emerg. Technol. Comput. Syst. 3, 1 (2007), 1.

Cited By

View all
  • (2023)Dynamic power-gating for leakage power reduction in FPGAs在现场可编程门阵列中用于降低泄漏功率的 动态电源门控Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220008424:4(582-598)Online publication date: 11-May-2023
  • (2023)CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/360350416:3(1-34)Online publication date: 27-Jul-2023
  • (2021)Large-scale combinatorial optimization in real-time systems by FPGA-based accelerators for simulated bifurcationProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468045(1-6)Online publication date: 21-Jun-2021
  • Show More Cited By

Index Terms

  1. Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 1
        Special Section on FCCM 2016 and Regular Papers
        March 2018
        183 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/3178391
        • Editor:
        • Steve Wilton
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 26 January 2018
        Accepted: 01 October 2017
        Received: 01 June 2017
        Published in TRETS Volume 11, Issue 1

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. FPGA
        2. On-chip memory
        3. SRAM
        4. magnetic tunnel junction

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Connaught Scholarship, Toshiba
        • NSERC/Intel Industrial Research Chair in Programmable Silicon

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 14 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Dynamic power-gating for leakage power reduction in FPGAs在现场可编程门阵列中用于降低泄漏功率的 动态电源门控Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220008424:4(582-598)Online publication date: 11-May-2023
        • (2023)CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/360350416:3(1-34)Online publication date: 27-Jul-2023
        • (2021)Large-scale combinatorial optimization in real-time systems by FPGA-based accelerators for simulated bifurcationProceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3468044.3468045(1-6)Online publication date: 21-Jun-2021
        • (2021)Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00018(88-96)Online publication date: May-2021
        • (2018)NVM-Based FPGA Block RAM With Adaptive SLC-MLC ConversionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.285726137:11(2661-2672)Online publication date: Nov-2018

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media