[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3357526.3357553acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

ARC: DVFS-aware asymmetric-retention STT-RAM caches for energy-efficient multicore processors

Published: 30 September 2019 Publication History

Abstract

Relaxed retention (or volatile) spin-transfer torque RAM (STT-RAM) has been widely studied as a way to reduce STT-RAM's write energy and latency overheads. Given a relaxed retention time STT-RAM level one (L1) cache, we analyze the impacts of dynamic voltage and frequency scaling (DVFS)---a common optimization in modern processors---on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 20.19% and overall processor energy by 7.66%, compared to a homogeneous STT-RAM cache design.

References

[1]
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198.
[2]
Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP benchmark suite. arXiv preprint arXiv:1508.03619 (2015).
[3]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.
[4]
B. Brock and K. Rajamani. 2003. Dynamic power management for embedded systems [SOC design]. In IEEE International [Systems-on-Chip] SOC Conference, 2003. Proceedings. 416--419.
[5]
D. Brooks and M. Martonosi. 2001. Dynamic thermal management for high-performance microprocessors. In Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture. 171--182.
[6]
B Chandra and P Paul Varghese. 2009. Fuzzifying Gini Index based decision trees. Expert Systems with Applications 36, 4 (2009), 8549--8559.
[7]
Wei-Kai Cheng, Yen-Heng Ciou, and Po-Yuan Shen. 2016. Architecture and data migration methodology for L1 cache design with hybrid SRAM and volatile STT-RAM configuration. Microprocessors and Microsystems 42 (2016), 191--199.
[8]
Kihwan Choi, R. Soma, and M. Pedram. 2004. Dynamic voltage and frequency scaling based on workload decomposition. In Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758). 174--179.
[9]
K. C. Chun, H. Zhao, J. D. Harms, T. H. Kim, J. P. Wang, and C. H. Kim. 2013. A Scaling Roadmap and Performance Evaluation of In-Plane and Perpendicular MTJ Based STT-MRAMs for High-Density Cache Memory. IEEE Journal of Solid-State Circuits 48, 2 (Feb 2013), 598--610.
[10]
T. R. da Rosa, V. LarrÃl'a, N. Calazans, and F. G. Moraes. 2012. Power consumption reduction in MPSoCs through DFS. In 2012 25th Symposium on Integrated Circuits and Systems Design (SBCCI). 1--6.
[11]
Manoranjan Dash and Huan Liu. 1997. Feature selection for classification. Intelligent data analysis 1, 1--4 (1997), 131--156.
[12]
James Donald and Margaret Martonosi. 2006. Techniques for multicore thermal management: Classification and new exploration. In ACM SIGARCH Computer Architecture News, Vol. 34. IEEE Computer Society, 78--88.
[13]
Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, Helen Li, and Yiran Chen. 2008. Circuit and Microarchitecture Evaluation of 3D Stacking Magnetic RAM (MRAM) As a Universal Memory Replacement. In Proceedings of the 45th Annual Design Automation Conference (DAC '08). ACM, New York, NY, USA, 554--559.
[14]
Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P. Jouppi. 2012. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. Trans. Comp.-Aided Des. Integ. Cir. Sys. 31, 7 (July 2012), 994--1007.
[15]
Brvan Donyanavard, Amir Mahdi Hosseini Monazzah, Nikil Dutt, and Tiago Mück. 2018. Exploring Hybrid Memory Caches in Chip Multiprocessors. In 2018 13th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC). IEEE, 1--8.
[16]
Yunus Emre, Chengen Yang, Ketul Sutaria, Yu Cao, and Chaitali Chakrabarti. 2012. Enhancing the reliability of STT-RAM through circuit and system level techniques. In 2012 IEEE Workshop on Signal Processing Systems. IEEE, 125--130.
[17]
AS Galathiya, AP Ganatra, and CK Bhensdadia. 2012. Improved decision tree induction algorithm with feature selection, cross validation, model complexity and reduced error pruning. International Journal of Computer Science and Information Technologies 3, 2 (2012), 3427--3431.
[18]
Rem Gensh, Ali Aalsaud, Ashur Rafiev, Fei Xia, Alexei Iliasov, Alexander Romanovsky, and Alex Yakovlev. 2015. Experiments with odroid-xu3 board. Newcastle University, Computing Science.
[19]
P Greenhalgh. 2011. big. little processing with arm cortex-a15 and cortex-a7. 2011. Citado na (2011), 46.
[20]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538). 3--14.
[21]
Sangwook Shane Hahn, Sungjin Lee, Inhyuk Yee, Donguk Ryu, and Jihong Kim. 2017. Improving user experience of android smartphones using foreground app-aware I/O management. In Proceedings of the 8th Asia-Pacific Workshop on Systems. ACM, 5.
[22]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.
[23]
Christopher J Hughes, Praful Kaul, Sarita V Adve, Rohit Jain, Chanik Park, and Jayanth Srinivasan. 2001. Variability in the execution of multimedia applications and implications for architecture. ACM SIGARCH Computer Architecture News 29, 2 (2001), 254--265.
[24]
Adwait Jog, A. K. Mishra, C. Xu, Y. Xie, V. Narayanan, R. Iyer, and C. R. Das. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In DAC Design Automation Conference 2012. 243--252.
[25]
Hwisung Jung and Massoud Pedram. 2010. Supervised learning based power management for multicore processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 9 (2010), 1395--1408.
[26]
Sukhun Kang and Rakesh Kumar. 2008. Magellan: a search and machine learning-based framework for fast multi-core design space exploration and optimization. In 2008 Design, Automation and Test in Europe. IEEE, 1432--1437.
[27]
Yusung Kim, Sumeet Kumar Gupta, Sang Phill Park, Georgios Panagopoulos, and Kaushik Roy. 2012. Write-optimized reliable design of STT MRAM. In Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design. ACM, 3--8.
[28]
Kyle Kuan and Tosiron Adegbija. 2019. Energy-Efficient Runtime Adaptable L1 STT-RAM Cache Design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2019).
[29]
Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML'17). JMLR.org, 1935--1944. http://dl.acm.org/citation.cfm?id=3305381.3305581
[30]
Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36). IEEE Computer Society, Washington, DC, USA, 81--. http://dl.acm.org/citation.cfm?id=956417.956569
[31]
Etienne Le Sueur and Gernot Heiser. 2010. Dynamic voltage and frequency scaling: The laws of diminishing returns. In Proceedings of the 2010 international conference on Power aware computing and systems. 1--8.
[32]
J. Li, C. J. Xue, and Yinlong Xu. 2011. STT-RAM based energy-efficiency hybrid cache for CMPs. In 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip. 31--36.
[33]
Qingan Li, Jianhua Li, Liang Shi, Chun Jason Xue, Yiran Chen, and Yanxiang He. 2013. Compiler-assisted refresh minimization for volatile STT-RAM cache. In 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC). 273--278.
[34]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 469--480.
[35]
Yongpan Liu, Huazhong Yang, Robert P Dick, Hui Wang, and Li Shang. 2007. Thermal vs energy optimization for dvfs-enabled processors in embedded systems. In 8th International Symposium on Quality Electronic Design (ISQED'07). IEEE, 204--209.
[36]
Jose F Martinez and Engin Ipek. 2009. Dynamic multicore resource management: A machine learning approach. IEEE micro 29, 5 (2009), 8--17.
[37]
James Montanaro, Richard T. Witek, Krishna Anne, Andrew J. Black, Elizabeth M. Cooper, Daniel W. Dobberpuhl, Paul M. Donahue, Jim Eno, Gregory W. Hoeppner, David Kruckemyer, Thomas H. Lee, Peter C.M. Lin, Liam Madden, Daniel Murray, and Mark H. Pearce. 1997. 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. Digital Technical Journal 9, 1 (1997), 49--62.
[38]
K. J. Nowka, G. D. Carpenter, E. W. MacDonald, H. C. Ngo, B. C. Brock, K. I. Ishii, T. Y. Nguyen, and J. L. Burns. 2002. A 32-bit PowerPC system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling. IEEE Journal of Solid-State Circuits 37, 11 (Nov 2002), 1441--1447.
[39]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[40]
P. Peneau, R. Bouziane, A. GamatiÃl', E. Rohou, F. Bruguier, G. Sassatelli, L. Torres, and S. Senni. 2016. Loop optimization in presence of STT-MRAM caches: A study of performance-energy tradeoffs. In 2016 26th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS). 162--169.
[41]
Archana Ravindar and Y. N. Srikant. 2011. Relative Roles of Instruction Count and Cycles Per Instruction in WCET Estimation. In Proceedings of the 2Nd ACM/SPEC International Conference on Performance Engineering (ICPE '11). ACM, New York, NY, USA, 55--60.
[42]
S Rasoul Safavian and David Landgrebe. 1991. A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21, 3 (1991), 660--674.
[43]
K. Saito, R. Kobayashi, and H. Shimada. 2016. Reduction of cache energy by switching between L1 high speed and low speed cache under application of DVFS. In 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA). 1--6.
[44]
S. Sarma, T. Muck, L. A. D. Bathen, N. Dutt, and A. Nicolau. 2015. SmartBalance: A sensing-driven linux load balancer for energy efficiency of heterogeneous MPSoCs. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6.
[45]
Avesta Sasan (Mohammad A Makhzan), Houman Homayoun, Ahmed Eltawil, and Fadi Kurdahi. 2009. Process Variation Aware SRAM/Cache for Aggressive Voltage-frequency Scaling. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE '09). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, 911--916. http://dl.acm.org/citation.cfm?id=1874620.1874845
[46]
S. Sharifi, A. K. Coskun, and T. S. Rosing. 2010. Hybrid dynamic energy and thermal management in heterogeneous embedded multiprocessor SoCs. In 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC). 873--878.
[47]
Daniel Shelepov and Alexandra Fedorova. 2008. Scheduling on heterogeneous multicore processors using architectural signatures. (2008).
[48]
Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: a scheduler for heterogeneous multicore systems. ACM SIGOPS Operating Systems Review 43, 2 (2009), 66--75.
[49]
K. Skadron, M. R. Stan, W. Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, and D. Tarjan. 2003. Temperature-aware microarchitecture. In 30th Annual International Symposium on Computer Architecture, 2003. Proceedings. 2--13.
[50]
C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 50--61.
[51]
Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price Theory Based Power Management for Heterogeneous Multi-cores. SIGARCH Comput. Archit. News 42, 1 (Feb. 2014), 161--176.
[52]
Z. Sun, X. Bi, H. Li, W. Wong, and X. Zhu. 2014. STT-RAM Cache Hierarchy With Multiretention MTJ Designs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 6 (June 2014), 1281--1293.
[53]
Z. Sun, X. Bi, H. Li, W. F. Wong, Z. L. Ong, X. Zhu, and W. Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 329--338.
[54]
W. Wang and P. Mishra. 2010. Leakage-Aware Energy Minimization Using Dynamic Voltage Scaling and Cache Reconfiguration in Real-Time Systems. In 2010 23rd International Conference on VLSI Design. 357--362.
[55]
Desheng Wu. 2009. Supplier selection: A hybrid model using DEA, decision tree and neural network. Expert Systems with Applications 36, 5 (2009), 9105--9112. http://www.sciencedirect.com/science/article/pii/S095741740800910X

Cited By

View all
  • (2024)Domain-Specific STT-MRAM-Based In-Memory Computing: A SurveyIEEE Access10.1109/ACCESS.2024.336563212(28036-28056)Online publication date: 2024
  • (2022)Write-Awareness Prefetching for Non-Volatile Cache in Energy-Constrained IoT DeviceIEICE Electronics Express10.1587/elex.19.20210499Online publication date: 2022
  • (2022)Evaluating the performance and energy of STT-RAM caches for real-world wearable workloadsFuture Generation Computer Systems10.1016/j.future.2022.05.023136(231-240)Online publication date: Nov-2022
  • Show More Cited By

Index Terms

  1. ARC: DVFS-aware asymmetric-retention STT-RAM caches for energy-efficient multicore processors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MEMSYS '19: Proceedings of the International Symposium on Memory Systems
    September 2019
    517 pages
    ISBN:9781450372060
    DOI:10.1145/3357526
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DVFS
    2. cache
    3. energy efficient systems
    4. nonvolatile memory
    5. retention time
    6. spin-transfer torque RAM (STTRAM)
    7. write energy
    8. write latency

    Qualifiers

    • Research-article

    Conference

    MEMSYS '19
    MEMSYS '19: The International Symposium on Memory Systems
    September 30 - October 3, 2019
    District of Columbia, Washington, USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Domain-Specific STT-MRAM-Based In-Memory Computing: A SurveyIEEE Access10.1109/ACCESS.2024.336563212(28036-28056)Online publication date: 2024
    • (2022)Write-Awareness Prefetching for Non-Volatile Cache in Energy-Constrained IoT DeviceIEICE Electronics Express10.1587/elex.19.20210499Online publication date: 2022
    • (2022)Evaluating the performance and energy of STT-RAM caches for real-world wearable workloadsFuture Generation Computer Systems10.1016/j.future.2022.05.023136(231-240)Online publication date: Nov-2022
    • (2020)A Study of Runtime Adaptive Prefetching for STTRAM L1 Caches2020 IEEE 38th International Conference on Computer Design (ICCD)10.1109/ICCD50377.2020.00051(247-254)Online publication date: Oct-2020
    • (2020)Smartphone processor architecture, operations, and functions: current state-of-the-art and future outlook: energy performance trade-offThe Journal of Supercomputing10.1007/s11227-020-03312-zOnline publication date: 16-May-2020
    • (2019)SCART: Predicting STT-RAM Cache Retention Times Using Machine Learning2019 Tenth International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC48788.2019.8957182(1-7)Online publication date: Oct-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media