2024 Volume 17 Pages 16-35
A magnetic tunnel junction (MTJ) based non-volatile flip-flop (NVFF) is attractive for non-volatile power gating to reduce power consumption and for non-volatile checkpointing to improve fault tolerance. An MTJ-based NVFF can perform a store operation to write the slave latch value to the MTJs, non-volatile devices, and a restore operation to write the MTJs value to the slave latch. However, a store operation is a stochastic operation. The store operations' success rate depends on their duration, NVFF characteristics, voltage, and temperature. Their success rate changes statically because each NVFF has different characteristics due to process variation in actual chips. Their success rate changes dynamically because voltage and temperature change dynamically depending on operating environments. Our goal is to reduce the checkpoint creation's energy consumption while ensuring its success. We propose a learning-based hardware scheme that dynamically finds the appropriate parameters to achieve our goal. The proposed scheme consists of a machine-learning unit and an exploration unit. The machine-learning unit learns and predicts the store operations' success rate by inputting their duration, voltage, and temperature. The exploration unit explores the trained machine-learning unit to find the appropriate parameters. The evaluation shows that the proposed scheme could achieve our goal.