3.1 Design Implementation
In our design, we assume a restricted class of MAC operation in which input voltages and weights (the conductance of the ReRAM devices) can acquire only two distinct values. To convert the analog output signal of the crossbar to a digital signal, we decided to use a VCO-based ADC, which is time based. Therefore, it can provide the advantages of the time-based signals with a relatively easy design procedure. In the ADC phase period and in order to transfer an analog current into the digital signal with the help of VCO-based ADC, three stages are required. In the first stage, the analog bit-line current needs to be transformed into an analog voltage. In the next stage, the obtained analog voltage is transformed into pulses with the help of the VCO. In the last stage, the generated pulses are counted with a counter and mapped to the corresponding digital signal with the help of a Lookup Table (LUT). We modulate the power supply directly by regulating the read voltage (
Vread) applied to the crossbar (as row voltages). Therefore, deactivating the row
Vread disables the crossbar as well as the ADC. The output of this stage is the equivalent digital signal and can be processed by the digital host. The schematic of the whole system, including the ReRAM crossbar and the VCO-based ADC, is shown in Figure
3.
3.1.1 Linking Crossbar and ADC Using Transfer Functions.
A transfer function is a mathematical function that describes the output of a system for each possible input. In our case, we will consider two systems: the 1T1R crossbar and the VCO-based ADC. For the crossbar, the input will be the specific resistance configuration of the cells that we read out, which is given as
Req of the crossbar. We will consider the case in which all of the rows in a column are selected.
Req is formed by the parallel connection of multiple series connections of ReRAM devices and access transistors (see Figure
1). The equivalent resistance of this parallel connection can then be calculated using
where
\(R_\text{LRS}\) and
\(R_\text{HRS}\) denote the LRS and HRS resistance, respectively.
Rtransistor,HRS/LRS denotes the drain-source resistance of the transistor connected to a ReRAM cell in the LRS or HRS during readout, and
n denotes the number of cells that are read in parallel with one ADC. The resulting transfer function of the crossbar can be seen in Figure
4(a). It follows from Equation (
2) when the number of cells in the LRS is changed from one to eight.
In this case, we assume
Rtransistor, HRS to be 26 k
\(\Omega\),
Rtransistor, LRS to be 5.8 k
\(\Omega\), the LRS as 3 k
\(\Omega\), and the HRS was varied from 15 k
\(\Omega\) to 300 k
\(\Omega\) to achieve various HRS/LRS ratios. The transistors were operating in the saturation region when connected to an LRS or HRS device. Figure
5(b) shows the load line characteristic of a 1T1R bit-cell with a LRS or HRS ReRAM cell. From this, it is obvious that the operating point of the transistor and its resistance will be different depending on the resistive state of the ReRAM cell.
For better clarity, a maximum of only 8 cells are considered to be read out at the same time. From this plot, it can be seen that a larger (smaller) HRS/LRS ratio leads to a less (more) linear relationship. This is because, for high HRS/LRS ratios, the equivalent resistance is almost exclusively determined by the number of cells in the LRS. The difference in the Req values decreases strongly for higher numbers of devices in the LRS, which increases the requirements on the ADC performance. It becomes more difficult to distinguish between different levels if more cells in the LRS state are connected. In addition, a smaller HRS/LRS ratio will lead to other issues because of an increased influence of Read Noise and Read Disturb. When the ratio between HRS state and LRS is decreased, the devices will be more susceptible to random variations and stress due to prolonged reading. If a sufficient number of cells are in the LRS, the equivalent resistance is very similar, independent of the HRS/LRS ratio. This means that the transfer function of the crossbar cannot be improved by optimizing the devices and has to be addressed by the ADC.
For an ADC, the transfer function displays its digital output value as a function of an analog input signal, usually the input voltage [
63]. In our case, it is more useful to display the ADC transfer characteristic as a function of the resistances of the crossbar. Usually, in the design of ADCs, linearity of the input–output relationship is preferred. This means that the input levels corresponding to one ADC output have equal widths. Such a transfer function can be seen in Figure
4(b), illustrated as a black line. To determine whether a linear ADC characteristic is a reasonable choice for CiM using resistive devices, we combined the transfer functions of the crossbar with the transfer function of the ADC. This is possible, as the output of the crossbar transfer function is the same as the input of the ADC transfer function. The red crosses in Figure
4(b) are obtained using the
Req values of the crossbar transfer function as presented in Figure
4(a) and an HRS/LRS ratio of 10. It should be noted here that the effective HRS/LRS ratio, when considering the serially connected transistors, is reduced from 10 to 6.36. The intersections between the ADC transfer characteristic (black line) and red crosses show to which ADC output the crossbar input is mapped. From Figure
4(b), it can be seen that the data in the crossbar does not map very well to the linear ADC characteristic since most of the possible outputs of the crossbar are mapped to the same ADC output (here, ‘101’). This is due to the fact that the
Req for these crossbar outputs have very small differences between each other. This shows that the usually linear ADC transfer function is not an optimal solution for CiMs based on resistive devices. In summary, we can say that an ADC with a nonlinear transfer function should be better suited for CiM applications. The VCO-based ADC that is considered here has a strong nonlinear transfer characteristic, which makes it a suitable ADC candidate for CiM based on resistive memories. Yan et al. discussed a related issue. In [
64], they compared different spacings of the resistive states (equal
\(\Delta\)R vs. equal
\(\Delta\)G) for a neural network using analog ReRAM devices. Their conclusion was that while both mappings delivered a comparable accuracy performance, the equal
\(\Delta\)R mapping was beneficial as it resulted in less severe constraints for the ADC. This result suggests that a matching between crossbar and ADC can be achieved on multiple levels, either in the ADC or in the spacing of the resistance values. Since we use the ReRAM devices in a binary way, different spacings are equivalent to choosing different
Roff/
Ron ratios, which, as we showed in Figure
5(a), improves the linearity of the crossbars transfer function. However, it leads to other problems such as an increased susceptibility to device variability.
3.1.2 Current-to-Voltage Converter.
In the VCO-based ADC, the bit-line current that contains the result of the MAC operation needs to be first transformed into a voltage. This analog voltage can be used as the input of the VCO element. For this purpose, four different methods can be selected and we have to explore all four possible connection options: capacitance, gate-drain connected NMOS (diode-connected structure), constant resistance, and none.
•
Capacitance: In this case, the voltage across this element is exponentially reaching the \(V_\text{read}\) value with a time constant that is proportional to the ReRAM states. The frequency of the VCO’s output is time dependent. After a specific time, it will be almost equal for all of the different ReRAM states. Thus, having a capacitor is not a wise converting method for this purpose.
•
G-D connected NMOS (diode-connected structure): In this case, the gate and drain voltage is \(\sqrt {\frac{2\cdot I_\text{bit-line}}{\mu _\text{n}\cdot C_\text{OX}\cdot \frac{W}{L}}}+V_\text{th}\) (\(\mu _\text{n}\) is the electron mobility, \(C_\text{OX}\) is the capacitance per area of the gate oxide, \(W\) and \(L\) are width and length of the NMOS, respectively, and \(V_\text{th}\) is the threshold voltage).
•
Constant resistance: In this case, the voltage across the resistance is \(R_s\cdot I_\text{bit-line}\).
•
None: Relying on the input impedance of the VCO for the converting current into the voltage.
To select among these four methods, we use the following
resolution criterion: a greater number of LRS for which the voltage at node Y in Figure
3 and the number of generated pulses become flattened indicates a higher resolution. As Figure
6 shows, using a diode-connected structure results in a better resolution compared with the other methods.
It is worth mentioning here that the parasitic bit-line capacitor is not necessarily required for the correct functionality of the circuit, but it exists due to the crossbar wires and junctions. This parasitic capacitor impacts different features of the circuit, as discussed in Section
3.2.
3.1.3 Voltage-Controlled Oscillator.
The
VCO is an abstract 2-terminal module that gets a DC
voltage as the input and produces a periodic signal with a frequency
\(F\) as the output. The frequency of the output signal is a function of the DC input voltage. To realize the VCO, we use a
ring-oscillator. A ring-oscillator consists of
\(n\) inverter gates in a loop (n: odd and greater than 1). The output of the last inverter is connected to the input of the first inverter. In Figure
3, by connecting input voltage to the bit-lines of the crossbar, all of the nodes of the circuit, including the node Y (which are marked with X), start to oscillate with the same frequency but different phases. By changing the bias voltage of the ring-oscillator node Y, the frequency of the oscillation is changing since the ring-oscillator can be considered to be a VCO. Simulations show that the change in the frequency (
\(f\)) of the oscillation has a linear relation with the change in the bias voltage (
\(V_\text{bias}\)):
\(\Delta f = K\cdot \Delta V_\text{bias}\). The constant
\(K\) can be adjusted based on the number and size of the inverter’s transistors. For different resistive states of the crossbar, the bias voltage will be different, but their voltage difference could be very small. These close voltage levels need to be transferred into different frequencies; thus, choosing a larger
\(K\) value helps in improving the resolution. Moreover, the value of
\(K\) has an impact on other features of the circuit, for instance, its ability to tolerate the variability in the resistive memories, which will be discussed in Section
3.2.
3.1.4 Counter and Lookup Table.
The counter is also a crucial module in the proposed VCO-based ADC design. The bit-line current is eventually transformed into the number of pulses. To digitize these generated pulses, it can be counted in the certain period. As discussed earlier, high speed of the ring-oscillator (high \(K\)) is necessary to have an acceptable resolution for the ADC. The oscillation frequency, however, is limited by the speed of the counter, that is, how fast a counter can count. In sequential circuits as well as counters, two timing parameters should be considered to determine the speed of these circuits: data-path length and setup time. Data-path length is the delay of the circuit’s critical path and setup time is the amount of time that the data needs to be stable before the active edge of the clock. To avoid timing errors in the counter module, the period of the clock (the signal produced by the ring-oscillator) must be greater than or equal to the sum of the data-path delay and setup time. In the fixed resolution, the speed of the ring-oscillator must be adjusted by considering the counting frequency of the counter. Therefore, in the fixed resolution, latency is determined only by the counter module. We have selected an asynchronous ripple counter as the counter module in our design because of its area efficiency. An N-bit ripple-counter can count \(2^N\) states, which is the maximum number of countable states among counter structures. Moreover, the ripple-counter does not need any logic between its flip-flops for the correct functionality. Finally, the output of the counter must be mapped to a proper digital representation. This correspondence is done through the LUT.
3.1.5 Self-Timing Path (STP).
The premise of the Self-Timing Path (STP) is to calculate the time a column with known resistance states takes to produce a known output and then use that time interval as the total time (
\(T_\text{total}\)) of the required operation. To implement this technique, a dummy column is used (we take all resistive values as LRS in this dummy column) in the same crossbar array. As soon as the output number of pulses for this dummy column reaches the applied number of non-zero row voltages, it triggers the other columns to stop counting. This dummy column thus provides a variation-aware
\(T_\text{total}\). The acquired
\(T_\text{total}\) is adaptive to global variations of CMOS and ReRAM devices, temperature, and fluctuations in the peripheral/core voltage supplies [
65].
3.2 Theoretical Analysis
In this subsection, we theoretically investigate the impact of each element on the whole circuit.
•
Access Transistor: As mentioned in Section
2, 1T1R bit-cells are beneficial in both removing the sneak paths and improving the writing process. We have chosen an NMOS access transistor located between the source-line and the ReRAM device. Different type (PMOS) or different location (between the ReRAM device and bit-line) has an impact on the features of the ADC since the resistance of the access transistor varies in these conditions. Further discussion on this matter is out of the scope of this article.
•
VCO-Impedance: As discussed earlier, we have realized the VCO part with a ring-oscillator. During the oscillation, the gate voltage of the inverters is also changing. This change in the gate voltage results in varying the resistance of the inverters and the whole ring-oscillator. The oscillating nature of the ring-oscillator cannot be emulated by the mere resistance. A capacitive element, for instance, is also required for the correct oscillation modeling. Thus, we safely substitute a
varying impedance for the ring-oscillator. Because of the oscillating nature in the output of the crossbar array (node Y in Figure
3), the source voltage of the access transistors is also changing, which ends up having time-variant equivalent resistance coming from the crossbar. To clarify the condition, the resistive state of the ReRAM devices is
data dependent and determined independently (e.g., during the training phase of neural networks). Various data generate different time-variant equivalent resistance and different time-variant ring-oscillator impedance. Figure
7 shows a simplified schematic of the VCO-based ADC by substituting a time-variant
\(R_\text{eq}\) for the crossbar and an
\(R_\text{eq}\)-dependent and time-variant impedance (
\(Z_\text{os}\)) for the ring-oscillator.
By considering Figure
7, we could theoretically evaluate the features of the VCO-based ADC, such as resolution, variability tolerance, power, energy, and voltage across the ReRAM devices.
Resolution: As Figure
4(a) also shows, by increasing the number of ReRAM devices in parallel, the difference between different resistive states is narrowing and they will be more difficult to distinguish. In the case of having
\(N\) parallel ReRAM, the two resistive states of
\(\frac{R_\text{LRS}}{N}\) and
\(\frac{R_\text{LRS}}{N-1}||R_\text{HRS}\) has the minimum resistance difference. The resolution of the ADC is specified by these two closest resistive states. The ADC can distinguish between
\(N\) levels as far as it is able to produce different outputs for these two different states. The ring-oscillator can generate a distinct number of pulses for these states if its bias voltage (node Y in Figure
3) has different corresponding values. The voltage at node Y is determined by voltage division between
\(R_\text{eq}\) and
\(Z_\text{os}\) and the voltage difference between these two resistive states is obtained via
We must highlight that the bias voltage of the ring-oscillator (node Y) is also oscillating but to benefit from the abstract model of the VCO. We are considering the effective value (DC) of the voltage at node Y. In DC analysis, capacitors are behaving as the open circuit; thus,
\(Z_{\text{os}}\) has only a resistive nature. As Equation (
3) shows, by increasing the number of ReRAMs (
\(N\)) the first terms in the denominators become negligible and the voltage difference between these two states approaches zero. As with the smaller
\(Z_{\text{os}_{1,2}}\) terms,
\(N\) has a higher limit to become before the first terms, including
\(N\) in denominator of Equation (
3), to become negligible in comparison with
\(Z_{\text{os}_{1,2}}\). As discussed before, the bit-line current needs to convert into the voltage at the input of the VCO via different
impedance elements. In Equation (
3), the impedance of such a converter is in parallel to
\(Z_{\text{os}_{1,2}}\) and makes these terms smaller. This results in a higher number of distinguishable levels. Thus, having a current-to-voltage converter is generally beneficial for the resolution. However, even without any converter and by relying on the impedance of the ring-oscillator, the bit-line current can transform into the voltage. Now, let us investigate the impact of both constant resistor and diode-connected element on the resolution. The diode-connected NMOS element can improve the resolution more than the constant resistor. It can be considered as a
voltage-controlled resistance and its value is equal to
\(\frac{2}{\mu _\text{n} \cdot C_\text{OX} \cdot \frac{W}{L}} \cdot \frac{V_\text{GS}}{(V_\text{GS}-V_\text{th})^2}\) (
\(V_\text{GS}\) is the gate-source voltage of the NMOS). By increasing
\(N\),
\(V_\text{GS}\) gets larger and the resistivity of the diode-connected NMOS decreases. Hence, the terms
\(Z_{\text{os}_{1,2}}\) get smaller, resulting in a higher resolution. It is worth mentioning here that the nominal values of the LRS and HRS also have an impact on the resolution. For larger values of the LRS and HRS,
\(N\) could be larger before the two closest resistive states become indistinguishable.
Variability tolerance: Due to the stochastic nature of the switching in the ReRAM devices, the HRS and LRS resistance states are not fixed values; rather, they have a range. As a result, it is desirable for the ADC to tolerate the variability in the ReRAM resistive states. The proposed VCO-based ADC can tolerate this variability. To theoretically investigate this variability tolerance, first, we must emphasize that in contrast to the DC analysis, the capacitors also influence the resolution. The impedance of the ring-oscillator has a resistive and capacitive nature. In addition, the parasitic bit-line has an impedance equal to \(\frac{1}{2\pi f C}\), in which \(f\) is the voltage frequency and \(C\) is the capacitance of the bit-line capacitor.
To theoretically evaluate the variability (in the resistive memories) tolerance of the VCO-based ADC, the change in the frequency with respect to the change in the equivalent resistance must be checked, that is,
\(\mid \!\!\frac{ \partial f}{\partial R_\text{eq}}\!\!\mid\). The smaller the term
\(\mid \!\!\frac{\partial f}{\partial R_\text{eq}}\!\!\mid\) is, the better is the variability tolerance. Please note that the
absolute value of
\(\frac{ \partial f}{\partial R_\text{eq}}\) matters for the variability tolerance. According to the chain rule:
In Equation (
4), the term
\(\frac{\partial f}{ \partial V_\text{bias}}\) is the speed of the ring-oscillator and equals
\(K\). The term
\(\frac{dV_{bias}}{dReq}\) is calculated according to
Combining Equations (
4) and (
5), we get that
\(Z_\text{os}\) and
\(R_\text{eq}\) are in the range of k
\(\Omega\)1,
\(f\) is in the range of GHz, and C is in the range of
\(fF\). Thus, the terms
\((R_\text{eq}\cdot Z_\text{os}\cdot 2\pi fC)\) and
\((2\pi \cdot Z_\text{os}^2\cdot R_\text{eq}\cdot C)\) are negligible. Since the term
\((\frac{dZ_\text{os}}{dR_\text{eq}}\cdot R_\text{eq}-Z_\text{os})\) is negative and due to the impact of
absolute value, the final formula for the variability tolerance is
Equation (
7) shows that by increasing the parasitic bit-line capacitor, variability tolerance decreases. The nominal values of the LRS and HRS also have an impact on variability tolerance. For instance, for higher resistances, the speed of the ring-oscillator (the constant
\(K\)) must be higher since similar data patterns must result in similar digital outputs regardless of the resistive values of LRS and HRS. By increasing the nominal values of LRS and HRS,
\(R_\text{eq}\) increases. This forces the bias voltage of the ring-oscillator to decrease (node Y in Figure
3). The number of generated signals, however, must remain the same to produce the same frequency with a lower bias voltage. A higher ring-oscillator speed is destructive for variability tolerance, but the larger LRS and HRS values contribute quadratically, which overcompensates for the former effect. To conclude, increasing the LRS and HRS value improves variability tolerance.
It is interesting to mention that adding an impedance element (such as the diode-connected structure in Figure
3) decreases variability tolerance by decreasing the
\(Z_\text{os}\) term in the denominator of Equation (
7). Confusion might result on this matter, as
\(Z_\text{os}\) is present in both nominator and denominator with equal exponent. These two terms are not canceling each other out since the first term of the nominator
\((Z_\text{os}-R_\text{eq}\cdot \frac{\scriptstyle dZ_\text{os}}{\scriptstyle dR_\text{eq}})\) is much larger than the second term
\((2\pi f\cdot C\cdot Z_\text{os}^2).\) Thus, the first term is dominant and the second term can be neglected here. However, if the only changing variable in the circuit is
\(C\), then the only changing term is
\((2\pi f\cdot C\cdot Z_\text{os}^2)\). In that case, the effect of
\(C\) on variability tolerance must be considered.
Power consumption: In resistive elements, power is equal to \(\frac{V^2}{R}\). Here, the term \(R\) indicates the total resistive load of the circuit, which is the series connection of the crossbar and the whole VCO-based ADC. Since the data manifest themselves as resistance states, the power consumption is data dependent. The maximum and minimum power consumption happens for all LRS and all HRS, respectively. The power is also affected by the nominal values of the LRS and HRS, as it is inversely proportional to \(R_\text{eq}\). In the fixed resolution, the latency is determined only by the counter and is not affected by the data or LRS and HRS nominal values. The Energy is the power-latency product and is affected by the data and the nominal values of LRS and HRS in a similar way as the power.
Voltage across the memory cells: As discussed in Section
2, read-disturb phenomena are more likely to happen at high voltages across the ReRAM devices. This voltage can also be affected by the data and the nominal values of the LRS and HRS. The largest voltage across the device occur for
all HRS since
\(R_\text{eq}\) is maximal in this case. Larger (smaller) nominal values for LRS and HRS also results in a higher (lower) voltage across the ReRAM devices. A higher voltage across the ReRAM devices increases the probability of read disturb phenomena.