Conventional von Neumann architectures cannot successfully meet the demands of emerging computation and data-intensive applications. These shortcomings can be improved by embracing new architectural paradigms using emerging technologies. In particular, Computation-In-Memory (CiM) using emerging technologies such as Resistive Random Access Memory (ReRAM) is a promising approach to meet the computational demands of data-intensive applications such as neural networks and database queries. In CiM, computation is done in an analog manner; digitization of the results is costly in several aspects, such as area, energy, and performance, which hinders the potential of CiM. In this article, we propose an efficient Voltage-Controlled-Oscillator (VCO)–based analog-to-digital converter (ADC) design to improve the performance and energy efficiency of the CiM architecture. Due to its efficiency, the proposed ADC can be assigned in a per-column manner instead of sharing one ADC among multiple columns. This will boost the parallel execution and overall efficiency of the CiM crossbar array. The proposed ADC is evaluated using a Multiplication and Accumulation (MAC) operation implemented in ReRAM-based CiM crossbar arrays. Simulations results show that our proposed ADC can distinguish up to 32 levels within 10 ns while consuming less than 5.2 pJ of energy. In addition, our proposed ADC can tolerate ≈30% variability with a negligible impact on the performance of the ADC.

1 Introduction

Existing complementary metal-oxide-semiconductor (CMOS)–based von Neumann architectures, in which memory and computing unit are separated, are facing various device-, circuit-, and architecture-level challenges. These conventional architectures are severely impacted by the slow speed of memory accesses, their limited parallelism, and the stagnation of the clock frequency due to thermal issues, which are well known as memory wall, instruction-level parallelism wall and power wall terms, respectively [1, 2, 3]. On the other hand, devices are also facing challenges related to reliability, high leakage, and excessive manufacturing cost [4, 5]. These challenges are more pronounced for data-intensive applications such as neuromorphic computing in which energy efficiency and data movement minimization are of paramount importance [6]. For such application domains, the need to identify viable alternatives has gained growing attention. In this regard, with non-volatility, zero-leakage, and high-density properties, the resistive memory-based Computation-in-Memory (CiM) crossbar structure is a potential candidate to replace traditional architectures and accelerate data-intensive applications [7, 8].

CiM architectures can significantly improve both energy efficiency and performance of computing systems by performing the operation within the memory, which avoids expensive data movement [9, 10, 11]. To enable CiM, the memory module must support operations in addition to its functionality as data storage [12, 13]. CiM can be realized using different memory technologies, such as Dynamic Random Access Memory (DRAM) [14] and Static Random Access Memory (SRAM) [15], as well as emerging resistive-memory technologies such as Resistive Random Access Memory (ReRAM), Phase Change Memory (PCM) and Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM), in which data are represented by resistance states [16]. Since these memory technologies have their own specific properties, they can be used for specific CiM operations [17]. The physical attributes of these resistive memories make them inherently suitable for Multiply-and-Accumulate (MAC) operations in CiM architectures. For example, matrix multiplication (vector-matrix multiplication) using MACs is a frequent operation in several Artificial Intelligence (AI) and signal processing applications. This operation can be accelerated by CiM architecture organized in a crossbar structure, which is performed on the physical level via Ohm’s Law for the multiplication and Kirchhoff’s Law for the accumulation [18].

Since MAC acceleration in a crossbar structure is fundamentally an analog operation, interfaces for connecting digital and analog parts are required and represent a critical part of the design. Digital-to-Analog Converters (DACs) and Analog-to-Digital Converters (ADCs) are crucial components to handle the CiM inputs and outputs, respectively. Due to the complex process involved in analog-to-digital conversion and vice versa, DACs and ADCs usually occupy significantly large areas and consume significant power when compared with the total area of the crossbar array. Moreover, the ADC phase is considered to be the bottleneck in terms of performance and energy efficiency of CiM architectures [19, 20]. Hence, efficient ADC design is extremely important in order to improve the overall performance and energy consumption of CiM architectures.

Several efforts have been made to improve the efficiency of the ADC phase, either through different circuit methodology or different system architectures. For example, the authors of [19, 20, 21, 22] proposed a shared ADC scheme in which the ADC is shared between multiple columns in order to improve area and energy efficiency. In their design, the number of columns that share the same ADC depends on both architecture and the ADC features. The authors of [23, 24], on the other hand, suggested a dedicated ADC design, in which one ADC interface is assigned per column to avoid time-multiplexing schemes to access a shared ADC. In order to amortize the design overheads, the authors of [25] designed for merging of ADC phases and activation computation phases. Due to the trade-off between different features of the ADC — such as resolution, power, latency, and area — the previous ADC designs are usually optimized for a specific feature. Moreover, the ADC designs in [19, 21, 22, 23, 24] do not consider variability and reliability aspects, which have a severe impact on the resolution as well as energy efficiency of the ADC. Therefore, an efficient ADC design addressing the impact of variability and reliability issues is of decisive importance in order to improve the performance, area, and energy efficiency of CiM architectures.

In this article, we propose an efficient Voltage-Controlled-Oscillator (VCO)–based ADC design for ReRAM-based CiM crossbar arrays to alleviate the ADC phase bottleneck of analog computation. In our proposed ADC design, the bit-line current as an analog signal coming from the crossbar is first transformed into a voltage. Then, this voltage is transformed into a frequency with the help of the VCO, which is realized using a ring oscillator. Subsequently, the generated pulses are counted with a counter. The output of the counter must be unique for different input combinations. We also theoretically investigate different features of the proposed VCO-based ADC, such as its variability tolerance. To ensure the compatibility of the ADC and the ReRAM devices, we perform real device-level measurements and programming methods. Our contributions in this article are as follows:

•

Design of a new VCO-based ADC using a ring-oscillator circuit for a crossbar array of resistive memories.

•

Theoretical evaluation of reliability and variation tolerance of the proposed ADC design.

•

Demonstration of the influence of variation based on real fabricated and characterized cells.

•

Comprehensive comparison done with the state-of-the-art ADC designs.

Simulation results show that our proposed ADC can distinguish up to 32 levels within 10 ns while consuming less than 5.2 pJ of energy. In addition, our proposed ADC can tolerate \(\approx\) 30% variability with a negligible impact on the performance of the ADC. The voltage across the ReRAM devices does not exceed 0.3 V during operation. Thus, the data stored in the resistive memories will not suffer from read-disturb.

The rest of the article is organized as follows. Section 2 presents basic information about the CiM architecture and related work, followed by the details of the proposed VCO-based ADC design and its circuit-level theoretical analysis in Section 3. Section 4 presents the device and circuit level results and introduces a Figure of Merit (FoM) for further comparison with state-of-the-art designs. Section 5 contains our conclusions.

2 Background

2.1 Computation-in-Memory

For data-intensive applications, the demand for high performance and energy efficiency is increasingly growing and existing architectures are incapable of fulfilling these requirements [26, 27]. Computation-in-Memory (CiM) architectures can significantly improve the performance and energy efficiency of data- and computation-intensive applications by performing operation and storage at the same physical location. Hence, CiM architectures remove the overheads associated with the costly data movements performed by the conventional von Neumann architectures by performing computing and storage at the same physical location [28].

Resistive memories — such as PCM [29], STT-MRAM [30], and ReRAM [31] — are promising alternatives to conventional memories such as DRAM or SRAM due to their attractive features, such as high density, almost zero leakage, and non-volatility. Also, due to their resistive characteristics, they are suitable for CiM architectures. In addition, various data-intensive applications and operations, such as MAC, can benefit from the computation in the crossbar organization of the resistive memories. Figure 1 shows the general architecture of the CiM unit (accelerator), built using a crossbar array of resistive memories. As shown by the digital input arrow in Figure 1, two sets of data are provided as an input to the CiM module. The first input set is stored in the resistive cells, which will be represented as the cell’s conductance. This write operation is performed by using write buffer and driver circuits. The other input set is provided by applying input voltages to the crossbar. In this case, the decoder module selects the address for the correct location where the input voltage is provided. As the nature of the computation in the crossbar is analog, the digital data must first be converted into an analog signal. Converting the digital input to the corresponding analog voltage is done through DAC modules and drivers. Similarly, the analog output signal of the crossbar needs to be converted into a digital signal again using ADCs, as the CiM accelerator module is a part of the digital host unit. These stages are controlled via a control unit. Figure 1 also shows how a resistive memory–based crossbar array can efficiently perform MAC operations in an analog manner by using Ohm’s Law and Kirchhoff’s Current Law. According to Ohm’s Law, current is the product of voltage (V; coming from the DAC modules) and conductance (G; these conductances are usually adjustable according to the training procedure [32]). According to Kirchhoff’s Current law, the bit-line current \(I_\text{bit-line}\) is equal to the current summation of all of the resistive cells (see Equation (1)) as follows:

\begin{equation} I_\text{bit-line} = [V_1 , V_2 , V_3]\times [G_1 , G_2 , G_3]^T = V_1\times G_1 + V_2\times G_2 + V_3\times G_3 . \end{equation}

(1)

Fig. 1.

2.1.1 ReRAM Technology.

Resistive switching devices have recently gained widespread attention due to their non-volatility, high integration density and their ability to overcome memory bandwidth issues by executing operations within the memory [1]. These properties make them attractive for various applications ranging from non-volatile memory [33], logic based around computation in memory [34], and neuromorphic computing [35]. Among the various restively switching technologies, bipolar resistive switching cells based on the Valence Change Mechanism (VCM) show great promise for various kinds of applications due to their properties such as non-linearity, multilevel capability, and stochastic switching behavior. VCM devices consist of a metal/mixed ion-electron conductor/metal structure. VCM switching has been recognized in devices based on various oxide materials such as HfO_x, TaO_x, ZrO_x, STO, or TiO_x [16, 36]. Oxide materials such as HfO_x are already compatible with conventional CMOS processes. If they are fabricated in a crossbar structure, they offer a high density of 4F², where F is the minimum feature size of the process technology. It should be noted that the original memristor publication by HP, which was inspired by the memristor circuit theory of Leon Chua, also describes a VCM cell based on TiO_x [37]. In a VCM cell, the two metal electrodes possess different work functions towards the oxide layer, as shown in Figure 2(a). The electrode with the higher work and lower work function forms a Schottky-type and Ohmic type contact with the oxide, respectively. The Schottky-type electrode is then denoted as Active Electrode (AE) while the Ohmic type electrode is called Ohmic Electrode (OE) [38]. Directly after fabrication, the oxide is insulating, which makes a forming step necessary. During forming, a high voltage is applied, which locally reduces the oxide layer. This generates positively charged oxygen vacancies and reduces the resistance of the device. In filamentary switching systems, this reduction is confined to a small portion of the total cell area. During the SET operation, the concentration of oxygen vacancies increases in the vicinity of the AE, which reduces the resistance of the device. In this condition, the ReRAM cell is in the Low Resistance State (LRS). The SET occurs when a negative voltage is applied to the AE. The RESET occurs for a positive voltage applied to the AE, which repels oxygen vacancies from the AE and causes the ReRAM cell to be in the High Resistance State (HRS) [39]. Figure 2(b) shows these two states via an I-V curve as well and Figure 2(c) shows the ReRAM 1T1R bit-cell structure. The 1T1R structure is mainly required to program and verify read of the individual conductance values in each cell. During the read of individual cells, sneak current paths may cause errors [40] in which unwanted leaking current in the crossbar structure leads to deviation of results, especially in the analog computation. Besides solving the problem of sneak paths, 1T1R structures enable analog programming for the memristors with low standard deviation [41], since this enables accurate measurement of the programmed values. There are different ways to organise 1T1R bit-cells into 1T1R arrays, such as the 1T1R memory array or the Pseudo-Crossbar array structure [42]. In this article, we focus on the Pseudo-Crossbar array since it was specifically developed to enable MAC computation operation.

Fig. 2.

One of the biggest challenges for the usage of VCM-based ReRAM cells is their variability, which stems from their stochastic and atomistic switching behavior [43] and which is observed, for example, in the variability of the HRS and LRS [44, 45]. In the development of circuits and architectures, this variability has to be addressed as per the application requirements. The resistance distribution of the LRS usually follows a normal distribution while the HRS distribution follows a log-normal distribution [46]. Additionally, during read operation, it has to be ensured that the voltage dropping across the ReRAM device is not too high in order to prevent read disturb, which is the changing of the device state due to consecutive read operations. It has been shown that low voltages will also lead to a switching of the devices, albeit on a much longer time scale [47]. While read disturb cannot completely be eliminated as long as there is a voltage drop across the ReRAM cells during read, having this voltage small for a few nanoseconds per read means that read disturb will happen only after billions of read cycles.

2.1.2 Analog-to-Digital Converter (ADC).

ADCs are generally used whenever an analog signal is needed as an input for digital modules. The CiM accelerator unit implemented with the ReRAM-based crossbar structure performs the operation in an analog manner. An ADC is required at the output of the crossbar to convert the analog result into the digital form, which can then be used by the digital host to which the CiM accelerator belongs. However, ADCs typically occupy a large area and consume significant energy compared with other components of CiM architectures [48]. Moreover, the major performance bottleneck of CiM architecture comes from the ADC phase. Thus, it is increasingly important to design highly efficient and compact ADCs for CiM operations.

ADCs mainly have two processing steps: (1) Sampling and Holding (S/H) and (2) Quantization and Encoding. The sampling frequency in the S/H step needs to meet the Nyquist rate and needs to amount to at least twice the highest data frequency. In the quantization phase, the reference signal is partitioned into a number of quantas. Then, the analog input is matched with one of these quantas in the encoding phase. A unique digital code will be assigned to the input reflecting which quanta it belongs to. Different designs of an ADC indeed have different ways for the quantization and encoding steps.

2.2 Related Works

Various previous works took different approaches to alleviate the ADC bottleneck. Each has its own pros and cons as well as common shortcomings. The authors of [21] proposed a shared ADC design in which an ADC is shared between multiple columns to tackle the bottleneck of the ADC phase. The authors used a Successive Approximation Register (SAR) ADC as their shared ADC and various columns have to access the ADC in a time-multiplexed scheme. The SAR-based ADC compares the output of the DAC module with the original analog signal and stores the result of the comparison in an SAR. Due to this closed-loop comparison, the digital output gets increasingly closer to the analog value. The advantage of the SAR-based ADC is that it is fast, has a high resolution, and it can produce all of the output bits in one activation cycle. Due to its large area and high power consumption, however, it has to be shared between multiple columns. The authors of [22] have selected a faster type of ADC: FLASH ADC. This ADC still needs to be shared between multiple columns. The Flash-type ADC is another design for an ADC in which the comparison of the analog input with all different reference signals is happening simultaneously. This makes the Flash-type ADC very fast and also expensive in terms of power and area, as multiple comparators are required that all must work in parallel. However, the authors of [49] have proposed a memristor-based FLASH ADC with higher density than conventional FLASH ADCs, but a FLASH ADC still has a large area and is only useful for low resolutions [50]. Due to the low resolution of the FLASH ADC, it cannot produce all of the output bits in one activation cycle.

The authors of [23] and [24] have adopted a different approach: leveraging small-area and low-power ADC modules, to be assigned one per column. Both Integrate and Fire (IF) and Sense Amplifier (SA) ADC modules are slow and low resolution. To produce an n-bit output, these ADCs need a \(2^n\) activation cycle; hence, the time complexity of these approaches is exponential \(O(2^n)\). The authors of [23] have used an IF ADC module in which a capacitor is charging via the bit-line current. The voltage of this capacitor is accumulative and is compared with a predefined threshold voltage. When the voltage of the capacitor reaches the threshold voltage, spikes will be generated. Then, these spikes are counted by a counter. The number of pulses determines the equivalent digital value of the analog result of the CiM operation. The authors of [25] also used an IF as their ADC, but they have tried to hide the overhead of the ADC phase by merging it with the activation computation phase, which is required in convolutional or fully connected neural networks. On the other hand, the authors of [24] used an SA. This interface is designed and specified for memory-read operations. By modifying this module, it can also be used as an ADC interface. Reuse ability is the remarkable advantage of the SA. The required interface for ADCs in the area of computation in the ReRAM crossbar structure can be obtained through modifying an already designed SA with considerably less effort.

Architectural-level solutions for the ADC phase bottleneck have also been considered in [19, 20, 25]. For instance, the authors of [20] have used a shared ADC as well. To decrease the number of ADCs and their activity, they have added an analog dataflow to their architecture. In this method, analog partial summations for large kernels (which need to be mapped on multiple crossbars) can be substituted for the digital partial summations without requiring costly ADC phases. With this technique, they have been able to decrease the number of ADCs and their cycles. The authors of [19] have also used shared ADC interfaces and the idea of analog buffers. In contrast, the ADC interface in [19] is a Time-Digital Converter (TDC). Due to the low supply voltage and technology scaling, designing an ADC in the voltage domain has become more difficult. Time-based signals, on the other hand, can improve the resolution of the ADC with these restrictions [51]. To benefit from the properties of the time-based signals, the authors of [19] used a TDC and Digital-Time Converter (DTC) instead of an ADC and DAC, respectively. In this approach, signals are distinguishable from each other by a delay from an initial point. The TDC mechanism can be implemented in a fully digital flow [52, 53, 54, 55, 56, 57] and not only has lower power consumption but is also less vulnerable to variations in noise, fabrication processes, voltage, and temperature [58]. Table 1 summarizes the advantages and disadvantages of the discussed related works.

Table 1.

Type of ADC phase	Area	Power	Resolution	Speed	ADC assignment
SAR ADC [20, 21]	–	–	++	+	Shared
FLASH ADC [22]	-	-	+	++	Shared
IF [23, 25]	++	++	–	–	Dedicated
SA [24]	++	++	–	–	Dedicated
TDC [19]	–	–	+	+	Shared

Table 1. Summary of the Related Work

Low performance is the common shortcoming between two general approaches of shared and dedicated ADC. The main factors for low performance are a time multiplexing scheme in the first approach and low resolution in the second approach. In addition to the known features of the ADC — namely, resolution, power, latency, and area — specific characteristics of the resistive memories, such as variability in the resistive states and restrictions on the maximum voltage across the device, must be taken into account in the ADC design. This motivates the need for a specific ADC design that fulfills the application requirements. In this article, we propose a VCO-based ADC. The VCO-based ADC is a type of time-based ADC (unlike SAR and FLASH ADCs) that produces signals with a unique frequency proportional to the analog signal [59]. In addition to the already mentioned properties of the time-based signals, the design of the VCO-based ADC is simpler since it does not need high-performance analog blocks such as amplifiers and DACs. There are several VCO-based ADCs available, such as those described in [60, 61, 62]. However, none of them is applicable for CiM due to their large area and high power consumption. Although the proposed VCO-based ADC does not offer high resolution, it is so small that it allows us to assign one single ADC to every single column. The VCO-based ADC also satisfies ReRAM device requirements, such as variability in their resistive states and the restriction on the maximum voltage across the device.

3 Proposed VCO-based ADC Design

3.1 Design Implementation

In our design, we assume a restricted class of MAC operation in which input voltages and weights (the conductance of the ReRAM devices) can acquire only two distinct values. To convert the analog output signal of the crossbar to a digital signal, we decided to use a VCO-based ADC, which is time based. Therefore, it can provide the advantages of the time-based signals with a relatively easy design procedure. In the ADC phase period and in order to transfer an analog current into the digital signal with the help of VCO-based ADC, three stages are required. In the first stage, the analog bit-line current needs to be transformed into an analog voltage. In the next stage, the obtained analog voltage is transformed into pulses with the help of the VCO. In the last stage, the generated pulses are counted with a counter and mapped to the corresponding digital signal with the help of a Lookup Table (LUT). We modulate the power supply directly by regulating the read voltage (V_read) applied to the crossbar (as row voltages). Therefore, deactivating the row V_read disables the crossbar as well as the ADC. The output of this stage is the equivalent digital signal and can be processed by the digital host. The schematic of the whole system, including the ReRAM crossbar and the VCO-based ADC, is shown in Figure 3.

Fig. 3.

3.1.1 Linking Crossbar and ADC Using Transfer Functions.

A transfer function is a mathematical function that describes the output of a system for each possible input. In our case, we will consider two systems: the 1T1R crossbar and the VCO-based ADC. For the crossbar, the input will be the specific resistance configuration of the cells that we read out, which is given as R_eq of the crossbar. We will consider the case in which all of the rows in a column are selected. R_eq is formed by the parallel connection of multiple series connections of ReRAM devices and access transistors (see Figure 1). The equivalent resistance of this parallel connection can then be calculated using

\begin{align} R_\text{eq}=\bigg (\frac{\# \textrm { of cells in LRS}}{R_\text{LRS}+R_\text{transistor, LRS}}+\frac{n - \# \textrm { of cells in LRS}}{R_\text{HRS}+R_\text{transistor, HRS}}\bigg)\raise1.5ex\hbox{--1}, \end{align}

(2)

where \(R_\text{LRS}\) and \(R_\text{HRS}\) denote the LRS and HRS resistance, respectively. R_{transistor,HRS/LRS} denotes the drain-source resistance of the transistor connected to a ReRAM cell in the LRS or HRS during readout, and n denotes the number of cells that are read in parallel with one ADC. The resulting transfer function of the crossbar can be seen in Figure 4(a). It follows from Equation (2) when the number of cells in the LRS is changed from one to eight.

Fig. 4.

In this case, we assume R_{transistor, HRS} to be 26 k\(\Omega\), R_{transistor, LRS} to be 5.8 k\(\Omega\), the LRS as 3 k\(\Omega\), and the HRS was varied from 15 k\(\Omega\) to 300 k\(\Omega\) to achieve various HRS/LRS ratios. The transistors were operating in the saturation region when connected to an LRS or HRS device. Figure 5(b) shows the load line characteristic of a 1T1R bit-cell with a LRS or HRS ReRAM cell. From this, it is obvious that the operating point of the transistor and its resistance will be different depending on the resistive state of the ReRAM cell.

Fig. 5.

For better clarity, a maximum of only 8 cells are considered to be read out at the same time. From this plot, it can be seen that a larger (smaller) HRS/LRS ratio leads to a less (more) linear relationship. This is because, for high HRS/LRS ratios, the equivalent resistance is almost exclusively determined by the number of cells in the LRS. The difference in the R_eq values decreases strongly for higher numbers of devices in the LRS, which increases the requirements on the ADC performance. It becomes more difficult to distinguish between different levels if more cells in the LRS state are connected. In addition, a smaller HRS/LRS ratio will lead to other issues because of an increased influence of Read Noise and Read Disturb. When the ratio between HRS state and LRS is decreased, the devices will be more susceptible to random variations and stress due to prolonged reading. If a sufficient number of cells are in the LRS, the equivalent resistance is very similar, independent of the HRS/LRS ratio. This means that the transfer function of the crossbar cannot be improved by optimizing the devices and has to be addressed by the ADC.

For an ADC, the transfer function displays its digital output value as a function of an analog input signal, usually the input voltage [63]. In our case, it is more useful to display the ADC transfer characteristic as a function of the resistances of the crossbar. Usually, in the design of ADCs, linearity of the input–output relationship is preferred. This means that the input levels corresponding to one ADC output have equal widths. Such a transfer function can be seen in Figure 4(b), illustrated as a black line. To determine whether a linear ADC characteristic is a reasonable choice for CiM using resistive devices, we combined the transfer functions of the crossbar with the transfer function of the ADC. This is possible, as the output of the crossbar transfer function is the same as the input of the ADC transfer function. The red crosses in Figure 4(b) are obtained using the R_eq values of the crossbar transfer function as presented in Figure 4(a) and an HRS/LRS ratio of 10. It should be noted here that the effective HRS/LRS ratio, when considering the serially connected transistors, is reduced from 10 to 6.36. The intersections between the ADC transfer characteristic (black line) and red crosses show to which ADC output the crossbar input is mapped. From Figure 4(b), it can be seen that the data in the crossbar does not map very well to the linear ADC characteristic since most of the possible outputs of the crossbar are mapped to the same ADC output (here, ‘101’). This is due to the fact that the R_eq for these crossbar outputs have very small differences between each other. This shows that the usually linear ADC transfer function is not an optimal solution for CiMs based on resistive devices. In summary, we can say that an ADC with a nonlinear transfer function should be better suited for CiM applications. The VCO-based ADC that is considered here has a strong nonlinear transfer characteristic, which makes it a suitable ADC candidate for CiM based on resistive memories. Yan et al. discussed a related issue. In [64], they compared different spacings of the resistive states (equal \(\Delta\)R vs. equal \(\Delta\)G) for a neural network using analog ReRAM devices. Their conclusion was that while both mappings delivered a comparable accuracy performance, the equal \(\Delta\)R mapping was beneficial as it resulted in less severe constraints for the ADC. This result suggests that a matching between crossbar and ADC can be achieved on multiple levels, either in the ADC or in the spacing of the resistance values. Since we use the ReRAM devices in a binary way, different spacings are equivalent to choosing different R_off/R_on ratios, which, as we showed in Figure 5(a), improves the linearity of the crossbars transfer function. However, it leads to other problems such as an increased susceptibility to device variability.

3.1.2 Current-to-Voltage Converter.

In the VCO-based ADC, the bit-line current that contains the result of the MAC operation needs to be first transformed into a voltage. This analog voltage can be used as the input of the VCO element. For this purpose, four different methods can be selected and we have to explore all four possible connection options: capacitance, gate-drain connected NMOS (diode-connected structure), constant resistance, and none.

•

Capacitance: In this case, the voltage across this element is exponentially reaching the \(V_\text{read}\) value with a time constant that is proportional to the ReRAM states. The frequency of the VCO’s output is time dependent. After a specific time, it will be almost equal for all of the different ReRAM states. Thus, having a capacitor is not a wise converting method for this purpose.

•

G-D connected NMOS (diode-connected structure): In this case, the gate and drain voltage is \(\sqrt {\frac{2\cdot I_\text{bit-line}}{\mu _\text{n}\cdot C_\text{OX}\cdot \frac{W}{L}}}+V_\text{th}\) (\(\mu _\text{n}\) is the electron mobility, \(C_\text{OX}\) is the capacitance per area of the gate oxide, \(W\) and \(L\) are width and length of the NMOS, respectively, and \(V_\text{th}\) is the threshold voltage).

•

Constant resistance: In this case, the voltage across the resistance is \(R_s\cdot I_\text{bit-line}\).

•

None: Relying on the input impedance of the VCO for the converting current into the voltage.

To select among these four methods, we use the following resolution criterion: a greater number of LRS for which the voltage at node Y in Figure 3 and the number of generated pulses become flattened indicates a higher resolution. As Figure 6 shows, using a diode-connected structure results in a better resolution compared with the other methods.

Fig. 6.

It is worth mentioning here that the parasitic bit-line capacitor is not necessarily required for the correct functionality of the circuit, but it exists due to the crossbar wires and junctions. This parasitic capacitor impacts different features of the circuit, as discussed in Section 3.2.

3.1.3 Voltage-Controlled Oscillator.

The VCO is an abstract 2-terminal module that gets a DC voltage as the input and produces a periodic signal with a frequency \(F\) as the output. The frequency of the output signal is a function of the DC input voltage. To realize the VCO, we use a ring-oscillator. A ring-oscillator consists of \(n\) inverter gates in a loop (n: odd and greater than 1). The output of the last inverter is connected to the input of the first inverter. In Figure 3, by connecting input voltage to the bit-lines of the crossbar, all of the nodes of the circuit, including the node Y (which are marked with X), start to oscillate with the same frequency but different phases. By changing the bias voltage of the ring-oscillator node Y, the frequency of the oscillation is changing since the ring-oscillator can be considered to be a VCO. Simulations show that the change in the frequency (\(f\)) of the oscillation has a linear relation with the change in the bias voltage (\(V_\text{bias}\)): \(\Delta f = K\cdot \Delta V_\text{bias}\). The constant \(K\) can be adjusted based on the number and size of the inverter’s transistors. For different resistive states of the crossbar, the bias voltage will be different, but their voltage difference could be very small. These close voltage levels need to be transferred into different frequencies; thus, choosing a larger \(K\) value helps in improving the resolution. Moreover, the value of \(K\) has an impact on other features of the circuit, for instance, its ability to tolerate the variability in the resistive memories, which will be discussed in Section 3.2.

3.1.4 Counter and Lookup Table.

The counter is also a crucial module in the proposed VCO-based ADC design. The bit-line current is eventually transformed into the number of pulses. To digitize these generated pulses, it can be counted in the certain period. As discussed earlier, high speed of the ring-oscillator (high \(K\)) is necessary to have an acceptable resolution for the ADC. The oscillation frequency, however, is limited by the speed of the counter, that is, how fast a counter can count. In sequential circuits as well as counters, two timing parameters should be considered to determine the speed of these circuits: data-path length and setup time. Data-path length is the delay of the circuit’s critical path and setup time is the amount of time that the data needs to be stable before the active edge of the clock. To avoid timing errors in the counter module, the period of the clock (the signal produced by the ring-oscillator) must be greater than or equal to the sum of the data-path delay and setup time. In the fixed resolution, the speed of the ring-oscillator must be adjusted by considering the counting frequency of the counter. Therefore, in the fixed resolution, latency is determined only by the counter module. We have selected an asynchronous ripple counter as the counter module in our design because of its area efficiency. An N-bit ripple-counter can count \(2^N\) states, which is the maximum number of countable states among counter structures. Moreover, the ripple-counter does not need any logic between its flip-flops for the correct functionality. Finally, the output of the counter must be mapped to a proper digital representation. This correspondence is done through the LUT.

3.1.5 Self-Timing Path (STP).

The premise of the Self-Timing Path (STP) is to calculate the time a column with known resistance states takes to produce a known output and then use that time interval as the total time (\(T_\text{total}\)) of the required operation. To implement this technique, a dummy column is used (we take all resistive values as LRS in this dummy column) in the same crossbar array. As soon as the output number of pulses for this dummy column reaches the applied number of non-zero row voltages, it triggers the other columns to stop counting. This dummy column thus provides a variation-aware \(T_\text{total}\). The acquired \(T_\text{total}\) is adaptive to global variations of CMOS and ReRAM devices, temperature, and fluctuations in the peripheral/core voltage supplies [65].

3.2 Theoretical Analysis

In this subsection, we theoretically investigate the impact of each element on the whole circuit.

•

Access Transistor: As mentioned in Section 2, 1T1R bit-cells are beneficial in both removing the sneak paths and improving the writing process. We have chosen an NMOS access transistor located between the source-line and the ReRAM device. Different type (PMOS) or different location (between the ReRAM device and bit-line) has an impact on the features of the ADC since the resistance of the access transistor varies in these conditions. Further discussion on this matter is out of the scope of this article.

•

VCO-Impedance: As discussed earlier, we have realized the VCO part with a ring-oscillator. During the oscillation, the gate voltage of the inverters is also changing. This change in the gate voltage results in varying the resistance of the inverters and the whole ring-oscillator. The oscillating nature of the ring-oscillator cannot be emulated by the mere resistance. A capacitive element, for instance, is also required for the correct oscillation modeling. Thus, we safely substitute a varying impedance for the ring-oscillator. Because of the oscillating nature in the output of the crossbar array (node Y in Figure 3), the source voltage of the access transistors is also changing, which ends up having time-variant equivalent resistance coming from the crossbar. To clarify the condition, the resistive state of the ReRAM devices is data dependent and determined independently (e.g., during the training phase of neural networks). Various data generate different time-variant equivalent resistance and different time-variant ring-oscillator impedance. Figure 7 shows a simplified schematic of the VCO-based ADC by substituting a time-variant \(R_\text{eq}\) for the crossbar and an \(R_\text{eq}\)-dependent and time-variant impedance (\(Z_\text{os}\)) for the ring-oscillator.

Illustration of equivalent circuit for the proposed VCO-based ADC design, substitution of time-variant \(R_\text{eq}\) for the crossbar, and time-variant and \(R_\text{eq}\)-dependent impedance for the rest of the ring-oscillator.

By considering Figure 7, we could theoretically evaluate the features of the VCO-based ADC, such as resolution, variability tolerance, power, energy, and voltage across the ReRAM devices.

Resolution: As Figure 4(a) also shows, by increasing the number of ReRAM devices in parallel, the difference between different resistive states is narrowing and they will be more difficult to distinguish. In the case of having \(N\) parallel ReRAM, the two resistive states of \(\frac{R_\text{LRS}}{N}\) and \(\frac{R_\text{LRS}}{N-1}||R_\text{HRS}\) has the minimum resistance difference. The resolution of the ADC is specified by these two closest resistive states. The ADC can distinguish between \(N\) levels as far as it is able to produce different outputs for these two different states. The ring-oscillator can generate a distinct number of pulses for these states if its bias voltage (node Y in Figure 3) has different corresponding values. The voltage at node Y is determined by voltage division between \(R_\text{eq}\) and \(Z_\text{os}\) and the voltage difference between these two resistive states is obtained via

\begin{equation} V_{y_1}-V_{y_2} = V_\text{read}\cdot \left(\left(\frac{Z_{\text{os}_1}}{\frac{R_\text{LRS}}{N}+Z_{\text{os}_1}} \right)- \left(\frac{Z_{\text{os}_2}}{\frac{R_\text{LRS}\cdot R_\text{HRS}}{R_\text{LRS}+R_\text{HRS}\cdot (N-1)}+Z_{\text{os}_2}} \right) \right) . \end{equation}

(3)

We must highlight that the bias voltage of the ring-oscillator (node Y) is also oscillating but to benefit from the abstract model of the VCO. We are considering the effective value (DC) of the voltage at node Y. In DC analysis, capacitors are behaving as the open circuit; thus, \(Z_{\text{os}}\) has only a resistive nature. As Equation (3) shows, by increasing the number of ReRAMs (\(N\)) the first terms in the denominators become negligible and the voltage difference between these two states approaches zero. As with the smaller \(Z_{\text{os}_{1,2}}\) terms, \(N\) has a higher limit to become before the first terms, including \(N\) in denominator of Equation (3), to become negligible in comparison with \(Z_{\text{os}_{1,2}}\). As discussed before, the bit-line current needs to convert into the voltage at the input of the VCO via different impedance elements. In Equation (3), the impedance of such a converter is in parallel to \(Z_{\text{os}_{1,2}}\) and makes these terms smaller. This results in a higher number of distinguishable levels. Thus, having a current-to-voltage converter is generally beneficial for the resolution. However, even without any converter and by relying on the impedance of the ring-oscillator, the bit-line current can transform into the voltage. Now, let us investigate the impact of both constant resistor and diode-connected element on the resolution. The diode-connected NMOS element can improve the resolution more than the constant resistor. It can be considered as a voltage-controlled resistance and its value is equal to \(\frac{2}{\mu _\text{n} \cdot C_\text{OX} \cdot \frac{W}{L}} \cdot \frac{V_\text{GS}}{(V_\text{GS}-V_\text{th})^2}\) (\(V_\text{GS}\) is the gate-source voltage of the NMOS). By increasing \(N\), \(V_\text{GS}\) gets larger and the resistivity of the diode-connected NMOS decreases. Hence, the terms \(Z_{\text{os}_{1,2}}\) get smaller, resulting in a higher resolution. It is worth mentioning here that the nominal values of the LRS and HRS also have an impact on the resolution. For larger values of the LRS and HRS, \(N\) could be larger before the two closest resistive states become indistinguishable.

Variability tolerance: Due to the stochastic nature of the switching in the ReRAM devices, the HRS and LRS resistance states are not fixed values; rather, they have a range. As a result, it is desirable for the ADC to tolerate the variability in the ReRAM resistive states. The proposed VCO-based ADC can tolerate this variability. To theoretically investigate this variability tolerance, first, we must emphasize that in contrast to the DC analysis, the capacitors also influence the resolution. The impedance of the ring-oscillator has a resistive and capacitive nature. In addition, the parasitic bit-line has an impedance equal to \(\frac{1}{2\pi f C}\), in which \(f\) is the voltage frequency and \(C\) is the capacitance of the bit-line capacitor.

To theoretically evaluate the variability (in the resistive memories) tolerance of the VCO-based ADC, the change in the frequency with respect to the change in the equivalent resistance must be checked, that is, \(\mid \!\!\frac{ \partial f}{\partial R_\text{eq}}\!\!\mid\). The smaller the term \(\mid \!\!\frac{\partial f}{\partial R_\text{eq}}\!\!\mid\) is, the better is the variability tolerance. Please note that the absolute value of \(\frac{ \partial f}{\partial R_\text{eq}}\) matters for the variability tolerance. According to the chain rule:

\begin{equation} \left|\frac{\partial f}{\partial R_\text{eq}}\right|\ =\ \left|\frac{\partial f}{ \partial V_\text{bias}}\cdot \frac{\partial V_\text{bias}}{\partial R_\text{eq}}\right|\!. \end{equation}

(4)

In Equation (4), the term \(\frac{\partial f}{ \partial V_\text{bias}}\) is the speed of the ring-oscillator and equals \(K\). The term \(\frac{dV_{bias}}{dReq}\) is calculated according to

\begin{equation} \left|\frac{\partial V_\text{bias}}{\partial R_\text{eq}}\right|\ =\ V_\text{read}\cdot \left|\frac{\partial }{ \partial R_\text{eq}} \frac{Z_\text{os} || \frac{1}{2\pi fC}}{Z_\text{os} || \frac{1}{2\pi fC}+R_\text{eq}}\right|. \end{equation}

(5)

Combining Equations (4) and (5), we get that

\begin{equation} \left|\frac{\partial f}{\partial R_\text{eq}}\right|\ =\ \left| K\cdot V_\text{read}\cdot \frac{\left(\frac{\partial Z_\text{os}}{\partial R_\text{eq}}\cdot R_\text{eq}-Z_\text{os} \right)-2\pi \cdot f\cdot C\cdot Z_\text{os}^2}{(Z_\text{os}+R_\text{eq}\cdot Z_\text{os} \cdot 2\pi fC+R_\text{eq})^2+(2\pi \cdot Z_\text{os}^2\cdot R_\text{eq}\cdot C)}\right|. \end{equation}

(6)

\(Z_\text{os}\) and \(R_\text{eq}\) are in the range of k\(\Omega\)¹, \(f\) is in the range of GHz, and C is in the range of \(fF\). Thus, the terms \((R_\text{eq}\cdot Z_\text{os}\cdot 2\pi fC)\) and \((2\pi \cdot Z_\text{os}^2\cdot R_\text{eq}\cdot C)\) are negligible. Since the term \((\frac{dZ_\text{os}}{dR_\text{eq}}\cdot R_\text{eq}-Z_\text{os})\) is negative and due to the impact of absolute value, the final formula for the variability tolerance is

\begin{equation} \left|\frac{\partial f}{\partial R_\text{eq}}\right|\ = K\cdot V_\text{read}\cdot \frac{\left(Z_\text{os}-\frac{\partial Z_\text{os}}{\partial R_\text{eq}}\cdot R_\text{eq} \right)+2\pi \cdot f\cdot C\cdot Z_\text{os}^2}{(Z_\text{os}+R_\text{eq})^2}. \end{equation}

(7)

Equation (7) shows that by increasing the parasitic bit-line capacitor, variability tolerance decreases. The nominal values of the LRS and HRS also have an impact on variability tolerance. For instance, for higher resistances, the speed of the ring-oscillator (the constant \(K\)) must be higher since similar data patterns must result in similar digital outputs regardless of the resistive values of LRS and HRS. By increasing the nominal values of LRS and HRS, \(R_\text{eq}\) increases. This forces the bias voltage of the ring-oscillator to decrease (node Y in Figure 3). The number of generated signals, however, must remain the same to produce the same frequency with a lower bias voltage. A higher ring-oscillator speed is destructive for variability tolerance, but the larger LRS and HRS values contribute quadratically, which overcompensates for the former effect. To conclude, increasing the LRS and HRS value improves variability tolerance.

It is interesting to mention that adding an impedance element (such as the diode-connected structure in Figure 3) decreases variability tolerance by decreasing the \(Z_\text{os}\) term in the denominator of Equation (7). Confusion might result on this matter, as \(Z_\text{os}\) is present in both nominator and denominator with equal exponent. These two terms are not canceling each other out since the first term of the nominator \((Z_\text{os}-R_\text{eq}\cdot \frac{\scriptstyle dZ_\text{os}}{\scriptstyle dR_\text{eq}})\) is much larger than the second term \((2\pi f\cdot C\cdot Z_\text{os}^2).\) Thus, the first term is dominant and the second term can be neglected here. However, if the only changing variable in the circuit is \(C\), then the only changing term is \((2\pi f\cdot C\cdot Z_\text{os}^2)\). In that case, the effect of \(C\) on variability tolerance must be considered.

Power consumption: In resistive elements, power is equal to \(\frac{V^2}{R}\). Here, the term \(R\) indicates the total resistive load of the circuit, which is the series connection of the crossbar and the whole VCO-based ADC. Since the data manifest themselves as resistance states, the power consumption is data dependent. The maximum and minimum power consumption happens for all LRS and all HRS, respectively. The power is also affected by the nominal values of the LRS and HRS, as it is inversely proportional to \(R_\text{eq}\). In the fixed resolution, the latency is determined only by the counter and is not affected by the data or LRS and HRS nominal values. The Energy is the power-latency product and is affected by the data and the nominal values of LRS and HRS in a similar way as the power.

Voltage across the memory cells: As discussed in Section 2, read-disturb phenomena are more likely to happen at high voltages across the ReRAM devices. This voltage can also be affected by the data and the nominal values of the LRS and HRS. The largest voltage across the device occur for all HRS since \(R_\text{eq}\) is maximal in this case. Larger (smaller) nominal values for LRS and HRS also results in a higher (lower) voltage across the ReRAM devices. A higher voltage across the ReRAM devices increases the probability of read disturb phenomena.

4 Results

In this section, we present the device- and circuit-level results of the proposed VCO-based ADC design. The first part of this section discusses the ReRAM manufacturing results. The second part discusses the circuit-level results.

4.1 ReRAM Fabrication and Characterization

4.1.1 Device Fabrication.

For the experimental investigation, we fabricated VCM ReRAM cells with a (30 nm Pt/5 nm ZrO₂/20 nm Ta/30 nm Pt) stack as shown in Figure 8(a).

Fig. 8.

The cells are arranged in a 7 \(\mu\)m \(\times\) 7 \(\mu\)m crossbar structure, designed as a 32 \(\times\) 1 cell array. A microscopic picture of this structure is given in Figure 8(b). Using a dedicated probe card, all 32 top electrodes of one array can be connected to the measurement device. The bottom electrode is common for all cells on the die and realized as a whole surface platinum layer underneath a structured SiO₂ layer, which separates the single array.

The fabrication workflow of the presented cells is illustrated in Figure 8(c). Onto the Pt bottom electrode covering the whole substrate surface, 30 nm SiO₂ is deposited via e-Beam sputtering. Using UV lithography, the array structure (represented by the vertical lines shown in Figure 8(b)) is transferred to the SiO₂ layer and etched free via hydrofluoric acid. After removing the photo-resist, 5 nm ZrO₂ is deposited via reactive RF sputtering. Another UV lithography step is used to structure the top electrodes; subsequently, 20 nm Ta is deposited on the oxide via RF sputtering. To prevent oxidation of the Ta electrode, it is in situ covered by a 30 nm Pt layer. It may be noted that the SiO₂ layer is used to separate the single array and does not contribute to the resistive switching characteristics. This enables a large bottom electrode in the array structure, resulting in comparatively low series resistances. Since HfO₂ is more common as switching oxide in VCM devices, we would like to note that ZrO₂ and HfO₂ are almost identical with respect to their physico-chemical properties.

4.1.2 Device Measurement.

The cells are characterized using a dedicated probe card providing 32 probes that are connected to a custom array tester based on the \(\mu\)Controller Module platform by aixACCT Systems. A photograph of the measurement setup is shown in Figure 8(d). All voltages are applied via the probe card shown on the left to the Ta top electrode. Using another probe shown on the right, the common Pt bottom electrode is connected to GND. Figure 9 sketches the general measurement flow for forming, SET, and RESET. Each cell requires an initial electroforming step to generate oxygen vacancies and develop a conducting filament [16]. Subsequently, the cell is cycled by alternating RESET and SET operations. The initial electroforming is performed by a triangular voltage pulse with a rise time of 20 ms. The forming stop voltage ranges from 3 V to 5 V. Here, a read-verify algorithm, implemented in the measurement software, is used. Starting with 3 V, the cell is read after each forming operation. In the case of failed forming, the pulse is repeated with increased stop voltage to a maximum of 5 V. During electroforming, the resistance of the cell is lowered by several orders of magnitude. Since this operation requires comparatively high voltages, the decreased resistance would result in a high current through the cell along with high temperature, which could cause irreparable damage to the cell [66]. Therefore, the cell current is limited during forming by adding a series resistance with \(R_\textrm {s} = 10~\textrm {k}\Omega\) via the internal switch matrix of the measurement equipment.

Fig. 9.

For both RESET and SET operations, rectangular voltage pulses with a length of 20 µs are applied. Using an equivalent read-verify algorithm, the pulse height varies from –0.5 V to –5 V for the RESET process and from 0.5 V to 5 V for the SET operations. After each programming pulse, a read pulse is applied and the cell resistance is determined. Unless the state is within the required margins, further programming pulses with increased voltage are applied, as shown schematically in Figure 9. It was observed that no series resistance is required for the pulsed SET and RESET schemes. It may be noted that the typical pulse voltage for a successful SET is approximately 1 V. The typical RESET voltage is –1.8 V. The high maximum voltages of the algorithm are usually not necessary. The algorithm ensures reliable programmability of each cell with a preferably low voltage. Furthermore, it enables programmability of cells into variability margins specified by the application.

4.1.3 Measurement Results.

As outlined in Table 2, the circuit design desires ReRAM resistances of \(R_\text{HRS} = 30~\)k\(\Omega\) and \(R_\text{LRS} = 3~\)k\(\Omega\). To model the ReRAM devices, we use fixed resistances, as 3 and 30 k\(\Omega\), which are relatively low resistances. For such low resistances, the kind of ReRAM devices we used behaves quite linearly and shows little RTN. As later discussed in detail in Section 4.2.2, the VCO-based ADC can tolerate variation in resistive states of the ReRAM devices up to <30%. Figure 10 shows experimentally obtained cumulative distributions of 7000 HRS and LRS states each. The data are acquired by cycling 32 cells of one array with the read-verify algorithm into the highlighted variability margins of 2.2 k\(\Omega\) to 3.8 k\(\Omega\) for the LRS and 22 k\(\Omega\) to 38 k\(\Omega\) for the HRS. These margins are well within the tolerated maximum of 30%. Thus, it can be stated that the fabricated devices match the specifications given in Table 2 and, therefore, are appropriate for the proposed application.

Fig. 10.

Table 2.

Parameters	Specifications
RRAM Device	\(ZrO_{2}/Ta\)
HRS	30 k\(\Omega\)
LRS	3 k\(\Omega\)
CMOS Technology	28 nm TSMC
Read Voltage	0.9 V with \(\pm\)10% variations
CMOS Specs	TT, 27\(^{\circ }\)C
CMOS Variation	3 \(\sigma\)
Counter/LUT Voltage	0.9 V with \(\pm\)10% variations
Latency of the counter	40 ps

Table 2. Simulation Parameters

4.2 Circuit Level Results

The electrical schematic simulation of this design has been done in TSMC 28 nm technology.

The Analog parts of the design, the crossbar and the ring-oscillator, are simulated using the Cadence Spectre simulator, whereas the digital parts, the counter and the LUT, are first described with Verilog code and then synthesized with the Cadence GENUS tool. Table 2 shows the simulation parameters.

4.2.1 Design Parameters Tuning and Exploration.

To improve the circuit from the resolution point of view, we first need to tune different parameters of the circuit, such as \(V_\text{read}\) (Figure 11(a)), the transistor sizes in the inverters (Figure 11(b)), the number of the inverter gates in the ring-oscillator (Figure 11(c)), and the width of the diode-connected structure (Figure 11(d)). To select these parameters, a trade-off analysis between different aspects of the circuit must be performed. On the one hand, for \(V_\text{read}\), lower values are beneficial from a power consumption point of view. On the other hand, higher values for \(V_\text{read}\) increase the resolution (more dynamic range for the number of generated pulses). Due to the timing constraints of the counter, the number of pulses cannot be more than a specific number, and increasing \(V_\text{read}\) beyond a certain voltage (0.9 V) does not have any impact on the resolution. Regarding the size of the transistors in the inverter, smaller widths are beneficial for the area (for similar rise and fall times of the pulses, the size of the PMOS is kept twice the size of the NMOS), whereas larger widths result in a higher speed of the ring-oscillator, which is beneficial for the resolution. Increasing the width from 0.35 \(\mu\)m to 0.5 \(\mu\)m does not have any specific impact on the resolution. Thus, the NMOS width has been selected to 0.35 \(\mu\)m. The number of inverter gates in the ring-oscillator also has to be investigated. A shorter ring results in a higher speed of the ring-oscillator, which is beneficial for the resolution. Moreover, a shorter ring improves area efficiency. However, increasing the speed of the ring-oscillator beyond timing constraints of the counter will not improve the resolution. Rather, this will increase the probability of timing errors. In the proposed VCO-based ADC, the number of inverters is 5. As mentioned before, we have selected a diode-connected NMOS structure as current-voltage converter. The width of this device also needs to be determined. The simulations show that the width of the diode-connected NMOS does not have a tangible impact on the resolution. Thus, we can select it as low as possible to improve area efficiency. As lower widths are more prone to CMOS variations, we have selected the width of diode-connected NMOS as 0.5 \(\mu\)m, as increasing beyond this width does not help further. For all four plots of Figure 11, we increase the number of LRS from 0 to 31 and count the number of pulses. A more dynamic range for the number of generated pulses while not exceeding the counter’s maximum frequency is our criterion for selecting the parameters.

Fig. 11.

4.2.2 VCO-Based ADC Evaluation.

To investigate the latency of the ADC, two points must be considered: (1) the number of generated pulses must be unique (not necessarily linear) for a different number of LRS devices and (2) the number of generated pulses must be countable by the counter. Figure 12(a) shows the number of pulses for different numbers of LRS, from 0 to 31. In 2 ns evaluation time, the proposed VCO-based ADC can generate different numbers of pulses for 13 different levels. For a higher number of levels, the number of generated pulses is not unique anymore. To consider the impact of process, voltage and temperature variability, 100 Monte Carlo analysis has been performed with and without STP technique to reduce the impact of global variations in CMOS devices (with normal distribution). Figure 12(b) also shows the voltage of the node Y in Figure 3 (bias voltage of the ring-oscillator) with 100 Monte Carlo analysis. The minimum voltage at node Y is \(0.6~V\); as \(V_\text{read}\) is \(0.9~V\), the maximum voltage across the ReRAM devices is less than \(0.3~V\). Thus, the probability read-disturb in the ReRAM is rather low.

Fig. 12.

By increasing the evaluation time, the resolution of the ADC increases at the cost of latency and energy. Figure 13(a) shows the impact of evaluation time on the number of generated pulses with 100 times Monte Carlo analysis with STP technique. Figure 13(b) shows the impact of evaluation time on the energy and resolution of the ADC. The latter is almost linear, as the number of generated pulses is more or less proportional to the evaluation time.

Fig. 13.

The area of the VCO-based ADC contains four elements, the diode-connected structure to convert current into voltage, the ring-oscillator, the counter, and the LUT. As already discussed, the number of generated pulses of the ADC can be adjusted by changing the evaluation time. To report the area, we consider a 6-stage ripple counter. Table 3 shows the area of each element and the total area of the whole design. As shown in the Table 3, the proposed VCO-based ADC has a small enough area to assign one to each column.

Table 3.

Circuit Design	Area (\(\mu\)m\(^2\))
NMOS (diode-connected)	0.022
Ring-oscillator	0.22
Counter	0.81
LUT	0.69
Total area	1.74

Table 3. The Area of the Different Parts of the VCO-Based ADC

The proposed VCO-based ADC can tolerate variation in the ReRAM devices up to a certain percentage. As already discussed, the write-verify algorithm at the device level can ensure the variability within the acceptable margin. However, as Figure 14 shows, the resolution of the ADC tends to decrease with increasing the variability. To analyze the impact of variability in ReRAM, we have considered the worst case, that is, all of the ReRAM devices either in LRS or HRS. As outlined in Table 2, CMOS variation is also considered in this simulation. The \(10\%\) variability, for instance, means that the resistance of the ReRAM devices can either be 2700 \(\Omega\) or 3300 \(\Omega\) for the LRS and 27 k\(\Omega\) or 33 k\(\Omega\) for the HRS.

Fig. 14.

Table 4 summarizes the features of our VCO-based ADC. The small area of the proposed VCO-based ADC enables us to assigned one ADC per column, as it is able to distinguish between more than one level per activation cycle. Moreover, the proposed ADC meets the requirements of the ReRAM devices, such as voltage across the device and variability.

Table 4.

Feature	Value
Resolution (bit)	3 to 5
Latency (ns)	2 to 10
Voltage across the ReRAM device (V)	less than 0.3
Energy (pJ)	0.8 to 5.2
Area (\(\mu\)m\(^2\))	1.74
ReRAM variation awareness	\(\approx \hspace{-2.0pt}30\%\)

Table 4. Features of the Proposed VCO-Based ADC

4.2.3 Comparison with State-of-the-Art.

Our proposed ADC provides an efficient solution that implements MAC operation in a group of memristor-based crossbar columns. In this manner, we have included a subset of a fully fledged general purpose vector-matrix multiplication (VMM), which is the most fundamental computational unit in today’s hardware systems implementing machine learning algorithms. In other words, we presented a column that is a block computed in a parallel manner in a crossbar of arbitrary number of columns. This implies that this sub-block can be repeated many times, and similar efficiency can be easily scaled for larger crossbars.

The essential step towards comparing the efficiency of the design is defining a Figure of Merit (FoM). To define a proper FoM, let us first investigate the energy of the ADC phase. The energy of this phase can be determined by

\begin{equation} E=P \times N_\text{ADC} \times L \times \left(\frac{N_\text{C}}{N_\text{ADC}} \right) \times \left(\frac{N_\text{S}}{N_\text{D}} \right). \end{equation}

(8)

Here, P is the power consumption of the ADC, \(N_\text{ADC}\) is the number of ADCs, L is the latency of the interface, \(N_\text{C}\) is the number of columns in the crossbar, \(N_\text{S}\) is all possible states covered by the DAC and the crossbar’s rows, and \(N_\text{D}\) is the number of levels distinguishable by the ADC.

In general, the total energy is the power-latency product. The whole power of the ADC phase is equal to the power of one ADC multiplied by the number of ADCs. The latency of the ADC is determined by three factors: the latency of one ADC, the degree of the ADC sharing, and the total number of activations per column. The degree of time-sharing is simply obtained by \(\frac{N_\text{C}}{N_\text{ADC}}\). The total number of activations per column is obtained by \(\frac{N_\text{S}}{N_\text{D}}\), which shows the number of cycles an ADC must be activated in order to convert the analog data of a column to a digital representation.

The FoM must reflect the efficiency of the design; hence, it must include parameters that are directly controllable by the designer. In the energy formula of the ADC (Equation (8)) P, L, and \(N_\text{D}\) are 100% controllable by the ADC designer, whereas \(N_\text{C}\) and \(N_\text{S}\) are determined by crossbar and DAC modules and number of crossbar’s rows, respectively. \(N_\text{ADC}\) is semi-controllable by the ADC designer, since it is derived from \(min\) \(\left(\frac{\text{Area~budget~of~the~system}}{\text{Area~of~the~ADC}} , \frac{\text{Power~budget~of~the~system}}{\text{Power~of~the~ADC}} \right)\). The area and power budget of the system is not controllable by the designer, but the area and power of the system can be fully controlled by the designer.

By defining FoM as \(\frac{\text{energy~per~module}}{N_\text{D}}\), it includes all of the parameters that are controllable by the ADC designer and reflects the efficiency of the circuit. A smaller FoM implies a more efficient circuit design. Table 5 includes different features for the SAR ADC [67], SA [68], and VCO-based ADC. The results reported for the SAR ADC [67] and SA [68] are based on the actual measurements while, for this work, results are gathered through electrical schematic simulation. As shown in the table, the proposed VCO-based design in 28 nm technology node has almost the same (even better for lower resolution) circuit design efficiency as previous designs and also considers two reliability concerns of the memristor devices: the variability and voltage across the device.

Table 5.

Feature	SAR ADC [21, 67]	SA [24, 68]	This work
Technology node	32 nm	65 nm	28 nm
Area	9600 \(\mu\)m\(^2\)	78.3 \(\mu\)m\(^2\)	1.74 \(\mu\)m\(^2\)
Latency	1 ns	5 ns	2 to 10 ns
Energy	16 uJ	114 to 130 fJ	0.8 to 5.2 pJ
Resolution (# of levels)	128	1	8 to 32
Variability in memristors	Not considered	Not considered	Considered
Voltage across the memristors	Not considered	Not considered	less than 0.3 V
FoM	1.25 \(\times \ 10^{-13}\) J	1.14 to 1.30 \(\times \ 10^{-13}\) J	1 to 1.6 \(\times \ 10^{-13}\) J

Table 5. Comparison of Different ADC Interfaces

5 Conclusion and Future Work

CiM architectures using emerging memory technologies have the potential to overcome the data transfer and performance challenges of conventional von Neumann–based designs. However, due to analog computation, the efficiency of CiM architecture is highly limited by the ADC phase. In order to address this issue, a VCO-based ADC design is presented in this article. In the proposed ADC design, the bit-line current coming from the crossbar is first converted into voltage. Then, the voltage is used to drive a VCO, which generates pulses with a frequency proportional to the voltage. The proposed ADC is evaluated using a ReRAM-based CiM crossbar array. Simulation results show that it can distinguish up to 32 levels within 10 ns while consuming less than 5.2 pJ of energy. In addition, our proposed ADC can tolerate \(\approx\)30% variability of the resistive device state with a negligible impact on the performance of the ADC. A direction for further work is to improve the resolution of our ADC design while maintaining a compact and low-power design. Moreover, efficient programming circuits need to be further explored to realize the write-verify operations. Other future works include enhancing the implementation of the currently high area and power-consuming counter and LUT designs.

Footnote

For the correct functionality of the ADC, \(Z_\text{os}\) and \(R_\text{eq}\) must be in a same range. If \(Z_\text{os}\) is much larger, the voltage of node Y in Figure 7 would be \(V_\text{read}\) and if \(Z_\text{os}\) is much smaller, the voltage of this point would be 0. In either of these states, the change in the \(R_\text{eq}\) cannot be reflected on the node Y and the ADC would fail to distinguish between levels.

References

[1]

S. Hamdioui, S. Kvatinsky, G. Cauwenberghs, L. Xie, N. Wald, S. Joshi, H. M. Elsayed, H. Corporaal, and K. Bertels. 2017. Memristor for computing: Myth or reality?. In Design, Automation and Test in Europe Conference and Exhibition (DATE’17). 722–731.

Abstract

1 Introduction

2 Background

2.1 Computation-in-Memory

2.1.1 ReRAM Technology.

2.1.2 Analog-to-Digital Converter (ADC).

2.2 Related Works

3 Proposed VCO-based ADC Design

3.1 Design Implementation

3.1.1 Linking Crossbar and ADC Using Transfer Functions.

3.1.2 Current-to-Voltage Converter.

3.1.3 Voltage-Controlled Oscillator.

3.1.4 Counter and Lookup Table.

3.1.5 Self-Timing Path (STP).

3.2 Theoretical Analysis

4 Results

4.1 ReRAM Fabrication and Characterization

4.1.1 Device Fabrication.

4.1.2 Device Measurement.

4.1.3 Measurement Results.

4.2 Circuit Level Results

4.2.1 Design Parameters Tuning and Exploration.

4.2.2 VCO-Based ADC Evaluation.

4.2.3 Comparison with State-of-the-Art.

5 Conclusion and Future Work

Footnote

References

Cited By

Index Terms

Recommendations

A wide band CMOS differential voltage-controlled ring oscillator

Optimized 2.4GHz voltage controlled oscillator with a high-Q MWCNT network-based pulse-shaped inductor

A Performance Model for the Design of Pipelined ADCs with Consideration of Overdrive Voltage and Slewing

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations