CN115910152A

CN115910152A - Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function

Info

Publication number: CN115910152A
Application number: CN202211499170.XA
Authority: CN
Inventors: 蔺智挺; 冯浩冉; 吴秀龙; 彭春雨
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2022-11-28
Filing date: 2022-11-28
Publication date: 2023-04-04

Abstract

The present invention relates to a charge domain memory calculation circuit and a memory calculation circuit having a positive/negative number operation function. The charge domain memory calculation circuit comprises four storage units T1-T4 for storing weight data, a storage unit TSIGN for storing representation symbol bit data, a multiplication accumulation module MAC and a PRE-charge module C-PRE. When the charge domain memory computing circuit executes signed multiplication, the voltage signal of external four-bit weight data is input into the multiplication and accumulation module MAC through the PRE-charging module C-PRE after sign bit operation, then the voltage signals of four-bit weights stored in T1-T4 are input into the multiplication and accumulation module MAC, and Vsum [3] -Vsum [0] outputs the voltage signal representing the operation result of the multiplication and accumulation module MAC. The invention realizes highly parallel data input on the basis of meeting the precision and meets the requirement of multi-bit multiplication with sign bit.

Description

Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function

Technical Field

The present invention relates to the field of memory computing technologies, and in particular, to a charge domain memory computing circuit and a memory computing circuit having a positive/negative number computing function based on the charge domain memory computing circuit.

Background

Convolutional Neural Networks (CNNs) play an important role in various intelligent recognition tasks, such as image recognition, speech recognition, natural language processing, and so on. Since the proposed von Neumann architecture, the basic architecture of computers has been determined and used to date. For the traditional von Neumann architecture, instructions and data in the whole system architecture are firstly written into a memory, then corresponding data in the memory are sent to a corresponding processing unit through a data bus to be calculated after the instructions are obtained through a control bus according to a control unit, and the result is written into a corresponding unit of the memory through the data bus. This approach limits the speed of data processing, making von neumann architectures unwilling to handle complex, data-intensive computations with the introduction of more and more complex algorithms.

In order to solve the problems caused by a von Neumann architecture 'memory wall', researchers put forward a concept of memory computing, and logic operations required by intensive data are completed in a memory, so that power consumption and delay caused by data migration are effectively reduced. Meanwhile, the memory array is of a highly-multiplexed structure, and large-scale parallel computing processing can be performed in the memory, so that the computing speed is improved, the access times of the memory in the computing process are reduced, the area of a logic unit in the processor is reduced, and more space is reserved for processing complex operation.

The memory computing carrier has various storage structures such as SRAM, RRAM and the like, and the SRAM has the advantages of high data reading speed and good compatibility with an advanced logic process, so the memory computing carrier is widely applied to memory computing. The conventional in-memory multiplication method includes in-memory computation (IMC) in a current domain, which has higher computation efficiency than the digital technique and can satisfy multi-bit multiplication. However, the memory calculation in the current domain aims at unsigned operands in the operation process, and cannot meet the multiplication operation of operands with sign bits, and especially when both the stored data and the external data have sign bits, the final result cannot accurately reflect the sign bits of the operation result, and further, the final result has calculation errors.

Disclosure of Invention

In view of the above, it is desirable to provide a charge domain memory calculation circuit and a memory circuit having a positive/negative number calculation function based on the charge domain memory calculation circuit, which are capable of solving the problem that the conventional current domain memory calculation method cannot satisfy the multiplication between signed operands.

In order to realize the purpose, the invention adopts the following technical scheme:

a charge domain memory computing circuit comprises four storage units T1-T4 used for storing weight data, a storage unit TSIGN used for storing token bit data, a multiply-accumulate module MAC and a PRE-charge module C-PRE. Wherein the sign bit comprises a positive sign bit or a negative sign bit.

The signal output ends of T1-T4 are correspondingly connected with the four signal input ends of the multiply-accumulate module MAC, and TSIGN is used for controlling the sign bit of the voltage signal input to the multiply-accumulate module MAC by the PRE-charge module C-PRE. Four result output ends of the multiply-accumulate module MAC are connected with the output signal lines Vsum [3] to Vsum [0] in a one-to-one correspondence manner.

When the charge domain memory calculation circuit executes signed multiplication operation, a voltage signal of signed external four-bit weight data and a voltage signal of a sign bit TSIGN PRE-stored with internal weight data are input into a PRE-charging module C-PRE, the voltage signal of the external four-bit weight after sign bit operation is input into a multiplication and accumulation module MAC through the PRE-charging module C-PRE, then the voltage signals of the four-bit weight stored in T1-T4 are input into the multiplication and accumulation module MAC, and Vsum [3] to Vsum [0] output voltage signals representing the operation result of the multiplication and accumulation module MAC.

Further, the multiply-accumulate module MAC includes NMOS transistors M6, M8, M10, M12, PMOS transistors M7, M9, M11, M13, computation capacitors C1, C2, C3, C4, and transmission gates W1, W2, W3, W4.

The drains of the M6, M8, M10, and M12 are used as signal input terminals of the multiply-accumulate module MAC and are correspondingly connected to the data output terminals of T4, T3, T2, and T1, the sources of the M6, M8, M10, and M12 are correspondingly connected to the gates of the M7, M9, M11, and M13, and the gates of the M6, M8, M10, and M12 are controlled by the control signal line CAL. Two ends of C1, C2, C3 and C4 are respectively connected with signal lines V1 and V2, a source electrode and a drain electrode of M7 are respectively connected with two ends of C1, a source electrode and a drain electrode of M9 are respectively connected with two ends of C2, a source electrode and a drain electrode of M11 are respectively connected with two ends of C3, and a source electrode and a drain electrode of M13 are respectively connected with two ends of C4. Two ends of C2 and C4 are used as result output ends of the multiplication accumulation module MAC and are respectively correspondingly connected with the output signal lines Vsum [3] to Vsum [1 ]. The capacitance values of C1 and C3 are the same, the capacitance values of C2 and C4 are the same, and the capacitance values of C1 and C3 are twice the capacitance values of C2 and C4.

The signal lines V1 and V2 are connected with transmission gates W1, W2, W3 and W4, W1 is used for controlling the on-off between C1 and C2, W2 is used for controlling the on-off between C2 and Vsum [3] and Vsum [2], W3 is used for controlling the on-off between C3 and C4, and W4 is used for controlling the on-off between C4 and Vsum [1] and Vsum [0 ].

Further, the PRE-charging module C-PRE comprises six NMOS transistors M1-M5 and M14. The drain of M14 is connected to the signal line V1, and the source of M14 is connected to the drains of M1 and M2. The drain of M5 is connected to the signal line V2, the source of M5 is connected to the drains of M3 and M4, and the gates of M14 and M5 are connected to the control signal line PRE.

The sources of M1 and M3 are connected with the voltage signal Vdac of external four-bit weight data, and the sources of M2 and M4 are connected with 1/2VDD. The gates of M2 and M3 are connected to the signal output line TSIGN. The gates of M1 and M4 are connected to the output line of TSIGN's complement signal.

Further, each of the memory cells T1 to T4 and TSIGN is a 6T memory cell including 6 transistors. The 6T memory cell comprises 2 PMOS transistors P1 and P2 and 4 NMOS transistors N1, N2, N3 and N4. Wherein, P1 and N1 form one inverter structure, P2 and N2 form another inverter structure, and N3 and N4 respectively serve as transmission tubes. The sources of P1 and P2 are connected to VDD, and the sources of NM1 and NM2 are connected to ground. The drain electrode of P1, the drain electrode of N1, the grid electrode of P2 and the grid electrode of N2 are connected as a storage node Q and connected with the drain electrode of N3. The drain of P2, the drain of N2, the gate of P1 and the gate of N1 are connected to each other as a storage node QB and are connected to the drain of N4. The gates of N3 and N4 are connected with a word line WL. The source of N3 is connected to bit line BL, and the source of N4 is connected to bit line BLB. The bit line BL serves as a signal output terminal of the 6T memory cell.

Further, the sign of the multiplication result of the charge domain memory calculation circuit is determined by the stored data of the memory cell TSIGN and the voltage signal Vdac. If the data stored in memory cell TSIGN is "1", it indicates that the sign bit of the weight data stored in memory cells T4 to T1 is a negative sign bit. When the data stored in the memory cell TSIGN is "0", it indicates that the sign bit of the weight data stored in the memory cells T4 to T1 is a positive sign bit.

When the data stored in the memory cell TSIGN is "1", the voltage difference V1-V2=1/2VDD-Vdac between the signal lines V1, V2. When the memory cell TSIGN stores data "0", the voltage difference V1-V2 of the signal lines V1, V2= Vdac-1/2VDD.

Furthermore, the charge domain memory computing circuit realizes multiplication computation and comprises a pre-charging stage, an accumulation stage and a charge sharing stage which are sequentially performed. The precharge stage is used to precharge the voltage signals of the external four-bit weight data to two ends of the computing capacitors C1, C2, C3, and C4. The accumulation and multiplication stage is used for multiplying the stored weight data and the external four-bit weight data. The charge sharing stage is used for carrying out charge sharing on the multiplication results of the T4, the T3, the T2 and the T1 and the external four-bit weight data so as to realize weighting and accumulation operation.

Further, the specific operation of the pre-charging stage is as follows:

high-level signals are input to M14 and M5 through the control signal line PRE, and high-level signals are input to the transfer gates W1 and W3. Low-level signals are input to M6, M8, M10, M12 through the control signal line CAL, and low-level signals are input to the transfer gates W2, W4. The voltages at the two ends of the calculating capacitors C1, C2, C3 and C4 are respectively the voltages of the signal lines V1 and V2.

Further, the specific operation of the multiply-accumulate stage is as follows:

a low-level signal is input to M14 and M5 through the control signal line PRE. High-level signals are input to M6, M8, M10, and M12 through the control signal line CAL, and low-level signals are input to the transfer gates W1 to W4. The calculating capacitors C1, C2, C3 and C4 are disconnected from each other and are communicated with the storage units T1 to T4 in a one-to-one correspondence mode.

Further, the specific operation of the charge sharing stage is as follows:

low-level signals are input to M6, M8, M10, and M12 through the control signal line CAL, and high-level signals are input to the transfer gates W1 to W4. The counting capacitors C1, C2 and C3, C4 are connected and disconnected from the memory cells T1-T4. The voltage difference of the output signal lines Vsum [3] and Vsum [2] represents the multiplication result of T4, T3 and the external four-bit weight data. The voltage difference of the output signal lines Vsum [1] and Vsum [0] represents the multiplication result of T2, T1 and the external four-bit weight data.

The invention also relates to a storage circuit with a positive and negative number operation function, which comprises a storage array, an output signal line group, a digital-to-analog conversion module, a time sequence control module, an analog-to-digital conversion module and a digital weighting and accumulation module.

The storage array is formed by a plurality of same storage units in an N × M array form. Where N represents the number of rows of the storage array and M represents the number of columns of the storage array.

The output signal line group comprises M groups of output signal lines Vsum [3] to Vsum [0], and each computing unit in each column is connected to the same group of output signal lines Vsum [3] to Vsum [0 ].

The digital-to-analog conversion module is used for converting the external four-bit weight data with the symbol into a corresponding analog voltage signal and inputting the analog voltage signal into any one of the storage units in the storage array.

The time sequence control module is used for generating control signals required by calculation operation.

The analog-to-digital conversion module is used for converting the analog voltage signal output by any column in the storage array through the output signal line group into a corresponding digital signal.

And the digital weighting and accumulating module is used for carrying out weighting and accumulating operation on the digital signals output by the analog-to-digital conversion module so as to output a digital quantity result of multiplication operation of the storage units in any column and external four-bit weight data.

In particular, the storage unit adopts the circuit structure of the charge domain memory computing circuit and realizes the complete function of the circuit.

The technical scheme provided by the invention has the following beneficial effects:

the charge domain memory computing circuit designed by the invention can realize multiplication operation of 5-bit storage weight data comprising sign bits and externally input 4-bit weight data; the internal multiplication is realized through digital-to-analog conversion, highly parallel data input is realized on the basis of meeting the precision, great advantages are realized in energy efficiency and calculation period, and the requirement of multi-bit multiplication with sign bit is met.

Drawings

Fig. 1 is a circuit configuration diagram of a charge domain memory computing circuit according to embodiment 1 of the present invention;

FIG. 2 is a circuit configuration diagram based on the 6T memory cell of FIG. 1;

fig. 3 is an overall configuration diagram of a storage circuit having a positive/negative number operation function according to embodiment 2 of the present invention;

FIG. 4 is a circuit structure diagram of a digital-to-analog conversion module based on FIG. 3;

fig. 5 is a timing diagram of control signals required to perform a charge domain multiplication operation with internally stored weight data of "1010" based on fig. 1.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Example 1

Referring to fig. 1, the present embodiment describes a charge domain memory calculation circuit, four memory cells T1 to T4 for storing weight data, one memory cell TSIGN for storing token bit data, and a multiply-accumulate module MAC and a precharge module C-PRE. Wherein the sign bit comprises a positive sign bit or a negative sign bit.

The signal output ends of T1-T4 are correspondingly connected with the four signal input ends of the multiply-accumulate module MAC, and TSIGN is used for controlling the sign bit of the voltage signal input to the multiply-accumulate module MAC by the PRE-charge module C-PRE. Four result output ends of the multiplication accumulation module MAC are correspondingly connected with the output signal lines Vsum [3] to Vsum [0 ].

Based on the above circuit structure, please refer to fig. 1. The data stored by memory cell TSIGN characterizes the sign bit of the internal storage weight data, which is placed in the first column. If memory cell TSIGN stores data of "1", it indicates that the sign bit of the internal storage weight data is negative. If the data stored in memory cell TSIGN is "0," it indicates that the sign bit of the internal storage weight data is positive. Four-bit internal storage weight data multiplied by the external weight data are sequentially stored in the memory cells T4 to T1 from the high order to the low order. The multiplication operation is realized by multiplying each bit of the internal storage weight data by the external four-bit weight data, and finally performing weighting and accumulation to obtain a final operation result.

To facilitate the charge domain memory calculation circuit of this embodiment, the specific structure of each module and memory cell will be described in detail below with reference to fig. 1.

The multiplication accumulation module MAC comprises NMOS transistors M6, M8, M10 and M12, PMOS transistors M7, M9, M11 and M13, calculation capacitors C1, C2, C3 and C4 and transmission gates W1, W2, W3 and W4. The drains of the M6, M8, M10, and M12 are used as signal input terminals of the multiply-accumulate module MAC and are correspondingly connected to the data output terminals of T4, T3, T2, and T1, the sources of the M6, M8, M10, and M12 are correspondingly connected to the gates of the M7, M9, M11, and M13, and the gates of the M6, M8, M10, and M12 are controlled by the control signal line CAL. Two ends of C1, C2, C3 and C4 are respectively connected with signal lines V1 and V2, a source electrode and a drain electrode of M7 are respectively connected with two ends of C1, a source electrode and a drain electrode of M9 are respectively connected with two ends of C2, a source electrode and a drain electrode of M11 are respectively connected with two ends of C3, and a source electrode and a drain electrode of M13 are respectively connected with two ends of C4. Two ends of C2 and C4 are used as the result output ends of the MAC and are correspondingly connected with the output signal lines Vsum [3] to Vsum [1] respectively. The capacitance values of C1 and C3 are the same, the capacitance values of C2 and C4 are the same, and the capacitance values of C1 and C3 are twice the capacitance values of C2 and C4. Be connected with transmission gate W1, W2, W3, W4 on the signal line V1, V2, W1 is used for controlling the break-make between C1 and the C2, and W2 is used for controlling the break-make between C2 and Vsum [3], vsum [2], and W3 is used for controlling the break-make between C3 and C4, and W4 is used for controlling the break-make between C4 and Vsum [1], vsum [0 ].

The PRE-charging module C-PRE comprises six NMOS tubes M1-M5 and M14. The drain of M14 is connected to the signal line V1, and the source of M14 is connected to the drains of M1 and M2. The drain of M5 is connected to the signal line V2, the source of M5 is connected to the drains of M3 and M4, and the gates of M14 and M5 are connected to the control signal line PRE. The sources of M1 and M3 are connected with the voltage signal Vdac of external four-bit weight data, and the sources of M2 and M4 are connected with 1/2VDD. The gates of M2 and M3 are connected to the signal output line TSIGN. The gates of M1 and M4 are connected to the output line of TSIGN's complement signal.

Each of the memory cells T1 to T4 and TSIGN is a 6T memory cell including 6 transistors. Referring to fig. 2, fig. 2 shows a circuit structure of a 6T memory cell. The 6T memory cell comprises 2 PMOS transistors P1 and P2 and 4 NMOS transistors N1, N2, N3 and N4. Wherein, P1 and N1 form one inverter structure, P2 and N2 form another inverter structure, and N3 and N4 respectively serve as transmission tubes. The sources of P1 and P2 are connected to VDD, and the sources of NM1 and NM2 are connected to ground. The drain of P1, the drain of N1, the gate of P2 and the gate of N2 are connected to each other as a storage node Q and connected to the drain of N3. The drain of P2, the drain of N2, the gate of P1 and the gate of N1 are connected to each other as a storage node QB and connected to the drain of N4. The gates of N3 and N4 are connected with a word line WL. The source of N3 is connected to bit line BL, and the source of N4 is connected to bit line BLB. The bit line BL serves as a signal output terminal of the 6T memory cell.

Based on the specific circuit structure of the charge domain memory calculation circuit, as can be understood from fig. 1, the operation results of the memory cells T4 and T3 and the calculation capacitors C1 and C2 are output through the output signal lines Vsum [3] and Vsum [2 ]. On the basis that the capacitance value of C1 is twice that of C2, T4, M6, M7 and C1 form a high-order multiplication and accumulation unit M-MAC, and T3, M8, M9 and C2 form a low-order multiplication and accumulation unit L-MAC. The multiplication result of the storage weight data of T4 and T3 and the external weight data is output through Vsum [3] and Vsum [2], i.e. the multiplication result of the storage weight data of T4 and T3 and the external weight data can be known through the voltage difference between Vsum [3] and Vsum [2 ].

The operation results of the memory cells T2 and T1 and the calculating capacitors C3 and C4 are outputted through the output signal lines Vsum [1] and Vsum [0 ]. On the basis that the capacitance value of C3 is twice that of C4, T2, M10, M11 and C3 form a high-order multiplication and accumulation unit M-MAC, and T1, M12, M14 and C4 form a low-order multiplication and accumulation unit L-MAC. The multiplication result of the storage weight data of T2 and T1 and the external weight data is output through Vsum [1] and Vsum [0], that is, the multiplication result of the storage weight data of T2 and T1 and the external weight data can be known through the voltage difference between Vsum [1] and Vsum [0 ].

M1 and M3 are connected with Vdac, and M2 and M4 are connected with 1/2VDD. As will be understood from fig. 1, when the data stored in the memory cell TSIGN is "1", the voltage difference V1-V2=1/2VDD-Vdac between the signal lines V1 and V2. When the data stored in the memory cell TSIGN is "0", the voltage difference V1-V2 between the signal lines V1 and V2= Vdac-1/2VDD.

The sign bit of the external weight data is positive if Vdac >1/2VDD by taking 1/2VDD as a reference base point for the positive and negative sign bits of the external weight data. If Vdac <1/2VDD, it indicates that the sign bit of the external weight data is negative. The sign bit operation principle can be seen in the following table:

sign bit operation logic table for external weight data and internal storage weight data

As can be seen from the above table, when TSIGN stores data "1" and Vdac >1/2VDD, V1-V2=1/2VDD-Vdac <0, and the sign bit of the final operation result is negative. When TSIGN stores data of "1" and Vdac <1/2VDD, V1-V2=1/2VDD-Vdac >0, and the sign bit of the final operation result is positive. When TSIGN stores data of "0" and Vdac >1/2VDD, V1-V2= Vdac-1/2vdd > -0, and the sign bit of the final operation result is positive. When TSIGN stores data of "0" and Vdac <1/2VDD, V1-V2= Vdac-1/2vdd are tied to 0, and the sign bit of the final operation result is negative.

Therefore, the input signal of the PRE-charge module C-PRE comprises data stored by Vdac and TSIGN, the value of the charge PRE-charged to the calculation capacitor can be determined according to the input symbol bit data stored by Vdac and TSIGN, and the symbol corresponding to the calculation result is determined.

The multiplication operation of the charge domain memory calculation circuit of the present embodiment is described with reference to the sign bit calculation logic. The multiplication operation of the charge domain includes three stages, which are a pre-charge stage, an accumulation multiplication stage, and a charge sharing stage. The specific operation of each stage is described in detail below:

1. pre-fill stage

Signals of the control signal line PRE and the transfer gates W1 and W3 are set to a high level, and signals of the control signal line CAL and the transfer gates W2 and W4 are set to a low level. The high level of the transmission gates W1-W4 is conducted, the low level is disconnected, so the signal lines V1, V2 are connected with the PRE-charging module C-PRE, the analog voltage signal of the external weight data is transmitted to V1, V2, the output signal lines Vsum 3-Vsum 0 and the storage units T1-T4 are disconnected, and the voltages at the two ends of the calculation capacitors C1-C4 are the voltages V1 and V2 respectively, thereby achieving the purpose of PRE-charging.

2. Multiplication accumulating stage

The control signal line PRE is pulled low, disconnecting the precharge module C-PRE. Setting W1, W2, W3 and W4 to be low level, pulling the CAL signal high, wherein C1 is communicated with T4, C2 is communicated with T3, C3 is communicated with T2, C4 is communicated with T1, C1-C4 are mutually disconnected, and the voltage difference between two ends of C1-C4 represents the multiplication result of external weight data and the internal storage data of the storage unit.

3. A charge sharing stage: after the multiplication stage is completed, the calculation capacitors C1-C4 store different calculation voltage signals, W1, W2, W3 and W4 are set to high levels, the control signal line CAL is pulled low, the multiplication and accumulation module MAC is disconnected from the storage units T1-T4 at the moment, V1 and V2 are communicated with Vsum [3] to Vsum [0], charge sharing operation is carried out, and high-order weighting and accumulation operation of partial multiplication in an analog domain is realized.

To further understand the calculation principle of the charge domain memory calculation circuit of the present embodiment, specific numerical values are described below.

First, the analog logic of the external weight data is explained, taking 1/2VDD as the analog of 0, the value is increased by 1 every 1/16VDD, and the value is decreased by 1 every 1/16VDD, so that the value is positive above 1/2VDD and negative below 1/2VDD, and the input range of the external weight data is-7 to 7.

Take the example of +5 for the external input weight data and-6 for the internal storage weight data. The +5 is converted to 13/16VDD analog, and the binary data corresponding to the internal storage weight data is 11010, so that the binary data is stored in the corresponding storage unit. The first is a SIGN bit, "1" indicates that the data is a negative number, and the remaining four bits of the weight data W [3] W [2] W [1] W [0] are 1010 in this order. "1" is stored in the memory cell TSIGN, weight data W [3] =1 is stored in the memory cell T4, weight data W [2] =0 is stored in the memory cell T3, weight data W [1] =1 is stored in the memory cell T2, and weight data W [0] =0 is stored in the memory cell T1.

During the precharge operation, signals PRE, W1, and W3 are set to high level, signals CAL, W2, and W4 are set to low level, and a voltage difference V1-V2=1/2VDD-Vdac = -5/16VDD between V1 and V2.

During the multiplication phase, W1, W2, W3, W4 are set to low level, and the CAL signal is pulled high. Wherein W3 and W2 of the weighted data bit are one group, W1 and W0 are the other group, when the stored weighted data bit is 0, the two ends of the capacitor are short-circuited, so that the voltage difference between the capacitors is about 0, when the stored weighted data bit is 1, the voltage difference between the capacitors is about equal to the voltage difference between V1 and V2, namely-5/16 VDD, at this stage, the multiplication results of the input analog quantity and the weighted data are W3, W2, W1 and W0 are respectively-5/169VDD, 0, -5/16940. Thus, multiplication of the data bits with the analog signal is achieved.

In the charge sharing stage, W1, W2, W3 and W4 are set to be high level, the CAL signal is pulled down, the charge sharing operation is carried out, and the operation of carrying out high-level weighting and accumulation in the analog domain by partial multiplication is realized. The formula of the whole calculation process is equivalent to:

Vsum[3]-Vsum[2]＝2×W[3]×(Vdac-1/2VDD)+W[2]×(Vdac-1/2VDD)，

Vsum[1]-Vsum[0]＝2×W[1]×(Vdac-1/2VDD)+W[0]×(Vdac-1/2VDD)。

the final result can be weighted and accumulated in the digital domain to yield the final calculated value. The calculation operation formula of the digital domain is equivalent to:

Vsum＝4×(Vsum[3]-Vsum[2])+(Vsum[1]-Vsum[0])。

thus, it implements a multiply-accumulate calculation of signed 4-bit external weight data with 5-bit weights containing sign bits. The method realizes highly parallel data input, has great advantages in energy efficiency and calculation period, and meets the multiplication requirement of multi-bit signed bit.

Example 2

Referring to fig. 3, fig. 3 is a diagram illustrating an overall structure of a memory circuit having a positive/negative number operation function. The embodiment introduces a storage circuit with a positive and negative number operation function, which comprises a storage array, an output signal line group, a digital-to-analog conversion module, a time sequence control module, an analog-to-digital conversion module, and a digital weighting and accumulation module.

The digital-to-analog conversion module is used for converting the external four-bit weight data with the symbol into a corresponding analog voltage signal and inputting the analog voltage signal into any one of the storage units in the storage array. The time sequence control module is used for generating control signals required by calculation operation. The analog-to-digital conversion module is used for converting the analog voltage signal output by any column in the storage array through the output signal line group into a corresponding digital signal. The digital weighting and accumulating module is used for carrying out weighting and accumulating operation on the digital signals output by the analog-to-digital conversion module so as to output a digital quantity result of multiplication operation of the storage units in any column and external four-bit weight data.

The storage unit adopts the circuit structure of the charge domain memory computing circuit and realizes the complete function of the circuit.

The structure of the DAC module is described in detail with reference to fig. 4. Referring to the conversion logic of the external weight data of embodiment 1, the digital-to-analog conversion module includes four capacitors C _X 、4C _X 、2C _X 、C _X Three switch tubes SW2, SW1, SW0 and a PMOS tube Pr. Wherein is C _X 、4C _X 、2C _X 、C _X And one end of the output terminal is connected in common and used as an output terminal for outputting the analog voltage quantity corresponding to the external weight data, namely, the Vdac is output. C _X The other end is connected with 1/2VDD,4C _X 、2C _X 、C _X The other ends of the SW2, the SW1 and the SW0 are correspondingly connected with one ends of the SW2, the SW1 and the SW0 one by one, the other ends of the SW2, the SW1 and the SW0 are connected with VDD, 1/2VDD or VSS, and a connection object is determined according to the switching of the SW2, the SW1 and the SW 0. Pr with source connected to VSS, drain connected to C _X 、4C _X 、2C _X 、C _X The common terminals are connected, and the grid is controlled by Rstn.

The digital-to-analog conversion module DAC can convert the input signed digital quantity into an analog quantity. Before the input digital signal arrives, rstn is set to be low level, all switches are set to be VDD/2, the voltage corresponding to Vdac is about VDD/2, when the digital signal arrives, the Rstn signal is pulled high, SW 0-SW 2 voltage switches are controlled by a control circuit of a DAC, the digital signal is converted into an analog signal, for example, if the input signal is 0, the digital quantity is converted into four-bit binary number 0000, the corresponding analog signal is VDD/2, SW0-SW 2 is connected with VDD/2, and the Vdac =1/2VDD. If the input signal is +5, the digital value is 0101, SW [0] and SW [2] are set to VDD, and the D/A conversion result is about 13/16VDD. If the input signal-5 is converted into digital quantity-0101, the D/A conversion result is about 3/16VDD. The output result of the DAC is input to the precharge module PRE.

In this embodiment, each 5 memory columns form one block, the first column of each block stores SIGN bits (SIGN is 1, weight is negative, SIGN is 0, weight is positive), and data bits of other columns (decreasing from left to right in sequence), and in the convolutional layer calculation process of the neural network, it is necessary to determine the storage method of the weight according to the size of the convolutional Kernel (Kernel), for example, for the convolutional Kernel of 3 × 3, it is necessary to activate the array of 3 blocks to participate in the operation of the convolutional layer.

In the CNN convolutional layer, each Block corresponds to a convolutional core, and the trained weight data is stored in each row of memory cells from high to low. The whole circuit is set to the calculation mode, and the multiplication operation is specifically performed in the same manner as in embodiment 1.

In the operation of charge sharing, the operation of weighting and accumulating high bits in the analog domain by partial multiplication is realized. The formula of the whole calculation process is equivalent to:

wherein n is the size of the convolution kernel, i refers to a row which is particularly involved in multiplication, and the value range of i is 1-i-N.

All the calculation processes are realized on a charge domain, the calculated analog quantity is generated, the digital quantity corresponding to the calculation is quantized through the successive approximation ADC, and the final calculated value is obtained through the operations of weighting and accumulating in the digital domain. The calculation operation formula of the digital domain is equivalent to:

Vsum＝4×(Vsum[3]-Vsum[2])+(Vsum[1]-Vsum[0])。

this embodiment has the same advantageous effects as embodiment 1, and in the array configured based on embodiment 1, the operation of the entire array can be performed, and the multiply-accumulate calculation of the signed 4-bit external weight data and the 5-bit weight including the sign bit can be realized.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A charge domain memory computing circuit is characterized by comprising four storage units T1-T4 for storing weight data, a storage unit TSIGN for storing representation symbol bit data, a multiplication accumulation module MAC and a PRE-charge module C-PRE; wherein the sign bit comprises a positive sign bit or a negative sign bit;

the signal output ends of T1-T4 are correspondingly connected with four signal input ends of a multiply-accumulate module MAC, and TSIGN is used for controlling the sign bit of a voltage signal input to the multiply-accumulate module MAC by a PRE-charge module C-PRE; four result output ends of the multiplication accumulation module MAC are correspondingly connected with the output signal lines Vsum [3] to Vsum [0] one by one;

when the charge domain memory computing circuit executes signed multiplication operation, a voltage signal of signed external four-bit weight data and a voltage signal of a sign bit TSIGN PRE-stored with internal weight data are input into a PRE-charging module C-PRE, the voltage signal of the external four-bit weight after sign bit operation is input into a multiplication accumulation module MAC through the PRE-charging module C-PRE, then the voltage signals of the four-bit weights stored by T1-T4 are input into the multiplication accumulation module MAC, and Vsum [3] to Vsum [0] output voltage signals representing the operation result of the multiplication accumulation module MAC.

2. The charge domain memory computing circuit of claim 1, wherein the multiply-accumulate module MAC comprises NMOS transistors M6, M8, M10, M12, PMOS transistors M7, M9, M11, M13, computing capacitors C1, C2, C3, C4, and transmission gates W1, W2, W3, W4;

the drains of the M6, M8, M10 and M12 are used as signal input ends of the MAC of the multiplication and accumulation module and correspondingly connected with the data output ends of the T4, T3, T2 and T1, the sources of the M6, M8, M10 and M12 are correspondingly connected with the gates of the M7, M9, M11 and M13, and the gates of the M6, M8, M10 and M12 are controlled by a control signal line CAL; two ends of C1, C2, C3 and C4 are respectively connected with signal lines V1 and V2, a source electrode and a drain electrode of M7 are respectively connected with two ends of C1, a source electrode and a drain electrode of M9 are respectively connected with two ends of C2, a source electrode and a drain electrode of M11 are respectively connected with two ends of C3, and a source electrode and a drain electrode of M13 are respectively connected with two ends of C4; two ends of C2 and C4 are used as result output ends of the MAC of the multiply-accumulate module and are correspondingly connected with output signal lines Vsum [3] to Vsum [1] respectively; the capacitance values of C1 and C3 are the same, the capacitance values of C2 and C4 are the same, and the capacitance values of C1 and C3 are twice of the capacitance values of C2 and C4;

3. The CCD-CAM computing circuit of claim 2, wherein the PRE-charge module C-PRE comprises six NMOS transistors M1-M5, M14; the drain electrode of M14 is connected with the signal line V1, and the source electrode of M14 is connected with the drain electrodes of M1 and M2; the drain electrode of M5 is connected with the signal line V2, the source electrode of M5 is connected with the drain electrodes of M3 and M4, and the grid electrodes of M14 and M5 are connected with the control signal line PRE;

the sources of M1 and M3 are connected with the voltage signal Vdac of external four-bit weight data, and the sources of M2 and M4 are connected with 1/2VDD;

the gates of M2 and M3 are connected with a signal output line of TSIGN; the gates of M1 and M4 are connected to the output line of TSIGN's complement signal.

4. The charge domain memory calculation circuit of claim 3, wherein each of the memory cells T1-T4 and TSIGN is a 6T memory cell comprising 6 transistors; the 6T storage unit comprises 2 PMOS tubes P1 and P2 and 4 NMOS tubes N1, N2, N3 and N4; p1 and N1 form an inverter structure, P2 and N2 form another inverter structure, and N3 and N4 are used as transmission tubes respectively; the sources of P1 and P2 are connected with VDD, and the sources of NM1 and NM2 are grounded; the drain electrode of the P1, the drain electrode of the N1, the grid electrode of the P2 and the grid electrode of the N2 are connected to be used as a storage node Q and are connected to the drain electrode of the N3; the drain electrode of the P2, the drain electrode of the N2, the grid electrode of the P1 and the grid electrode of the N1 are connected to be used as a storage node QB and are connected to the drain electrode of the N4; the grid electrodes of the N3 and the N4 are connected with a word line WL; the source of N3 is connected with the bit line BL, and the source of N4 is connected with the bit line BLB; the bit line BL is used as the signal output terminal of the 6T memory cell.

5. The charge domain memory calculation circuit according to claim 4, wherein a sign of a multiplication result of the charge domain memory calculation circuit is determined by the stored data of the memory cell TSIGN and the voltage signal Vdac; if the data stored in the memory cell TSIGN is "1", it indicates that the sign bit of the weight data stored in the memory cells T4 to T1 is a negative sign bit; if the data stored in memory cell TSIGN is "0", it indicates that the sign bit of the weight data stored in memory cells T4 to T1 is a positive sign bit;

when the data stored in the memory cell TSIGN is "1", the voltage difference V1-V2=1/2VDD-Vdac between the signal lines V1, V2; when the memory cell TSIGN stores data "0", the voltage difference V1-V2= Vdac-1/2VDD of the signal lines V1, V2.

6. The charge domain memory computing circuit of claim 5, wherein the charge domain memory computing circuit implements multiplication computation comprising a pre-charge stage, an accumulation stage, and a charge sharing stage in sequence; the pre-charging stage is used for pre-charging voltage signals of external four-bit weight data to two ends of the computing capacitors C1, C2, C3 and C4; the accumulation and multiplication stage is used for carrying out multiplication operation on the stored weight data and external four-bit weight data; the charge sharing stage is used for carrying out charge sharing on the multiplication results of the T4, the T3, the T2 and the T1 and the external four-bit weight data so as to realize weighting and accumulation operation.

7. The charge domain memory computing circuit of claim 6, wherein the pre-charge stage is specifically operated as follows:

inputting high-level signals to M14 and M5 through a control signal line PRE, and inputting high-level signals to transfer gates W1 and W3; inputting low-level signals to M6, M8, M10, M12 through a control signal line CAL, and inputting low-level signals to transfer gates W2, W4; the voltages at the two ends of the calculating capacitors C1, C2, C3 and C4 are respectively the voltages of the signal lines V1 and V2.

8. The charge domain memory computing circuit of claim 6, wherein the multiplication phase operates as follows:

inputting low level signals to M14 and M5 through a control signal line PRE; inputting high-level signals to M6, M8, M10 and M12 through a control signal line CAL, and inputting low-level signals to transmission gates W1-W4; the calculation capacitors C1, C2, C3 and C4 are disconnected with each other and are in one-to-one correspondence connection with the storage units T1 to T4.

9. The charge domain memory computing circuit of claim 6, wherein the charge sharing phase operates as follows:

inputting low-level signals to M6, M8, M10 and M12 through a control signal line CAL, and inputting high-level signals to transmission gates W1-W4; connecting the calculating capacitors C1 and C2 and C3 and C4 and disconnecting the calculating capacitors from the storage units T1 to T4; the voltage difference of the output signal lines Vsum [3] and Vsum [2] represents the multiplication result of T4 and T3 and external four-bit weight data; the voltage difference of the output signal lines Vsum [1] and Vsum [0] represents the multiplication result of T2, T1 and the external four-bit weight data.

10. A memory circuit having a positive/negative operation function, comprising:

an arithmetic array in an N × M array form constituted by a plurality of identical arithmetic units; wherein N represents the row number of the storage and computation array, and M represents the column number of the storage and computation array;

an output signal line group which comprises M groups of output signal lines Vsum [3] to Vsum [0], wherein each computing unit in each column is connected to the same group of output signal lines Vsum [3] to Vsum [0 ];

the digital-to-analog conversion module is used for converting the external four-bit weight data with the symbol into a corresponding analog voltage signal and inputting the analog voltage signal into any one of the storage units in the storage array;

the time sequence control module is used for generating a control signal required by calculation operation;

the analog-to-digital conversion module is used for converting the analog voltage signal output by any column in the storage array through the output signal line group into a corresponding digital signal;

the digital weighting and accumulating module is used for carrying out weighting and accumulating operation on the digital signals output by the analog-to-digital conversion module so as to output a digital quantity result of multiplication operation of the storage units in any column and external four-bit weight data;

wherein, the storage unit adopts the circuit structure of the charge domain memory computing circuit according to any one of claims 1-9, and realizes the complete function of the circuit.