CN113269317B

CN113269317B - Pulse neural network computing array

Info

Publication number: CN113269317B
Application number: CN202110400723.0A
Authority: CN
Inventors: 李丽; 沈思睿; 傅玉祥; 陈沁雨; 徐瑾; 王心沅; 何书专
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2024-05-31
Anticipated expiration: 2041-04-14
Also published as: CN113269317A

Abstract

The invention provides a pulse neural network computing array which can support convolution-pooling continuous operation, can support parallel reasoning operation of the pulse neural network and improves the execution efficiency in the algorithm reasoning process. The invention comprises a pulse neural network computing cluster formed by a plurality of pulse neural network computing units, wherein each pulse neural network computing unit comprises a membrane potential accumulator, a pulse transmitter, a pooling buffer area and a pooling comparator. The membrane potential accumulator is electrically connected with the pulse emitter, and the pulse emitter is electrically connected with the pooling buffer area and the pooling comparator. The membrane potential accumulator is used for accumulating the input pulse sequence; the pulse transmitter judges whether to transmit a pulse to the next stage according to the membrane potential input by the accumulator; the pooling buffer area counts and buffers the pulses of the pulse transmitter; the pooling comparator performs a comparison operation on the input of the buffer.

Description

Pulse neural network computing array

Technical Field

The invention relates to the field of hardware implementation in the field of neural network computation, in particular to a pulse neural network computation array.

Background

The impulse neural network (Spiking Neural Networks, SNNs) uses impulse neurons as a computing unit, and can simulate the information coding and processing process of the human brain. Different from the traditional deep neural network, the information transmission is carried out by using specific values, and the pulse neural network can provide sparse but high-accuracy computing capability by carrying out information transmission through each pulse transmitting time in a pulse sequence. The impulse neurons accumulate inputs to the membrane voltage and when a specific threshold is reached, pulse firing is performed, enabling event-driven calculations. Because of the sparsity of the impulse events and the event-driven calculation form, the impulse neural network can provide excellent energy utilization efficiency and is the neural network structure closest to the biological brain at present.

According to the characteristics of the impulse neural network data, the sparsity of the traditional neural network computing array structure cannot be utilized to the maximum extent, and an operational array needs to be designed according to the computing characteristics of the impulse neural network. While some impulse neural network computing arrays have been disclosed, most only support the operation of fully connected and convolved structures. However, these designs have the disadvantage of poor support for the pooling layer in the impulse neural network and require a significant overhead to complete the pooling operation in hardware. The array cannot adapt to various network structures, and the array has poor universality.

Due to the event-driven characteristics of the impulse neural network algorithm model, the convergence speed needs to be increased and the membrane potential needs to be refreshed in the reasoning process. How to design the computing structure to adapt to the refresh rate of the event and increase the speed of reasoning convergence is also part of the study of the impulse neural network computing unit.

In summary, how to design a pulse neural network computing array with high resource utilization, low operation delay and low cost, which has good support for each layer of the pulse neural network, is a problem to be solved in the research of the existing pulse neural network technology.

Disclosure of Invention

The invention aims to: the pulse neural network computing array aims to overcome the defects of the prior art, can better support a pulse neural network pooling layer, comprehensively considers the aspects of precision, area power consumption and operation time delay of hardware implementation, and is provided.

The technical scheme is as follows: aiming at the problems, the invention provides a pulse neural network computing array which can support convolution-pooling continuous operation, can support parallel reasoning operation of the pulse neural network and improves the execution efficiency in the algorithm reasoning process. The invention comprises a pulse neural network computing cluster formed by a plurality of pulse neural network computing units, wherein each pulse neural network computing unit comprises a membrane potential accumulator, a pulse transmitter, a pooling buffer area and a pooling comparator. The membrane potential accumulator is electrically connected with the pulse emitter, and the pulse emitter is electrically connected with the pooling buffer area and the pooling comparator. The membrane potential accumulator is used for accumulating the input pulse sequence; the pulse transmitter judges whether to transmit a pulse to the next stage according to the membrane potential input by the accumulator; the pooling buffer area counts and buffers the pulses of the pulse transmitter; the pooling comparator performs a comparison operation on the input of the buffer.

In a further embodiment, the membrane potential accumulator comprises at least one membrane potential loading unit and at least one fixed point accumulator, the membrane potential loading unit and the fixed point accumulator being electrically connected to each other.

In a further embodiment, the pulse transmitter comprises at least one fixed point comparator and at least one fixed point subtractor, the fixed point comparator and the fixed point subtractor being electrically connected to each other.

In a further embodiment, the pooled buffer includes at least one pooled count load unit, at least one counter, and at least one first-in first-out data buffer, the pooled count load unit, counter, and first-in first-out data buffer being electrically connected to each other;

The pooling comparator comprises at least one fixed point number comparator and at least one data manager, wherein the fixed point number comparator and the data manager are electrically connected with each other.

In a further embodiment, the membrane potential loading unit is activated simultaneously with the fixed point accumulator, the loading process being overridden by the accumulation process.

In a further embodiment, the output of the counter is electrically connected to both the fifo and the pooling comparator.

In a further embodiment, the data manager includes a first-in first-out data buffer reading unit, a data packet parsing unit, and a data allocation unit; the first-in first-out data buffer reading unit is electrically connected with the data packet analysis unit, and the data packet analysis unit is electrically connected with the data distribution unit.

In a further embodiment, each impulse neural network computing unit comprises an input port and an output port, the input ports being electrically connected to each other through a membrane potential accumulator, an impulse transmitter, a pooling buffer and a pooling comparator.

In a further embodiment, the plurality of impulse neural network computing units form an impulse neural network computing cluster, the impulse neural network computing units within the cluster sharing the same set of control logic.

In a further embodiment, the input of the impulse neural network specific computing array is electrically connected to the input of the impulse neural network computing cluster, and the output of the impulse neural network computing cluster is electrically connected to the output of the impulse neural network specific computing array.

The beneficial effects are that: the special calculation array for the impulse neural network can realize a plurality of different operation modes through external configuration signals. In addition, the invention realizes the parallelization calculation of the impulse neural network through the 8x8 impulse neural network calculation unit structure, and has good operation flexibility and hardware resource utilization rate. The accumulation method fully utilizes the sparsity characteristic of the impulse neural network, solves the problem of massive redundancy in the traditional calculation array, and improves the operation efficiency of the impulse neural network.

Drawings

FIG. 1 is a schematic diagram of a pulsed neural network computational array of the present invention.

Fig. 2 is a schematic diagram of the structure of the impulse neural network calculation unit in the present invention.

FIG. 3 is a schematic diagram of a membrane potential accumulator according to the present invention.

Fig. 4 is a schematic diagram of the structure of the pulse emitter according to the present invention.

FIG. 5 is a schematic diagram of the structure of a pooled buffer according to the present invention.

FIG. 6 is a schematic diagram of a pooled comparator according to the present invention.

Fig. 7 is a schematic diagram of the structure of a single convolutional layer calculation of a impulse neural network.

FIG. 8 is a schematic diagram of the structure of a pulsed neural network convolution-pooling layer continuous calculation.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.

Example 1

Referring to fig. 1, SNPU in fig. 1 represents a impulse neural network calculation unit, and SNPU CLUSTER represents a cluster of impulse neural network calculation units. The invention relates to a pulse neural network computing array, which comprises a pulse neural network computing cluster formed by a plurality of pulse neural network computing units. The single impulse neural network computing unit can perform computing processing on the original data, the intra-cluster units can compute different rows of each image in parallel, and the inter-cluster parallelism can compute different convolution patterns of the same image in parallel. The computing array in the embodiment comprises 8 pulse neural network computing clusters, and each cluster comprises 8 pulse neural network computing units, so that 64 paths of data can be processed in parallel. In combination with the illustration of fig. 2, the pulsed neural network computing unit of the present invention includes a membrane potential accumulator, a pulse emitter, a pooling buffer, and a pooling comparator. The membrane potential accumulator is electrically connected with the pulse emitter, and the pulse emitter is electrically connected with the pooling buffer area and the pooling comparator.

Describing the above components in further detail, specifically, in conjunction with fig. 3, the membrane potential accumulator is used to accumulate the input pulse sequence; specifically, the membrane potential accumulator comprises a membrane potential loading unit and a fixed-point accumulator. The membrane potential loading unit comprises an initialization input port, a loading enabling port, a membrane potential loading port and a membrane potential register.

The membrane potential loading unit resets the membrane potential register to 0 according to the initialization input port being 1'b1, and reads the value of the membrane potential loading port into the register according to the enabling signal loaded into the enabling port when the initialization input port is 1' b 0. The fixed point accumulator has 18 bit weight input port, 1bit input enable port and 18 bit membrane potential output port. The fixed point accumulator can accumulate the value of the membrane potential register and output an 8bit membrane potential value under the condition that the input enabling port is 1' b 1.

Referring to fig. 4, the pulse transmitter is used for determining whether to transmit a pulse to the next stage according to the membrane potential value input by the accumulator. Specifically, the pulse transmitter comprises a fixed point comparator and a fixed point subtractor. The fixed point comparator comprises two 8bit inputs and one 1bit output. The fixed point comparator compares the input membrane potential value with the emission threshold value, and when the membrane potential value is larger than the emission threshold value, the output signal of the comparator is pulled up. When the membrane potential is less than the emission threshold, the output signal of the comparator is not pulled high. The fixed point subtractor comprises two 8bit inputs and one 1bit output. When the output of the comparator is pulled high, the fixed point subtracter performs fixed point subtraction on the value of the membrane potential, and outputs the membrane potential value after subtracting the threshold value. The membrane potential threshold value can be 32 in design, and the membrane potential threshold value can be configured.

As shown in connection with fig. 5, the pooled buffer is used to count and buffer pulses of the pulse emitter. Specifically, the pooling buffer comprises a pooling count loading unit, a counter and a first-in first-out data buffer. The pooling count loading unit comprises 1bit initialization input port, 1bit loading enabling port and 8bit pooling count loading port. The pooling count loading unit resets the pooling counter to 0 according to the initialization input port being 1'b1, and reads the value of the pooling count loading port into the counter according to the enabling signal of the loading enabling port when the initialization input port is 1' b 0. The counter has a 1bit input port and an 8bit output port. The counter increases the count according to the pulse emission condition. The current pooling count is output after each time step. The first-in first-out data buffer has a 1bit input enabling port, a 9bit input port, a 1bit output enabling port and a 9bit output port. The data format of the first-in first-out data buffer is { state of transmission (state), pooling count (pool_cnt) }. The first-in first-out data buffer has a depth of 128, which is greater than the length of one line of the image. The counter outputs 9bit data to both the pooling comparator and the first-in-first-out data buffer.

Referring to fig. 6, the pooling comparator is configured to perform a comparison operation on the input of the buffer area by using the pooling comparator, and find the maximum value in the mask of the input image. Specifically, the pooled comparator comprises a data manager and a fixed point number comparator. The data manager is connected with the output of the pooling buffer area, and reads pooling count and transmitting information of the upper line and the lower line simultaneously, and the data manager comprises two 9bit outputs of the first-in first-out buffer and two outputs of the pooling comparator. The fixed point number comparator has 48 bit inputs and 12 bit output. The comparator can compare the 48 bit inputs to each other to find the maximum value therein, and the maximum value is represented by the 2bit output. The 4 8-bit numbers include two pooling counts sequentially fetched from the first-in-first-out buffer and two pooling counts directly output by the pooling counter. The comparison refers to finding the maximum of 4 numbers. 4 pixels are arranged in the image, wherein the number of the pixels is 4, and two rows of the pixels are close to 4 pixels in the image.

According to the pulse neural network computing array, various different operation modes can be realized through external configuration signals. If the mode select signal is set to 1' b0, the array sends the output of the pulse generator directly to the output port of the array for calculation by the next layer of the network (see FIG. 7). The output of the pulse generator is determined by looking at the calculated value of the membrane potential.

Mode 0 may be any mode as long as the membrane potential and the output of the pulse generator are calculated. If the mode select signal is set to 1' b1, the output of the pulse generator in the array is fed into the pooling buffer, and the pooling layer is calculated by the pooling buffer and the pooling comparator and then fed into the output port of the array (see FIG. 8). The membrane potential value is calculated first, and then the output of the layer is sent to a pooling buffer area and a pooling comparator to calculate the maximum value in the mask. And the mode 1 is to continuously send the membrane potential and the value of the pulse generator into the pooling unit for calculation. At the same time, the membrane potential will also be output and stored.

According to the pulse neural network computing array, parallelization computation of the pulse neural network is realized through the 8x8 pulse neural network computing unit structure, and the pulse neural network computing array has good operation flexibility and hardware resource utilization rate. The accumulation method fully utilizes the sparsity characteristic of the impulse neural network, solves the problem of massive redundancy in the traditional calculation array, and improves the operation efficiency of the impulse neural network.

The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to the specific details of the above embodiments, and various equivalent changes can be made to the technical solution of the present invention within the scope of the technical concept of the present invention, and all the equivalent changes belong to the protection scope of the present invention.

Claims

1. The impulse neural network computing array is characterized by comprising impulse neural network computing clusters formed by a plurality of impulse neural network computing units;

The pulse neural network calculation unit comprises a membrane potential accumulator, and is used for performing accumulation operation on an input pulse sequence; the membrane potential accumulator comprises at least one membrane potential loading unit and at least one fixed-point accumulator, and the membrane potential loading unit and the fixed-point accumulator are electrically connected with each other;

The membrane potential loading unit resets a membrane potential register to 0 according to the initialization input port being 1'b1, and reads the value of the membrane potential loading port into the register according to the enabling signal loaded into the enabling port when the initialization input port is 1' b 0;

The fixed-point accumulator is provided with 1 8-bit weight input port, 1-bit input enabling port and 1 8-bit membrane potential output port; the fixed-point accumulator accumulates the values of the membrane potential registers and outputs 8-bit membrane potential values under the condition that the input enabling port is 1' b 1; the pulse emitter is electrically connected with the membrane potential accumulator and judges whether to emit pulses to the next stage according to the membrane potential input by the accumulator;

The pooling buffer area is electrically connected with the pulse emitter and is used for counting and buffering the pulses of the pulse emitter;

The pooling comparator is electrically connected with the pulse transmitter and is used for comparing and operating the input of the buffer area;

The pulse transmitter is used for judging whether to transmit a pulse to the next stage according to the membrane potential value input by the accumulator, and comprises a fixed point comparator and a fixed point subtracter; the fixed point comparator comprises two 8bit inputs and one 1bit output; the fixed point comparator compares the input membrane potential value with the emission threshold value, and when the membrane potential value is greater than the emission threshold value, the output signal of the comparator is pulled up; when the membrane potential is smaller than the emission threshold, the output signal of the comparator is not pulled high; the fixed point subtracter comprises two 8bit inputs and one 1bit output; when the output of the comparator is pulled high, the fixed point subtracter performs fixed point subtraction on the value of the membrane potential, and outputs the membrane potential value after subtracting the threshold value.

2. The pulsed neural network computational array of claim 1, wherein the pulsed transmitter comprises at least one fixed-point comparator and at least one fixed-point subtractor, the fixed-point comparator and fixed-point subtractor being electrically connected to each other.

3. The pulsed neural network computing array of claim 1, wherein the pooling buffer comprises at least one pooled count load unit, at least one counter, and at least one first-in-first-out data buffer, the pooled count load unit, counter, and first-in-first-out data buffer being electrically connected to one another;

4. The pulsed neural network computational array of claim 1 wherein the membrane potential loading unit is activated simultaneously with the fixed-point accumulator and the loading process is overridden by the accumulation process.

5. A pulsed neural network computational array according to claim 3, wherein the counter outputs are simultaneously electrically connected to the fifo and the pooling comparator.

6. A pulsed neural network computational array according to claim 3, wherein the data manager comprises a first-in first-out data buffer reading unit, a data packet parsing unit, and a data distribution unit; the first-in first-out data buffer reading unit is electrically connected with the data packet analysis unit, and the data packet analysis unit is electrically connected with the data distribution unit.

7. A pulsed neural network computational array according to any one of claims 1 to 6, wherein each pulsed neural network computational cell comprises an input port and an output port, the input ports being electrically connected to each other with the output ports by a membrane potential accumulator, a pulsed emitter, a pooling buffer and a pooling comparator.

8. The impulse neural network computing array of claim 7, wherein the plurality of impulse neural network computing units form an impulse neural network computing cluster, and impulse neural network computing units within the cluster share the same set of control logic.

9. The impulse neural network computing array of claim 8, wherein an input of the impulse neural network-specific computing array is electrically connected to an input of the impulse neural network computing cluster, and an output of the impulse neural network computing cluster is electrically connected to an output of the impulse neural network-specific computing array.