CN116109489A

CN116109489A - Denoising method and related equipment

Info

Publication number: CN116109489A
Application number: CN202111321228.7A
Authority: CN
Inventors: 冷卢子未
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-11-09
Filing date: 2021-11-09
Publication date: 2023-05-12
Also published as: WO2023083121A1

Abstract

The application provides a denoising method and related equipment in the field of artificial intelligence, wherein the method comprises the following steps: acquiring a first dynamic vision sensor signal; denoising the first dynamic vision sensor signal by adopting a pulse neural network model to obtain a second dynamic vision sensor signal, wherein a postsynaptic membrane voltage kernel function of a pulse neuron of the pulse neural network model comprises target parameters, and the target parameters are determined according to autocorrelation coefficients of the dynamic vision sensor signal. By adopting the embodiment of the application, the denoising effect of the dynamic vision sensor signal can be improved.

Description

Denoising method and related equipment

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a denoising method and related equipment.

Background

The existing dynamic vision sensor (Dynamical Vision Sensor, DVS) signal denoising method mainly comprises filtering denoising and denoising based on an artificial neural network (Artificial Neural Network, ANN) under a deep learning framework. The filtering denoising method includes a temporal filtering method or a spatial filtering method, which mainly denoises by filtering events isolated in time or space. The denoising method based on the artificial neural network generally compresses the dynamic vision sensor data stream into an image to improve the data density, and then adopts the traditional network for denoising the RGB image to denoise the image obtained by the compressed frame; among them, the denoising method based on the two-dimensional (2D) convolutional neural network (Convolutional Neural Networks, CNN) generally requires additional definition of a noise model containing time information, while the denoising method based on the three-dimensional (3D) convolutional neural network requires time convolution for a time window.

However, the filtering denoising method is easy to filter events and noise together when the data are sparse, and the performance of the filtering denoising method on a reference data set is inferior to that of a denoising method based on an artificial neural network; the denoising method based on the artificial neural network is faced with the problems of large network scale, large calculated amount, long processing time and the like. In addition, due to the characteristics of the dynamic vision sensor such as high sparseness of signals, large variation range of data density, high time resolution and the like, the existing dynamic vision sensor signal denoising method cannot obtain good denoising effect.

Disclosure of Invention

The application provides a denoising method and related equipment, which can improve the denoising effect of a dynamic vision sensor signal.

According to a first aspect, the present application relates to a denoising method comprising: acquiring a first dynamic vision sensor signal; denoising the first dynamic vision sensor signal using a pulsed neural network (Spiking Neural Network, SNN) model to obtain a second dynamic vision sensor signal, wherein a Post-synaptic membrane voltage (PSP) kernel of a pulsed neuron (spiking neuron) of the pulsed neural network model comprises a target parameter, the target parameter being determined from an autocorrelation coefficient of the dynamic vision sensor signal.

In the application, the pulse neural network model is adopted to carry out denoising processing on the dynamic vision sensor signal, the pulse neural network model comprises a pulse neuron, and the target parameter in the postsynaptic membrane voltage kernel function of the pulse neuron is determined according to the autocorrelation coefficient of the dynamic vision sensor signal, so that the postsynaptic membrane voltage kernel function can enable the pulse neural network model to learn the time correlation of the dynamic vision sensor signal and remove noise events with weak time correlation in the dynamic vision sensor signal, and the denoising effect of the pulse neural network model on the dynamic vision sensor signal can be improved. In addition, due to the inherent time dynamics characteristic of the impulse neural network model, the impulse neural network model performs stream denoising on the highly sparse dynamic vision sensor signals, namely the input data and the output data of the impulse neural network model have the same frame number, the data pass through the impulse neural network model once, and the traversing sliding window calculation is performed in the time dimension without depending on three-dimensional convolution, so that compared with the existing denoising method based on the artificial neural network, the running time, the network scale and the calculated amount can be greatly reduced.

In one possible implementation, the autocorrelation coefficients of the dynamic vision sensor signal include a plurality of autocorrelation coefficients, the plurality of autocorrelation coefficients being derived from a plurality of first target dynamic vision sensor signals over a preset period of time; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained according to distribution fitting of the autocorrelation coefficients in time.

In the implementation manner, based on the distribution of a plurality of autocorrelation coefficients obtained by a plurality of first target dynamic visual sensor signals in a preset time period in time, function fitting is performed to obtain a preset function, so that the preset function represents the relation between the autocorrelation coefficients of the dynamic visual sensor signals and time; then solving an inverse function of the preset function to obtain an inverse function of the preset function, so that the inverse function of the preset function represents the relation between time and the autocorrelation coefficient of the dynamic vision sensor signal; and then solving a time value obtained by an inverse function of the preset function based on a preset autocorrelation coefficient threshold value to be used as a value of a target parameter in the postsynaptic membrane voltage kernel function of the pulse neuron. In this way, the target parameters in the postsynaptic membrane voltage kernel function are adjusted so that the impulse neural network model learns the time correlation of the dynamic vision sensor signals.

In one possible implementation manner, any one of the autocorrelation coefficients is an average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D second target dynamic vision sensor signals are D first target dynamic vision sensor signals belonging to the same first preset period in the plurality of first target dynamic vision sensor signals, and the preset time period comprises a plurality of first preset periods; the first numerical value corresponding to any one of the D second target dynamic vision sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; the second value corresponding to any pixel in the plurality of pixels is obtained according to a first signal value of the any pixel and a target signal value of a first target pixel, wherein the first target pixel is a pixel with the adjacent degree of the first target pixel being not more than a preset adjacent degree threshold value; the arbitrary second target dynamic vision sensor signal includes a first signal value of the arbitrary pixel, the arbitrary second target dynamic vision sensor signal is a w first target dynamic vision sensor signal in the plurality of first target dynamic vision sensor signals, and w is a positive integer; the target signal value of the first target pixel is a first signal value of the first target pixel in a third target dynamic vision sensor signal, the third target dynamic vision sensor signal is a w+q first target dynamic vision sensor signal in the plurality of first target dynamic vision sensor signals, and q is a positive integer. The pixels are a plurality of pixels in a photosensitive element of the dynamic vision sensor, or the pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is acquired through the photosensitive element of the dynamic vision sensor; the first target pixel is a pixel, the proximity degree of which to any pixel is not greater than a preset proximity degree threshold value, in a photosensitive element of the dynamic vision sensor, or the first target pixel is a pixel, the proximity degree of which to any pixel is not greater than a preset proximity degree threshold value, in a two-dimensional image, and the two-dimensional image is acquired through the photosensitive element of the dynamic vision sensor.

In this implementation manner, the plurality of first target dynamic vision sensor signals are dynamic vision sensor signals at a plurality of moments within a preset time period, and an autocorrelation coefficient of the dynamic vision sensor signals within the preset time period can be calculated by a time sliding window method; for example, the window is slid up in a preset time period by taking the size of the first preset period as the size of the time window, and the time interval of one sliding is the size of one first preset period; during a time sliding window, the time window may frame D first target dynamic vision sensor signals of the plurality of first target dynamic vision sensor signals, e.g., noted as D second target dynamic vision sensor signals; and calculating the autocorrelation coefficients of the dynamic vision sensor signals corresponding to the time sliding window based on the D second target dynamic vision sensor signals. Because each of the plurality of first target dynamic vision sensor signals includes the first signal values of the plurality of pixels, each of the D second target dynamic vision sensor signals in each time window (or the first preset period) also includes the first signal values of the plurality of pixels, so that the autocorrelation coefficients corresponding to each time window (or the first preset period) can be calculated according to the first signal values of the adjacent pixels at different moments. In this way, after the time window size is taken as the size of the first preset period and the time window passes through multiple sliding windows in the preset time period, the autocorrelation coefficients of the dynamic vision sensor signals corresponding to the multiple time sliding windows can be calculated, and therefore multiple autocorrelation coefficients can be obtained.

In one possible implementation manner, the first target pixel includes a plurality of second target pixels, and the second value corresponding to any one pixel is accumulated according to the third values corresponding to the plurality of second target pixels; the third value corresponding to any one of the plurality of second target pixels is a product of the first signal value of the any one pixel and the target signal value of the any one second target pixel.

In this implementation manner, for any pixel in all pixels in the dynamic vision sensor signal, multiplying the first signal value of the any pixel at a certain moment by the first signal value of each adjacent pixel of the any pixel at another moment to obtain a third value corresponding to each adjacent pixel, and obtaining a plurality of third values corresponding to the any pixel; then accumulating the third numerical values corresponding to each adjacent pixel to obtain a second numerical value corresponding to any pixel; and accumulating the second values corresponding to all the pixels to obtain the first value corresponding to the second target dynamic vision sensor signal at a certain moment. The above operations are performed for the second target dynamic vision sensor signals (for example, D second target dynamic vision sensor signals) at all times in a time window (or a first preset period), so as to obtain first values corresponding to the second target dynamic vision sensor signals (for example, D second target dynamic vision sensor signals) at all times; and calculating an average value of all the first values obtained in the time window (or the first preset period), so as to obtain the autocorrelation coefficient of the dynamic vision sensor signal corresponding to the time window (or the first preset period).

In one possible implementation, the first signal value of any pixel included in any one of the plurality of first target dynamic vision sensor signals is obtained according to the second signal value of the any pixel; wherein: if the second signal value of any pixel is greater than 0, the first signal value of any pixel is 1; if the second signal value of any pixel is smaller than 0, the first signal value of any pixel is-1; the second signal value of any pixel is any one of m third signal values of the any pixel, and m is a positive integer; the m third signal values are obtained according to a plurality of third dynamic vision sensor signals in the preset time period; wherein: a g-th third signal value in the m third signal values is a sum of a g-1 th third signal value in the m third signal values and a summation of a fourth signal value of any pixel in a g second preset period, g is a positive integer, g is not less than 1 and not more than m, the third dynamic vision sensor signals comprise fourth signal values of any pixel in the g second preset period, and the preset time period comprises the g second preset period; and g is equal to 1, the 1 st third signal value is the accumulated sum of the fourth signal values of any pixel in the 1 st second preset period.

In this implementation, multiple dynamic vision sensor signals over a period of time are compressed into one dynamic vision sensor signal to increase the data density of the dynamic vision sensor signal. For example, the frame pressing is performed on the plurality of third dynamic vision sensor signals in the preset time period by taking the size of the second preset period as the size of the time window, so as to obtain a plurality of first target dynamic vision sensor signals, wherein the first signal value of any pixel included in any first target dynamic vision sensor signal is obtained by resetting the accumulated value of the fourth signal value of the any pixel in the plurality of third dynamic vision sensor signals, and the resetting principle is that: if the accumulated value of the fourth signal value of any pixel in the plurality of third dynamic vision sensor signals is larger than 0, the first signal value is set to be 1, and if the accumulated value of the fourth signal value of any pixel in the plurality of third dynamic vision sensor signals is larger than 0, the first signal value is set to be-1. In addition, as the first target dynamic vision sensor signal has higher data density, compared with the third dynamic vision sensor signal, the training efficiency can be improved by training the impulse neural network model through the first target dynamic vision sensor signal.

In one possible implementation, the impulse neural network model includes N convolutional layers and N deconvolution layers, wherein: the output of the jth convolution layer in the N convolution layers is the input of the j+1th convolution layer in the N convolution layers, the output of the jth inverse convolution layer in the N inverse convolution layers is the input of the j+1th inverse convolution layer in the N inverse convolution layers, the output of the jth convolution layer is the input of the N-jth inverse convolution layer in the N inverse convolution layers, the output of the Nth convolution layer in the N convolution layers is the input of the 1 st inverse convolution layer in the N inverse convolution layers, i is more than or equal to 1 and less than or equal to N, and N and j are positive integers.

In the implementation mode, the impulse neural network model comprises N symmetrical convolution layers and N inverse convolution layers, and each convolution layer is connected with the inverse convolution layer symmetrical to the convolution layer in a jumping way; therefore, the impulse neural network model performs feature extraction and reconstruction on the dynamic vision sensor signals through deconvolution and jump connection, and is beneficial to ensuring that the extracted features are complete and the reconstructed features are true.

According to a second aspect, the present application relates to a denoising apparatus, and the beneficial effects can be seen in the description of the first aspect, which is not repeated here. The denoising apparatus has a function of realizing the behavior in the method example of the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above. In one possible implementation manner, the denoising apparatus includes: an acquisition unit for acquiring a first dynamic vision sensor signal; and the processing unit is used for denoising the first dynamic vision sensor signal by adopting a pulse neural network model so as to obtain a second dynamic vision sensor signal, and the postsynaptic membrane voltage kernel function of the pulse neuron of the pulse neural network model comprises target parameters which are determined according to the autocorrelation coefficients of the dynamic vision sensor signal.

In one possible implementation manner, any one of the autocorrelation coefficients is an average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D second target dynamic vision sensor signals are D first target dynamic vision sensor signals belonging to the same first preset period in the plurality of first target dynamic vision sensor signals, and the preset time period comprises a plurality of first preset periods; the first numerical value corresponding to any one of the D second target dynamic vision sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; the second value corresponding to any pixel in the plurality of pixels is obtained according to a first signal value of the any pixel and a target signal value of a first target pixel, wherein the first target pixel is a pixel with the adjacent degree of the first target pixel being not more than a preset adjacent degree threshold value; the arbitrary second target dynamic vision sensor signal includes a first signal value of the arbitrary pixel, the arbitrary second target dynamic vision sensor signal is a w first target dynamic vision sensor signal in the plurality of first target dynamic vision sensor signals, and w is a positive integer; the target signal value of the first target pixel is a first signal value of the first target pixel in a third target dynamic vision sensor signal, the third target dynamic vision sensor signal is a w+q first target dynamic vision sensor signal in the plurality of first target dynamic vision sensor signals, and q is a positive integer.

In one possible implementation, the impulse neural network model includes N convolutional layers and N deconvolution layers, wherein: the output of the jth convolution layer in the N convolution layers is the input of the j+1th convolution layer in the N convolution layers, the output of the jth inverse convolution layer in the N inverse convolution layers is the input of the j+1th inverse convolution layer in the N inverse convolution layers, the output of the jth convolution layer is the input of the N-jth inverse convolution layer in the N inverse convolution layers, the output of the Nth convolution layer in the N convolution layers is the input of the 1 st inverse convolution layer in the N inverse convolution layers, j is more than or equal to 1 and less than or equal to N, and N and j are positive integers.

According to a third aspect, the present application relates to an electronic device comprising: one or more processors; a computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the electronic device to perform the method of any one of the possible embodiments of the first aspect.

According to a fourth aspect, the present application relates to a computer readable storage medium comprising program code for performing the method of any one of the possible embodiments of the first aspect when being executed by a computer device.

According to a fifth aspect, the present application relates to a chip comprising: a processor for calling and running a computer program from a memory, so that a device on which the chip is mounted performs the method according to any one of the possible embodiments of the first aspect.

According to a sixth aspect, the present application relates to a computer program product comprising program code which, when run, performs the method of any one of the possible embodiments of the first aspect.

Drawings

The drawings used in the embodiments of the present application are described below.

Fig. 1 is a schematic structural diagram of a neural network according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a pulse neural network according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a denoising method according to an embodiment of the present application;

FIG. 4 is a graph showing the distribution of autocorrelation coefficients of a dynamic visual sensor signal over time according to an embodiment of the present application;

FIG. 5 is a schematic diagram of post-synaptic membrane voltage kernel functions corresponding to different time parameters provided in an embodiment of the present application;

fig. 6 is a schematic diagram of a training process of a pulse neural network model according to an embodiment of the present application;

FIG. 7 is a schematic diagram showing the effect of different target parameters on training of the impulse neural network model provided in the embodiments of the present application;

Fig. 8 is a graph comparing denoising effects of a pulse neural network model and denoising effects of a three-dimensional convolutional neural network model according to an embodiment of the present application;

fig. 9 is a graph comparing the effect of denoising a signal denoised by a pulse neural network model and a signal denoised by a three-dimensional convolutional neural network model according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a denoising apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

First, the related art to which the embodiments of the present application relate will be described so as to facilitate understanding of the present application by those skilled in the art.

Dynamic vision sensor and noise

A dynamic vision sensor, also known as an event camera or neuromorphic camera, is an imaging sensor that responds to local brightness changes. Dynamic vision sensors do not use a shutter to capture images as with conventional cameras. Instead, each pixel within the dynamic vision sensor operates independently and asynchronously, reporting when brightness changes, otherwise remaining silent. Dynamic vision sensors have a temporal resolution on the order of microseconds, a dynamic range of 120dB, and less underexposure/overexposure and motion blur than frame cameras. Because the dynamic vision sensor has the advantages of asynchronous triggering, high time resolution, high dynamic range, low delay, low bandwidth, low power consumption and the like, the dynamic vision sensor can be mounted on a mobile terminal platform (such as a mobile phone, an unmanned aerial vehicle, a car and the like) for vision tasks such as object detection, tracking, identification, depth estimation and the like.

In the different and traditional technical scheme, the dynamic vision sensor is based on a fixed-frequency acquired frame and sequentially reads all pixel information in each frame, does not need to read all pixel points in a picture, and only needs to acquire the address and information of the pixel points with changed light intensity; specifically, when the dynamic vision sensor detects that the light intensity change of a certain pixel point is greater than or equal to a preset threshold value, an event signal of the pixel point is sent out; if the light intensity changes to positive change, namely the pixel point jumps from low brightness to high brightness, a +1 event signal is sent out and marked as a positive event; if the light intensity change is negative change, namely the pixel point jumps from high brightness to low brightness, a '1' event signal is sent out and marked as a negative event; if the light intensity change is smaller than a preset threshold value, an event signal is not sent out, and no event is marked; the dynamic vision sensor marks the event of each pixel point to form event stream information. The form of the light intensity change information acquired by the dynamic vision sensor can be (X, Y, P and T), wherein 'X, Y' is an event address, 'P' is event output, and 'T' is the time of event generation. The fact that an event address is matched with a pixel point in a two-dimensional image associated with the dynamic vision sensor means that the event address corresponds to a pixel position in a reference color image, wherein 'X' and 'Y' can be row and column positions in the reference color image respectively, 'P' is a specific value of a real-time light intensity variation, and 'T' is generation time of the real-time light intensity variation.

Dynamic vision sensors based on address-event expression (AER) mimic the working mechanism of biological vision, while traditional vision image acquisition methods are based on "frames" acquired at a fixed frequency, and have the defects of high redundancy, high delay, low dynamic range, high data volume and the like. The dynamic vision sensor is characterized in that pixels work asynchronously, only the address and information of the pixels with changed light intensity are output, the information of each pixel in a frame is not read out passively in sequence, redundant data is eliminated from the source, and the dynamic vision sensor has the characteristics of real-time dynamic response of scene change, super-sparse representation of images, asynchronous output of events and the like. Due to the high sensitivity of dynamic vision sensors, the output of signals is often accompanied by noise, including background noise, device thermal noise, and the like. Meanwhile, the data density variation range is large, the time resolution is high, and challenges are brought to the traditional image-based denoising algorithm.

(II) Impulse neural network

Neural networks are a computing system that mimics biological brain structures for data processing. The biological brain is composed of a large number of neurons through different connection modes, and the former neuron and the latter neuron are connected through a synaptic structure for information transmission. The neural network has powerful nonlinear, adaptive and fault-tolerant information processing capability.

Accordingly, as shown in fig. 1, each node simulates a neuron, and performs a specific operation, such as an activation function; the connection between the nodes mimics a nerve synapse, with the weight value of the synapse representing the strength of the connection between two neurons.

In the current artificial neural network, information is transmitted through analog values, and each neuron accumulates the values of the precursor neurons through multiply-add operation and transmits the values to the subsequent neurons after an activation function. In the pulse neural network which is more imitative to brain, the information is transmitted through a pulse sequence, each neuron carries out the regulation and control of the membrane voltage through accumulating the pulse sequence of the precursor neuron, and when the membrane voltage reaches a certain threshold value, the neuron issues new pulses and transmits the new pulses to the subsequent neurons, so that the information transmission, the information processing and the nonlinear transformation are realized. Many different types of neurons may be used in the impulse neural network, such as an integral discharge (Integrate and Fire, IF) model, a leaky integral discharge (Leaky Integrate andFire, LIF) model, an impulse response model (Spike Response Model, SRM), a threshold variable neuron, and the like.

Some important terms of impulse neural networks are as follows:

impulse neurons are the basic building blocks of impulse neural networks, where information is integrated by receiving impulse inputs. The pulse input causes the membrane voltage of the neuron to increase, and when the membrane voltage of the neuron increases beyond a certain threshold voltage, the neuron emits a pulse and transmits the pulse to other neurons.

Synapses (synapses) are carriers during impulse transmission, and impulse neurons are connected by means of synapses.

The postsynaptic membrane voltage, also known as the postsynaptic potential, is the voltage change produced by a pulse issued by a presynaptic neuron over the postsynaptic neuron membrane voltage.

Compared with an artificial neural network, the impulse neural network is based on the heuristic of the brain neural network, has the characteristics of low energy consumption, asynchronous calculation, time dynamics and the like, and is an ideal technical method for processing the highly sparse dynamic vision sensor streaming data.

The technical scheme that this application provided adopts pulse neural network model to remove the noise to dynamic vision sensor signal, specifically includes: architecture design of a pulse neural network model, a technical scheme of training design, a technical scheme of improving data density of dynamic vision sensor signals, a technical scheme of adjusting parameters of pulse neurons based on characteristics (such as time correlation) of the dynamic vision sensor signals, and the like.

The technical scheme provided by the application is described in detail below in connection with the specific embodiments.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a pulse neural network according to an embodiment of the present application; the impulse neural network model includes N convolutional layers and N deconvolution layers, wherein: the output of the jth convolution layer in the N convolution layers is the input of the j+1th convolution layer in the N convolution layers, the output of the jth inverse convolution layer in the N inverse convolution layers is the input of the j+1th inverse convolution layer in the N inverse convolution layers, the output of the jth convolution layer is the input of the N-jth inverse convolution layer in the N inverse convolution layers, the output of the Nth convolution layer in the N convolution layers is the input of the 1 st inverse convolution layer in the N inverse convolution layers, j is more than or equal to 1 and less than or equal to N, and N and j are positive integers.

From the above, it can be seen that a jump connection is added between the convolutional layer and the deconvolution layer of the impulse neural network. And the network computing unit of the impulse neural network is an impulse neuron, and the impulse neuron can use a plurality of different types of neurons, such as an integral discharging model, a leakage integral discharging model, an impulse response model, a threshold variable neuron and the like.

In one possible implementation, the neurons of the impulse neural network employ an impulse response model defined by the following integral equation:

in the formula (1), t represents time, u (t) represents voltage at time t, η represents voltage value variable of the neuron after each firing pulse, f represents neuron firing time index number, t ^(f) Indicating the pulse delivery time, κ ^ext Represents postsynaptic membrane voltage kernel function and κ ^ext As an exponential function, s represents the pulse input time, u _rest The rest voltage is indicated to be the same as the rest voltage,

representing the pulse of the precursor neuronThe cumulative effect of the punches produces a voltage value. Wherein the neuron fires a pulse when the voltage u (t) exceeds a certain threshold θ.

Conventional Euclidean distance or L2norm (L2 norm) based loss functions tend to produce "zero output" due to the high sparsity of dynamic vision sensor output data. Aiming at the problem:

in one possible implementation, a loss function based on Fan Luosu m distance (Van Rossum) is adopted in the impulse neural network to better embody the error of the impulse sequence and avoid the insufficient network output. Wherein, fan Luosu mu distance is specifically defined as follows:

given a sequence of output pulses:

u＝{u ₁ ，u ₂ ，...，u _n } (2)

In formula (2), u ₁ ，u ₂ ，...，u _n All represent pulse times.

If the target pulse sequence is:

v＝{v ₁ ，v ₂ ，...，v _n } (3)

in formula (3), v ₁ ，v ₂ ，...，v _n All represent pulse times.

Fan Luosu mu distance is defined as:

in formula (4), τ represents the time constant of the kernel function h (t), and t represents time; the convolutions of f (t; u) and f (t; v) representing the pulse sequence with a specific kernel function are shown in the formula (5) and the formula (6), respectively:

wherein the kernel function h (t) is defined as:

the impulse neural network model shown in fig. 2 comprises N symmetrical convolution layers and N inverse convolution layers, and each convolution layer is connected with the inverse convolution layer symmetrical to the convolution layer in a jumping manner; therefore, the impulse neural network model performs feature extraction and reconstruction on the dynamic vision sensor signals through deconvolution and jump connection, and is beneficial to ensuring that the extracted features are complete and the reconstructed features are true.

Referring to fig. 3, fig. 3 is a flow chart illustrating a process 300 of a denoising method according to one embodiment of the present application. Process 300 is described as a series of steps or operations, it being understood that process 300 may be performed in various orders and/or concurrently, and is not limited to the order of execution as depicted in fig. 3. The process 300 may be performed by an electronic device comprising a server and a terminal, the process 300 including, but not limited to, the steps or operations of:

Step 301: acquiring a first dynamic vision sensor signal;

step 302: denoising the first dynamic vision sensor signal by adopting a pulse neural network model to obtain a second dynamic vision sensor signal, wherein a postsynaptic membrane voltage kernel function of a pulse neuron of the pulse neural network model comprises target parameters, and the target parameters are determined according to autocorrelation coefficients of the dynamic vision sensor signal.

The structure of the impulse neural network model may be as shown in fig. 2.

The first dynamic vision sensor signal is an original signal acquired by the dynamic vision sensor, and the second dynamic vision sensor signal is a signal subjected to denoising processing through the pulse neural network model.

Wherein the autocorrelation coefficients of the dynamic vision sensor signal may be temporal data autocorrelation coefficients (autocorrelation coefficient): < x(s) x (s-t) >; wherein x represents a signal, t represents an interval time, s represents a time point, and the signal at different time points is averaged.

The target parameter can be preset according to the autocorrelation coefficient of the dynamic vision sensor signal before training the impulse neural network model; the target parameters can also be adjusted in real time according to the autocorrelation coefficients of the dynamic vision sensor signals during the training process of the impulse neural network model. In addition, in the training process of the impulse neural network model, the target parameter can be set to be self-learning or fixed.

The implementation can obtain the target parameter in the postsynaptic membrane voltage kernel function based on the autocorrelation coefficient of the dynamic vision sensor signal in a period of time. For example: obtaining a plurality of autocorrelation coefficients according to a plurality of first target dynamic vision sensor signals in a preset time period, wherein the autocorrelation coefficients are in one-to-one correspondence with a plurality of moments, namely the autocorrelation coefficients are distributed at the moments, as shown in fig. 4; then performing function fitting according to the distribution of the autocorrelation coefficients in time (namely, the dynamic vision sensor signal correlation curve) to obtain a preset function; and obtaining the objective function according to a preset autocorrelation coefficient threshold and the preset function obtained by fitting. The method comprises the following steps:

Firstly, fitting a dynamic vision sensor signal correlation curve by a function fitting method, namely fitting according to the distribution shape of a plurality of autocorrelation coefficients in time to obtain a preset function; for example, the format of the preset function obtained by fitting may be:

y＝be ^-ax +c (8)

wherein the values of a, b and c in equation (8) are determined based on the shape of the distribution of the actual autocorrelation coefficients over time.

Then, solving an inverse function of the preset function based on a preset autocorrelation coefficient threshold; for example, the inverse of the preset function may be:

the preset autocorrelation coefficient threshold, that is, the value of y in the formula (9), is calculated, so that the value of x in the formula (9) is obtained based on the preset autocorrelation coefficient threshold, and the value of x in the formula (9) is the value of the target parameter of the postsynaptic membrane voltage kernel function.

In one example, the number of preset autocorrelation coefficient thresholds is plural, thereby forming a preset autocorrelation coefficient threshold interval, and the inverse function of the preset function is solved based on the preset autocorrelation coefficient threshold interval, so as to obtain a value interval of the target parameter of the post-synaptic membrane voltage kernel function, and the target parameter of the post-synaptic membrane voltage kernel function is selected from the value interval. Wherein the target parameters of the postsynaptic membrane voltage kernel function are selected based on the following principles:

(1) The time span of the postsynaptic membrane voltage kernel function can be matched with the correlation of the dynamic vision sensor signal, namely, the abscissa scale of the postsynaptic membrane voltage kernel function and the time correlation range of the dynamic vision sensor signal are matched as much as possible.

(2) The time span of the postsynaptic membrane voltage kernel function should not be close to 0, avoiding weakening the time dynamics of the neurons.

For example, the threshold interval of autocorrelation coefficients is preset to be [0.2,0.5], and the interval of values of the target parameters of the postsynaptic membrane voltage kernel function is obtained to be [5, 13].

Assuming that the neurons of the impulse neural network model are impulse response models, the postsynaptic membrane voltage kernel function may be a double-exponential function, whose expression is as follows:

in formula (10), κ ^ext Representing postsynaptic membrane voltage kernel function, κ ^ext Also known as amplitude; beta represents an amplitude adjustment coefficient; τ _s Representing a first time parameter, τ _s I.e. the target parameters described in the present application; τ _m Representing a second time parameter. For example τ _s The value interval of (2) is [5, 13]]Tau can be set _s =5, and τ _m Is a fixed value, e.g. τ _m =2. Fig. 5 shows the postsynaptic membrane voltage kernel functions corresponding to several different time parameters.

It should be noted that, the present application may calculate the autocorrelation coefficient of the dynamic vision sensor signal in a period of time online by a time sliding window method. The specific operation is as follows:

suppose C _q Representing the autocorrelation coefficient between dynamic visual sensor signals at interval q, C _q The calculation formula of (2) is as follows:

in formula (11), D represents the number of dynamic vision sensor signals within the sliding window; w and H are the dimensions of the pixel in the x-axis and the y-axis, respectively, i.e., the width and height of the sliding window; s is S _k，x，y Representing the signal value in the kth dynamic vision sensor signal in the sliding window for the pixel with coordinates (x, y), S _k，x，y =1 or S _k，x，y ＝-1；S _{k+q，x′，y′} Representing the signal value of a pixel with coordinates (x ', y') in a dynamic vision sensor signal spaced q from the kth dynamic vision sensor signal, S _{k+q，x′，y′} =1 or S _{k+q，x′，y′} -1; the pixel with the coordinates (x ', y') is an adjacent pixel of the pixel with the coordinates (x, y), and the value ranges of x 'and y' are as follows:

x-Δ≤x′≤x+Δ (12)

y-Δ≤y′≤y+Δ (13)

in formula (11), p _x，y (S _k，x，y ，S _{k+q，x′，y′} Delta) represents the sum of the values of S _k，x，y And S is _{k+q，x′，y′} Calculating the proximity delta, wherein delta represents the proximity P _x，y (S _k，x，y ，S _{k+q，x′，y′} Delta) is calculated as follows:

assuming Δ=1, a series of C can be calculated from equations (11) through (14) _q The series of values of C _q The distribution of the values of (c) over time is shown in figure 4.

The pixels are a plurality of pixels in a photosensitive element of the dynamic vision sensor, or the pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is acquired through the photosensitive element of the dynamic vision sensor; the first target pixel is a pixel, the proximity degree of which to any pixel is not greater than a preset proximity degree threshold value, in a photosensitive element of the dynamic vision sensor, or the first target pixel is a pixel, the proximity degree of which to any pixel is not greater than a preset proximity degree threshold value, in a two-dimensional image, and the two-dimensional image is acquired through the photosensitive element of the dynamic vision sensor.

Specifically, a plurality of first target dynamic vision sensor signals are arranged in a preset time period, the autocorrelation coefficient of the dynamic vision sensor signals in the preset time period is calculated through a time sliding window method, and the size of a sliding window can be the size of a first preset period; d represents the number of the first target dynamic vision sensor signals in the same first preset period, namely D second target dynamic vision sensor signals in the same first preset period; w and H are the width and height of the photosensitive element of the dynamic vision sensor, respectively; (x, y) represents the coordinates of any pixel in the photosensitive element of the dynamic vision sensor, or (x, y) represents the coordinates of any pixel on the two-dimensional image; (x ', y') represents coordinates of a pixel having a proximity of not more than a preset proximity threshold value with respect to any pixel (x, y) in the photosensitive element of the dynamic vision sensor, or (x ', y') represents coordinates of a pixel having a proximity of not more than a preset proximity threshold value with respect to any pixel (x, y) in the two-dimensional image, that is, coordinates of a first target pixel; s is S _k，x，y Representing a first signal value of any pixel (x, y) in a w-th first target dynamic vision sensor signal of the plurality of first target dynamic vision sensor signals; s is S _{k+q，x′，y′} Representing a first signal value of a first target pixel (x ', y') in a w+q-th first target dynamic vision sensor signal of the plurality of first target dynamic vision sensor signals, i.e. a first signal value of a first target pixel (x ', y') in a third target dynamic vision sensor signal; the first value is

The second value is p _x，y (S _k，x，y ，S _{k+q，x′，y′} ，Δ)。

Specifically S _k，x，y A first signal value representing any pixel (x, y); s is S _{k+q，x′，y′} The target signal value representing any second target pixel (x ', y'), the third value being S _k，x，y S _{k+q，x′，y′} The second value is sigma _x′，y′ S _k，x，y S _{k+q，x′，y′} 。

It should be noted that, the lowest time resolution of the dynamic vision sensor signal is 1us, in order to improve the data density of the dynamic vision sensor signal, the dynamic vision sensor signal within a certain period of time can be compressed into one dynamic vision sensor signal, that is, a plurality of dynamic vision sensor signals are compressed into one dynamic vision sensor signal.

The dynamic vision sensor signal will be described below by taking any pixel (x, y) as an exampleThe specific operation of compression is as follows: given a series of dynamic vision sensor signals: { s ₁ ，s ₂ ，...，s _r -wherein s _r Representing any pixel (x, y) at t _r The signal polarization value in the dynamic vision sensor signal at the moment, the positive event value is 1, and the negative event value is-1; a frame time window T is set for a given voltage, e.g., t=500 us, S _g The accumulated value of the signal of any pixel (x, y) after g times of frame pressing; if gT is less than t _r (g+1) T, S _g +＝s _r I.e. S _g ＝S _g +sr; after traversing the series of dynamic vision sensor signals, a reset is performed according to the following principle to obtain a signal reset value of any pixel (x, y): if S _g > 0, then S _g =1; if S _g < 0, S _g ＝-1。

Specifically, the size of a frame pressing time window is taken as the size of a second preset period, and a plurality of third dynamic vision sensor signals in a preset time period are subjected to frame pressing to obtain a plurality of first target dynamic vision sensor signals, wherein the third dynamic vision sensor signals are original dynamic vision sensor signals; for example, the r third dynamic vision sensor signals are framed to obtain m first target dynamic vision sensor signals. The first signal value of any pixel (x, y) included in any first target dynamic vision sensor signal is a signal reset value, the second signal value or the third signal value of any pixel (x, y) is a signal accumulated value, and the fourth signal value of any pixel (x, y) is a signal polarization value.

It should be appreciated that the first target dynamic vision sensor signal may be a raw dynamic vision sensor signal, such as a third dynamic vision sensor signal; the first target dynamic vision sensor signal may also be a signal obtained by framing the original dynamic vision sensor signal, for example, a plurality of third dynamic vision sensor signals are framed to obtain a first target dynamic vision sensor signal. When the first target dynamic vision sensor signal is an original dynamic vision sensor signal, a first signal value of any pixel in the first target dynamic vision sensor signal is a signal polarization value; when the first target dynamic vision sensor signal is a signal obtained by frame pressing, the first signal value of any pixel in the first target dynamic vision sensor signal is a signal reset value.

Referring to fig. 6, fig. 6 is a schematic diagram of a training process of a neural network model according to an embodiment of the present application; on a training data set of a dynamic vision sensor signal, the training set can be effectively trained by applying the technical scheme of the application, and the impulse neural network model basically converges after a period of time, for example, after being iterated for 50 times again, the loss basically converges.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating the influence of different target parameters on training of the impulse neural network model according to the embodiment of the present application; target parameter τ of postsynaptic membrane voltage kernel function _s Different pulse neural network models have different training effects; the loss during training follows the target parameter tau _s Is in a trend of decreasing before increasing; wherein, when τ _s The smaller the loss around 10. Therefore, the proper target parameter tau can be selected through the technical scheme of the application _s So that the training of the impulse neural network model can be optimized.

Referring to fig. 8, fig. 8 is a graph comparing the denoising effect of the impulse neural network model and the denoising effect of the three-dimensional convolutional neural network model according to the embodiment of the present application; compared with the traditional three-dimensional convolutional neural network model roll, the impulse neural network model provided by the application has obviously better denoising effect.

The comparison of the impulse neural network model and the three-dimensional convolutional neural network model in terms of running time, parameter quantity and calculation amount is shown in table 1.

TABLE 1 comparison of run time, parameters and calculated amount of impulse neural network model and three-dimensional convolutional neural network model

Network model	Run time	Quantity of parameters	Calculated amount
				Three-dimensional convolutional neural network model	170s	6.54MB	531.5G
Impulse neural network model	14s	150KB	1G-10G

As can be seen from Table 1, the impulse neural network model provided by the application has obvious advantages in terms of parameters, running time and calculated amount under the condition that denoising effects are similar; wherein, table 1 is the statistics of images collected using a dynamic vision sensor of size 1000 pieces 346 x 260 run on a TESLA v100 GPU.

Referring to fig. 9, fig. 9 is a graph comparing the effect of denoising a pulse neural network model and a three-dimensional convolutional neural network model according to an embodiment of the present disclosure; the method comprises the steps of denoising a dynamic vision sensor signal, and when a subsequent deblurring task is performed by using the denoised dynamic vision sensor signal, a deblurring image based on the denoised dynamic vision sensor signal of the pulse neural network model is compared with a deblurring image based on the denoised dynamic vision sensor signal of the three-dimensional convolution neural network model, so that more real details can be reserved, and the method is closer to reality.

In summary, the technical scheme of the application sets the target parameters of the postsynaptic membrane voltage kernel function, so that the pulse neural network model learns the time correlation of the dynamic vision sensor signals, and the dynamic vision sensor signals are subjected to stream denoising by using the time dynamics of the pulse neurons, thereby greatly reducing the network scale, the running time and the calculated amount.

It should be noted that the series of steps or operations described in the process 300 may also correspond to the corresponding descriptions of the embodiments shown with reference to fig. 1 and 2.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a denoising apparatus according to an embodiment of the present application; the denoising apparatus 1000 is applied to an electronic device including a server and a terminal, the denoising apparatus 1000 including: an acquisition unit 1001 for acquiring a first dynamic vision sensor signal; the processing unit 1002 is configured to perform denoising processing on the first dynamic vision sensor signal by using a pulse neural network model to obtain a second dynamic vision sensor signal, where a post-synaptic membrane voltage kernel function of a pulse neuron of the pulse neural network model includes a target parameter, and the target parameter is determined according to an autocorrelation coefficient of the dynamic vision sensor signal.

In one possible implementation, the impulse neural network model includes N convolutional layers and N deconvolution layers, wherein: the output of the jth convolution layer in the N convolution layers is the input of the j+1th convolution layer in the N convolution layers, the output of the jth inverse convolution layer in the N inverse convolution layers is the input of the j+1th inverse convolution layer in the N inverse convolution layers, the output of the jth convolution layer is the input of the N-i th inverse convolution layer in the N inverse convolution layers, the output of the Nth convolution layer in the N convolution layers is the input of the 1 st inverse convolution layer in the N inverse convolution layers, j is more than or equal to 1 and less than or equal to N, and N and j are positive integers.

It should be noted that, the implementation of each unit of the denoising apparatus 1000 described in fig. 10 may also correspond to the corresponding descriptions of the embodiments shown in fig. 1 to 9. Also, the beneficial effects of the denoising apparatus 1000 described in fig. 10 can be referred to the corresponding descriptions of the embodiments shown in fig. 1 to 9, and the descriptions are not repeated here.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device 1110 according to an embodiment of the present application, where the electronic device 1110 includes a processor 1111, a memory 1112, and a communication interface 1113, and the processor 1111, the memory 1112, and the communication interface 1113 are connected to each other through a bus 1114.

Memory 1112 includes, but is not limited to, random access memory (random access memory, RAM), read-only memory (ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), or portable read-only memory (compact disc read-only memory, CD-ROM), and memory 1112 is used for associated computer programs and data. The communication interface 1113 is used to receive and transmit data.

The processor 1111 may be one or more central processing units (central processing unit, CPU), and in the case where the processor 1111 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 1111 of the electronic device 1110 is configured to read the computer program code stored in the memory 1112 and execute the method of any of the embodiments shown in fig. 3.

It should be noted that the electronic device may be a server or a terminal, and the implementation of the operations of the electronic device 1110 described in fig. 11 may also correspond to the corresponding descriptions of the embodiments shown in fig. 1 to 9. Also, the advantageous effects of the electronic device 1110 described in fig. 11 may refer to the corresponding descriptions of the embodiments shown in fig. 1 to 9, and the descriptions are not repeated here.

The embodiment of the application also provides a chip, which comprises at least one processor, a memory and an interface circuit, wherein the memory, the transceiver and the at least one processor are interconnected through a circuit, and the at least one memory stores a computer program; the method flow of any of the embodiments shown in fig. 3 is implemented when the computer program is executed by the processor.

The embodiments of the present application further provide a computer readable storage medium, where a computer program is stored, and when the computer program runs on a computer, the method flow of any one of the embodiments shown in fig. 3 is implemented.

Embodiments of the present application also provide a computer program product, which when run on a computer implements the method flow of any one of the embodiments shown in fig. 3.

It should be appreciated that the processors referred to in the embodiments of the present application may be central processing units (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (FieldProgrammable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the memory referred to in the embodiments of the present application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (RandomAccess Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DR RAM).

Note that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, the memory (storage module) is integrated into the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should also be understood that the first, second, third and various numerical numbers referred to herein are merely descriptive convenience and are not intended to limit the scope of the present application.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method shown in the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A denoising method, comprising:

acquiring a first dynamic vision sensor signal;

denoising the first dynamic vision sensor signal by adopting a pulse neural network model to obtain a second dynamic vision sensor signal, wherein a postsynaptic membrane voltage kernel function of a pulse neuron of the pulse neural network model comprises target parameters, and the target parameters are determined according to autocorrelation coefficients of the dynamic vision sensor signal.

2. The method of claim 1, wherein the autocorrelation coefficients of the dynamic vision sensor signal comprise a plurality of autocorrelation coefficients derived from a plurality of first target dynamic vision sensor signals over a preset period of time;

the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained according to distribution fitting of the autocorrelation coefficients in time.

3. The method of claim 2, wherein any one of the plurality of autocorrelation coefficients is an average of the first values corresponding to D second target dynamic visual sensor signals, D being a positive integer; the D second target dynamic vision sensor signals are D first target dynamic vision sensor signals belonging to the same first preset period in the plurality of first target dynamic vision sensor signals, and the preset time period comprises a plurality of first preset periods;

the first numerical value corresponding to any one of the D second target dynamic vision sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;

The second value corresponding to any pixel in the plurality of pixels is obtained according to a first signal value of the any pixel and a target signal value of a first target pixel, wherein the first target pixel is a pixel with the adjacent degree of the first target pixel being not more than a preset adjacent degree threshold value; the arbitrary second target dynamic vision sensor signal includes a first signal value of the arbitrary pixel, the arbitrary second target dynamic vision sensor signal is a w first target dynamic vision sensor signal in the plurality of first target dynamic vision sensor signals, and w is a positive integer; the target signal value of the first target pixel is a first signal value of the first target pixel in a third target dynamic vision sensor signal, the third target dynamic vision sensor signal is a w+q first target dynamic vision sensor signal in the plurality of first target dynamic vision sensor signals, and q is a positive integer.

4. A method according to claim 3, wherein the first target pixel comprises a plurality of second target pixels, and the second value corresponding to any one of the plurality of second target pixels is accumulated according to the third value corresponding to the plurality of second target pixels;

The third value corresponding to any one of the plurality of second target pixels is a product of the first signal value of the any one pixel and the target signal value of the any one second target pixel.

5. The method of any of claims 2-4, wherein a first signal value of any pixel comprised by any one of the plurality of first target dynamic vision sensor signals is derived from a second signal value of the any pixel; wherein: if the second signal value of any pixel is greater than 0, the first signal value of any pixel is 1; if the second signal value of any pixel is smaller than 0, the first signal value of any pixel is-1;

the second signal value of any pixel is any one of m third signal values of the any pixel, and m is a positive integer; the m third signal values are obtained according to a plurality of third dynamic vision sensor signals in the preset time period; wherein: a g-th third signal value in the m third signal values is a sum of a g-1 th third signal value in the m third signal values and a summation of a fourth signal value of any pixel in a g second preset period, g is a positive integer, g is not less than 1 and not more than m, the third dynamic vision sensor signals comprise fourth signal values of any pixel in the g second preset period, and the preset time period comprises the g second preset period; and g is equal to 1, the 1 st third signal value is the accumulated sum of the fourth signal values of any pixel in the 1 st second preset period.

6. The method of any of claims 1-5, wherein the impulse neural network model comprises N convolutional layers and N deconvolution layers, wherein: the output of the jth convolution layer in the N convolution layers is the input of the j+1th convolution layer in the N convolution layers, the output of the jth inverse convolution layer in the N inverse convolution layers is the input of the j+1th inverse convolution layer in the N inverse convolution layers, the output of the jth convolution layer is the input of the N-jth inverse convolution layer in the N inverse convolution layers, the output of the Nth convolution layer in the N convolution layers is the input of the 1 st inverse convolution layer in the N inverse convolution layers, j is more than or equal to 1 and less than or equal to N, and N and j are positive integers.

7. A denoising apparatus, comprising:

an acquisition unit for acquiring a first dynamic vision sensor signal;

and the processing unit is used for denoising the first dynamic vision sensor signal by adopting a pulse neural network model so as to obtain a second dynamic vision sensor signal, and the postsynaptic membrane voltage kernel function of the pulse neuron of the pulse neural network model comprises target parameters which are determined according to the autocorrelation coefficients of the dynamic vision sensor signal.

8. The apparatus of claim 7, wherein the autocorrelation coefficients of the dynamic vision sensor signal comprise a plurality of autocorrelation coefficients derived from a plurality of first target dynamic vision sensor signals over a preset period of time;

9. The apparatus of claim 8, wherein any one of the plurality of autocorrelation coefficients is an average of the first values corresponding to D second target dynamic visual sensor signals, D being a positive integer; the D second target dynamic vision sensor signals are D first target dynamic vision sensor signals belonging to the same first preset period in the plurality of first target dynamic vision sensor signals, and the preset time period comprises a plurality of first preset periods;

10. The apparatus of claim 9, wherein the first target pixel comprises a plurality of second target pixels, and the second value corresponding to any one pixel is accumulated according to the third values corresponding to the plurality of second target pixels;

11. The apparatus of any of claims 8-10, wherein a first signal value of any pixel included in any one of the plurality of first target dynamic vision sensor signals is derived from a second signal value of the any pixel; wherein: if the second signal value of any pixel is greater than 0, the first signal value of any pixel is 1; if the second signal value of any pixel is smaller than 0, the first signal value of any pixel is-1;

12. The apparatus of any of claims 7-11, wherein the impulse neural network model comprises N convolutional layers and N deconvolution layers, wherein: the output of the jth convolution layer in the N convolution layers is the input of the j+1th convolution layer in the N convolution layers, the output of the jth inverse convolution layer in the N inverse convolution layers is the input of the j+1th inverse convolution layer in the N inverse convolution layers, the output of the jth convolution layer is the input of the N-jth inverse convolution layer in the N inverse convolution layers, the output of the Nth convolution layer in the N convolution layers is the input of the 1 st inverse convolution layer in the N inverse convolution layers, j is more than or equal to 1 and less than or equal to N, and N and j are positive integers.

13. An electronic device, comprising:

one or more processors;

a computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the electronic device to perform the method of any one of claims 1-6.

14. A computer readable storage medium comprising program code for performing the method of any of claims 1-6 when executed by a computer device.

15. A chip, comprising: a processor for calling and running a computer program from a memory, causing a device on which the chip is mounted to perform the method of any of claims 1-6.