CN114298289A - Data processing method, data processing equipment and storage medium - Google Patents
Data processing method, data processing equipment and storage medium Download PDFInfo
- Publication number
- CN114298289A CN114298289A CN202011000482.2A CN202011000482A CN114298289A CN 114298289 A CN114298289 A CN 114298289A CN 202011000482 A CN202011000482 A CN 202011000482A CN 114298289 A CN114298289 A CN 114298289A
- Authority
- CN
- China
- Prior art keywords
- data
- input
- data set
- coding
- groups
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The embodiment of the application discloses a data processing method, data processing equipment and a storage medium in the field of artificial intelligence, which are used for not only accelerating computation in parallel but also providing optimization on time performance and reducing huge computation amount of a full connection layer by providing a neural network model for arranging coding and decoding modules in a parallel mode. The method comprises the following steps: acquiring a data set to be input; and inputting the data set to be input into a neural network model to obtain target extraction data, wherein the neural network model comprises at least two coding and decoding modules, and each coding and decoding module of the at least two coding and decoding modules is arranged in a parallel mode.
Description
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a data processing method, data processing equipment and a storage medium.
Background
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.
The Seq2Seq model is a structural model composed of an encoder (encoder) and a decoder (decoder), and the Transformer structure is the Seq2Seq model established on the attention mechanism, and can be applied to various tasks in the AI field, such as Natural Language Processing (NLP), Automatic Speech Recognition (ASR), and the like, with great success.
However, the transform structure applied in NLP, ASR and other tasks in the existing solution adopts a stacked manner, i.e. a plurality of codec modules composed of a multi-head attention mechanism layer, a feedforward layer and so on are stacked in series. Although abstract information can be deeply extracted, a plurality of coding and decoding modules are stacked in a serial mode, so that the Transformer structure cannot be sufficiently optimized in time performance; and because of the serial connection mode, only the previous coding and decoding module finishes the calculation, the calculation of the next coding and decoding module can be entered, so that the full connection layer occupies a large calculation amount. In addition, in other existing schemes, attention weights of several adjacent layers of the encoder and the decoder in the encoding and decoding module are respectively shared, so as to save memory space and achieve the purpose of acceleration. However, for the overall Transformer structure, a plurality of codec modules composed of a multi-head attention layer, a feedforward layer, and the like are stacked in series.
Therefore, how to optimize the time performance of the transform structure as a whole and reduce the huge calculation amount of the full connection layer becomes a problem which needs to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a data processing method, data processing equipment and a storage medium, and by providing a neural network model for arranging coding and decoding modules in a parallel mode, not only can parallel calculation be accelerated, but also optimization is provided on the time performance, and the huge calculation amount of a full connection layer is reduced.
In a first aspect, an embodiment of the present application provides a data processing method, where the method may include: acquiring a data set to be input; and inputting the data set to be input into a neural network model to obtain target extraction data, wherein the neural network model comprises at least two coding and decoding modules, and each coding and decoding module of the at least two coding and decoding modules is arranged in a parallel mode. In this way, by providing a neural network model in which the codec modules are arranged in parallel, not only can the computation be accelerated in parallel, but also optimization in temporal performance is provided.
In some embodiments, after acquiring the data set to be input, the method further comprises: processing the data set to be input based on a target seed function to obtain a first input data set, wherein the first input data set comprises at least one first input data, and the resolution corresponding to each first input data in the at least one first input data is different; correspondingly, inputting the data set to be input into a neural network model to obtain target extraction data, including: inputting each first input data in the first input data set into each coding and decoding module of the at least two coding and decoding modules to obtain at least one output data; target extraction data is derived based on the at least one output data. Through the mode, the target seed function is introduced, so that the neural network model can extract target data more optimally.
In some embodiments, processing the to-be-input data set based on an objective seed function to obtain a first input data set includes: and sampling the data set to be input based on a preset sampling rate corresponding to the target seed function to obtain a first input data set. Through the mode, the first input data set has certain representativeness on the whole so as to fully extract the target data.
In some embodiments, processing the to-be-input data set based on an objective seed function to obtain a first input data set includes: grouping the data sets to be input to obtain M groups of data to be input, and grouping the preset sampling rates corresponding to the target seed functions to obtain M sub-sampling rates, wherein M is an integer and is more than or equal to 2; for each group of data to be input in the M groups of data to be input, sampling the data to be input based on the M sub-sampling rates to obtain M groups of first input data; the first input data set is derived based on the M groups of first input data. Through the method, the representativeness of the whole data set to be input can be improved by the first input data obtained by each group of final sampling, so that the accuracy of the finally extracted target data is improved.
In some embodiments, the method further comprises: and constraining a first output format to be normal distribution according to a preset distribution parameter, wherein the first output format is an output format corresponding to the dimensionality of data output by a self-attention mechanism layer in the coding and decoding module. Through the method, the output format of the self-attention mechanism layer in each coding and decoding module is restricted to be normal distribution, so that the expression capability of the neural network model is richer.
In some embodiments, the method further comprises: dividing dimensionality of data input by a feedforward layer in the coding and decoding module to obtain N groups of first dimensionalities, wherein N is an integer and is more than or equal to 2; sequentially performing dimension increasing and dimension reducing on each of the N groups of first dimensions to obtain N groups of second dimensions; and splicing the N groups of second dimensions to obtain a third dimension, wherein the third dimension is the same as the dimension of the data input by the feedforward layer. In this way, the reduced feedforward layer can greatly reduce the number of parameters while retaining data to the maximum extent.
In a second aspect, an embodiment of the present application provides a data processing apparatus, which may include: an acquisition unit for acquiring a data set to be input; and the processing unit is used for inputting the data set to be input acquired by the acquisition unit into a neural network model to acquire target extraction data, wherein the neural network model comprises at least two coding and decoding modules, and each coding and decoding module of the at least two coding and decoding modules is arranged in a parallel manner.
In some embodiments, the processing unit is further configured to, after the obtaining unit obtains a data set to be input, process the data set to be input based on a target seed function to obtain a first input data set, where the first input data set includes at least one first input data, and a resolution corresponding to each first input data in the at least one first input data is different; inputting each first input data in the first input data set into each coding and decoding module of the at least two coding and decoding modules to obtain at least one output data; target extraction data is derived based on the at least one output data.
In some embodiments, the processing unit is configured to sample the data set to be input according to a preset sampling rate corresponding to the target seed function, so as to obtain a first input data set.
In some embodiments, the processing unit is to:
grouping the data sets to be input to obtain M groups of data to be input, and grouping the preset sampling rates corresponding to the target seed functions to obtain M sub-sampling rates, wherein M is an integer and is more than or equal to 2;
for each group of data to be input in the M groups of data to be input, sampling the data to be input based on the M sub-sampling rates to obtain M groups of first input data;
the first input data set is derived based on the M groups of first input data.
In some embodiments, the processing unit is to:
and constraining a first output format to be normal distribution according to a preset distribution parameter, wherein the first output format is an output format corresponding to the dimensionality of data output by a self-attention mechanism layer in the coding and decoding module.
In some embodiments, the processing unit is to:
dividing dimensionality of data input by a feedforward layer in the coding and decoding module to obtain N groups of first dimensionalities, wherein N is an integer and is more than or equal to 2;
sequentially performing dimension increasing and dimension reducing on each of the N groups of first dimensions to obtain N groups of second dimensions;
and splicing the N groups of second dimensions to obtain a third dimension, wherein the third dimension is the same as the dimension of the data input by the feedforward layer.
In a third aspect, embodiments of the present application provide a computer-readable storage medium, which includes a computer program, when running on a processor, causes the processor to perform the method according to the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program product comprising a computer program, which when run on a processor, causes the processor to perform a method according to the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip system, where the chip system includes a processor, and is configured to support a data processing device to implement the functions in the first aspect or any one of the possible implementation manners of the first aspect. In one possible design, the system-on-chip further includes a memory, the memory being used to hold program instructions and data necessary for the data processing device. The chip system may be constituted by a chip, or may include a chip and other discrete devices.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the application, by providing the neural network model with the coding and decoding modules arranged in a parallel manner, not only can parallel accelerated calculation be realized, but also optimization is provided on the time performance, and the huge calculation amount of a full connection layer is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present application.
Fig. 1 is a schematic structural diagram of a system architecture provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of object detection using a convolutional neural network model provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a chip hardware structure according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a stacked Transformer structure in a prior art scheme;
FIG. 5 is a transform structure for weight sharing in a conventional scheme;
FIG. 6 is a schematic structural diagram of a Transformer provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of an embodiment of a method for data processing provided in an embodiment of the present application;
FIG. 8 is a schematic illustration of a normally distributed self-attention mechanism layer provided in an embodiment of the present application;
FIG. 9a is a schematic diagram of a feed forward layer provided in a prior art scheme;
FIG. 9b is a schematic diagram of a reduced feed-forward layer provided in an embodiment of the present application;
fig. 10 is a schematic hardware structure diagram of a communication device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a data processing method, data processing equipment and a storage medium, and by providing a neural network model for arranging coding and decoding modules in a parallel mode, not only can parallel calculation be accelerated, but also optimization is provided on the time performance, and the huge calculation amount of a full connection layer is reduced.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application is mainly applied to scenes such as language processing, text processing and the like, for example: NLP, ASR, etc., and are not specifically limited in this application.
The method provided by the application is described from the model training side and the model application side as follows:
the training method of the neural network provided by the embodiment of the application relates to processing of natural language, automatic language identification and the like, and can be particularly applied to data training, machine learning, deep learning and the like, symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like are carried out on training data (such as a data set to be input in the application), and finally the trained neural network is obtained, so that target data can be conveniently extracted.
The data processing method provided by the embodiment of the application can use the trained neural network to input a data set to be input (such as speech to be input or text to be input) into the trained neural network to obtain output data (such as target extraction data in the application). It should be noted that the training method of the neural network provided in the embodiment of the present application and the data processing method of the embodiment of the present application are inventions based on the same concept, and can also be understood as two parts in a system or two stages of an overall process: such as a model training phase and a model application phase.
Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.
(1) Neural network
The neural network may be composed of neural units, which may be referred to as xsAnd an arithmetic unit with intercept 1 as input, the output of which may be:
wherein s is 1, 2, … … n, n is a natural number greater than 1, and W isSIs xsB is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by a number of the above-mentioned single neural units joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
(2) Deep neural network
Deep Neural Network (DNN) can be understood as a neural network with many hidden layers, where "many" has no special metric, we often say multi-layer neural networks and deep neural networksIt is the same thing in nature. From the division of DNNs by the location of different layers, neural networks inside DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer. Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression: y '═ α (Wx' + b), where x 'is the input vector, y' is the output vector, b is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer simply performs such a simple operation on the input vector x 'to obtain the output vector y'. Due to the large number of DNN layers, the number of coefficients W and offset vectors b is also large. Then, as to how DNN is defined, we first look at the definition of the coefficient W. Taking a three-layer DNN as an example, such as: the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined asThe superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input. In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined asNote that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks.
(3) Convolutional neural network
A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(4) Recurrent Neural Networks (RNNs), also known as recurrent neural networks, are used to process sequence data. In the traditional neural network model, from an input layer to an intermediate layer to an output layer, all layers are connected, and all nodes in each layer are not connected. Although solving many problems, the common neural network still has no capability to solve many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network will memorize the previous information and apply it to the calculation of the current output, i.e. the nodes between the intermediate layers are no longer connected but connected, and the input of the intermediate layer includes not only the output of the input layer but also the output of the intermediate layer at the last moment. In theory, RNNs can process sequence data of any length. The training for RNN is the same as for conventional CNN or DNN.
(5) Residual neural network (ResNet)
The residual neural network is proposed for solving the degradation problem generated when the hidden layer of the neural network is too much. The degradation problem is that: when the hidden layers of the network become more, the accuracy of the network is saturated and then is degraded sharply, and the degradation is not caused by overfitting, but is caused by that when the network is reversely propagated, the correlation of each gradient is not large when the network is propagated to the bottom layer, and the gradient is insufficiently updated, so that the accuracy of the finally obtained prediction label of the model is reduced. When the neural network is degraded, the shallow network can achieve better training effect than the deep network, and if the characteristics of the lower layer are transmitted to the higher layer, the effect should be at least not worse than that of the shallow network, so that the effect can be achieved through an Identity Mapping (Identity Mapping). This identity mapping is called residual join (shortcut) and it is easier to optimize this residual mapping than the original mapping.
(6) Classifier
Many neural network architectures have a classifier for classifying objects in the image. The classifier is generally composed of a fully connected layer (full connected layer) and a softmax function, and is capable of outputting probabilities of different classes according to inputs.
Some basic contents of the neural network are briefly introduced above. However, it should be understood that, in the following embodiments of the present application, the mentioned neural network model may be understood as a model optimized based on the above convolutional neural network, and the system architecture of the embodiments of the present application is described in detail below with reference to fig. 1.
Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in FIG. 1, the system architecture 100 includes an execution device 110, a training device 120, a database 130, a client device 140, a data storage system 150, and a data collection device 160.
In addition, the execution device 110 includes a calculation module 111, an I/O interface 112, a preprocessing module 113, and a preprocessing module 114. Wherein, the calculation module 111 may include the target model/rule 101, and the pre-processing module 113 and the pre-processing module 114 are optional.
The data acquisition device 160 is used to acquire training data. For the method of data processing of the embodiment of the present application, the training data may include a waiting input data set for a speech to be input or a text to be input. After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.
The following describes that the training device 120 obtains the target model/rule 101 based on the training data, the training device 120 processes the input data set to be input, and compares the output target extraction data with the output result labeled in advance until the difference between the target extraction data output by the training device 120 and the output result labeled in advance is smaller than a certain threshold value, thereby completing the training of the target model/rule 101.
The target model/rule 101 can be used for implementing the data processing method of the embodiment of the present application, that is, the data set to be input (after the relevant preprocessing) is input into the target model/rule 101, and the target extraction data can be obtained. The target model/rule 101 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.
The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 1, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, or a server or a cloud. In fig. 1, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: the data set to be input is input by the client device. The client device 140 may specifically be a terminal device.
The preprocessing module 113 and the preprocessing module 114 are configured to perform preprocessing according to input data (such as a data set to be input) received by the I/O interface 112, and in this embodiment of the application, the preprocessing module 113 and the preprocessing module 114 may not be provided (or only one of the preprocessing modules may be provided), and the computing module 111 is directly used to process the input data.
In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.
Finally, the I/O interface 112 presents the processing result, such as the target extraction data obtained as described above, to the client device 140, thereby providing it to the user.
It should be noted that the training device 120 may generate corresponding target models/rules 101 for different targets or different tasks based on different training data, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.
In the case shown in fig. 1, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.
It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.
As shown in fig. 1, a target model/rule 101 is obtained according to training of a training device 120, and may be a neural network model in the present application in this embodiment, specifically, the neural network model provided in the present application may be a model optimized on the basis of a CNN and a Deep Convolutional Neural Network (DCNN), and the like.
Since CNN is a very common neural network, the structure of CNN will be described in detail below with reference to fig. 2. As described in the introduction of the basic concept above, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, where the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images input thereto.
As shown in fig. 2, Convolutional Neural Network (CNN)200 may include an input layer 210, a convolutional/pooling layer 220 (where pooling is optional), and a neural network layer 230. The relevant contents of these layers are described in detail below.
Convolutional layer/pooling layer 220:
and (3) rolling layers:
the convolutional layer/pooling layer 220 shown in fig. 2 may include layers such as example 221 and 226, for example: in one implementation, 221 is a convolutional layer, 222 is a pooling layer, 223 is a convolutional layer, 224 is a pooling layer, 225 is a convolutional layer, 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.
The inner working principle of a convolutional layer will be described below by taking convolutional layer 221 as an example.
The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network 200 can make correct prediction.
When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 200 increases, the more convolutional layers (e.g., 226) that go further back extract more complex features, such as features with high levels of semantics, the more highly semantic features are more suitable for the problem to be solved.
A pooling layer:
since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, where the layers 221-226, as illustrated by 220 in fig. 2, may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.
The neural network layer 230:
after processing by convolutional layer/pooling layer 220, convolutional neural network 200 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to generate one or a set of the required number of classes of output using the neural network layer 230. Accordingly, a plurality of hidden layers (231, 232 to 23n shown in fig. 2) and an output layer 240 may be included in the neural network layer 230, and parameters included in the hidden layers may be pre-trained according to related training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.
After the hidden layers in the neural network layer 230, i.e. the last layer of the whole convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from the direction 210 to 240 in fig. 2 is the forward propagation) of the whole convolutional neural network 200 is completed, the backward propagation (i.e. the propagation from the direction 240 to 210 in fig. 2 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 200, and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
It should be noted that the convolutional neural network 200 shown in fig. 2 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.
It should be understood that the Convolutional Neural Network (CNN)200 shown in fig. 2 may be used to perform the object detection method of the embodiment of the present application, and as shown in fig. 2, the data set to be input may obtain target extraction data after being processed by the input layer 210, the convolutional layer/pooling layer 220 and the neural network layer 230.
Fig. 3 is a hardware structure of a chip provided in an embodiment of the present application, where the chip includes a neural network processor. The chip may be provided in the execution device 110 as shown in fig. 1 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 1 to complete the training work of the training apparatus 120 and output the target model/rule 101. The algorithms for the various layers in the convolutional neural network shown in fig. 2 can all be implemented in a chip as shown in fig. 3.
The neural network processor NPU is mounted as a coprocessor on a main Central Processing Unit (CPU) (host CPU), and tasks are allocated by the main CPU. The core portion of the NPU is an arithmetic circuit 303, and the controller 304 controls the arithmetic circuit 303 to extract data in a memory (weight memory or input memory) and perform an operation.
In some implementations, the arithmetic circuitry 303 includes a plurality of processing units (PEs) internally. In some implementations, the operational circuitry 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 303 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 303 fetches the data corresponding to the matrix B from the weight memory 302 and buffers the data on each PE in the arithmetic circuit 303. The arithmetic circuit 303 takes the matrix a data from the input memory 301 and performs matrix arithmetic with the matrix B, and a partial result or a final result of the obtained matrix is stored in an accumulator (accumulator) 308.
The vector calculation unit 307 may further process the output of the operation circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 307 may be used for network calculation of a non-convolution/non-FC layer in a neural network, such as pooling (Pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.
In some implementations, the vector calculation unit 307 can store the processed output vector to the unified buffer 306. For example, the vector calculation unit 307 may apply a non-linear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 307 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 303, for example, for use in subsequent layers in a neural network.
The unified memory 306 is used to store input data as well as output data.
The weight data directly passes through a memory unit access controller 305 (DMAC) to carry input data in the external memory to the input memory 301 and/or the unified memory 306, store the weight data in the external memory into the weight memory 302, and store data in the unified memory 306 into the external memory.
A Bus Interface Unit (BIU) 310, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 309 through a bus.
An instruction fetch buffer (instruction fetch buffer)309, coupled to the controller 304, is used to store instructions used by the controller 304.
The controller 304 is configured to call the instruction cached in the finger memory 309, so as to control the operation process of the operation accelerator.
Generally, the unified memory 306, the input memory 301, the weight memory 302 and the instruction fetch memory 309 are On-Chip (On-Chip) memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM) or other readable and writable memories.
The operation of each layer in the convolutional neural network shown in fig. 2 can be performed by an operation circuit or a vector calculation module 307.
While the execution device 110 in fig. 1 described above is capable of executing the steps of the method of data processing in the embodiment of the present application, the CNN model shown in fig. 2 and the chip shown in fig. 3 may also be used for executing the steps of the method of data processing in the embodiment of the present application. It is to be understood that the input layer 210, the convolutional layer/pooling layer 220, and the neural network layer 230 described in fig. 2 above can be understood as one codec module, while only one codec module is described in fig. 2, in practical applications, a plurality of codec modules may be arranged in parallel to facilitate parallel computation, which is specifically understood with reference to what is described in the following fig. 6.
In the related art, the transform structure applied in NLP, ASR and other tasks adopts a stacked manner, i.e. a plurality of codec modules composed of a multi-head attention mechanism layer, a feedforward layer and the like are stacked in series. Fig. 4 shows a schematic diagram of a stacked Transformer structure in a prior art scheme. As can be seen from fig. 4, the encoding and decoding module in the transform structure is a module mainly composed of a multi-head attention layer (MHA) and a feed forward layer (FF); it can also be appreciated from fig. 4 that it is necessary to connect multiple modules in series, such as Nx shown in fig. 4. In addition, fig. 4 also shows that position encoding information (positional encoding) is introduced at the input end for recording history information, and a residual connection (residual connection) layer is used for optimizing gradient propagation in the multi-layer network in the transform structure. However, although the above-mentioned Transformer structure shown in fig. 4 can be used to deeply extract abstract information, stacking a plurality of codec modules in a serial manner does not sufficiently optimize the temporal performance of the Transformer structure; and because of the serial connection mode, only the previous coding and decoding module finishes the calculation, the calculation of the next coding and decoding module can be entered, so that the full connection layer occupies a large calculation amount.
In order to save memory space, fig. 5 shows a transform structure for weight sharing in the prior art scheme. As can be seen from fig. 5, by sharing attention weights of several adjacent layers of the encoder and the decoder in the codec module, for example, weights between the attention mechanism layer of the mth layer and the attention mechanism layer of the m + i layer are shared. However, as is clear from the Transformer structure shown in fig. 5, a plurality of codec modules are stacked in series in the entire Transformer structure, and thus cannot be optimized sufficiently in terms of time performance, and the full link layer still occupies a large amount of calculation.
Therefore, in order to solve the problems caused by the transform structures shown in fig. 4 and fig. 5, in the embodiment of the present application, a method for data processing is provided, and the method for data processing may be applied to application scenarios such as NLP, ASR, and the like. Please refer to fig. 6, which is a schematic structural diagram of a Transformer according to an embodiment of the present application. As shown in fig. 6, at least two codec modules are arranged in parallel, and each codec module may include a convolutional layer (CONV), a normally distributed self-attentive-effort layer (normalized self-attention), a residual layer, and a reduced-form feedforward layer (reduced FF). In this way, after the data to be input is obtained, the data to be input is optionally subjected to a target seed function to obtain first input data sets with different resolutions, and then each first input data in the first input data sets can be further input to one encoding and decoding module respectively. And then, processing the first input data in parallel through each coding and decoding module, so that the finally output data passes through a full connection layer to obtain target extraction data. It can be understood that the output format of the dimension output by the normal distribution self-attention mechanism layer in each codec module conforms to the normal distribution, so that the extracted target proposed data can be more fully expressed; the dimensionality of the data finally output by the feedforward layer of the reduction formula is consistent with the dimensionality of the input data by sequentially performing dimensionality increasing and dimensionality reducing on each group, but the dimensionality of the data is reduced to 1/N of the quantity of the unprocessed parameters, wherein N is the number of the divided groups, and N is more than or equal to 2.
In order to better understand the method for data processing proposed in the embodiment of the present application, please refer to fig. 7, which is a schematic diagram illustrating an embodiment of a method for data processing provided in the embodiment of the present application. As shown in fig. 7, the method may include:
701. a dataset to be input is acquired.
In an embodiment, the data set to be input includes at least one data to be input, where the data to be input may include, but is not limited to, a text to be input, a voice to be input, and the like, and is not limited in this embodiment of the application.
702. And processing the data set to be input based on a target seed function to obtain a first input data set, wherein the first input data set comprises at least one first input data, and the resolution corresponding to each first input data in the at least one first input data is different.
In an embodiment, since the target seed function is a function capable of sampling the data set to be input to obtain inputs with different resolutions, it can be ensured that the extraction of data can be sufficiently performed in the transform structure shown in fig. 6. Therefore, the target seed function is used for processing the data set to be input, so that the resolution corresponding to each first input data in the obtained first input data set is different, and each first input data with different resolutions can reflect the input data which is randomly sampled from the data set to be input and can be input into each coding and decoding module, so that the input data which can be input into each coding and decoding module has randomness and representativeness, and the incidence relation among the input data can be reflected, thereby being convenient for full extraction.
In particular, the form of the above-described objective seed function can be understood with reference to the following:
fseed=ffactorized(fselect(input))
wherein the input mentioned above can be understood as a data set to be input, fseed() Can be understood as an objective seed function; to select the function fselect() The input is mainly used for processing to obtain a second input data set containing at least one second input data with different resolutions;and ffactorized() The second input data set is mainly decomposed, such as: the second input data set is split into at least one group of second input data, and then the at least one group of second input data is merged, so that the finally decomposed value is used as the first input data set. And compared with the mode that the data volume of the first input data set is directly input into the coding and decoding module by the second input data set, the effect of compressing the data volume is achieved.
Optionally, in some embodiments, the to-be-input data set is processed based on an objective seed function to obtain a first input data set, which may be understood with reference to the following manner:
the first method comprises the following steps: and sampling the data set to be input based on a preset sampling rate corresponding to the target seed function to obtain a first input data set.
In an embodiment, the preset sampling rate may be determined according to needs, and is not limited herein. In addition, after the data set to be input is acquired, sampling can be performed based on the preset sampling rate corresponding to the target seed function, so that the finally sampled first input data set can have certain representativeness on the whole, and target data can be fully extracted.
And the second method comprises the following steps: grouping the data sets to be input to obtain M groups of data to be input, and grouping the preset sampling rates corresponding to the target seed functions to obtain M sub-sampling rates, wherein M is an integer and is more than or equal to 2; for each group of data to be input in the M groups of data to be input, sampling the data to be input based on the M sub-sampling rates to obtain M groups of first input data; the first input data set is derived based on the M groups of first input data.
In an embodiment, assuming the predetermined sampling rate is σ ', the predetermined sampling rate is obtained by further subdividing σ' into M sub-sampling rates, such as: sigma1’,σ2’,…,σM'; similarly, the acquired data set X to be input needs to be divided into M groups of data to be input, such as: x1,X2,…,XM. In this way it is possible to obtain,the M groups of data to be input are sampled according to the M sub-sampling rates respectively, and then the first input data obtained from each group are integrated into a first input data set.
Note that σ' ═ Σ X described above is usediσ'i/X,0<i≤M,M≥2。#
It can be understood that, by the above manner, the first input data obtained by each group of final sampling can improve the representativeness of the whole data set to be input, so as to improve the accuracy of the finally extracted target data. In addition, compared with the first method that the sampling is directly performed from the whole data set to be input based on the preset sampling rate, the second sampling method is in a grouping mode, so that when the data set to be input has an obvious layering phenomenon, the first input data sampled by each group can have stronger representativeness.
703. And inputting each first input data in the first input data set into each coding and decoding module of the at least two coding and decoding modules to obtain at least one output data.
In an embodiment, after obtaining the first input data sets, each of the first input data sets may be input to the parallel codec modules according to the transform structure shown in fig. 6. After the first input data is processed by each coding and decoding module, each coding and decoding module can output corresponding output data.
704. Target extraction data is derived based on the at least one output data.
In the embodiment, the output data output by each coding and decoding module is used as the input of the full connection layer, so that the target proposed data can be obtained after the processing of the full connection layer.
It will be appreciated that each of the codec modules described above, as well as the fully-connected layer, may be included in the deep convolutional neural network module described above.
Optionally, in other embodiments, in order to make the expression capability of the transform structure shown in fig. 6 more sufficient, the method for processing data may further include: and constraining a first output format to be normal distribution according to a preset distribution parameter, wherein the first output format is an output format corresponding to the dimensionality of data output by a self-attention mechanism layer in the coding and decoding module.
In an embodiment, since the normal distribution can sufficiently express the distribution state of the data, the first output format of the self-attention mechanism layer may be limited according to a preset distribution function, and specifically, the output format corresponding to the dimension of the data output from the self-attention mechanism layer in the codec module is constrained to the normal distribution. Fig. 8 shows a schematic diagram of a normally distributed self-attention mechanism layer provided in an embodiment of the present application. As seen from fig. 8, assuming that the dimension of the data output from the attention mechanism layer is 512 dimensions, if 8 multi-head attention layers need to be passed, the first output format of the attention layer can be constrained to be normal distribution in the form of {32,32,48,48,80,80,96,96 }.
It is understood that the above-mentioned fig. 8 is only a schematic description, and in practical applications, the first output format of the self-attention mechanism layer may also be constrained in the form of other normal distributions, which is not limited in the embodiment of the present application.
Optionally, in other embodiments, the data processing method may further include: dividing dimensionality of data input by a feedforward layer in the coding and decoding module to obtain N groups of first dimensionalities, wherein N is an integer and is more than or equal to 2; sequentially increasing and zooming each first dimension in the N groups of first dimensions to obtain N groups of second dimensions; and splicing the N groups of second dimensions to obtain a third dimension, wherein the third dimension is the same as the dimension of the data input by the feedforward layer.
It can be understood that, the dimensionality of the data input by the feedforward layer is firstly divided into N groups of first dimensionalities, then each group of first dimensionalities is firstly subjected to dimensionality increasing and dimensionality decreasing, and finally the second dimensionalities are spliced, so that the finally obtained third dimensionality is consistent with the dimensionality of the data input by the feedforward layer, and the finally obtained third dimensionality can be reducedThe amount of the ginseng is reduced. Such as: assuming that the dimension of the data input by the feedforward layer is a dimension and the dimension expanded from the middle is b dimension, if the data are divided into N groups, the parameter number is
For example, please refer to fig. 9a, which is a schematic diagram of a feed-forward layer provided in the prior art. As can be seen from fig. 9a, the dimension of the data input by the feedforward layer is assumed to be 512 dimensions, and then the data is directly increased to 2048 dimensions and then scaled to 512 dimensions, but in the manner of fig. 9a, the parameter number is 2 × 512 × 2048.
Please refer to fig. 9b, which is a schematic diagram of a reduced feed-forward layer provided in an embodiment of the present application. As can be seen from fig. 9b, assuming that the dimension of the data input by the feedforward layer is 512 dimensions, the 512 dimensions are divided into two groups of 256 dimensions; then, for each set of 256 dimensions, the dimension is first raised to 1024 dimensions and then reduced to 256 dimensions, and after two 256 dimensions are obtained, the two 256 dimensions are spliced into 512 dimensions. In the manner of fig. 9b, the number of parameters is 2 × (256/2 × 1024/2+1024/2+256/2) × 256 × 1024. It is clear that the amount of parameters is significantly reduced by the feed forward layer of the reduction of fig. 9 b.
It is understood that in practical applications, the dimension of the data input by the feedforward layer may be further divided according to other proportions, and the first dimension and the second dimension described above may be other dimensions, which is not specifically described in this embodiment.
In the embodiment of the application, by providing the neural network model with the coding and decoding modules arranged in a parallel manner, not only can parallel accelerated calculation be realized, but also the target data can be extracted better by the neural network model by introducing the target seed function; in addition, the output format of the self-attention mechanism layer in each coding and decoding module is restricted to be normal distribution, so that the expression capability of the neural network model is richer; the reduced feedforward layer can greatly reduce the parameter number while retaining the data to the maximum extent.
The scheme provided by the embodiment of the application is mainly introduced from the perspective of a method. It is to be understood that the network management and control system includes hardware structures and/or software modules for performing the functions in order to realize the functions. Those skilled in the art will readily appreciate that the functions described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
From the perspective of an entity device, the data processing device may be specifically implemented by one entity device, may also be implemented by multiple entity devices together, and may also be a logic function unit in one entity device, which is not specifically limited in this embodiment of the present application.
For example, the data processing device described above may be implemented by the communication device in fig. 10. Fig. 10 is a schematic hardware structure diagram of a communication device according to an embodiment of the present application. The communication device comprises at least one processor 1001, a memory 1002, a transceiving device 1003.
The processor 1001 may be a general purpose central processing unit CPU, a microprocessor, an application-specific integrated circuit (asic), or one or more ics for controlling the execution of programs in accordance with the teachings of the present application.
The transceiver 1003 may be any transceiver or other communication network, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc. The transceiving device 1003 may be connected with the processor 1001.
The memory 1002 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 1002 may be separate or may be coupled to the processor 1001. The memory 1002 may also be integrated with the processor 1001.
The memory 1002 is used for storing computer-executable instructions for executing the present invention, and is controlled by the processor 1001. The processor 1001 is configured to execute computer-executable instructions stored in the memory 1002, so as to implement the data processing method provided by the above-mentioned method embodiment of the present application.
In a possible implementation manner, the computer execution instruction in the embodiment of the present application may also be referred to as an application program code, which is not specifically limited in the embodiment of the present application.
In particular implementations, processor 1001 may include one or more CPUs such as CPU0 and CPU1 of fig. 10, for example, as one embodiment.
From the perspective of functional units, the present application may perform functional unit division on the data processing apparatus according to the above method embodiments, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one functional unit. The integrated functional unit can be realized in a form of hardware or a form of software functional unit.
For example, in a case where each functional unit is divided in an integrated manner, fig. 11 shows a schematic structural diagram of a data processing device provided in an embodiment of the present application. As shown in fig. 11, an embodiment of the data processing device 110 of the present application may include:
an acquisition unit 1101 configured to acquire a data set to be input;
the processing unit 1102 is configured to input the to-be-input data set acquired by the acquiring unit 1101 into a neural network model to obtain target extraction data, where the neural network model includes at least two encoding and decoding modules, and each of the at least two encoding and decoding modules is arranged in a parallel manner.
In some embodiments, the processing unit 1102 is further configured to, after the obtaining unit 1101 obtains a data set to be input, process the data set to be input based on a target seed function to obtain a first input data set, where the first input data set includes at least one first input data, and a resolution corresponding to each first input data in the at least one first input data is different; inputting each first input data in the first input data set into each coding and decoding module of the at least two coding and decoding modules to obtain at least one output data; target extraction data is derived based on the at least one output data.
In some embodiments, the processing unit 1102 is configured to sample the data set to be input according to a preset sampling rate corresponding to the target seed function, so as to obtain a first input data set.
In some embodiments, the processing unit 1102 is configured to:
grouping the data sets to be input to obtain M groups of data to be input, and grouping the preset sampling rates corresponding to the target seed functions to obtain M sub-sampling rates, wherein M is an integer and is more than or equal to 2;
for each group of data to be input in the M groups of data to be input, sampling the data to be input based on the M sub-sampling rates to obtain M groups of first input data;
the first input data set is derived based on the M groups of first input data.
In some embodiments, the processing unit 1102 is configured to:
and constraining a first output format to be normal distribution according to a preset distribution parameter, wherein the first output format is an output format corresponding to the dimensionality of data output by a self-attention mechanism layer in the coding and decoding module.
In some embodiments, the processing unit 1102 is configured to:
dividing dimensionality of data input by a feedforward layer in the coding and decoding module to obtain N groups of first dimensionalities, wherein N is an integer and is more than or equal to 2;
sequentially performing dimension increasing and dimension reducing on each of the N groups of first dimensions to obtain N groups of second dimensions;
and splicing the N groups of second dimensions to obtain a third dimension, wherein the third dimension is the same as the dimension of the data input by the feedforward layer.
The data processing device 110 provided in the embodiment of the present application is configured to execute the method in the corresponding method embodiment in fig. 7, 8, and 9b, so that the embodiment of the present application can be understood by referring to the relevant parts in the corresponding method embodiment in fig. 7, 8, and 9 b.
In the embodiment of the present application, the data processing device 110 is presented in a form of dividing each functional unit in an integrated manner. "functional unit" herein may refer to an application-specific integrated circuit (ASIC), a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that may provide the described functionality. In a simple embodiment, those skilled in the art will appreciate that the data processing device 110 may take the form shown in FIG. 10.
For example, the processor 1001 in fig. 10 may cause the data processing device 110 to execute the method performed by the data processing device in the method embodiments corresponding to fig. 7, fig. 8, and fig. 9b by calling a computer stored in the memory 1002 to execute the instructions.
In particular, the functions/implementation of the processing unit 1102 in fig. 11 may be implemented by the processor 1001 in fig. 10 invoking computer executable instructions stored in the memory 1002. The function/implementation procedure of the acquisition unit 1101 in fig. 11 may be implemented by the transceiving apparatus 1003 in fig. 10.
In the device of fig. 10, the respective components are communicatively connected, i.e., the processing unit (or processor), the storage unit (or memory) and the transceiving unit (transceiver) communicate with each other via internal connection paths, and control and/or data signals are transmitted. The above method embodiments of the present application may be applied to a processor, or the processor may implement the steps of the above method embodiments. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. The various methods, steps, and logic blocks disclosed in this application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in this application may be directly implemented by a hardware decoding processor, or may be implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. Although only one processor is shown in the figure, the apparatus may comprise a plurality of processors or a processor may comprise a plurality of processing units. Specifically, the processor may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
The memory is used for storing computer instructions executed by the processor. The memory may be a memory circuit or a memory. The memory may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory, a programmable read-only memory, an erasable programmable read-only memory, an electrically erasable programmable read-only memory, or a flash memory. Volatile memory may be random access memory, which acts as external cache memory. The memory may be independent of the processor, or may be a storage unit in the processor, which is not limited herein. Although only one memory is shown in the figure, the apparatus may comprise a plurality of memories or the memory may comprise a plurality of memory units.
The transceiver is used for enabling the processor to interact with the content of other elements or network elements. Specifically, the transceiver may be a communication interface of the apparatus, a transceiving circuit or a communication unit, and may also be a transceiver. The transceiver may also be a communication interface or transceiving circuitry of the processor. Alternatively, the transceiver may be a transceiver chip. The transceiver may also include a transmitting unit and/or a receiving unit. In one possible implementation, the transceiver may include at least one communication interface. In another possible implementation, the transceiver may also be a unit implemented in software. In embodiments of the application, the processor may interact with other elements or network elements via the transceiver. For example: the processor obtains or receives content from other network elements through the transceiver. If the processor and the transceiver are physically separate components, the processor may interact with other elements of the apparatus without going through the transceiver.
In one possible implementation, the processor, the memory, and the transceiver may be connected to each other by a bus. The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In the embodiments of the present application, various illustrations are made for the convenience of understanding. However, these examples are merely examples and are not meant to be the best mode of carrying out the present application.
The above-described embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof, and when implemented using software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The technical solutions provided by the present application are introduced in detail, and the present application applies specific examples to explain the principles and embodiments of the present application, and the descriptions of the above examples are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (14)
1. A method of data processing, comprising:
acquiring a data set to be input;
and inputting the data set to be input into a neural network model to obtain target extraction data, wherein the neural network model comprises at least two coding and decoding modules, and each coding and decoding module of the at least two coding and decoding modules is arranged in a parallel mode.
2. The method of claim 1, wherein after acquiring the data set to be input, the method further comprises:
processing the data set to be input based on a target seed function to obtain a first input data set, wherein the first input data set comprises at least one first input data, and the resolution corresponding to each first input data in the at least one first input data is different;
correspondingly, inputting the data set to be input into a deep convolutional neural network model to obtain target extraction data, including:
inputting each first input data in the first input data set into each coding and decoding module of the at least two coding and decoding modules to obtain at least one output data;
target extraction data is derived based on the at least one output data.
3. The method of claim 2, wherein processing the to-be-input data set based on an objective seed function to obtain a first input data set comprises:
and sampling the data set to be input based on a preset sampling rate corresponding to the target seed function to obtain a first input data set.
4. The method of claim 2, wherein processing the to-be-input data set based on an objective seed function to obtain a first input data set comprises:
grouping the data sets to be input to obtain M groups of data to be input, and grouping the preset sampling rates corresponding to the target seed functions to obtain M sub-sampling rates, wherein M is an integer and is more than or equal to 2;
for each group of data to be input in the M groups of data to be input, sampling the data to be input based on the M sub-sampling rates to obtain M groups of first input data;
the first input data set is derived based on the M groups of first input data.
5. The method according to any one of claims 1-4, further comprising:
and constraining a first output format to be normal distribution according to a preset distribution parameter, wherein the first output format is an output format corresponding to the dimensionality of data output by a self-attention mechanism layer in the coding and decoding module.
6. The method according to any one of claims 1-5, further comprising:
dividing dimensionality of data input by a feedforward layer in the coding and decoding module to obtain N groups of first dimensionalities, wherein N is an integer and is more than or equal to 2;
sequentially performing dimension increasing and dimension reducing on each of the N groups of first dimensions to obtain N groups of second dimensions;
and splicing the N groups of second dimensions to obtain a third dimension, wherein the third dimension is the same as the dimension of the data input by the feedforward layer.
7. A data processing apparatus, characterized by comprising:
an acquisition unit for acquiring a data set to be input;
and the processing unit is used for inputting the data set to be input acquired by the acquisition unit into a neural network model to acquire target extraction data, wherein the neural network model comprises at least two coding and decoding modules, and each coding and decoding module of the at least two coding and decoding modules is arranged in a parallel manner.
8. The data processing device of claim 7,
the processing unit is further configured to, after the obtaining unit obtains a data set to be input, process the data set to be input based on a target seed function to obtain a first input data set, where the first input data set includes at least one first input data, and a resolution corresponding to each first input data in the at least one first input data is different;
the processing unit is configured to:
inputting each first input data in the first input data set into each coding and decoding module of the at least two coding and decoding modules to obtain at least one output data;
target extraction data is derived based on the at least one output data.
9. The data processing device of claim 8,
and the processing unit is used for sampling the data set to be input according to a preset sampling rate corresponding to the target seed function to obtain a first input data set.
10. The data processing device of claim 8, wherein the processing unit is configured to:
grouping the data sets to be input to obtain M groups of data to be input, and grouping the preset sampling rates corresponding to the target seed functions to obtain M sub-sampling rates, wherein M is an integer and is more than or equal to 2;
for each group of data to be input in the M groups of data to be input, sampling the data to be input based on the M sub-sampling rates to obtain M groups of first input data;
the first input data set is derived based on the M groups of first input data.
11. The data processing device of any of claims 7-10, wherein the processing unit is to:
and constraining a first output format to be normal distribution according to a preset distribution parameter, wherein the first output format is an output format corresponding to the dimensionality of data output by a self-attention mechanism layer in the coding and decoding module.
12. The data processing device of any of claims 7-11, wherein the processing unit is to:
dividing dimensionality of data input by a feedforward layer in the coding and decoding module to obtain N groups of first dimensionalities, wherein N is an integer and is more than or equal to 2;
sequentially performing dimension increasing and dimension reducing on each of the N groups of first dimensions to obtain N groups of second dimensions;
and splicing the N groups of second dimensions to obtain a third dimension, wherein the third dimension is the same as the dimension of the data input by the feedforward layer.
13. A data processing apparatus, characterized by comprising:
a processor, a memory; the processor and the memory are communicated with each other;
the memory is used for storing a computer program;
the processor is adapted to execute the computer program in the memory, performing the method of any of claims 1-6.
14. A computer-readable storage medium comprising a computer program which, when run on a processor, causes the processor to perform the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011000482.2A CN114298289A (en) | 2020-09-21 | 2020-09-21 | Data processing method, data processing equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011000482.2A CN114298289A (en) | 2020-09-21 | 2020-09-21 | Data processing method, data processing equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114298289A true CN114298289A (en) | 2022-04-08 |
Family
ID=80964235
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011000482.2A Pending CN114298289A (en) | 2020-09-21 | 2020-09-21 | Data processing method, data processing equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114298289A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115033400A (en) * | 2022-06-15 | 2022-09-09 | 北京智源人工智能研究院 | Intermediate data transmission method, dendritic module, neural network model and related method |
CN117472591A (en) * | 2023-12-27 | 2024-01-30 | 北京壁仞科技开发有限公司 | Method for data calculation, electronic device and storage medium |
-
2020
- 2020-09-21 CN CN202011000482.2A patent/CN114298289A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115033400A (en) * | 2022-06-15 | 2022-09-09 | 北京智源人工智能研究院 | Intermediate data transmission method, dendritic module, neural network model and related method |
CN117472591A (en) * | 2023-12-27 | 2024-01-30 | 北京壁仞科技开发有限公司 | Method for data calculation, electronic device and storage medium |
CN117472591B (en) * | 2023-12-27 | 2024-03-22 | 北京壁仞科技开发有限公司 | Method for data calculation, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188795B (en) | Image classification method, data processing method and device | |
WO2022052601A1 (en) | Neural network model training method, and image processing method and device | |
CN113705769B (en) | Neural network training method and device | |
CN112183718B (en) | Deep learning training method and device for computing equipment | |
WO2021218517A1 (en) | Method for acquiring neural network model, and image processing method and apparatus | |
CN110175671A (en) | Construction method, image processing method and the device of neural network | |
WO2022001805A1 (en) | Neural network distillation method and device | |
CN112418392A (en) | Neural network construction method and device | |
CN111914997B (en) | Method for training neural network, image processing method and device | |
CN113011575A (en) | Neural network model updating method, image processing method and device | |
CN112639828A (en) | Data processing method, method and equipment for training neural network model | |
CN111797882B (en) | Image classification method and device | |
CN111797881B (en) | Image classification method and device | |
WO2021008206A1 (en) | Neural architecture search method, and image processing method and device | |
CN110222718B (en) | Image processing method and device | |
CN111797970B (en) | Method and device for training neural network | |
CN112561028B (en) | Method for training neural network model, method and device for data processing | |
WO2022156475A1 (en) | Neural network model training method and apparatus, and data processing method and apparatus | |
CN114492723A (en) | Neural network model training method, image processing method and device | |
WO2022088063A1 (en) | Method and apparatus for quantizing neural network model, and method and apparatus for processing data | |
CN115081588A (en) | Neural network parameter quantification method and device | |
CN114298289A (en) | Data processing method, data processing equipment and storage medium | |
CN111652349A (en) | Neural network processing method and related equipment | |
WO2022227024A1 (en) | Operational method and apparatus for neural network model and training method and apparatus for neural network model | |
WO2024179503A1 (en) | Speech processing method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |