CN117216542A - Model training method and related device - Google Patents
Model training method and related device Download PDFInfo
- Publication number
- CN117216542A CN117216542A CN202310572666.3A CN202310572666A CN117216542A CN 117216542 A CN117216542 A CN 117216542A CN 202310572666 A CN202310572666 A CN 202310572666A CN 117216542 A CN117216542 A CN 117216542A
- Authority
- CN
- China
- Prior art keywords
- model
- features
- dimension
- feature
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 319
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000013145 classification model Methods 0.000 claims abstract description 174
- 238000004590 computer program Methods 0.000 claims description 16
- 230000000875 corresponding effect Effects 0.000 description 174
- 230000004927 fusion Effects 0.000 description 32
- 238000005516 engineering process Methods 0.000 description 18
- 238000013473 artificial intelligence Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 239000010410 layer Substances 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000000605 extraction Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000005610 quantum mechanics Effects 0.000 description 6
- 230000002411 adverse Effects 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 241000820057 Ithone Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a model training method and a related device, which are based on a target object with a determined actual category, and a training sample is determined according to the table category content of the target object, wherein the training sample comprises N-dimensional characteristics. Sample wave features including N wavelet features are generated by an attention sub-model of the initial classification model. The relevance and importance between the various dimensional features is measured from the perspective of wave similarity to determine the attention weight of each wavelet feature in the sample wave features. After the wave characteristics to be predicted are obtained based on the attention weight, determining the corresponding prediction category through the initial classification model, and adjusting the attention weight according to the difference between the prediction category and the actual category of the training sample to obtain the classification model capable of being used for carrying out category recognition on the table category content. Each dimension characteristic is represented through a wave form, so that the initial classification model does not need to set complex model parameters for determining the attention weight, and the model training efficiency is improved.
Description
Technical Field
The application relates to the field of data processing, in particular to a model training method and a related device.
Background
With the development of internet information technology, the form is widely used as a carrier capable of recording a large amount of content in a plurality of industries such as security. When tables are employed to record content, the table class content may be analyzed, and in particular, the table class content may be classified, i.e., the class of the object identified by the table class content may be identified by a classification model.
In the related art, a classification model for a table is mainly a transducer model, and the transducer model relies on a self-attention mechanism to identify table contents, wherein the self-attention mechanism is to interact with contents in multiple dimensions related to the table contents, so that the classification model can aggregate the contents in different dimensions to obtain an identification result for global contents, namely, the classification model can obtain an accurate identification result by focusing on the global contents.
However, in the actual training process of the classification model, since the dimensions related to the table class content are usually large, at this time, the self-attention mechanism may cause the classification model to need to configure massive model parameters to enable the contents with more dimensions in the table class content to successfully interact, and the complexity of the classification model is multiplied by the excessive parameter, so that the training efficiency and the training effect of the classification model are seriously affected.
Disclosure of Invention
In order to solve the technical problems, the application provides a model training method and a related device, aiming at a training sample determined by table contents, each dimension characteristic can be represented by a wave form, so that other model parameters are not required to be introduced, relevance and importance degree among each dimension characteristic are determined from the wave similarity angle directly through wavelet characteristics corresponding to each dimension characteristic, and further, attention weight of each dimension characteristic is determined, namely, complex and redundant model parameters are not required to be set for determining the attention weight in the training process of a model, the model volume and training time are effectively reduced, and the model training efficiency is improved.
The embodiment of the application discloses the following technical scheme:
in one aspect, an embodiment of the present application provides a model training method, including:
determining a training sample according to table class content of the identified target object, wherein a sample label corresponding to the training sample is used for identifying the actual class of the target object, and N-dimensional characteristics included in the training sample are determined according to dimension content of N data dimensions related to the table class content, wherein N is more than 1;
the training sample is used as input data of an initial classification model, and sample wave characteristics of the training sample are generated through an attention sub-model of the initial classification model, wherein wavelet characteristics corresponding to each dimension of N-dimensional characteristics are represented in a wave form in the sample wave characteristics;
Determining the attention weight of each wavelet feature in the sample wave features according to the wave similarity among the wavelet features;
according to the wave characteristics to be predicted generated by the sample wave characteristics and the attention weights, determining a prediction category corresponding to the training sample through a predictor model of the initial classification model;
based on the difference between the predicted category and the actual category, model training is carried out on the initial classification model, the attention weight is adjusted through the model training to obtain a classification model, and the classification model is used for carrying out category recognition on the table category content of the object to be recognized.
On the other hand, the embodiment of the application provides a model training device, which comprises a first determining unit, a generating unit, a second determining unit, a third determining unit and a training unit:
the first determining unit is used for determining a training sample according to table class content for identifying a target object, a sample label corresponding to the training sample is used for identifying the actual class of the target object, and N-dimensional characteristics included in the training sample are determined according to dimension content of N data dimensions related to the table class content, wherein N is more than 1;
the generating unit is used for taking the training sample as input data of the initial classification model, generating sample wave characteristics of the training sample through an attention sub-model of the initial classification model, and representing wavelet characteristics corresponding to each dimension of the N-dimension characteristics in the sample wave characteristics in a wave form;
A second determining unit for determining the attention weight of each wavelet feature in the sample wave features according to the wave similarity between the wavelet features;
the third determining unit is used for determining a prediction category corresponding to the training sample through a predictor model of the initial classification model according to the wave characteristics to be predicted generated by the sample wave characteristics and the attention weight;
the training unit is used for carrying out model training on the initial classification model based on the difference between the predicted category and the actual category, adjusting the attention weight through the model training to obtain a classification model, and carrying out category recognition on the table category content of the object to be recognized by the classification model.
Optionally, the generating unit is configured to:
aiming at the ith dimension feature in the N dimension features, according to the ith dimension feature, determining the amplitude feature and the phase feature corresponding to the ith dimension feature through the amplitude model parameter and the phase model parameter of the attention sub model of the initial classification model;
and generating wavelet features corresponding to the ith dimension features based on the amplitude features and the phase features corresponding to the ith dimension features.
Optionally, the training unit is configured to:
based on the difference between the predicted category and the actual category, model training is carried out on the initial classification model, and the attention weight, the amplitude model parameter and the phase model parameter are adjusted through model training to obtain the classification model.
Optionally, the phase feature corresponding to the ith dimension feature is determined based on semantic information of the ith dimension feature.
Optionally, the second determining unit is configured to:
aiming at the ith dimension feature in the N dimension features, determining the sub-attention weights of the wavelet features of the ith dimension feature relative to the wavelet features of the N-1 dimension features according to the wave similarity between the wavelet features of the ith dimension feature and the wavelet features of other N-1 dimension features respectively;
the generating unit is further configured to:
and aiming at the ith dimension feature, generating a wavelet feature to be predicted corresponding to the ith dimension feature in the wave features to be predicted according to the sub-attention weights and the wavelet features of the N-1 dimension features respectively associated with the sub-attention weights.
Optionally, the third determining unit is configured to:
and determining the prediction category corresponding to the training sample through a predictor model of the initial classification model according to the wave characteristics to be predicted and the training sample.
Optionally, the attention sub-models include M, M >1, wherein output data of the jth attention sub-model is input data of the jth+1th attention sub-model, and output data of the mth attention sub-model is a wave feature to be predicted, j is greater than or equal to 1.
Optionally, the initial classification model further comprises a downsampling sub-model, the downsampling sub-model being arranged between a kth attention sub-model and a kth +1 attention sub-model of the M attention sub-models,
The output data of the kth attention sub-model is the input data of the downsampling sub-model, the output data of the downsampling sub-model is the input data of the kth+1th attention sub-model, the downsampling sub-model is used for downsampling the input data, and the channel number of the output data of the downsampling sub-model is smaller than that of the input data of the downsampling sub-model through downsampling.
Optionally, the first determining unit is configured to:
when the dimension content of N data dimensions related to the table type content comprises Q continuous data contents and P discrete data contents, P coding features are obtained by coding the P discrete data contents according to the number of values related to the P discrete data contents;
and generating training samples based on the P coding features and Q data features corresponding to the Q continuous data contents respectively.
Optionally, the first determining unit is configured to:
and stacking the P coding features and the Q data features as two whole adjacent layers to obtain a training sample.
In yet another aspect, an embodiment of the present application provides a computer device including a processor and a memory:
the memory is used for storing the computer program and transmitting the computer program to the processor;
The processor is configured to perform the method according to the above aspect according to a computer program.
In yet another aspect. Embodiments of the present application provide a computer readable storage medium storing a computer program for executing the method described in the above aspect.
In yet another aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of the above aspect.
According to the technical scheme, based on the fact that the target object of the actual category is determined, the training sample is determined according to the table class content of the target object, and the determined training sample comprises N-dimensional features corresponding to the N-dimensional features because the table class content relates to the dimension content of the N data dimensions. In order to reduce adverse effects of excessive feature dimensions on model training, a sample wave feature corresponding to a training sample can be generated through an attention sub-model of an initial classification model, and wavelet features corresponding to each dimension of the N-dimensional features in the sample wave feature are represented in the form of waves. Therefore, when the initial classification model learns the attention weight of each dimension based on the training sample, the wavelet characteristics represented in a waveform mode are unified, other model parameters are not required to be introduced to analyze and compare each dimension characteristic, the relevance among each dimension characteristic can be directly measured from the angle of wave similarity through the wavelet characteristics, the importance degree of each dimension characteristic is further measured, the attention weight of each wavelet characteristic in the sample wave characteristics can be rapidly and accurately determined, when more wavelet characteristics with higher wave similarity exist in N wavelet characteristics, namely, more dimension characteristics with higher relevance in N dimension characteristics, the identification of the dimension characteristics to a target object is more unified, the dimension characteristics can reflect information related to the target object and the category, namely, the importance degree of the dimension characteristics is higher than that of other dimension characteristics in the training sample, and therefore, the determined attention weight can also effectively guide the model to learn effective knowledge related to the classification. After obtaining the wave characteristics to be predicted based on the attention weight fusion training sample global information, determining the corresponding prediction category through an initial classification model, and adjusting the attention weight according to the difference between the prediction category and the actual category of the training sample to obtain a classification model which can be used for carrying out category recognition on the table category content. Because each dimension characteristic is represented by the wave form, the characteristic expression form which is more convenient for determining the relevance and the importance degree of each dimension characteristic is not needed to be set with complex and redundant model parameters for determining the attention weight in the initial classification model, the model volume and the training time consumption are effectively reduced, and the model training efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a model training scenario provided in an embodiment of the present application;
FIG. 2 is a flow chart of a model training method according to an embodiment of the present application;
FIG. 3 is a model structure diagram of an initial classification model according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a model training device according to an embodiment of the present application;
fig. 5 is a block diagram of a terminal device according to an embodiment of the present application;
fig. 6 is a block diagram of a server according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
The form is used as a carrier capable of recording a large amount of contents, and has wide application in various industries, for example, in the security industry, the contents for security detection can be recorded through the form. When the table is used to record the content, the table class content may be analyzed, and in particular, the table class content may be classified, that is, the classification model is used to identify the class of the object identified by the table class content, for example, in the security industry, whether the object identified by the table class content belongs to the dangerous class may be identified by the classification model.
In the related art, a classification model for a table is mainly a transducer model, the transducer model relies on a self-attention mechanism to identify table contents, specifically, features corresponding to each dimension can be determined according to the contents of each dimension related to the table contents, and features of different dimensions are input into the classification model; the classification model may extract the independent information of each feature first, for example, the information in the features may be extracted by a multi-layer perceptron ((Multilayer Perceptron, MLP)), and then the information interaction may be performed with respect to the independent information of the plurality of features, so that the classification model may aggregate the information of the features of different dimensions to obtain a recognition result with respect to the global information.
However, since the number of model parameters to be configured in the calculation process of the self-attention mechanism is highly related to the number of dimensions related to the table contents, for example, when the self-attention mechanism calculates by adopting the attention matrix, if the number of dimensions related to the table contents is large, the more model parameters to be configured are needed to be configured by the attention moment matrix, but in the actual training process of the classification model, the more dimensions related to the table contents are usually, the complexity of the classification model is doubled by the excessive parameter, and thus the training efficiency and the training effect of the classification model are seriously affected.
Therefore, the embodiment of the application provides a model training method and a related device, aiming at a training sample determined by table contents, each dimension characteristic can be represented by a wave form, so that other model parameters are not required to be introduced, relevance and importance degree among each dimension characteristic are determined from the angle of wave similarity directly through wavelet characteristics corresponding to each dimension characteristic, and further, attention weight of each dimension characteristic is determined, namely, complex and redundant model parameters are not required to be set for determining the attention weight in the training process of a model, the model volume and training time are effectively reduced, and the model training efficiency is improved.
The model training method provided by the embodiment of the application can be implemented through computer equipment, wherein the computer equipment can be terminal equipment or a server, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud computing service. Terminal devices include, but are not limited to, cell phones, computers, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, aircraft, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
The model training method provided by the embodiment of the application relates to an artificial intelligence technology, wherein artificial intelligence (Artificial Intelligence, AI) is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.
The model training method provided by the embodiment of the application mainly relates to a Machine Learning technology in artificial intelligence, wherein Machine Learning (ML) is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.
The model training method provided by the embodiment of the application can be used for training a classification model in a machine learning technology, the classification model is mainly used for table contents, the global content in the table contents can be focused by a self-attention mechanism in machine learning and deep learning aiming at the table contents, and the classification model can represent the characteristics in the form of waves in the training process because of the large number of dimensions related to the table contents, so that complex and redundant model parameters are not required to be set for determining attention weights in the training process of the model.
Fig. 1 is a schematic diagram of a model training scenario provided in an embodiment of the present application, where the foregoing computer device is a server.
The classification model is a model capable of classifying table contents, the classification model can be obtained by training and optimizing an initial classification model, in order to ensure the reliability of the identification result of the classification model, the classification model needs to obtain an output result based on the global information of the table contents, the related technology mainly depends on a self-attention mechanism to polymerize the contents of different dimensionalities related to the table contents to obtain an identification result aiming at the global contents, and the self-attention mechanism can lead the classification model to be required to be configured with massive model parameters to enable the contents of more dimensionalities in the table contents to successfully interact, and the complexity of the classification model is multiplied by excessive parameter quantity, so that the training efficiency and the training effect of the classification model are seriously affected.
In this regard, in the application scenario of the present embodiment, the initial classification model to be trained may include an attention sub-model and a predictor sub-model, where the attention sub-model may generate sample wave features corresponding to the training samples, determine attention weights between each wavelet feature in the sample wave features from a wave similarity angle, and the predictor sub-model may perform class identification on the target object identified by the training samples according to the sample wave features and the attention weights.
Specifically, for determining the target object of the actual class, a training sample may be determined according to the table class content of the target object, and since the table class content relates to the dimension content of the N data dimensions, as shown in fig. 1, the determined training sample may include N-dimensional features corresponding to the dimension content of the N data dimensions one by one.
In order to reduce the adverse effect of excessive feature dimensions on model training, a sample wave feature corresponding to a training sample can be generated through an attention sub-model of an initial classification model, wherein the sample wave feature is a training sample expressed in a wave form, and since the training sample comprises N-dimensional features, as shown in fig. 1, the sample wave feature comprises wavelet features corresponding to each dimension of the N-dimensional features.
By representing the N-dimensional features in the training sample in the form of waves, when the initial classification model learns the attention weights of all the dimensions based on the training sample, the relevance among the features of all the dimensions can be directly measured from the angle of wave similarity through the wavelet features, and the importance degree of the features of all the dimensions can be further measured, so that the attention weight of each wavelet feature in the wave features of the sample can be rapidly and accurately determined.
After obtaining the wave characteristics to be predicted based on the attention weight fusion training sample global information, determining the corresponding prediction category through the initial classification model, and carrying out model training on the initial classification model according to the difference between the prediction category and the actual category of the training sample. The attention weight can be adjusted in the model training process, so that a classification model which can be used for carrying out category identification on the table type content is obtained.
Each dimension characteristic is represented through a wave form, so that the relevance and importance degree of each dimension characteristic can be measured by the initial classification model from the angle of wave similarity, namely, the initial classification model does not need to set complex and redundant model parameters for determining attention weight, thereby effectively reducing model volume and training time consumption and improving model training efficiency.
Fig. 2 is a flowchart of a method for model training according to an embodiment of the present application, where the method may be performed by a computer device, and in this embodiment, the computer device is taken as a server for illustration, and the method includes:
s201, determining a training sample according to table class content of the identification target object.
A table refers to a carrier capable of recording a large amount of content, and in order to be able to record more content, a table generally includes a plurality of rows and columns. The table class content is the content recorded in the form of a table for the target object, and the target object is the object identified by the table class content, that is, the content identified by the same object in a large amount of content recorded by the table can be used as the table class content, and the large amount of content can be related or independent. In one table, the content identifier recorded in each row is usually the same object, so the content recorded in the same row in the table can be used as table class content, and it should be noted that if the content identifiers recorded in different tables are the same object, the content recorded in different tables can be used as table class content for the object.
In practical application of the table, description will be made on a plurality of dimensions for one object, that is, the table class content identifying the target object will refer to dimension content of N data dimensions, where N is greater than 1, the data dimension refers to the dimension related to the table class content, the dimension content refers to the content of the table class content on the corresponding data dimension, the dimension content may be various forms of contents such as text, numbers, etc., when the content recorded on the same row in the table is regarded as the table class content, the data dimension related to the table class content may refer to a column related to the content recorded on the same row in the table, and the dimension content corresponding to each data dimension may refer to the content recorded on each column in the table.
In order to implement training of the model in the subsequent step, as shown in fig. 3, the server may determine a training sample according to the table class content, and in this embodiment, since the table class content includes dimension contents of N data dimensions, the training sample corresponding to the table class content may include an N-dimensional feature corresponding to the N-dimensional feature, where the dimension feature refers to a series of quantifiable information reflected in the dimension content, and may be used to characterize the corresponding dimension content. Specifically, the server may determine N-dimensional features corresponding to the N data dimensions respectively according to the dimension content of the N data dimensions related to the table class content, and then form a training sample according to the N-dimensional features.
In order to realize supervised learning training of the model, the training sample is configured with a corresponding sample label, the sample label can identify an actual class of an object corresponding to the training sample, and since the table class content corresponds to a target object, the object corresponding to the training sample is also a target object, that is, the sample label can identify an actual class of the target object, the actual class refers to a true value of the class of the object, for example, when whether the object identified by the table class content belongs to a dangerous class is identified through a classification model, if the target object corresponding to the training sample belongs to the dangerous class, the corresponding sample label is a dangerous class, or if the target object corresponding to the training sample does not belong to the dangerous class, the corresponding sample label is a safe class.
S202, taking the training sample as input data of an initial classification model, and generating sample wave characteristics of the training sample through an attention sub-model of the initial classification model.
The classification model refers to a model capable of classifying table class contents, and classifying the table class contents refers to performing category recognition on objects identified by the table class contents, namely, the classification model refers to a model capable of recognizing categories of the objects identified by the table class contents.
The initial classification model is an initialization model which can be used for being trained by a training sample to obtain a classification model, namely a classification model which is not trained by a model, and the initial classification model to be trained comprises an attention sub-model and a prediction sub-model, wherein the attention sub-model can generate sample wave characteristics corresponding to the training sample, determine the attention weight among each wavelet characteristic in the sample wave characteristics, and the prediction sub-model can carry out category identification on a target object identified by the training sample according to the sample wave characteristics and the attention weight.
The server may use the training samples as input data of the initial classification model, generate sample wave features corresponding to the training samples through an attention sub-model of the initial classification model, where the sample wave features refer to the training samples represented in wave form, for example, the attention sub-model may include a phase-aware token mixing (phase-aware token mixing, PATM) module, and the PATM module may determine corresponding sample wave features according to the training samples. In this embodiment, a training sample is represented in a wave form and is inspired by quantum mechanics, specifically, the training sample includes N-dimensional features, because the N-dimensional features are derived from different data dimensions, the N-dimensional features have sparsity, that is, correlation between the N-dimensional features is not high, in an actual process of identifying a target object of the training sample in a class, in order to ensure accuracy of an identification result, a model cannot obtain the identification result only according to independent information reflected by part of features in the training sample, but global information of the N-dimensional features in the training sample is needed to obtain the identification result, at this time, the sparsity of the N-dimensional features can cause the model to be difficult to obtain global information from the N-dimensional features, in this case, the wavelet features corresponding to each dimensional feature included in the training sample can be determined, the wavelet features refer to the dimensional features represented in a wave form, the N-dimensional features corresponding to the N-dimensional features can form the wave features corresponding to the training sample, the model in this unified form represents the N-dimensional features from different data dimensions, and the N-dimensional features are better in this way, and the correlation of the N-dimensional features in this way is better to obtain the global information.
It should be noted that, since the sample wave feature is a training sample expressed in the form of a wave, the sample wave feature is only different from the training sample in the form of a representation, and the identified object should still be the same, i.e. the object identified by the sample wave feature should still be the target object.
In addition, in order to determine the corresponding sample wave characteristics according to the training samples, before the training samples are used as input data of the initial classification model, normalization operation may be performed on the training samples, for example, a Layer of normalization (Layer Norm, LN) Layer may be added to the attention sub-model, so that the training samples are normalized first, and then the corresponding sample wave characteristics are determined according to the normalized training samples.
S203, determining the attention weight of each wavelet feature in the sample wave features according to the wave similarity among the wavelet features.
In order to ensure the reliability of the identification result of the initial classification model, the initial classification model needs to obtain an output result based on the global information of the N-dimensional features in the training sample, and in the related art, independent information of each dimensional feature is mainly aggregated to obtain global information by relying on a self-attention mechanism.
In this regard, since the sample wave features corresponding to the training samples are generated in S202, the sample wave features include N wavelet features uniformly represented in a waveform form, so that the correlation between the N dimensional features can be directly measured by the N wavelet features from the angle of the wave similarity, where the wave similarity refers to the similarity of different wavelet features on the waveform, and the correlation between the wave similarity between the wavelet features and the correlation between the different dimensional features corresponding to the wavelet features is a positive correlation, specifically, when the wave similarity of different wavelet features is higher, that is, the similarity of the different wavelet features on the waveform is higher, it is indicated that the information reflected by the different dimensional features corresponding to the different wavelet features for the target object is more similar, that is, the correlation between the different dimensional features is also higher, and when the wave similarity between the different wavelet features is lower, it is indicated that the correlation between the different dimensional features corresponding to the different wavelet features is also lower. That is, in this embodiment, the correlation between the dimensional features can be obtained by directly analyzing and comparing the dimensional features from the perspective of wave similarity without introducing other model parameters.
Based on the relevance of each dimension feature, the importance degree of each dimension feature can be further measured, and specifically, when the N dimension features have more dimension features with higher relevance, the information reflected by the dimension features on the target object is similar, namely, the dimension features are used for identifying the target object more uniformly. That is, for sparse N-dimensional features in the training sample, more dimension features with higher relevance are used as a whole to perform more uniform identification on the target object, that is, when the class identification is performed on the object of the training sample, the dimension features can reflect information related to the target object and the class, that is, the importance degree of the dimension features relative to other dimension features in the training sample is higher.
That is, the importance of each dimension feature is positively correlated with the relevance of the dimension feature to other dimension features in the N-dimensional feature, the more dimension features in the N-dimensional feature that are more relevant to the dimension feature, the higher the importance of the dimension feature, and correspondingly, the fewer dimension features in the N-dimensional feature that are more relevant to the dimension feature, the lower the importance of the dimension feature.
According to the importance degree of each dimension feature, the attention weight of the wavelet feature corresponding to each dimension feature can be determined, the attention weight corresponding to each sub-wave feature refers to the weight of the global information obtained by fusion of the independent information of the dimension feature corresponding to each sub-wave feature, the knowledge learned by the model in each dimension feature can be guided through the attention weight, and the higher the attention weight of one wavelet feature is, the greater the weight of the global information obtained by fusion of the independent information of the dimension feature corresponding to the wavelet feature is, and the more the knowledge learned by the model from the dimension feature is.
The importance degree of each dimension feature and the attention weight of the wavelet feature corresponding to each dimension feature are positively correlated, specifically, the higher the importance degree of one dimension feature is, the higher the attention weight of the wavelet feature corresponding to the dimension feature is, or the lower the importance degree of one dimension feature is, the lower the attention weight of the wavelet feature corresponding to the dimension feature is.
By positively correlating the importance of each dimension feature with the attention weight of the wavelet feature corresponding to each dimension feature, it is ensured that the determined attention weight can guide independent information more relevant to classification to be fused into global information. The importance degree of a dimension feature is higher, the dimension feature can show information related to the category of the target object, and the attention weight of the wavelet feature corresponding to the dimension feature with higher importance degree is higher, so that independent information more related to the category in the dimension feature can be fused into global information, and the model can be effectively guided to learn effective knowledge related to the category.
It should be noted that, since the attention weight of the wavelet feature corresponding to each dimension feature is closely related to the relevance and importance degree of each dimension feature, and each dimension feature is a specific case of the corresponding data dimension in the corresponding training sample, that is, the dimension feature of the training sample is a specific case of the corresponding data dimension in the training sample, the attention weight mainly represents the relevance and importance degree between each data dimension.
In short, when each dimension feature is represented in the form of a wave, the dimension features can be directly analyzed and compared from the angle of wave similarity to obtain the relevance and importance degree among the dimension features, so that the attention weight corresponding to each sub-wave feature is directly determined without introducing other model parameters.
In one possible implementation, for an ith one of the N-dimensional features, determining the attention weight of each of the wavelet features in the sample wave features according to the wave similarity between the wavelet features in S203 includes:
determining sub-attention weights of the wavelet features of the ith dimension feature relative to the wavelet features of the N-1 dimension feature according to the wave similarity between the wavelet features of the ith dimension feature and the wavelet features of other N-1 dimension features respectively;
For the ith dimension feature, the model training method further comprises:
and generating the wavelet characteristics to be predicted corresponding to the ith dimension characteristic in the wave characteristics to be predicted according to the sub-attention weights and the wavelet characteristics of the N-1 dimension characteristics respectively associated with the sub-attention weights.
The attention weight of each wavelet feature is determined according to the wave similarity among the wavelet features, specifically, the relevance among the feature dimensions is determined according to the wave similarity among the wavelet features, and then the importance degree among the feature dimensions is determined, so that the attention weight corresponding to each wavelet feature is determined, and the independent information more relevant to classification can be fused into global information by the model according to the attention weight.
For the ith dimension feature in the N dimension features, the relevance between the ith dimension feature and the N-1 dimension feature can be measured from the angle of wave similarity, wherein the N-1 dimension feature refers to the dimension feature except the ith dimension feature in the N dimension features, the relevance between the wave similarity between the wavelet feature of the ith dimension feature and the wavelet feature of other dimension features and the relevance between the ith dimension feature and the other dimension features are positive correlations, and in particular, when the wave similarity between the wavelet feature of the ith dimension feature and the wavelet feature of other dimensions is higher, the relevance between the ith dimension feature and the information reflected by the dimension feature for a target object is similar, namely, the relevance between the ith dimension feature and the dimension feature is higher, and when the wave similarity between the wavelet feature of the ith dimension feature and the wavelet feature of other dimensions is lower, the relevance between the ith dimension feature and the dimension feature is lower. That is, from the perspective of wave similarity, the i-th dimensional features and the N-i dimensional features can be analyzed and compared to obtain the relevance between the i-th dimensional features and the N-1 dimensional features.
According to the relevance between the ith dimension feature and other N-1 dimension features, the sub-attention weights of the wavelet features of the ith dimension feature relative to the wavelet features of the N-1 dimension features can be determined, wherein the sub-attention weights between the ith dimension feature and the other dimension features refer to the weights of the independent information of the dimension features, which are fused from the other dimension features, when the information is fused on the ith dimension feature, the larger the sub-attention weights between the ith dimension feature and the other dimension features are, the more the information is fused from the independent information of the dimension features.
The correlation between the wavelet feature of the ith dimension feature and the wavelet feature of the other dimension features is positive, specifically, the higher the correlation between the ith dimension feature and the other dimension features is, the larger the sub-attention weight of the wavelet feature of the ith dimension feature and the wavelet feature of the dimension features is, or the lower the correlation between the ith dimension feature and the other dimension features is, the smaller the sub-attention weight of the wavelet feature of the ith dimension feature and the wavelet feature of the dimension features is.
By enabling the sub-attention weights of the wavelet features of the ith feature to be positively correlated with the correlations between the ith feature and the dimensional features of other dimensional features, when the information fusion is carried out on the ith feature to obtain global information, more information similar to the ith feature can be learned from the dimensional features with higher correlation degree with the ith feature according to the sub-attention weights, namely, when the correlation between the ith feature and the more dimensional features in the N-1 dimensional features is higher, the sub-attention weights can be fused to the more information similar to the ith feature according to the sub-attention weights, so that the model can learn more relevant information of the ith feature, and when the correlation between the ith feature and the less dimensional features in the N-1 dimensional features is higher, the model can learn less relevant information of the ith feature according to the sub-attention weights, namely, the model can learn more relevant information of the ith feature better through the sub-attention weights.
It should be noted that, for the ith dimension feature, the corresponding sub-attention weights are not one, but N-1, which are in one-to-one correspondence with the N-1 dimension feature, and the attention weights corresponding to the ith dimension feature may be formed according to the N-1 sub-attention weights of the ith dimension feature.
After determining the sub-attention weights, the server may generate wavelet features to be predicted corresponding to the ith feature in the wavelet features to be predicted according to the wavelet features of the N-1-dimensional features respectively associated with the sub-attention weights and the sub-attention weights, where the wavelet features to be predicted corresponding to the ith feature represent a fusion result corresponding to the ith feature obtained according to the sub-attention weights, for example, the wavelet features to be predicted corresponding to the ith feature may be determined by the following two formulas:
wherein,representing a fusion result corresponding to an ith dimension feature obtained according to the sub-attention weights, k representing a kth dimension feature of the N-1 dimension features, +.>Sub-attention weights of wavelet features representing the i-th feature relative to the wavelet features of the k-th feature, +.>Wavelet features representing a kth dimension feature.
Wherein,representing the wavelet feature to be predicted corresponding to the ith dimension feature,/->And representing a fusion result corresponding to the ith dimension characteristic obtained according to the sub-attention weight.
According to the formula, information related to the ith dimension characteristic in the N-1 dimension characteristic can be fused according to the sub-attention weight, so that a fusion result corresponding to the ith dimension characteristic is obtained, the fusion result corresponding to the ith dimension characteristic is used as a wavelet characteristic to be predicted corresponding to the ith dimension characteristic, and a prediction category corresponding to a training sample can be obtained according to the wavelet characteristic to be predicted respectively corresponding to the N dimension characteristic in a subsequent step.
It should be noted that, for the ith dimension feature, since the wavelet feature to be predicted corresponding to the ith dimension feature is used as an input of the predictor model in a subsequent step, so that the predictor model can learn for the ith dimension feature, in order to enable the model to learn the independent information from the ith dimension feature, when determining the sub-attention weight of the wavelet feature of the ith dimension feature relative to the wavelet features of other N-1 dimension features, the sub-attention weight of the wavelet feature of the ith dimension feature relative to the independent information of the ith dimension feature can be determined, and the information related to the ith dimension feature in the N-1 dimension feature and the independent information of the ith dimension feature are fused according to the sub-attention weight, so as to obtain the to-be-predicted wavelet sub-feature including the independent information of the ith dimension feature.
For the ith dimension feature in the N dimension features, the wavelet features of the ith dimension feature and the N-1 dimension features can be respectively compared, the relevance of the wavelet features of the ith dimension feature and the N-1 dimension features can be determined, and then the sub-attention weights of the ith dimension feature and the N-1 dimension features can be determined, so that the model can learn knowledge of the ith dimension feature better through the sub-attention weights.
S204, according to the wave characteristics to be predicted generated by the sample wave characteristics and the attention weight, determining the prediction category corresponding to the training sample through a predictor model of the initial classification model.
The wave feature to be predicted refers to a fusion result obtained by fusing the sample wave features according to the attention weight, and because the attention weight corresponding to each sub-wave feature refers to the weight of the global information obtained by fusion, the attention weight of each sub-wave feature and the importance degree of the dimension feature corresponding to each sub-wave feature are positively correlated, so that the independent information corresponding to the sub-wave feature more relevant to the classification can be included in the wave feature to be predicted obtained by fusing the attention weight, and the effective knowledge relevant to the classification can be learned by the predictor model when the wave feature to be predicted is input as the predictor model, for example, the PATM module in the attention sub-model can determine the wave feature to be predicted corresponding to the sample wave feature according to the attention weight after determining the attention weight.
According to the wave characteristics to be predicted, the server can determine the prediction category corresponding to the training sample through a predictor model of the initial classification model, wherein the prediction category refers to the prediction result of the category of the object corresponding to the wave characteristics to be predicted, which is obtained through the predictor model, specifically, the predictor module can comprise a single-layer full-connection layer, the wave characteristics to be predicted can obtain a corresponding category score according to the single-layer full-connection layer, the category score is used for identifying the prediction score of the category identified by the wave characteristics to be predicted, and the corresponding prediction category is determined according to the category score through a Softmax function. Since the wave features to be predicted are obtained by fusion based on the sample wave features, and the object identified by the sample wave features is the target object, the object identified by the wave features to be predicted is still the target object, i.e. the prediction class refers to the prediction result of the class of the target object obtained by the prediction sub-model. That is, according to the wave characteristics to be predicted, the prediction result of the category of the target object is determined by the predictor model.
In one possible implementation manner, determining, in S204, a prediction category corresponding to the training sample according to the wave feature to be predicted generated by the sample wave feature and the attention weight through a predictor model of the initial classification model includes:
And determining the prediction category corresponding to the training sample through a predictor model of the initial classification model according to the wave characteristics to be predicted and the training sample.
The wave characteristics to be predicted refer to fusion results obtained by fusing the wave characteristics of the sample according to the attention weight, because the training samples are determined by table contents, the table contents comprise more dimensional contents of data dimensions, and the dimensional contents of different data dimensions are sparse, so fusion information reflected by the wave characteristics to be predicted may be more or less than original information reflected by the training samples, and particularly, when the fusion information reflected by the wave characteristics to be predicted is more than the original information reflected by the training samples, if the server only inputs the wave characteristics to be predicted into the predictor model, the information which can be referred by the predictor model is too much, at the moment, the training samples and the wave characteristics to be predicted can be jointly input into the predictor model, thereby reinforcing the original information reflected by the training samples, and further better learning the training samples by the predictor model; when the fusion information reflected by the wave characteristics to be predicted is relatively more than the original information reflected by the training samples, if the server only inputs the wave characteristics to be predicted into the prediction sub-model, which results in less information which can be referred to by the prediction sub-model, at this time, the wave characteristics to be predicted and the training samples can be jointly input into the prediction sub-model, so that the prediction sub-model can refer to the fusion information reflected by the wave characteristics to be predicted and the original information reflected by the training samples at the same time, and thus the problem that the prediction category output is unreliable due to the fact that the fusion information is relatively less is avoided, for example, the attention sub-model can comprise a feature Mixing (Token Mixing) module and a multi-layer Channel convolution (Channel Multilayer Perceptron, channel P) module, the Token Mixing module can jointly input the wave characteristics to be predicted and the training samples output by the PATM module as the Channel MLP module, and the Channel MLP module can carry out Channel convolution operation on the wave characteristics to be predicted and the training samples, so that the feature extraction is carried out on the wave characteristics to be predicted and the training samples, and the output of the Channel MLP module can be used as the input of the prediction module. That is, the server may input the wave feature to be predicted and the training sample into the predictor model together, so that the predictor model refers to the fusion information reflected by the wave feature to be predicted and the original information reflected by the training sample to obtain the prediction category, and the reliability of the prediction result is ensured.
In the actual training process of the initial classification model, before the training sample and the wave characteristics to be predicted are input into the predictor model together, the training sample may be subjected to feature extraction, for example, the attention model may include a Channel convolution (Channel Fibre Channel, channel FC) module, the Channel FC module may perform feature extraction on N-dimensional features in the training sample through a Channel convolution operation, and the Token Mixing module may use the output of the Channel FC module and the wavelet characteristics to be predicted output by the pat module together as the input of the Channel MLP module, so that the Channel MLP module may perform feature extraction on the wave characteristics to be predicted and the training sample, and the output of the Channel MLP module may be used as the input of the predictor model, so that the predictor model may better predict the wave characteristics to be predicted and the training sample.
In addition, an LN layer may be added before the Token Mixing module and the Channel MLP module in the attention sub-model, so that the data input to the Token Mixing module and the Channel MLP module are normalized and then subjected to corresponding operations.
Because the dimension content of the plurality of data dimensions included in the table type content is generally sparse, fusion information reflected by wave characteristics to be predicted may be more or less than original information reflected by training samples, and for this reason, by taking the training samples and the wave characteristics to be predicted together as inputs of the predictor model, the more reliable prediction type output by the predictor model can be ensured.
S205, model training is carried out on the initial classification model based on the difference between the predicted category and the actual category, and the attention weight is adjusted through the model training so as to obtain the classification model.
Because the prediction class refers to the prediction result of the class of the target object determined by the initial classification model, and the actual class refers to the true value of the class of the target object, the server can determine the difference between the prediction result and the true value obtained by the initial classification model based on the difference between the prediction class and the actual class, so as to obtain the training optimization direction of the initial classification model, wherein the training optimization direction can be reflected in various ways based on the difference between the prediction class and the actual class, for example, the training optimization direction can be reflected by a cross entropy loss function, the cross entropy loss function is determined by calculating the cross entropy between the prediction class and the actual class, and the initial classification model can be trained according to the training optimization direction so as to obtain the applicable classification model, and the classification model is used for carrying out class identification on the table class content of the object to be identified.
The attention weight can show that the independent information of each wavelet feature occupies the weight of the global information obtained by fusion, when the independent information of the wavelet features more relevant to classification can be fused into the global information, the smaller the difference between the corresponding prediction category and the actual category is, namely, the attention weight can be adjusted in the training process of the initial classification model, so that the obtained classification model can fuse the independent information corresponding to the dimension features of the data dimension more relevant to classification into the global information according to the determined attention weight, and further an accurate prediction result is obtained. It should be noted that, because the attention weight mainly reflects the relevance and the importance degree between the data dimensions, in the model training process, the attention weight is adjusted to measure the importance degree of each data dimension in the N data dimensions, so that the obtained classification model can learn more about the knowledge of the dimension characteristics corresponding to the important data dimensions in the table content of the object to be identified, thereby obtaining more accurate prediction results.
In addition, in the training process of the initial classification model, in order to adjust the attention weight to be optimal to obtain the classification model, a plurality of training samples are needed to train the initial classification model, although in the embodiment, the initial classification model does not need to set complex model parameters for determining the attention weight, the training samples determined by table type contents still comprise more dimension characteristics, for this purpose, the initial classification model can be trained by a random gradient descent method, specifically, one training sample can be randomly extracted from the plurality of training samples, updated according to a gradient, then one training sample is randomly extracted again, and updated according to a gradient, so that when the training sample amount is large, the classification model which can be applied can be obtained without training all the training samples, and the training efficiency of the model is further improved.
It can be seen that, based on determining the target object of the actual class, a training sample is determined according to the table class content of the target object, and since the table class content relates to the dimension content of the N data dimensions, the determined training sample includes an N-dimensional feature corresponding to the N-dimensional feature. In order to reduce adverse effects of excessive feature dimensions on model training, a sample wave feature corresponding to a training sample can be generated through an attention sub-model of an initial classification model, and wavelet features corresponding to each dimension of the N-dimensional features in the sample wave feature are represented in the form of waves. Therefore, when the initial classification model learns the attention weight of each dimension based on the training sample, the wavelet characteristics represented in a waveform mode are unified, other model parameters are not required to be introduced to analyze and compare each dimension characteristic, the relevance among each dimension characteristic can be directly measured from the angle of wave similarity through the wavelet characteristics, the importance degree of each dimension characteristic is further measured, the attention weight of each wavelet characteristic in the sample wave characteristics can be rapidly and accurately determined, when more wavelet characteristics with higher wave similarity exist in N wavelet characteristics, namely, more dimension characteristics with higher relevance in N dimension characteristics, the identification of the dimension characteristics to a target object is more unified, the dimension characteristics can reflect information related to the target object and the category, namely, the importance degree of the dimension characteristics is higher than that of other dimension characteristics in the training sample, and therefore, the determined attention weight can also effectively guide the model to learn effective knowledge related to the classification. After obtaining the wave characteristics to be predicted based on the attention weight fusion training sample global information, determining the corresponding prediction category through an initial classification model, and adjusting the attention weight according to the difference between the prediction category and the actual category of the training sample to obtain a classification model which can be used for carrying out category recognition on the table category content. Because each dimension characteristic is represented by the wave form, the characteristic expression form which is more convenient for determining the relevance and the importance degree of each dimension characteristic is not needed to be set with complex and redundant model parameters for determining the attention weight in the initial classification model, the model volume and the training time consumption are effectively reduced, and the model training efficiency is improved.
Since in quantum mechanics, the wave is mainly identified by amplitude and phase, inspired by quantum mechanics, in one possible implementation, for the ith dimension feature of the N-dimension features, a sample wave feature of the training sample is generated through an attention sub-model of the initial classification model in S202, including:
s11, according to the ith dimension feature, determining an amplitude feature and a phase feature corresponding to the ith dimension feature through an amplitude model parameter and a phase model parameter of a attention sub model of the initial classification model;
s12, generating wavelet features corresponding to the ith dimension features based on the amplitude features and the phase features corresponding to the ith dimension features.
In the application, a server represents a training sample in a wave form to obtain a corresponding sample wave characteristic, the sample wave characteristic comprises wavelet characteristics corresponding to N-dimensional characteristics in the training sample, the wavelet characteristics refer to dimension characteristics represented in a wave form, and because in quantum mechanics, one entity can be represented by a wave comprising an amplitude and a phase, the amplitude is used for representing the maximum intensity of the wave, the phase is used for representing the position in a wave period, and therefore, the method is inspired by quantum mechanics, in the embodiment, for the ith dimension characteristic in the N-dimensional characteristics, the wavelet characteristics of the ith dimension characteristic can be represented through the amplitude characteristics and the phase characteristics, the amplitude characteristics are used for representing the maximum characteristic value of the wavelet characteristics of the ith dimension characteristic, and the phase characteristics are used for representing the position of the ith dimension characteristic in the corresponding wavelet characteristics.
Specifically, for the ith dimension feature, the server may perform feature extraction on the ith dimension feature through an amplitude model parameter of the attention sub-model, where the amplitude model parameter refers to a model parameter of the attention sub-model used for obtaining the amplitude feature, and obtaining a maximum feature value of the ith dimension feature, so as to determine a corresponding amplitude feature, for example, the attention sub-model may include a Channel FC module, where the Channel FC module performs feature extraction on the ith dimension feature through a Channel convolution operation, and uses a result of feature extraction as the amplitude feature of the ith dimension feature, and a corresponding formula is as follows:
Channel-FC(t i ,W c )=W c t i ,i=1,2,…,N
wherein Channel-FC (t i ,W c ) Representing the channel convolution operation, W, of the ith dimension feature through the amplitude model parameters c Representing the parameters of the amplitude model, t i Representing the i-th dimensional feature.
|z i |=Channel-FC(t i ,W c ),i=1,2,…,N
Wherein z is i Representing the magnitude characteristic of the i-th dimensional characteristic, |z i I represents the absolute value of the amplitude feature of the i-th dimensional feature, channel-FC (t i ,W c ) The channel convolution operation is performed on the ith dimension feature through the amplitude model parameters.
The method can extract the characteristics of the ith dimension through the channel convolution operation by the formula to obtain the maximum characteristic value of the ith dimension, thereby determining the amplitude characteristics of the ith dimension.
At this time, for the ith dimension feature, the server may further perform phase estimation on the position of the ith dimension feature in the wave period through a phase model parameter of the attention sub-model, where the phase model parameter is a model parameter of the attention sub-model for acquiring the phase feature, so as to determine the phase feature of the ith dimension feature, and the corresponding formula is as follows:
θ i =Θ(t i ,W θ ),i=1,2,…,N
wherein θ i Phase characteristics, Θ (α), representing the phase estimation operation performed on α, W θ Representing phase model parameters, t i Representing the i-th dimensional feature.
The phase estimation operation can be performed on the ith dimension characteristic according to the phase model parameters through the formula, and the phase characteristic of the ith dimension characteristic is determined, wherein the phase estimation operation can also be realized through a channel convolution operation.
In this embodiment, the amplitude model parameter and the phase model parameter are the same for each dimension feature in the N-dimensional features in order to reduce the calculation amount of the model, because the data dimension related to the table type content is usually large.
After the amplitude characteristic and the phase characteristic corresponding to the ith dimension characteristic are determined, the ith dimension characteristic can be expressed in the form of waves through the amplitude characteristic and the phase characteristic, and then the wavelet characteristic corresponding to the ith dimension characteristic can be directly generated. Specifically, the wavelet features may be represented in a complex domain, that is, the wavelet features may be expanded using an euler formula, where the expanded wavelet features may include a real unit and an imaginary unit, and the corresponding formula is as follows:
Wherein,wavelet features representing features in dimension i, z i I represents the absolute value of the amplitude characteristic of the i-th dimensional characteristic, +. i Phase characteristics s representing the i-th dimension characteristics 2 =-1,|z i |⊙cosθ i Real part unit of wavelet characteristic representing ith dimension characteristic, s|z i |⊙sinθ i Imaginary units of wavelet features representing the i-th dimension feature.
It should be noted that, when determining the wavelet features corresponding to each dimension feature through the amplitude feature and the phase feature of each dimension feature, the wave similarity of different wavelet features may be comprehensively determined from two different angles, that is, the higher the similarity of the amplitude feature and the higher the similarity of the phase feature of different wavelet features, the higher the corresponding wave similarity, the lower the similarity of the amplitude feature and the lower the similarity of the phase feature of different wavelet features, and the lower the corresponding wave similarity.
In addition, when the wavelet features of the ith dimension feature include a real part unit and an imaginary part unit, generating the wavelet feature to be predicted corresponding to the ith dimension feature in the wave feature to be predicted according to the sub-attention weights of the ith dimension feature and the wavelet features of the N-1 dimension feature respectively associated with the sub-attention weights in the foregoing embodiment specifically includes:
And respectively fusing real part units and imaginary part units in the wavelet features of the N-1 dimensional features respectively associated with the child attention weights according to the child attention weights to generate the wavelet features to be predicted corresponding to the ith dimensional feature.
Since the wavelet features represented by the wave form include two parts of real part units and imaginary part units in the complex domain, when the wavelet features of the N-1-dimensional feature are fused according to the sub-attention weights, the real part units and the imaginary part units of the wavelet features of the N-1-dimensional feature can be fused respectively, and the corresponding formulas are as follows:
wherein,representing wavelet features to be predicted corresponding to the ith dimension feature, k representing the kth dimension feature in the N-1 dimension features,sub-attention weights, |z, representing wavelet features of the ith dimension feature relative to wavelet features of the kth dimension feature k I represents the absolute value of the amplitude characteristic of the kth dimension characteristic, +. k Representing the phase characteristics of the kth dimension.
According to the formula, the real part and the imaginary part of the information related to the ith dimension feature in the N-1 dimension feature can be respectively fused according to the sub-attention weight, and the fusion result is integrated, so that the wavelet feature to be predicted corresponding to the ith dimension feature is obtained.
Inspired by quantum mechanics, the phase characteristics and the amplitude characteristics corresponding to the respective dimension characteristics can be determined firstly, and the wavelet characteristics corresponding to the respective dimension characteristics can be determined according to the phase characteristics and the amplitude characteristics, so that the dimension characteristics of a plurality of data dimensions included in a training sample generated by table contents can be accurately represented in the form of waves.
When the attention model is determined by the amplitude model parameter and the phase model parameter, in order to improve the accuracy of the classification model, the amplitude model parameter and the phase model parameter may be adjusted based on the adjustment of the attention weight in the model training process, and in a possible implementation manner, the initial classification model is model-trained based on the difference between the prediction category and the actual category in S205, and the adjustment of the attention weight by the model training to obtain the classification model includes:
based on the difference between the predicted category and the actual category, model training is carried out on the initial classification model, and the attention weight, the amplitude model parameter and the phase model parameter are adjusted through model training so as to obtain the classification model.
When the trained classification model is adopted to conduct category recognition on the table class content of the object to be recognized, the amplitude characteristics corresponding to the dimension content of each data dimension in the table class content of the object to be recognized can be determined according to the amplitude model parameters of the classification model, the phase characteristics corresponding to the dimension content of each data dimension in the table class content of the object to be recognized can be determined according to the phase model parameters of the classification model, the wave characteristics corresponding to each data dimension can be determined according to the amplitude characteristics and the phase characteristics corresponding to each data dimension, and the classification result of the table class content of the object to be recognized can be obtained based on the wave characteristics corresponding to each data dimension.
Based on the above, the phase model parameter and the amplitude model parameter in the classification model are closely related to the accuracy of the wave characteristics corresponding to each data dimension of the table class content of the object to be identified, and the accuracy of the wave characteristics is closely related to the accuracy of the classification model, so that the phase model parameter and the amplitude model parameter in the classification model are closely related to the accuracy of the classification model. At this time, in order to ensure the accuracy of the classification model, in the process of performing model training on the initial classification model, on the basis of adjusting the attention weight, the phase model parameter and the amplitude model parameter can be adjusted, so that the obtained classification model can obtain more accurate wave characteristics corresponding to each data dimension of the table content of the object to be identified according to the determined phase model parameter and amplitude model parameter, thereby further improving the accuracy of the classification model.
In the training process of the initial classification model, model parameters for generating amplitude characteristics and phase characteristics can be adjusted so as to improve the accuracy of wave characteristics of each data dimension pair of the table class content of the object to be identified generated by the classification model, and further improve the accuracy of the classification model.
In one possible implementation, the phase feature corresponding to the ith dimension feature is determined based on semantic information of the ith dimension feature.
In the actual training process of the model, the dimensional features of different training samples under the same data dimension are usually different, for example, when the data dimension is color, the information represented by the dimensional feature corresponding to the training sample a may be red, and the information represented by the dimensional feature corresponding to the training sample B may be green.
In this regard, when there is a distinction between dimensional features of different training samples in the same data dimension, there should be a distinction between wavelet features determined according to different dimensional features of different training samples, in order to more clearly show the distinction between wavelet features, corresponding phase features may be determined according to semantic information of each dimensional feature, where the semantic information refers to a linguistic meaning of information expression shown by the dimensional features, so that the distinction between different training samples in the same data dimension may be shown on the phase features, that is, the phase features of the wavelet features may be dynamically changed according to specific semantics of the dimensional features of the training samples, so that the corresponding wavelet features may also be dynamically changed, thereby implementing dynamic fusion of wavelet features and improving fusion accuracy.
Because the dimension characteristics of different training samples under the same data dimension are usually different, the corresponding phase characteristics are determined through the semantic information of the dimension characteristics, so that the difference in the semantic information reflected by the dimension characteristics of the different training samples can be reflected on the phase, thereby realizing the dynamic fusion of wavelet characteristics and improving the fusion precision.
Because the training samples corresponding to the table contents generally have more data dimensions and the dimension features corresponding to the data dimensions are sparse, in one possible manner, the attention sub-model may include M, M >1, where the output data of the jth attention sub-model is the input data of the j+1th attention sub-model, the output data of the mth attention sub-model is the wave feature to be predicted, and j is greater than or equal to 1.
The independent information of each dimension characteristic in the training sample can be fused through the attention sub-model to obtain global information capable of integrally expressing the training sample, and because the data dimension related to the training sample is usually more, and the dimension characteristic corresponding to each data dimension is also sparse, in order to strengthen the fusion degree of the global information, as shown in fig. 3, the initial classification model can include M attention sub-models, M >1, so as to fuse the training sample for multiple times, wherein the M attention sub-models are sequentially arranged, for example, the original training sample can be used as input data of the 1 st attention sub-model, the output data of the 1 st attention sub-model can be used as input data of the 2 nd attention sub-model, namely, the output data of the previous attention sub-model in the M attention sub-models can be used as input data of the next attention sub-model, and the corresponding formula is as follows:
T 1 =T,T j+1 =F j (T j )
Wherein T is 1 Input data representing the 1 st attention sub model, T represents training samples, T j+1 Input data representing j+1th attention sub-model, F j (T j ) Output data representing the j-th attention sub-model.
After the training sample passes through the M attention sub-models, as shown in fig. 3, output data of the mth attention sub-model can be used as a feature of a wave to be predicted, that is, output data obtained by fusing the M attention sub-models for multiple times can be used as a final fusion result obtained by fusing the training sample.
Because the data dimensionality related to the training samples corresponding to the table content is usually more, and the dimensionality characteristics corresponding to each data dimensionality are sparse, the training samples are fused for multiple times through a plurality of attention sub-models, so that the fusion degree of global information is enhanced, and the accuracy of the models is improved.
In order to alleviate the over-fitting problem in model training when the initial classification model includes M attention sub-models, in one possible implementation, the initial classification model may further include a downsampling sub-model, disposed between a kth attention sub-model and a kth +1 attention sub-model of the M attention sub-models,
The output data of the kth attention sub-model is the input data of the downsampling sub-model, the output data of the downsampling sub-model is the input data of the kth+1th attention sub-model, the downsampling sub-model is used for downsampling the input data, and the channel number of the output data of the downsampling sub-model is smaller than that of the input data of the downsampling sub-model through downsampling.
When the initial classification model includes M attention sub-models, the multiple attention sub-models may cause an overfitting problem in the model training process, for which a downsampling sub-model is set between two adjacent attention sub-models in the M attention sub-models, as shown in fig. 3, and a downsampling sub-model is set between the kth attention sub-model and the kth+1th attention sub-model, where output data of a previous attention sub-model of the downsampling sub-model is input data of the downsampling sub-model, and output data of the downsampling sub-model is input data of a next attention sub-model of the downsampling sub-model. The input data can be downsampled through the downsampling sub-model, the channel number of the input data is reduced, the channel number of the output data of the downsampling sub-model is smaller than that of the input data of the downsampling sub-model, so that the problem of overfitting in the model training process is effectively alleviated, meanwhile, the M attention sub-models are divided into different stages, namely the M attention sub-models are divided into stages through the downsampling sub-model, the downsampling sub-model can downsample the input data in various modes, and the channel number of the input data can be reduced through simple channel convolution.
It should be noted that, for M attention sub-models in the initial classification model, in order to further alleviate the over-fitting problem in the model training process, a plurality of downsampling sub-models may be set, and different downsampling sub-models may be set between different two attention sub-models, so as to reduce the number of channels of the input data multiple times.
When the initial classification model comprises M attention sub-models, the downsampling sub-model is arranged between two adjacent attention sub-models in the M attention sub-models, and can downsample input data, so that the number of channels of output data of the downsampling sub-model is smaller than that of the input data of the downsampling sub-model, and the problem of over-fitting in the model training process is effectively alleviated.
In one possible implementation, when the dimension content of the N data dimensions related to the table class content includes Q continuous data contents and P discrete data contents, determining the training sample according to the table class content identifying the target object in S201 includes:
s21, coding the P discrete data contents according to the number of the numerical values related to the P discrete data contents to obtain P coding features;
S22, generating training samples based on the P coding features and Q data features corresponding to the Q continuous data contents respectively.
The table type content relates to a plurality of data dimensions, the dimension content of the plurality of data dimensions is usually heterogeneous, for example, continuous data content such as continuous values and discrete data content such as ordinal numbers, the continuous data content refers to data content which can be subdivided into infinite numbers between two data contents, for example, like degree data content which can be subdivided into infinite numbers is continuous data content, the discrete data content refers to data content with limited number of values between two data contents, for example, when one grade has ten grades, the number of grades between any two grades is limited, so the number of grades is discrete data content.
In this embodiment, the dimension content of N data dimensions related to the table type content needs to be converted into a training sample including N dimensional features for subsequent analysis, because the number of values between the discrete type data contents is limited, that is, the discrete type data content is not continuous, the accuracy of semantic understanding of the model for the discontinuous data content may be low, for this, when the dimension content of N data dimensions related to the table type content includes Q continuous type data contents and P discrete type data contents, the server may encode P discrete type data contents according to the number of values related to P discrete type data contents, for example, may perform single-hot transcoding on P discrete type data contents during encoding, convert P encoding features of the continuity of the P discrete type data contents, where the information represented by the encoding features is continuous. For the P-th discrete data content in the P-th discrete data content, a related formula for obtaining the coding characteristic of the P-th discrete data content is as follows:
Wherein,coding features representing the p-th discrete data content,/->Bias representing p-th discrete data content,/->Weight representing p-th discrete data content,/->One-hot transcoding result representing p-th discrete data content,/th discrete data content>Meaning that the coding features of the p-th discrete data content are in the corresponding d-dimensional space.
Through the formula, for the P-th discrete data content in the P-th discrete data content, after the P-th discrete data content is subjected to single-heat code conversion, the P-th discrete data content can be multiplied by the weight of the P-th discrete data content, and the bias of the P-th discrete data content is added, so that the coding characteristic of the P-th discrete data content is obtained.
Since the Q continuous data contents are continuous, the corresponding Q data features can be determined directly from the Q continuous data contents. For the Q-th continuous data content in the Q continuous data contents, a related formula for obtaining the data characteristics of the Q-th continuous data content is as follows:
wherein,data characteristic representing the q-th continuous data content,/->Bias representing the q-th continuous data content,/->Weight representing the q-th continuous data content,/- >Represents the q-th continuity data content, +.>Meaning that the coding features of the q-th contiguous data content are in the corresponding d-dimensional space.
Through the formula, the Q-th continuous data content in the Q continuous data contents can be directly multiplied by the weight of the Q-th continuous data content, and the bias of the Q-th continuous data content is added to obtain the data characteristics of the Q-th continuous data content.
After the P coding features and the Q data features are acquired, since both the coding features and the data features are continuous, the P coding features and the Q data features may be used as N-dimensional features to obtain a training sample including the N-dimensional features.
In order to avoid that the discrete data content influences the accuracy of the semantic understanding of the model, continuous coding features are obtained by coding the discrete data content, and the continuous coding features and the data features corresponding to the continuous data content are used as N-dimensional features, so that information reflected by the N-dimensional features in a training sample is continuous, and the accuracy of the model is improved.
In one possible implementation, generating the training samples in S22 based on the P coding features and Q data features corresponding to the Q continuous data contents respectively includes:
and stacking the P coding features and the Q data features as two whole adjacent layers to obtain a training sample.
Although the discrete feature and the coding feature are derived from table content, the coding feature is a feature obtained after the discrete data content is coded, the data feature is a feature obtained after the continuous data content is processed, and since the data dimension corresponding to the discrete data content and the data dimension corresponding to the continuous data content are generally low in relevance, for example, the dimension content corresponding to another data dimension with higher relevance to the data dimension corresponding to the discrete data content is also generally discrete data content, or the dimension content corresponding to another data dimension with higher relevance to the continuous data content is also generally continuous data content, in order to facilitate the analysis of relevance between the dimensional features in the training sample in the subsequent step, P coding features and Q data features may be separately stacked as two integers to obtain the training sample, so that P coding features can be taken as a whole in the training sample, and Q data features can be taken as a whole in the training sample, and the corresponding formula is as follows:
Wherein T represents a training sample,data characteristic representing the 1 st continuous data content,/->Data characteristic representing the Q-th continuous data content,/->Coding features representing the 1 st discrete data content,/->Coding features representing the P-th discrete data content,/->Representing training samples in a corresponding N x d dimensional space.
By the above formula, P coding features and Q data features can be stacked as two whole neighbors to obtain a training sample comprising separate stacks of coding features and data features.
After obtaining training samples comprising P encoded features and Q data features as two integrally adjacent stacks, the training samples may be processedIn another simple representation, the following formula is used:
T=[t 1 ,t 2 ,…,t N ]
wherein T represents a training sample, T 1 Representing the 1 st dimension of features in the training sample, t 2 Representing training samples2 nd dimensional feature, t N Representing the nth dimensional feature in the training sample.
Because the correlation between the data dimension corresponding to the discrete data content and the data dimension corresponding to the continuous data content is generally low, the P coding features and the Q data features are stacked adjacently as two whole bodies, so that a training sample comprising separate stacking of the coding features and the data features is obtained, and the correlation between the dimensional features in the training sample can be analyzed in the subsequent step.
On the basis of the foregoing embodiments corresponding to fig. 1 to 3, fig. 4 is a schematic device diagram of a model training device provided by the embodiment of the present application, where the model training device 400 includes a first determining unit 401, a generating unit 402, a second determining unit 403, a third determining unit 404, and a training unit 405:
a first determining unit 401, configured to determine a training sample according to table class content identifying a target object, where a sample label corresponding to the training sample is used to identify an actual class of the target object, and N-dimensional features included in the training sample are determined according to dimension content of N data dimensions related to the table class content, and N >1;
a generating unit 402, configured to generate, using the training sample as input data of the initial classification model, a sample wave feature of the training sample through an attention sub-model of the initial classification model, where wavelet features corresponding to each dimension of the N-dimensional features are represented in a wave form;
a second determining unit 403 for determining an attention weight of each of the wavelet features in the sample wave features based on the wave similarity between the wavelet features;
a third determining unit 404, configured to determine, according to the wave feature to be predicted generated by the sample wave feature and the attention weight, a prediction category corresponding to the training sample through a predictor model of the initial classification model;
The training unit 405 is configured to perform model training on the initial classification model based on the difference between the predicted class and the actual class, adjust the attention weight through the model training to obtain a classification model, and the classification model is used for performing class recognition on the table class content of the object to be recognized.
In one possible implementation, the generating unit 402 is configured to:
aiming at the ith dimension feature in the N dimension features, according to the ith dimension feature, determining the amplitude feature and the phase feature corresponding to the ith dimension feature through the amplitude model parameter and the phase model parameter of the attention sub model of the initial classification model;
and generating wavelet features corresponding to the ith dimension features based on the amplitude features and the phase features corresponding to the ith dimension features.
In one possible implementation, the training unit 405 is configured to:
based on the difference between the predicted category and the actual category, model training is carried out on the initial classification model, and the attention weight, the amplitude model parameter and the phase model parameter are adjusted through model training to obtain the classification model.
In one possible implementation, the phase feature corresponding to the ith dimension feature is determined based on semantic information of the ith dimension feature.
In a possible implementation, the second determining unit 403 is configured to:
Aiming at the ith dimension feature in the N dimension features, determining the sub-attention weights of the wavelet features of the ith dimension feature relative to the wavelet features of the N-1 dimension features according to the wave similarity between the wavelet features of the ith dimension feature and the wavelet features of other N-1 dimension features respectively;
the generating unit 402 is further configured to:
and aiming at the ith dimension feature, generating a wavelet feature to be predicted corresponding to the ith dimension feature in the wave features to be predicted according to the sub-attention weights and the wavelet features of the N-1 dimension features respectively associated with the sub-attention weights.
In a possible implementation manner, the third determining unit 404 is configured to:
and determining the prediction category corresponding to the training sample through a predictor model of the initial classification model according to the wave characteristics to be predicted and the training sample.
In one possible implementation, the attention sub-model includes M, M >1, where output data of the jth attention sub-model is input data of the j+1th attention sub-model, and output data of the mth attention sub-model is a feature of the wave to be predicted, j being equal to or greater than 1.
In one possible implementation, the initial classification model further includes a downsampling sub-model disposed between a kth attention sub-model and a kth +1 attention sub-model of the M attention sub-models,
The output data of the kth attention sub-model is the input data of the downsampling sub-model, the output data of the downsampling sub-model is the input data of the kth+1th attention sub-model, the downsampling sub-model is used for downsampling the input data, and the channel number of the output data of the downsampling sub-model is smaller than that of the input data of the downsampling sub-model through downsampling.
In a possible implementation manner, the first determining unit 401 is configured to:
when the dimension content of N data dimensions related to the table type content comprises Q continuous data contents and P discrete data contents, P coding features are obtained by coding the P discrete data contents according to the number of values related to the P discrete data contents;
and generating training samples based on the P coding features and Q data features corresponding to the Q continuous data contents respectively.
In a possible implementation manner, the first determining unit 401 is configured to:
and stacking the P coding features and the Q data features as two whole adjacent layers to obtain a training sample.
It can be seen that, based on determining the target object of the actual class, a training sample is determined according to the table class content of the target object, and since the table class content relates to the dimension content of the N data dimensions, the determined training sample includes an N-dimensional feature corresponding to the N-dimensional feature. In order to reduce adverse effects of excessive feature dimensions on model training, a sample wave feature corresponding to a training sample can be generated through an attention sub-model of an initial classification model, and wavelet features corresponding to each dimension of the N-dimensional features in the sample wave feature are represented in the form of waves. Therefore, when the initial classification model learns the attention weight of each dimension based on the training sample, the wavelet characteristics represented in a waveform mode are unified, other model parameters are not required to be introduced to analyze and compare each dimension characteristic, the relevance among each dimension characteristic can be directly measured from the angle of wave similarity through the wavelet characteristics, the importance degree of each dimension characteristic is further measured, the attention weight of each wavelet characteristic in the sample wave characteristics can be rapidly and accurately determined, when more wavelet characteristics with higher wave similarity exist in N wavelet characteristics, namely, more dimension characteristics with higher relevance in N dimension characteristics, the identification of the dimension characteristics to a target object is more unified, the dimension characteristics can reflect information related to the target object and the category, namely, the importance degree of the dimension characteristics is higher than that of other dimension characteristics in the training sample, and therefore, the determined attention weight can also effectively guide the model to learn effective knowledge related to the classification. After obtaining the wave characteristics to be predicted based on the attention weight fusion training sample global information, determining the corresponding prediction category through an initial classification model, and adjusting the attention weight according to the difference between the prediction category and the actual category of the training sample to obtain a classification model which can be used for carrying out category recognition on the table category content. Because each dimension characteristic is represented by the wave form, the characteristic expression form which is more convenient for determining the relevance and the importance degree of each dimension characteristic is not needed to be set with complex and redundant model parameters for determining the attention weight in the initial classification model, the model volume and the training time consumption are effectively reduced, and the model training efficiency is improved.
The embodiment of the application also provides a computer device, which is the computer device introduced above, and can comprise a terminal device or a server, and the model training device can be configured in the computer device. The computer device is described below with reference to the accompanying drawings.
If the computer device is a terminal device, please refer to fig. 5, an embodiment of the present application provides a terminal device, taking the terminal device as a mobile phone as an example:
fig. 5 is a block diagram showing a part of the structure of a mobile phone related to a terminal device provided by an embodiment of the present application. Referring to fig. 5, the mobile phone includes: radio Frequency (RF) circuitry 1410, memory 1420, input unit 1430, display unit 1440, sensor 1450, audio circuitry 1460, wireless fidelity (WiFi) module 1470, processor 1480, and power supply 1490. Those skilled in the art will appreciate that the handset configuration shown in fig. 5 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or may be arranged in a different arrangement of components.
The following describes the components of the mobile phone in detail with reference to fig. 5:
The RF circuit 1410 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the downlink information is processed by the processor 1480; in addition, the data of the design uplink is sent to the base station.
The memory 1420 may be used to store software programs and modules, and the processor 1480 performs various functional applications and data processing of the cellular phone by executing the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The input unit 1430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1430 may include a touch panel 1431 and other input devices 1432.
The display unit 1440 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1440 may include a display panel 1441.
The handset can also include at least one sensor 1450, such as a light sensor, motion sensor, and other sensors.
Audio circuitry 1460, speaker 1461, microphone 1462 may provide an audio interface between the user and the handset.
WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1470, so that wireless broadband Internet access is provided for the user.
The processor 1480 is a control center of the handset, connects various parts of the entire handset using various interfaces and lines, performs various functions of the handset and processes data by running or executing software programs and/or modules stored in the memory 1420, and invoking data stored in the memory 1420.
The handset also includes a power supply 1490 (e.g., a battery) that provides power to the various components.
In this embodiment, the processor 1480 included in the terminal apparatus also has the following functions:
determining a training sample according to table class content of the identified target object, wherein a sample label corresponding to the training sample is used for identifying the actual class of the target object, and N-dimensional characteristics included in the training sample are determined according to dimension content of N data dimensions related to the table class content, wherein N is more than 1;
The training sample is used as input data of an initial classification model, and sample wave characteristics of the training sample are generated through an attention sub-model of the initial classification model, wherein wavelet characteristics corresponding to each dimension of N-dimensional characteristics are represented in a wave form in the sample wave characteristics;
determining the attention weight of each wavelet feature in the sample wave features according to the wave similarity among the wavelet features;
according to the wave characteristics to be predicted generated by the sample wave characteristics and the attention weights, determining a prediction category corresponding to the training sample through a predictor model of the initial classification model;
based on the difference between the predicted category and the actual category, model training is carried out on the initial classification model, the attention weight is adjusted through the model training to obtain a classification model, and the classification model is used for carrying out category recognition on the table category content of the object to be recognized.
If the computer device is a server, as shown in fig. 6, fig. 6 is a block diagram of the server 1500 according to the embodiment of the present application, where the server 1500 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (Central Processing Units, abbreviated as CPU) 1522 (e.g., one or more processors) and a memory 1532, one or more storage media 1530 (e.g., one or more mass storage devices) storing application programs 1542 or data 1544. Wherein the memory 1532 and the storage medium 1530 may be transitory or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a series of instruction operations on the server. Still further, the central processor 1522 may be configured to communicate with a storage medium 1530 and execute a series of instruction operations on the storage medium 1530 on the server 1500.
The Server 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input/output interfaces 1558, and/or one or more operating systems 1541, such as Windows Server TM ,Mac OS X TM ,Unix TM ,Linux TM ,FreeBSD TM Etc.
The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 6.
In addition, the embodiment of the application also provides a storage medium for storing a computer program for executing the method provided by the embodiment.
The present application also provides a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method provided by the above embodiments.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, where the above program may be stored in a computer readable storage medium, and when the program is executed, the program performs steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: read-only Memory (ROM), RAM, magnetic disk or optical disk, and the like, on which a computer program can be stored.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
The foregoing is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be included in the scope of the present application. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.
Claims (14)
1. A method of model training, the method comprising:
determining a training sample according to table class content of an identified target object, wherein a sample label corresponding to the training sample is used for identifying the actual class of the target object, and N-dimensional characteristics included in the training sample are determined according to dimension content of N data dimensions related to the table class content, wherein N is more than 1;
the training sample is used as input data of an initial classification model, and sample wave characteristics of the training sample are generated through an attention sub-model of the initial classification model, wherein wavelet characteristics corresponding to each dimension of the N-dimension characteristics are represented in a wave form;
determining an attention weight of each of the wavelet features in the sample wave feature according to the wave similarity among the wavelet features;
according to the wave characteristics to be predicted generated by the sample wave characteristics and the attention weights, determining a prediction category corresponding to the training sample through a predictor model of the initial classification model;
based on the difference between the predicted category and the actual category, performing model training on the initial classification model, and adjusting the attention weight through the model training to obtain a classification model, wherein the classification model is used for carrying out category recognition on table category contents of an object to be recognized.
2. The method of claim 1, wherein generating sample wave features of the training sample for an i-th dimension feature of the N-dimensional features by the attention sub-model of the initial classification model comprises:
according to the ith dimension feature, determining an amplitude feature and a phase feature corresponding to the ith dimension feature through an amplitude model parameter and a phase model parameter of a attention sub-model of the initial classification model;
and generating wavelet features corresponding to the ith dimension features based on the amplitude features and the phase features corresponding to the ith dimension features.
3. The method of claim 2, wherein the model training the initial classification model based on the difference between the predicted class and the actual class, adjusting the attention weight through the model training to obtain a classification model, comprises:
based on the difference between the predicted category and the actual category, model training is performed on the initial classification model, and the attention weight, the amplitude model parameter and the phase model parameter are adjusted through the model training to obtain a classification model.
4. The method of claim 2, wherein the phase feature corresponding to the i-th dimensional feature is determined based on semantic information of the i-th dimensional feature.
5. The method of claim 1, wherein said determining the attention weight of each of said wavelet features in said sample wave feature based on the wave similarity between said wavelet features for an i-th one of the N-dimensional features comprises:
determining sub-attention weights of the wavelet features of the ith dimension feature relative to the wavelet features of the N-1 dimension feature according to wave similarity between the wavelet features of the ith dimension feature and the wavelet features of other N-1 dimension features respectively;
for the i-th dimensional feature, the method further comprises:
and generating the wavelet characteristics to be predicted corresponding to the ith dimension characteristic in the wave characteristics to be predicted according to the sub-attention weights and the wavelet characteristics of the N-1 dimension characteristics respectively associated with the sub-attention weights.
6. The method according to claim 1, wherein the determining, by a predictor model of the initial classification model, a prediction category corresponding to the training sample according to a wave feature to be predicted generated by the sample wave feature and the attention weight comprises:
and determining a prediction category corresponding to the training sample through a predictor model of the initial classification model according to the wave characteristics to be predicted and the training sample.
7. The method according to claim 1, wherein the attention sub-model comprises M, M >1, wherein the output data of the j-th attention sub-model is the input data of the j+1th attention sub-model, and the output data of the M-th attention sub-model is the wave feature to be predicted, j is equal to or greater than 1.
8. The method of claim 7, wherein the initial classification model further comprises a downsampling sub-model disposed between a kth and a (k+1) th of the M attention sub-models,
the output data of the kth attention sub-model is the input data of the downsampling sub-model, the output data of the downsampling sub-model is the input data of the kth+1th attention sub-model, the downsampling sub-model is used for downsampling the input data, and the channel number of the output data of the downsampling sub-model is smaller than that of the input data of the downsampling sub-model through the downsampling.
9. The method according to any one of claims 1-8, wherein the dimension content of the N data dimensions related to the table class content includes Q continuous data contents and P discrete data contents, and the determining the training sample according to the table class content identifying the target object includes:
Coding the P discrete data contents according to the number of the numerical values related to the P discrete data contents to obtain P coding features;
and generating the training sample based on the P coding features and Q data features respectively corresponding to the Q continuous data contents.
10. The method of claim 9, wherein the generating the training samples based on the P coding features and Q data features respectively corresponding to the Q consecutive data contents comprises:
and stacking the P coding features and the Q data features as two whole adjacent layers to obtain the training sample.
11. A model training device, which is characterized in that the device comprises a first determining unit, a generating unit, a second determining unit, a third determining unit and a training unit:
the first determining unit is configured to determine a training sample according to table class content identifying a target object, where a sample tag corresponding to the training sample is used to identify an actual class of the target object, and N-dimensional features included in the training sample are determined according to dimension content of N data dimensions related to the table class content, where N is greater than 1;
The generating unit is used for taking the training sample as input data of an initial classification model, and generating sample wave characteristics of the training sample through an attention sub-model of the initial classification model, wherein wavelet characteristics corresponding to each dimension of the N-dimension characteristics are represented in a wave form in the sample wave characteristics;
the second determining unit is used for determining the attention weight of each wavelet feature in the sample wave features according to the wave similarity among the wavelet features;
the third determining unit is configured to determine, according to the wave feature to be predicted generated by the sample wave feature and the attention weight, a prediction category corresponding to the training sample through a predictor model of the initial classification model;
the training unit is used for carrying out model training on the initial classification model based on the difference between the prediction category and the actual category, adjusting the attention weight through the model training to obtain a classification model, and the classification model is used for carrying out category recognition on the table category content of the object to be recognized.
12. A computer device, the computer device comprising a processor and a memory:
The memory is used for storing a computer program and transmitting the computer program to the processor;
the processor is configured to perform the method of any of claims 1-10 according to the computer program.
13. A computer readable storage medium for storing a computer program which, when executed by a computer device, implements the method of any one of claims 1-10.
14. A computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of claims 1-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310572666.3A CN117216542A (en) | 2023-05-19 | 2023-05-19 | Model training method and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310572666.3A CN117216542A (en) | 2023-05-19 | 2023-05-19 | Model training method and related device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117216542A true CN117216542A (en) | 2023-12-12 |
Family
ID=89048543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310572666.3A Pending CN117216542A (en) | 2023-05-19 | 2023-05-19 | Model training method and related device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117216542A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117932497A (en) * | 2024-03-19 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Model determination method and related device |
-
2023
- 2023-05-19 CN CN202310572666.3A patent/CN117216542A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117932497A (en) * | 2024-03-19 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Model determination method and related device |
CN117932497B (en) * | 2024-03-19 | 2024-06-25 | 腾讯科技(深圳)有限公司 | Model determination method and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111554268B (en) | Language identification method based on language model, text classification method and device | |
US11055567B2 (en) | Unsupervised exception access detection method and apparatus based on one-hot encoding mechanism | |
CN109885756B (en) | CNN and RNN-based serialization recommendation method | |
US20210005183A1 (en) | Orthogonally constrained multi-head attention for speech tasks | |
CN110598620B (en) | Deep neural network model-based recommendation method and device | |
Zeng et al. | Spatial and temporal learning representation for end-to-end recording device identification | |
US20230206928A1 (en) | Audio processing method and apparatus | |
CN111666416A (en) | Method and apparatus for generating semantic matching model | |
Aziguli et al. | A robust text classifier based on denoising deep neural network in the analysis of big data | |
Kopčan et al. | Anomaly detection using autoencoders and deep convolution generative adversarial networks | |
CN117113146A (en) | Main body classification processing method, related device and medium | |
US20220207353A1 (en) | Methods and systems for generating recommendations for counterfactual explanations of computer alerts that are automatically detected by a machine learning algorithm | |
EA038264B1 (en) | Method of creating model for analysing dialogues based on artificial intelligence for processing user requests and system using such model | |
CN117216542A (en) | Model training method and related device | |
HajiAkhondi-Meybodi et al. | Vit-cat: Parallel vision transformers with cross attention fusion for popularity prediction in mec networks | |
CN116049650A (en) | RFSFD-T network-based radio frequency signal fingerprint identification method and system | |
CN116090536A (en) | Neural network optimization method, device, computer equipment and storage medium | |
CN113160823B (en) | Voice awakening method and device based on impulse neural network and electronic equipment | |
CN115130650A (en) | Model training method and related device | |
Nag et al. | CNN based approach for post disaster damage assessment | |
Zhang et al. | Small-footprint keyword spotting based on gated Channel Transformation Sandglass residual neural network | |
Serghini et al. | 1-D Convolutional Neural Network-Based Models for Cooperative Spectrum Sensing | |
Li et al. | [Retracted] Visual Information Features and Machine Learning for Wushu Arts Tracking | |
Zeng et al. | End-to-end Recording Device Identification Based on Deep Representation Learning | |
CN112733930B (en) | Human behavior sensing system, method and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |