WO2021115159A1 - 文字识别网络模型训练方法、文字识别方法、装置、终端及其计算机存储介质 - Google Patents
文字识别网络模型训练方法、文字识别方法、装置、终端及其计算机存储介质 Download PDFInfo
- Publication number
- WO2021115159A1 WO2021115159A1 PCT/CN2020/133116 CN2020133116W WO2021115159A1 WO 2021115159 A1 WO2021115159 A1 WO 2021115159A1 CN 2020133116 W CN2020133116 W CN 2020133116W WO 2021115159 A1 WO2021115159 A1 WO 2021115159A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attention
- feature
- channel
- picture
- input
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 80
- 238000012549 training Methods 0.000 title claims abstract description 49
- 230000007246 mechanism Effects 0.000 claims abstract description 54
- 239000013598 vector Substances 0.000 claims abstract description 31
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 22
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 52
- 238000010606 normalization Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000006835 compression Effects 0.000 claims description 4
- 238000007906 compression Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 7
- 230000000052 comparative effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the embodiments of the present application relate to the field of computer vision technology, and more specifically, to a text recognition network model training method, text recognition method, device, terminal, and computer storage medium thereof.
- Handwritten Chinese Character Recognition has always been a very active and challenging research direction in the field of computer vision. It has been studied since the 1960s and has made great progress. Many real-life applications are closely related to it. , Such as mail sorting, bank check reading, transcription of books and handwritten notes, etc. Although many studies have been carried out, the recognition of handwritten Chinese characters is still a very challenging task. On the one hand, due to the large number of Chinese character categories and the existence of a large number of similar characters, it is easy to confuse; on the other hand, due to differences People have huge differences in writing styles, resulting in obvious visual differences even for the same type of characters, which brings great difficulties to handwritten Chinese character recognition.
- the embodiments of the present application provide a text recognition network model training method, a text recognition method, a device, a terminal and a computer storage medium thereof, which can improve the accuracy of visually confusing text recognition.
- the embodiment of the present application provides a method for training a text recognition network model, which includes the following steps: standardize each picture in the original data set, and mark each picture with a character category to obtain a character category labeled Standard training data set; input each picture in the standard training data set into a convolutional neural network, extract convolutional features of the picture, and obtain a depth feature map containing the convolutional features; input the depth feature map with multiple
- the attention mechanism module of each channel is used to obtain the attention weight of each channel, and each channel of the depth feature map is re-scaled using the attention weight to obtain multiple attention feature maps;
- the force feature maps are respectively input to the fully connected layer to obtain multiple attention feature vectors; the multiple attention feature vectors are feature-fused and input to the character fully connected layer for character category prediction; according to the result of the character category prediction Labeling with the character category, designing a target loss function, using a backpropagation algorithm to iterate, minimizing the target loss function, and optimizing the attention weight.
- an embodiment of the present application provides a text recognition method, which includes: standardizing a picture to be tested and scaling it to a preset height H and a preset width W; inputting the picture to be tested into a convolutional neural network, and extracting the picture to be tested Convolution features of the picture to obtain a depth feature map containing the convolution features; input the depth feature map to an attention mechanism module with multiple channels to obtain the attention weight of each channel, and use the attention weight Re-scale each channel of the depth feature map to obtain multiple attention feature maps; input each of the attention feature maps into the fully connected layer to obtain multiple attention feature vectors; The force feature vector is used for feature fusion and input to the character class fully connected layer for character class prediction.
- an embodiment of the present application provides a text recognition network model training device, including: a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor executes the computer program When realizing the text recognition network model training method as described in the embodiment of the second aspect.
- an embodiment of the present application provides a character recognition device, including: a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
- the processor executes the computer program as follows The character recognition method described in the embodiment of the third aspect.
- an embodiment of the present application provides a terminal, which includes the text recognition network model training device as described in the foregoing fourth aspect or includes the text recognition device as described in the fifth aspect.
- an embodiment of the present application provides a computer storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the text recognition network model training method described in the embodiment of the second aspect or to Perform the character recognition method as described in the embodiment of the third aspect.
- FIG. 1 is a schematic diagram of the process of a text recognition network model training method and a text recognition method provided by an embodiment of the present application;
- FIG. 2 is a flowchart of a method for training a text recognition network model provided by an embodiment of the present application
- FIG. 3 is a network structure diagram of a text recognition network model provided by an embodiment of the present application, and "CA” represents a channel attention mechanism (Channel Attention);
- Fig. 4 is a structural diagram of a convolutional neural network provided by an embodiment of the present application.
- FIG. 5 is a structural diagram of an attention mechanism module provided by an embodiment of the present application.
- FIG. 6 is a flowchart of a character recognition method provided by another embodiment of the present application.
- FIG. 7 is a structural diagram of a text recognition network model training device provided by another embodiment of the present application.
- FIG. 8 is a structural diagram of a character recognition device provided by another embodiment of the present application.
- Handwritten Chinese Character Recognition has always been a very active and challenging research direction in the field of computer vision. It has been studied since the 1960s and has made great progress. Many real-life applications are closely related to it. , Such as mail sorting, bank check reading, transcription of books and handwritten notes, etc. Although many studies have been carried out, the recognition of handwritten Chinese characters is still a very challenging task. On the one hand, due to the large number of Chinese character categories and the existence of a large number of similar characters, it is easy to confuse; on the other hand, due to differences People have huge differences in writing styles, resulting in obvious visual differences even for the same type of characters, which brings great difficulties to handwritten Chinese character recognition.
- a method has proposed a handwritten Chinese character recognition method based on recurrent neural network (RNN) and attention mechanism.
- RNN recurrent neural network
- This method uses residual convolutional neural network as the backbone network, and uses RNN to iteratively update the attention distribution to correct character prediction.
- This method can use the attention mechanism to locate the local area of characters to recognize visually similar Chinese characters.
- this method has two main shortcomings: First, it is based on the iterative update of the attention distribution method, which is highly dependent on the prediction results of the previous iteration, which may accumulate initial errors, resulting in limited improvement in recognition accuracy; second, the method uses RNN Multiple iterations, longer training time, and more complicated processes, because the internal mechanism of RNN makes it unable to make full use of GPU parallel computing, and problems such as gradient disappearance or gradient explosion are prone to occur in the back propagation process.
- this application provides a text recognition network model training method, text recognition method, device, terminal, and computer storage medium.
- the feature extraction of the input image is performed through the convolutional neural network, and then the feature is obtained through the attention mechanism module. Distinguishing attention features, after feature fusion, the character category prediction results are obtained.
- a loss function is also designed according to the character category annotations of the input pictures and the character category prediction results, and the attention weight is optimized, thereby improving the accuracy of text recognition It is more robust to the identification of difficult samples.
- FIG. 1 is a schematic flowchart of a text recognition network model training method and a text recognition method provided by an embodiment of the present application, wherein the solid arrow represents the training step, and the dashed arrow represents the recognition step.
- the text recognition network model includes a deep convolutional neural network, a multi-channel attention mechanism module, a comparative attention feature learning branch and a multi-attention feature fusion module.
- Deep Convolutional Neural Network A neural network that can be used for classification.
- the network is mainly composed of a convolutional layer and a pooling layer.
- the convolutional layer is used to extract image features; the role of the pooling layer is to reduce the dimensionality of the feature vector output by the convolutional layer and reduce overfitting.
- the parameters in the network can be updated through the back propagation algorithm.
- the deep convolutional neural network is composed of 14 convolutional layers and 4 pooling layers.
- Attention mechanism module imitating the way humans observe things. Generally speaking, when people look at a picture, they will not only grasp the image as a whole, but also pay more attention to some local information of the picture, such as the position of the table. , The type of goods, etc. In the field of computer vision, the essence of the attention mechanism is to select the information that needs more attention from the input information, and extract the features from the key parts.
- the introduction of the attention mechanism on the one hand, can increase the expressive ability of the model without increasing the complexity of the model; on the other hand, the attention mechanism only selects the input information that is important to the model for processing, which can improve the performance of the neural network. effectiveness.
- Contrast the attention feature learning branch Extracting the global features of the image can classify general objects well, but for the fine-grained classification problem of handwritten Chinese characters, it is necessary to pay attention to the distinguishing local features of the characters.
- the purpose of the learning of the contrast attention feature is to allow the attention mechanism module of multiple channels to locate multiple local regions on the input sample, and train under the supervision of the contrast loss function and the regional center loss function to obtain the distracted attention area, so that the model It is more likely to locate the distinguishing features of characters, thereby reducing the recognition error rate of visually similar characters.
- an embodiment of the present application proposes a text recognition network model training method, which includes the following steps:
- Step S100 Standardize each picture in the original data set, and mark each picture with a character category to obtain a standard training data set with character category labels;
- Step S200 input each picture in the standard training data set into a convolutional neural network, extract convolutional features of the picture, and obtain a depth feature map containing the convolutional features;
- Step S300 input the depth feature map to the attention mechanism module with multiple channels to obtain the attention weight of each channel, and use the attention weight to re-scale each channel of the depth feature map to obtain multiple attention feature maps;
- Step S400 input each attention feature map into the fully connected layer to obtain multiple attention feature vectors
- Step S500 Perform feature fusion of multiple attention feature vectors, and input them to the character class fully connected layer for character class prediction;
- Step S600 According to the result of the character category prediction and the character category labeling, design a target loss function, and use the back propagation algorithm to iterate to minimize the target loss function and optimize the attention weight.
- the convolutional neural network includes 2 convolutional layers (conv1, conv2) and 4 convolution modules
- the Conv-Block is a "bottleneck" structure.
- the number of channels in the middle layer of the 3 convolutional layers is less than that of the upper and lower layers; between each conv-block Connect with the largest pooling layer with a step size of 2, halve the resolution of the input feature map, and finally after 4 convolution modules (Conv-Block), the output size is 6*6*448 depth feature map X i , these deep feature maps X i contain high-level semantic information obtained through 14 convolutional layers.
- step S300 comprises: the size of the last convolution module (Conv-Block) for the output 448 * 6 * 6 characterized in FIG depth X i as input, having a plurality delivered to Attention mechanism module of two channels, calculating attention feature map
- the value of S is 2; the attention mechanism module draws on the channel attention mechanism introduced by the SENet method.
- ⁇ is the Sigmoid function
- ⁇ is the ReLU function
- r is the channel compression ratio
- step S400 specifically includes: inputting the multiple attention feature maps obtained in step S300 to the comparative attention feature learning branch for extracting the attention features of the local distinguishing regions, that is, each attention Feature map Input respectively to the fully connected layer containing 768 neurons:
- the operator F flatt ( ⁇ ) flattens the matrix into a 1-dimensional vector.
- [] represents the cascade operation
- Y i represents the image I i characters belonging to the class corresponding to 3755 points, the highest score as the prediction result of the character category category
- step S600 specifically includes: labeling gt with the character category as the expected output of the network model to predict the result To predict the output of the network model, design the target loss function between the expected output of the network model and the predicted output of the network model, and minimize the cross-entropy loss function L cls during the training process to ensure each attention feature map It can locate the area that is important for character classification; for the comparative attention feature learning branch, take the multiple attention features obtained in step S300 as input, and use the metric learning loss function, that is, the ratio loss function and the regional center loss function , To make the attention feature map of the network model focus on different regions with distinguishing features of the input picture; specifically, the contrast loss function is applied to the attention feature to capture the separable attention regions;
- L cls is the cross-entropy loss function
- L center is the area center loss function used to reduce the distance between the various attention features of the same type of characters
- L contra is the multiple attention feature vectors f i s of the picture I i
- ⁇ is a hyperparameter used to control the weight of the two loss functions
- the contrast loss function is defined as:
- D(I i ) is defined as:
- m is the preset threshold
- the contrast loss function is to zoom out the multiple attention feature vectors f i s of the input image I i in the high-dimensional space, so that the distance between the two vectors is greater than the preset threshold m
- this implementation m is set to 40 to ensure that the local features of the characters located by each attention feature map are different, so that the text recognition network model is more likely to dig out the distinguishing features of the character.
- the regional center loss function is defined as:
- the area center loss function is used to reduce the distance between the attention features of the same type of characters, so that the multiple attention features learned by the same type of characters are similar to each other, so that each attention feature map Are activated in the same character part, where Is the center of the sth attention feature of the y i class, d represents the dimension of the feature, and the attention feature center Initialize it with a Gaussian distribution with a mean of 0 and a variance of 1, and then update the feature center according to the regional center loss function algorithm.
- the back propagation algorithm is used to iterate, and the cross-entropy loss function is minimized during the training process to realize the optimal network model.
- the original data set is used for iterative training during the training process to obtain the parameters of the network model.
- an embodiment of the present application proposes a text recognition method, which uses a text recognition network model trained in the foregoing embodiment of the present application to recognize handwritten Chinese character pictures, including the following steps:
- Step A100 Standardize the picture I i to be tested and scale it to a preset height H and a preset width W;
- Step A200 test image I i to be input convolutional neural network, wherein the convolution extracted image I i to be tested to obtain the depth of the convolutions characteristic feature of FIG X i;
- Step A300 wherein the depth module attentional mechanisms FIG input X i having a plurality of channels, each channel to obtain the right to re-focus, re-using the focus depth weights for each feature map X channel i to obtain a plurality of scaled Note Force map
- Step A400 Convert each attention feature map Input the fully connected layer separately to obtain multiple attention feature vectors f i s ;
- Step A500 Perform feature fusion of multiple attention feature vectors f i s , and input them to the character class fully connected layer for character class prediction.
- step A200 specifically includes: the convolutional neural network includes 2 convolutional layers (conv1, conv2) and 4 convolutional modules, and the image to be tested I i is input into the 2 convolutional layers (conv1, conv2)
- each convolutional layer is followed by a batch normalization layer (Batch Normalization, BN) and a nonlinear activation function ReLU to obtain a feature map with a size of 96*96*64, and then input the feature map to the step size
- the maximum pooling layer of 2 is sampled to obtain a 48*48*64 feature map, and then the feature map is input into 4 convolution modules (Conv-Block), and each convolution module is composed of 3 convolution kernels.
- a depth size of the output characteristic of FIG. 6 * 448 6 * X i, X i comprises a depth wherein FIG. 14 through convolution advanced layers obtained Semantic information.
- ⁇ is the Sigmoid function
- ⁇ is the ReLU function
- r is the channel compression ratio
- step A400 specifically includes: inputting the multiple attention feature maps obtained in step A300 to the comparative attention feature learning branch for extracting the attention features of the local distinguishing regions, that is, each attention Feature map Input respectively to the fully connected layer containing 768 neurons:
- the operator F flatt ( ⁇ ) flattens the matrix into a 1-dimensional vector.
- [ ⁇ ] represents the cascade operation
- Y i represents the corresponding score of the 3755 types of Chinese characters in the picture I i to be tested
- the category with the highest score is the predicted result of the character category
- an embodiment of the present application provides a text recognition network model training device 100, including: a memory 101, a processor 102, and a computer program stored in the memory and running on the processor, the processor
- the text recognition network model training method in the foregoing embodiment is implemented, for example, the steps S100 to S600 of the method in FIG. 2 described above are executed.
- the processor 102 and the memory 101 may be connected by a bus or in other ways. In FIG. 7, the connection by a bus is taken as an example.
- an embodiment of the present application provides a text recognition device 200, including: a memory 201, a processor 202, and a computer program stored in the memory and running on the processor, and the processor executes the
- the computer program implements the character recognition method in the above embodiment, for example, executes the steps A100 to A500 of the method in FIG. 6 described above.
- the processor 202 and the memory 201 may be connected by a bus or in other ways. In FIG. 8, the connection by a bus is taken as an example.
- An embodiment of the present application also provides a terminal, which includes the text recognition network model training device 100 described in the foregoing embodiment or includes the text recognition device 200 described in the foregoing embodiment.
- the terminal can be any type of smart terminal, such as a smart phone, a tablet computer, a laptop computer, or a desktop computer.
- an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, as shown in FIG.
- Execution by one of the processors 102 in 7 can enable the aforementioned processor 102 to execute the text recognition network model training method in the aforementioned embodiment, for example, to execute the steps S100 to S600 of the method in FIG. 2 described above.
- execution by a processor 202 in FIG. 8 can cause the processor 202 to execute the character recognition method in the foregoing embodiment, for example, to execute the steps A100 to A500 of the method in FIG. 6 described above.
- the input picture is feature extracted through the convolutional neural network, and then the distinguishing attention feature is obtained through the attention mechanism module, and the character category prediction result is obtained after feature fusion.
- a loss function is also designed according to the character category annotation of the input picture and the character category prediction result, and the attention weight is optimized, so as to improve the accuracy of character recognition, and the recognition of difficult samples is more robust.
- computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
- Information such as computer-readable instructions, data structures, program modules, or other data.
- Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
- communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (18)
- 一种文字识别网络模型训练方法,包括以下步骤:将原始数据集中的每张图片进行标准化,并对每张图片进行字符类别标注,得到带字符类别标注的标准训练数据集;将所述标准训练数据集中的每张图片输入卷积神经网络,提取图片的卷积特征,得到包含所述卷积特征的深度特征图;将所述深度特征图输入具有多个通道的注意力机制模块,得到每个通道的注意力权重,使用所述注意力权重重新对所述深度特征图的每个通道进行缩放获得多个注意力特征图;将每一个所述注意力特征图分别输入全连接层,得到多个注意力特征向量;将多个所述注意力特征向量进行特征融合,输入到字符类全连接层进行字符类别预测;根据所述字符类别预测的结果和所述字符类别标注,设计目标损失函数,利用反向传播算法进行迭代,最小化所述目标损失函数,优化所述注意力权重。
- 根据权利要求1所述的一种文字识别网络模型训练方法,其中,所述将原始数据集中的每张图片进行标准化,包括:统计所述原始数据集中每张图片I i(i=1,···,N)的均值和方差,将每张图片的高度和宽度缩放到预设高度H和预设宽度W,其中N为所述原始数据集中的图片数量。
- 根据权利要求2所述的一种文字识别网络模型训练方法,其中,所述卷积神经网络包括若干卷积层和若干卷积模块;所述将所述标准训练数据集中的每张图片输入卷积神经网络,提取图片的卷积特征,得到包含所述卷积特征的深度特征图,包括:将标准化的图片I i(i=1,···,N)分别输入若干卷积层中,每个卷积层后均接一个批归一化层和非线性激活函数ReLU,然后输入到最大池化层进行采样,再输入到若干所述卷积模块中,每个卷积模块由若干数量相同的卷积层和批归一化层构成,每个批归一化层均跟在每个卷积层之后,每个卷积模块之间用最大池化层相连接,最后一个所述卷积模块输出包含卷积特征的深度特征图X i。
- 根据权利要求1或3所述的一种文字识别网络模型训练方法,其中,所述注意力权重通过以下步骤获得:所述注意力机制模块使用全局平均池化在空间维度上汇集输入的所述深度特征图以生成通道描述子,使用带有Sigmoid激活的门控机制处理所述通道描述子得到每个通道的注意力权重。
- 根据权利要求3所述的一种文字识别网络模型训练方法,其中,所述将所述深度特征图输入具有多个通道的注意力机制模块,得到每个通道的注意力权重,使用所述注意力权重重新对所述深度特征图的每个通道进行缩放获得多个注意力特征图,包括:所述注意力机制模块使用全局平化池在H×W的空间维度上汇集输入的所述深度特征图X i, 以生成通道描述子z s=[z 1,···,z C],其中z s的第c个元素z c的计算方法是:其中s=1,···,S,S为注意力机制模块的数量;其中c=1,···,C,C为通道数量;在z s上使用带有Sigmoid激活的门控机制,处理所述通道描述子,得到每个所述注意力机制模块的注意力权重:
- 根据权利要求6所述的一种文字识别网络模型训练方法,其中,所述将多个所述注意力特征向量进行特征融合,输入到字符类全连接层进行字符类别预测,包括:将多个所述注意力特征向量f i s(s=1,···,S)进行特征融合,再输入到字符类全连接层进行字符类别预测:Y i=soft max(W·[f i 1,···,f i S])其中,[·]表示级联操作,Y i表示图片I i属于字符类别的对应得分,得分最高的类别为字符类别预测的结果。
- 根据权利要求7所述的一种文字识别网络模型训练方法,其中,所述根据所述字符类别预 测的结果和所述字符类别标注,设计目标损失函数,利用反向传播算法进行迭代,最小化所述目标损失函数,优化所述注意力权重,包括:定义目标损失函数为:L total=L cls+λ(L center+L contra)其中L cls为交叉熵损失函数,L center为用于减少同一类字符的各个注意力特征之间的距离的区域中心损失函数,L contra为将图片I i的多个所述注意力特征向量f i s在高维空间的拉远的对比度损失函数,λ为用于控制两种损失函数所占的权重的超参数;对比度损失函数定义为:其中D(I i)定义为:其中m为预设阈值;区域中心损失函数定义为:根据所述目标损失函数,利用反向传播算法进行迭代,最小化所述交叉熵损失函数,优化所述注意力权重。
- 一种文字识别方法,包括:将待测试图片进行标准化,缩放到预设高度H和预设宽度W;将所述待测试图片输入卷积神经网络,提取所述待测试图片的卷积特征,得到包含所述卷积特征的深度特征图;将所述深度特征图输入具有多个通道的注意力机制模块,得到每个通道的注意力权重,使用所述注意力权重重新对所述深度特征图的每个通道进行缩放获得多个注意力特征图;将每一个所述注意力特征图分别输入全连接层,得到多个注意力特征向量;将多个所述注意力特征向量进行特征融合,输入到字符类全连接层进行字符类别预测。
- 根据权利要求9所述的一种文字识别方法,其中,所述卷积神经网络包括若干卷积层和若干卷积模块;所述待测试图片输入卷积神经网络,提取所述待测试图片的卷积特征,得到包含所述卷积特征 的深度特征图,包括:将所述待测试图片I i输入所述若干卷积层中,每个卷积层后均接一个批归一化层和非线性激活函数ReLU,然后输入到最大池化层进行采样,再输入到所述若干卷积模块中,每个卷积模块由若干数量相同的卷积层和批归一化层构成,每个批归一化层均跟在每个卷积层之后,每个所述卷积模块之间用最大池化层相连接,最后一个所述卷积模块输出包含卷积特征的深度特征图X i。
- 根据权利要求9或10所述的一种文字识别方法,其中,所述注意力权重通过以下步骤获得:所述注意力机制模块使用全局平均池化在空间维度上汇集输入的所述深度特征图以生成通道描述子,使用带有Sigmoid激活的门控机制处理所述通道描述子得到每个通道的注意力权重。
- 根据权利要求10所述的一种文字识别方法,其中,所述将所述深度特征图输入具有多个通道的注意力机制模块,得到每个通道的注意力权重,使用所述注意力权重重新对所述深度特征图的每个通道进行缩放获得多个注意力特征图,包括:所述注意力机制模块使用全局平化池在H×W的空间维度上汇集输入的所述深度特征图X i,以生成通道描述子z s=[z 1,···,z C],其中z s的第c个元素z c的计算方法是:其中s=1,···,S,S为注意力机制模块的数量;其中c=1,···,C,C为通道数量;在z s上使用带有Sigmoid激活的门控机制,处理所述通道描述子,得到每个注意力机制模块的注意力权重:
- 根据权利要求13所述的一种文字识别方法,其中,所述将多个所述注意力特征向量进行特征融合,输入到字符类全连接层进行字符类别预测,包括:将多个所述注意力特征向量f i s(s=1,···,S)进行特征融合,再输入到字符类全连接层进行字符类别预测:Y i=soft max(W·[f i 1,···,f i S])其中,[·]表示级联操作,Y i表示图片I i属于字符类别的对应得分,得分最高的类别为字符类别预测的结果。
- 一种文字识别网络模型训练装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至8中任意一项所述的文字识别网络模型训练方法。
- 一种文字识别装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求9至14中任意一项所述的文字识别方法。
- 一种终端,包括如权利要求15所述的文字识别网络模型训练装置或者包括如权利要求16所述的文字识别装置。
- 一种计算机存储介质,存储有计算机可执行指令,其中,所述计算机可执行指令用于执行权利要求1至8中任意一项所述的文字识别网络模型训练方法或用于执行权利要求9至14中任意一项所述的文字识别方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911253120.1A CN113033249A (zh) | 2019-12-09 | 2019-12-09 | 文字识别方法、装置、终端及其计算机存储介质 |
CN201911253120.1 | 2019-12-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021115159A1 true WO2021115159A1 (zh) | 2021-06-17 |
Family
ID=76329519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/133116 WO2021115159A1 (zh) | 2019-12-09 | 2020-12-01 | 文字识别网络模型训练方法、文字识别方法、装置、终端及其计算机存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113033249A (zh) |
WO (1) | WO2021115159A1 (zh) |
Cited By (73)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113421318A (zh) * | 2021-06-30 | 2021-09-21 | 合肥高维数据技术有限公司 | 一种基于多任务生成对抗网络的字体风格迁移方法和系统 |
CN113469335A (zh) * | 2021-06-29 | 2021-10-01 | 杭州中葳数字科技有限公司 | 一种利用不同卷积层特征间关系为特征分配权重的方法 |
CN113487013A (zh) * | 2021-06-29 | 2021-10-08 | 杭州中葳数字科技有限公司 | 一种基于注意力机制的排序分组卷积方法 |
CN113538675A (zh) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | 一种为激光点云计算注意力权重的神经网络及训练方法 |
CN113569727A (zh) * | 2021-07-27 | 2021-10-29 | 广东电网有限责任公司 | 遥感影像中施工场地的识别方法、系统、终端及介质 |
CN113610045A (zh) * | 2021-08-20 | 2021-11-05 | 大连理工大学 | 深度特征集成学习的遥感图像目标识别泛化性方法 |
CN113627590A (zh) * | 2021-07-29 | 2021-11-09 | 中汽创智科技有限公司 | 一种卷积神经网络的注意力模块、注意力机制及卷积神经网络 |
CN113673451A (zh) * | 2021-08-25 | 2021-11-19 | 上海鹏冠生物医药科技有限公司 | 一种用于组织细胞学病理片图像特征抽取的图卷积模块 |
CN113688830A (zh) * | 2021-08-13 | 2021-11-23 | 湖北工业大学 | 基于中心点回归的深度学习目标检测方法 |
CN113705733A (zh) * | 2021-09-29 | 2021-11-26 | 平安医疗健康管理股份有限公司 | 医疗票据图像处理方法及装置、电子设备、存储介质 |
CN113705344A (zh) * | 2021-07-21 | 2021-11-26 | 西安交通大学 | 基于全手掌的掌纹识别方法、装置、终端设备及存储介质 |
CN113705568A (zh) * | 2021-08-27 | 2021-11-26 | 深圳市商汤科技有限公司 | 文字识别网络训练方法、装置、计算机设备及存储介质 |
CN113762357A (zh) * | 2021-08-18 | 2021-12-07 | 江苏大学 | 基于深度学习的智能药房处方检查方法 |
CN113763965A (zh) * | 2021-08-26 | 2021-12-07 | 江苏大学 | 一种多重注意力特征融合的说话人识别方法 |
CN113763412A (zh) * | 2021-09-08 | 2021-12-07 | 理光软件研究所(北京)有限公司 | 图像处理方法、装置及电子设备、计算机可读存储介质 |
CN113780170A (zh) * | 2021-09-10 | 2021-12-10 | 昭通亮风台信息科技有限公司 | 基于ssd深度学习网络的火灾检测识别方法、系统及火灾报警方法 |
CN113793627A (zh) * | 2021-08-11 | 2021-12-14 | 华南师范大学 | 一种基于注意力的多尺度卷积语音情感识别方法及装置 |
CN113836850A (zh) * | 2021-11-26 | 2021-12-24 | 成都数之联科技有限公司 | 模型获得方法及系统及装置及介质及产品缺陷检测方法 |
CN113850741A (zh) * | 2021-10-10 | 2021-12-28 | 杭州知存智能科技有限公司 | 图像降噪方法、装置、电子设备以及存储介质 |
CN113869426A (zh) * | 2021-09-29 | 2021-12-31 | 北京搜狗科技发展有限公司 | 一种公式识别方法及装置 |
CN113963352A (zh) * | 2021-09-22 | 2022-01-21 | 支付宝(杭州)信息技术有限公司 | 识别图片和训练神经网络的方法和装置 |
CN113989541A (zh) * | 2021-09-23 | 2022-01-28 | 神思电子技术股份有限公司 | 一种基于特征聚合的着装分类方法及系统 |
CN114037600A (zh) * | 2021-10-11 | 2022-02-11 | 长沙理工大学 | 一种基于新注意力机制的新CycleGAN风格迁移网络 |
CN114049634A (zh) * | 2022-01-12 | 2022-02-15 | 深圳思谋信息科技有限公司 | 一种图像识别方法、装置、计算机设备和存储介质 |
CN114119979A (zh) * | 2021-12-06 | 2022-03-01 | 西安电子科技大学 | 基于分割掩码和自注意神经网络的细粒度图像分类方法 |
CN114119997A (zh) * | 2021-11-26 | 2022-03-01 | 腾讯科技(深圳)有限公司 | 图像特征提取模型的训练方法、装置、服务器和存储介质 |
CN114118415A (zh) * | 2021-11-29 | 2022-03-01 | 暨南大学 | 一种轻量级瓶颈注意力机制的深度学习方法 |
CN114140357A (zh) * | 2021-12-02 | 2022-03-04 | 哈尔滨工程大学 | 一种基于协同注意力机制的多时相遥感图像云区重建方法 |
CN114140873A (zh) * | 2021-11-09 | 2022-03-04 | 武汉众智数字技术有限公司 | 一种基于卷积神经网络多层次特征的步态识别方法 |
CN114140685A (zh) * | 2021-11-11 | 2022-03-04 | 国网福建省电力有限公司 | 一种自适应环境的变电站仪表读数识别方法、设备和介质 |
CN114220178A (zh) * | 2021-12-16 | 2022-03-22 | 重庆傲雄在线信息技术有限公司 | 基于通道注意力机制的签名鉴别系统及方法 |
CN114220012A (zh) * | 2021-12-16 | 2022-03-22 | 池明旻 | 一种基于深度自注意力网络的纺织品棉麻鉴别方法 |
CN114266938A (zh) * | 2021-12-23 | 2022-04-01 | 南京邮电大学 | 一种基于多模态信息和全局注意力机制的场景识别方法 |
CN114429633A (zh) * | 2022-01-28 | 2022-05-03 | 北京百度网讯科技有限公司 | 文本识别方法、模型的训练方法、装置、电子设备及介质 |
CN114445299A (zh) * | 2022-01-28 | 2022-05-06 | 南京邮电大学 | 一种基于注意力分配机制的双残差去噪方法 |
CN114530210A (zh) * | 2022-01-06 | 2022-05-24 | 山东师范大学 | 药物分子筛选方法及系统 |
CN114566216A (zh) * | 2022-02-25 | 2022-05-31 | 桂林电子科技大学 | 一种基于注意力机制的剪接位点预测及解释性方法 |
CN114612791A (zh) * | 2022-05-11 | 2022-06-10 | 西南民族大学 | 一种基于改进注意力机制的目标检测方法及装置 |
CN114639169A (zh) * | 2022-03-28 | 2022-06-17 | 合肥工业大学 | 基于注意力机制特征融合与位置无关的人体动作识别系统 |
CN114694211A (zh) * | 2022-02-24 | 2022-07-01 | 合肥工业大学 | 非接触式多生理参数的同步检测方法和系统 |
CN114724219A (zh) * | 2022-04-11 | 2022-07-08 | 辽宁师范大学 | 一种基于注意力遮挡机制的表情识别方法 |
CN114757511A (zh) * | 2022-03-31 | 2022-07-15 | 广州市赛皓达智能科技有限公司 | 一种基于深度学习的电网施工进度与安全识别方法 |
CN114881011A (zh) * | 2022-07-12 | 2022-08-09 | 中国人民解放军国防科技大学 | 多通道中文文本更正方法、装置、计算机设备和存储介质 |
CN114973222A (zh) * | 2021-12-20 | 2022-08-30 | 西北工业大学宁波研究院 | 基于显式监督注意力机制的场景文本识别方法 |
CN114998482A (zh) * | 2022-06-13 | 2022-09-02 | 厦门大学 | 文字艺术图案智能生成方法 |
CN115034256A (zh) * | 2022-05-05 | 2022-09-09 | 上海大学 | 基于深度学习的近地面目标声震信号分类识别系统及方法 |
CN115147297A (zh) * | 2022-06-09 | 2022-10-04 | 浙江华睿科技股份有限公司 | 一种图像处理方法及装置 |
CN115251948A (zh) * | 2022-07-14 | 2022-11-01 | 深圳未来脑律科技有限公司 | 一种双模态运动想象的分类识别方法、系统和存储介质 |
CN115439849A (zh) * | 2022-09-30 | 2022-12-06 | 杭州电子科技大学 | 基于动态多策略gan网络的仪表数字识别方法及系统 |
CN115471851A (zh) * | 2022-10-11 | 2022-12-13 | 小语智能信息科技(云南)有限公司 | 融合双重注意力机制的缅甸语图像文本识别方法及装置 |
CN115568860A (zh) * | 2022-09-30 | 2023-01-06 | 厦门大学 | 基于双注意力机制的十二导联心电信号的自动分类方法 |
CN115658865A (zh) * | 2022-10-26 | 2023-01-31 | 茅台学院 | 一种基于注意力预训练的图片问答方法 |
CN115993365A (zh) * | 2023-03-23 | 2023-04-21 | 山东省科学院激光研究所 | 一种基于深度学习的皮带缺陷检测方法及系统 |
CN116052154A (zh) * | 2023-04-03 | 2023-05-02 | 中科南京软件技术研究院 | 一种基于语义增强与图推理的场景文本识别方法 |
CN116246331A (zh) * | 2022-12-05 | 2023-06-09 | 苏州大学 | 一种圆锥角膜自动分级方法、装置及存储介质 |
CN116259067A (zh) * | 2023-05-15 | 2023-06-13 | 济南大学 | 一种高精度识别pid图纸符号的方法 |
CN116405310A (zh) * | 2023-04-28 | 2023-07-07 | 北京宏博知微科技有限公司 | 一种网络数据安全监测方法及系统 |
CN116563615A (zh) * | 2023-04-21 | 2023-08-08 | 南京讯思雅信息科技有限公司 | 基于改进多尺度注意力机制的不良图片分类方法 |
CN116597258A (zh) * | 2023-07-18 | 2023-08-15 | 华东交通大学 | 一种基于多尺度特征融合的矿石分选模型训练方法及系统 |
CN116934733A (zh) * | 2023-08-04 | 2023-10-24 | 湖南恩智测控技术有限公司 | 一种芯片的可靠性测试方法及测试系统 |
WO2023202543A1 (zh) * | 2022-04-18 | 2023-10-26 | 北京字跳网络技术有限公司 | 文字处理方法、装置、电子设备及存储介质 |
CN116993679A (zh) * | 2023-06-30 | 2023-11-03 | 芜湖合德传动科技有限公司 | 一种基于目标检测的伸缩机皮带磨损检测方法 |
CN117036891A (zh) * | 2023-08-22 | 2023-11-10 | 睿尔曼智能科技(北京)有限公司 | 一种基于跨模态特征融合的图像识别方法及系统 |
CN117037173A (zh) * | 2023-09-22 | 2023-11-10 | 武汉纺织大学 | 一种二阶段的英文字符检测与识别方法及系统 |
CN117079295A (zh) * | 2023-09-19 | 2023-11-17 | 中航西安飞机工业集团股份有限公司 | 一种航空电缆张力计指针识别与读数方法及系统 |
CN117173716A (zh) * | 2023-09-01 | 2023-12-05 | 湖南天桥嘉成智能科技有限公司 | 一种基于深度学习的高温板坯id字符识别方法和系统 |
WO2024022060A1 (zh) * | 2022-07-28 | 2024-02-01 | 杭州堃博生物科技有限公司 | 一种图像配准方法、装置及存储介质 |
CN117523685A (zh) * | 2023-11-15 | 2024-02-06 | 中国矿业大学 | 基于非对称对比融合的双模态生物特征识别方法及系统 |
CN117573810A (zh) * | 2024-01-15 | 2024-02-20 | 腾讯烟台新工科研究院 | 一种多语言产品包装说明书文字识别查询方法及系统 |
CN117593610A (zh) * | 2024-01-17 | 2024-02-23 | 上海秋葵扩视仪器有限公司 | 图像识别网络训练及部署、识别方法、装置、设备及介质 |
CN117809314A (zh) * | 2023-11-21 | 2024-04-02 | 中化现代农业有限公司 | 文字识别方法、装置、电子设备和存储介质 |
CN118279679A (zh) * | 2024-06-04 | 2024-07-02 | 深圳大学 | 基于深度学习模型的图像分类方法、图像分类设备及介质 |
CN118429733A (zh) * | 2024-07-05 | 2024-08-02 | 湖南大学 | 一种多头注意力驱动的厨余垃圾多标签分类方法 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364860B (zh) * | 2020-11-05 | 2024-06-25 | 北京字跳网络技术有限公司 | 字符识别模型的训练方法、装置和电子设备 |
CN113326833B (zh) * | 2021-08-04 | 2021-11-16 | 浩鲸云计算科技股份有限公司 | 一种基于中心损失的文字识别改进训练方法 |
CN113610164B (zh) * | 2021-08-10 | 2023-12-22 | 北京邮电大学 | 一种基于注意力平衡的细粒度图像识别方法及其系统 |
CN113657534B (zh) * | 2021-08-24 | 2024-07-05 | 北京经纬恒润科技股份有限公司 | 一种基于注意力机制的分类方法及装置 |
CN113741528B (zh) * | 2021-09-13 | 2023-05-23 | 中国人民解放军国防科技大学 | 一种面向多无人机碰撞规避的深度强化学习训练加速方法 |
CN114898345A (zh) * | 2021-12-13 | 2022-08-12 | 华东师范大学 | 一种阿拉伯语文本识别方法及系统 |
CN114580542A (zh) * | 2022-03-07 | 2022-06-03 | 京东科技信息技术有限公司 | 生成可供性检测模型的方法、装置、设备及存储介质 |
CN114677661B (zh) * | 2022-03-24 | 2024-10-18 | 智道网联科技(北京)有限公司 | 一种路侧标识识别方法、装置和电子设备 |
CN114743206B (zh) * | 2022-05-17 | 2023-10-27 | 北京百度网讯科技有限公司 | 文本检测方法、模型训练方法、装置、电子设备 |
CN116432521B (zh) * | 2023-03-21 | 2023-11-03 | 浙江大学 | 一种基于多模态重建约束的手写汉字识别和检索方法 |
CN118072973B (zh) * | 2024-04-15 | 2024-06-28 | 慧医谷中医药科技(天津)股份有限公司 | 基于医学知识库的智能问诊方法与系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368831A (zh) * | 2017-07-19 | 2017-11-21 | 中国人民解放军国防科学技术大学 | 一种自然场景图像中的英文文字和数字识别方法 |
US20190114770A1 (en) * | 2017-10-13 | 2019-04-18 | Shenzhen Keya Medical Technology Corporation | Systems and methods for detecting cancer metastasis using a neural network |
CN110097049A (zh) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | 一种自然场景文本检测方法及系统 |
CN110334705A (zh) * | 2019-06-25 | 2019-10-15 | 华中科技大学 | 一种结合全局和局部信息的场景文本图像的语种识别方法 |
-
2019
- 2019-12-09 CN CN201911253120.1A patent/CN113033249A/zh active Pending
-
2020
- 2020-12-01 WO PCT/CN2020/133116 patent/WO2021115159A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368831A (zh) * | 2017-07-19 | 2017-11-21 | 中国人民解放军国防科学技术大学 | 一种自然场景图像中的英文文字和数字识别方法 |
US20190114770A1 (en) * | 2017-10-13 | 2019-04-18 | Shenzhen Keya Medical Technology Corporation | Systems and methods for detecting cancer metastasis using a neural network |
CN110097049A (zh) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | 一种自然场景文本检测方法及系统 |
CN110334705A (zh) * | 2019-06-25 | 2019-10-15 | 华中科技大学 | 一种结合全局和局部信息的场景文本图像的语种识别方法 |
Cited By (104)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113469335B (zh) * | 2021-06-29 | 2024-05-10 | 杭州中葳数字科技有限公司 | 一种利用不同卷积层特征间关系为特征分配权重的方法 |
CN113469335A (zh) * | 2021-06-29 | 2021-10-01 | 杭州中葳数字科技有限公司 | 一种利用不同卷积层特征间关系为特征分配权重的方法 |
CN113487013A (zh) * | 2021-06-29 | 2021-10-08 | 杭州中葳数字科技有限公司 | 一种基于注意力机制的排序分组卷积方法 |
CN113487013B (zh) * | 2021-06-29 | 2024-05-07 | 杭州中葳数字科技有限公司 | 一种基于注意力机制的排序分组卷积方法 |
CN113538675A (zh) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | 一种为激光点云计算注意力权重的神经网络及训练方法 |
CN113421318A (zh) * | 2021-06-30 | 2021-09-21 | 合肥高维数据技术有限公司 | 一种基于多任务生成对抗网络的字体风格迁移方法和系统 |
CN113421318B (zh) * | 2021-06-30 | 2022-10-28 | 合肥高维数据技术有限公司 | 一种基于多任务生成对抗网络的字体风格迁移方法和系统 |
CN113705344A (zh) * | 2021-07-21 | 2021-11-26 | 西安交通大学 | 基于全手掌的掌纹识别方法、装置、终端设备及存储介质 |
CN113569727A (zh) * | 2021-07-27 | 2021-10-29 | 广东电网有限责任公司 | 遥感影像中施工场地的识别方法、系统、终端及介质 |
CN113627590A (zh) * | 2021-07-29 | 2021-11-09 | 中汽创智科技有限公司 | 一种卷积神经网络的注意力模块、注意力机制及卷积神经网络 |
CN113793627B (zh) * | 2021-08-11 | 2023-12-29 | 华南师范大学 | 一种基于注意力的多尺度卷积语音情感识别方法及装置 |
CN113793627A (zh) * | 2021-08-11 | 2021-12-14 | 华南师范大学 | 一种基于注意力的多尺度卷积语音情感识别方法及装置 |
CN113688830A (zh) * | 2021-08-13 | 2021-11-23 | 湖北工业大学 | 基于中心点回归的深度学习目标检测方法 |
CN113688830B (zh) * | 2021-08-13 | 2024-04-26 | 湖北工业大学 | 基于中心点回归的深度学习目标检测方法 |
CN113762357A (zh) * | 2021-08-18 | 2021-12-07 | 江苏大学 | 基于深度学习的智能药房处方检查方法 |
CN113762357B (zh) * | 2021-08-18 | 2024-05-14 | 江苏大学 | 基于深度学习的智能药房处方检查方法 |
CN113610045A (zh) * | 2021-08-20 | 2021-11-05 | 大连理工大学 | 深度特征集成学习的遥感图像目标识别泛化性方法 |
CN113673451A (zh) * | 2021-08-25 | 2021-11-19 | 上海鹏冠生物医药科技有限公司 | 一种用于组织细胞学病理片图像特征抽取的图卷积模块 |
CN113763965A (zh) * | 2021-08-26 | 2021-12-07 | 江苏大学 | 一种多重注意力特征融合的说话人识别方法 |
CN113763965B (zh) * | 2021-08-26 | 2023-12-19 | 江苏大学 | 一种多重注意力特征融合的说话人识别方法 |
CN113705568A (zh) * | 2021-08-27 | 2021-11-26 | 深圳市商汤科技有限公司 | 文字识别网络训练方法、装置、计算机设备及存储介质 |
CN113763412A (zh) * | 2021-09-08 | 2021-12-07 | 理光软件研究所(北京)有限公司 | 图像处理方法、装置及电子设备、计算机可读存储介质 |
CN113780170A (zh) * | 2021-09-10 | 2021-12-10 | 昭通亮风台信息科技有限公司 | 基于ssd深度学习网络的火灾检测识别方法、系统及火灾报警方法 |
CN113963352A (zh) * | 2021-09-22 | 2022-01-21 | 支付宝(杭州)信息技术有限公司 | 识别图片和训练神经网络的方法和装置 |
CN113989541A (zh) * | 2021-09-23 | 2022-01-28 | 神思电子技术股份有限公司 | 一种基于特征聚合的着装分类方法及系统 |
CN113705733A (zh) * | 2021-09-29 | 2021-11-26 | 平安医疗健康管理股份有限公司 | 医疗票据图像处理方法及装置、电子设备、存储介质 |
CN113869426A (zh) * | 2021-09-29 | 2021-12-31 | 北京搜狗科技发展有限公司 | 一种公式识别方法及装置 |
CN113850741A (zh) * | 2021-10-10 | 2021-12-28 | 杭州知存智能科技有限公司 | 图像降噪方法、装置、电子设备以及存储介质 |
CN114037600A (zh) * | 2021-10-11 | 2022-02-11 | 长沙理工大学 | 一种基于新注意力机制的新CycleGAN风格迁移网络 |
CN114140873A (zh) * | 2021-11-09 | 2022-03-04 | 武汉众智数字技术有限公司 | 一种基于卷积神经网络多层次特征的步态识别方法 |
CN114140685A (zh) * | 2021-11-11 | 2022-03-04 | 国网福建省电力有限公司 | 一种自适应环境的变电站仪表读数识别方法、设备和介质 |
CN114119997A (zh) * | 2021-11-26 | 2022-03-01 | 腾讯科技(深圳)有限公司 | 图像特征提取模型的训练方法、装置、服务器和存储介质 |
CN113836850A (zh) * | 2021-11-26 | 2021-12-24 | 成都数之联科技有限公司 | 模型获得方法及系统及装置及介质及产品缺陷检测方法 |
CN114118415A (zh) * | 2021-11-29 | 2022-03-01 | 暨南大学 | 一种轻量级瓶颈注意力机制的深度学习方法 |
CN114140357A (zh) * | 2021-12-02 | 2022-03-04 | 哈尔滨工程大学 | 一种基于协同注意力机制的多时相遥感图像云区重建方法 |
CN114140357B (zh) * | 2021-12-02 | 2024-04-19 | 哈尔滨工程大学 | 一种基于协同注意力机制的多时相遥感图像云区重建方法 |
CN114119979A (zh) * | 2021-12-06 | 2022-03-01 | 西安电子科技大学 | 基于分割掩码和自注意神经网络的细粒度图像分类方法 |
CN114220012A (zh) * | 2021-12-16 | 2022-03-22 | 池明旻 | 一种基于深度自注意力网络的纺织品棉麻鉴别方法 |
CN114220012B (zh) * | 2021-12-16 | 2024-05-31 | 池明旻 | 一种基于深度自注意力网络的纺织品棉麻鉴别方法 |
CN114220178A (zh) * | 2021-12-16 | 2022-03-22 | 重庆傲雄在线信息技术有限公司 | 基于通道注意力机制的签名鉴别系统及方法 |
CN114973222B (zh) * | 2021-12-20 | 2024-05-10 | 西北工业大学宁波研究院 | 基于显式监督注意力机制的场景文本识别方法 |
CN114973222A (zh) * | 2021-12-20 | 2022-08-30 | 西北工业大学宁波研究院 | 基于显式监督注意力机制的场景文本识别方法 |
CN114266938A (zh) * | 2021-12-23 | 2022-04-01 | 南京邮电大学 | 一种基于多模态信息和全局注意力机制的场景识别方法 |
CN114530210A (zh) * | 2022-01-06 | 2022-05-24 | 山东师范大学 | 药物分子筛选方法及系统 |
CN114049634A (zh) * | 2022-01-12 | 2022-02-15 | 深圳思谋信息科技有限公司 | 一种图像识别方法、装置、计算机设备和存储介质 |
CN114429633B (zh) * | 2022-01-28 | 2023-10-27 | 北京百度网讯科技有限公司 | 文本识别方法、模型的训练方法、装置、电子设备及介质 |
CN114429633A (zh) * | 2022-01-28 | 2022-05-03 | 北京百度网讯科技有限公司 | 文本识别方法、模型的训练方法、装置、电子设备及介质 |
CN114445299A (zh) * | 2022-01-28 | 2022-05-06 | 南京邮电大学 | 一种基于注意力分配机制的双残差去噪方法 |
CN114694211B (zh) * | 2022-02-24 | 2024-04-19 | 合肥工业大学 | 非接触式多生理参数的同步检测方法和系统 |
CN114694211A (zh) * | 2022-02-24 | 2022-07-01 | 合肥工业大学 | 非接触式多生理参数的同步检测方法和系统 |
CN114566216B (zh) * | 2022-02-25 | 2024-04-02 | 桂林电子科技大学 | 一种基于注意力机制的剪接位点预测及解释性方法 |
CN114566216A (zh) * | 2022-02-25 | 2022-05-31 | 桂林电子科技大学 | 一种基于注意力机制的剪接位点预测及解释性方法 |
CN114639169A (zh) * | 2022-03-28 | 2022-06-17 | 合肥工业大学 | 基于注意力机制特征融合与位置无关的人体动作识别系统 |
CN114639169B (zh) * | 2022-03-28 | 2024-02-20 | 合肥工业大学 | 基于注意力机制特征融合与位置无关的人体动作识别系统 |
CN114757511A (zh) * | 2022-03-31 | 2022-07-15 | 广州市赛皓达智能科技有限公司 | 一种基于深度学习的电网施工进度与安全识别方法 |
CN114724219A (zh) * | 2022-04-11 | 2022-07-08 | 辽宁师范大学 | 一种基于注意力遮挡机制的表情识别方法 |
CN114724219B (zh) * | 2022-04-11 | 2024-05-31 | 辽宁师范大学 | 一种基于注意力遮挡机制的表情识别方法 |
WO2023202543A1 (zh) * | 2022-04-18 | 2023-10-26 | 北京字跳网络技术有限公司 | 文字处理方法、装置、电子设备及存储介质 |
CN115034256A (zh) * | 2022-05-05 | 2022-09-09 | 上海大学 | 基于深度学习的近地面目标声震信号分类识别系统及方法 |
CN114612791A (zh) * | 2022-05-11 | 2022-06-10 | 西南民族大学 | 一种基于改进注意力机制的目标检测方法及装置 |
CN114612791B (zh) * | 2022-05-11 | 2022-07-29 | 西南民族大学 | 一种基于改进注意力机制的目标检测方法及装置 |
CN115147297A (zh) * | 2022-06-09 | 2022-10-04 | 浙江华睿科技股份有限公司 | 一种图像处理方法及装置 |
CN114998482A (zh) * | 2022-06-13 | 2022-09-02 | 厦门大学 | 文字艺术图案智能生成方法 |
CN114881011A (zh) * | 2022-07-12 | 2022-08-09 | 中国人民解放军国防科技大学 | 多通道中文文本更正方法、装置、计算机设备和存储介质 |
CN114881011B (zh) * | 2022-07-12 | 2022-09-23 | 中国人民解放军国防科技大学 | 多通道中文文本更正方法、装置、计算机设备和存储介质 |
CN115251948A (zh) * | 2022-07-14 | 2022-11-01 | 深圳未来脑律科技有限公司 | 一种双模态运动想象的分类识别方法、系统和存储介质 |
WO2024022060A1 (zh) * | 2022-07-28 | 2024-02-01 | 杭州堃博生物科技有限公司 | 一种图像配准方法、装置及存储介质 |
CN115568860A (zh) * | 2022-09-30 | 2023-01-06 | 厦门大学 | 基于双注意力机制的十二导联心电信号的自动分类方法 |
CN115439849B (zh) * | 2022-09-30 | 2023-09-08 | 杭州电子科技大学 | 基于动态多策略gan网络的仪表数字识别方法及系统 |
CN115439849A (zh) * | 2022-09-30 | 2022-12-06 | 杭州电子科技大学 | 基于动态多策略gan网络的仪表数字识别方法及系统 |
CN115471851A (zh) * | 2022-10-11 | 2022-12-13 | 小语智能信息科技(云南)有限公司 | 融合双重注意力机制的缅甸语图像文本识别方法及装置 |
CN115658865A (zh) * | 2022-10-26 | 2023-01-31 | 茅台学院 | 一种基于注意力预训练的图片问答方法 |
CN116246331A (zh) * | 2022-12-05 | 2023-06-09 | 苏州大学 | 一种圆锥角膜自动分级方法、装置及存储介质 |
CN115993365A (zh) * | 2023-03-23 | 2023-04-21 | 山东省科学院激光研究所 | 一种基于深度学习的皮带缺陷检测方法及系统 |
CN115993365B (zh) * | 2023-03-23 | 2023-06-13 | 山东省科学院激光研究所 | 一种基于深度学习的皮带缺陷检测方法及系统 |
CN116052154A (zh) * | 2023-04-03 | 2023-05-02 | 中科南京软件技术研究院 | 一种基于语义增强与图推理的场景文本识别方法 |
CN116563615A (zh) * | 2023-04-21 | 2023-08-08 | 南京讯思雅信息科技有限公司 | 基于改进多尺度注意力机制的不良图片分类方法 |
CN116563615B (zh) * | 2023-04-21 | 2023-11-07 | 南京讯思雅信息科技有限公司 | 基于改进多尺度注意力机制的不良图片分类方法 |
CN116405310B (zh) * | 2023-04-28 | 2024-03-15 | 北京宏博知微科技有限公司 | 一种网络数据安全监测方法及系统 |
CN116405310A (zh) * | 2023-04-28 | 2023-07-07 | 北京宏博知微科技有限公司 | 一种网络数据安全监测方法及系统 |
CN116259067B (zh) * | 2023-05-15 | 2023-09-12 | 济南大学 | 一种高精度识别pid图纸符号的方法 |
CN116259067A (zh) * | 2023-05-15 | 2023-06-13 | 济南大学 | 一种高精度识别pid图纸符号的方法 |
CN116993679A (zh) * | 2023-06-30 | 2023-11-03 | 芜湖合德传动科技有限公司 | 一种基于目标检测的伸缩机皮带磨损检测方法 |
CN116993679B (zh) * | 2023-06-30 | 2024-04-30 | 芜湖合德传动科技有限公司 | 一种基于目标检测的伸缩机皮带磨损检测方法 |
CN116597258A (zh) * | 2023-07-18 | 2023-08-15 | 华东交通大学 | 一种基于多尺度特征融合的矿石分选模型训练方法及系统 |
CN116597258B (zh) * | 2023-07-18 | 2023-09-26 | 华东交通大学 | 一种基于多尺度特征融合的矿石分选模型训练方法及系统 |
CN116934733B (zh) * | 2023-08-04 | 2024-04-09 | 湖南恩智测控技术有限公司 | 一种芯片的可靠性测试方法及测试系统 |
CN116934733A (zh) * | 2023-08-04 | 2023-10-24 | 湖南恩智测控技术有限公司 | 一种芯片的可靠性测试方法及测试系统 |
CN117036891B (zh) * | 2023-08-22 | 2024-03-29 | 睿尔曼智能科技(北京)有限公司 | 一种基于跨模态特征融合的图像识别方法及系统 |
CN117036891A (zh) * | 2023-08-22 | 2023-11-10 | 睿尔曼智能科技(北京)有限公司 | 一种基于跨模态特征融合的图像识别方法及系统 |
CN117173716B (zh) * | 2023-09-01 | 2024-03-26 | 湖南天桥嘉成智能科技有限公司 | 一种基于深度学习的高温板坯id字符识别方法和系统 |
CN117173716A (zh) * | 2023-09-01 | 2023-12-05 | 湖南天桥嘉成智能科技有限公司 | 一种基于深度学习的高温板坯id字符识别方法和系统 |
CN117079295B (zh) * | 2023-09-19 | 2024-05-03 | 中航西安飞机工业集团股份有限公司 | 一种航空电缆张力计指针识别与读数方法及系统 |
CN117079295A (zh) * | 2023-09-19 | 2023-11-17 | 中航西安飞机工业集团股份有限公司 | 一种航空电缆张力计指针识别与读数方法及系统 |
CN117037173A (zh) * | 2023-09-22 | 2023-11-10 | 武汉纺织大学 | 一种二阶段的英文字符检测与识别方法及系统 |
CN117037173B (zh) * | 2023-09-22 | 2024-02-27 | 武汉纺织大学 | 一种二阶段的英文字符检测与识别方法及系统 |
CN117523685A (zh) * | 2023-11-15 | 2024-02-06 | 中国矿业大学 | 基于非对称对比融合的双模态生物特征识别方法及系统 |
CN117809314A (zh) * | 2023-11-21 | 2024-04-02 | 中化现代农业有限公司 | 文字识别方法、装置、电子设备和存储介质 |
CN117573810A (zh) * | 2024-01-15 | 2024-02-20 | 腾讯烟台新工科研究院 | 一种多语言产品包装说明书文字识别查询方法及系统 |
CN117573810B (zh) * | 2024-01-15 | 2024-04-09 | 腾讯烟台新工科研究院 | 一种多语言产品包装说明书文字识别查询方法及系统 |
CN117593610A (zh) * | 2024-01-17 | 2024-02-23 | 上海秋葵扩视仪器有限公司 | 图像识别网络训练及部署、识别方法、装置、设备及介质 |
CN117593610B (zh) * | 2024-01-17 | 2024-04-26 | 上海秋葵扩视仪器有限公司 | 图像识别网络训练及部署、识别方法、装置、设备及介质 |
CN118279679A (zh) * | 2024-06-04 | 2024-07-02 | 深圳大学 | 基于深度学习模型的图像分类方法、图像分类设备及介质 |
CN118429733A (zh) * | 2024-07-05 | 2024-08-02 | 湖南大学 | 一种多头注意力驱动的厨余垃圾多标签分类方法 |
Also Published As
Publication number | Publication date |
---|---|
CN113033249A (zh) | 2021-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021115159A1 (zh) | 文字识别网络模型训练方法、文字识别方法、装置、终端及其计算机存储介质 | |
Chherawala et al. | Feature set evaluation for offline handwriting recognition systems: application to the recurrent neural network model | |
RU2661750C1 (ru) | Распознавание символов с использованием искусственного интеллекта | |
Tarawneh et al. | Invoice classification using deep features and machine learning techniques | |
CN110647912A (zh) | 细粒度图像识别方法、装置、计算机设备及存储介质 | |
Haghighi et al. | Stacking ensemble model of deep learning and its application to Persian/Arabic handwritten digits recognition | |
EP3166020A1 (en) | Method and apparatus for image classification based on dictionary learning | |
CN114444600A (zh) | 基于记忆增强原型网络的小样本图像分类方法 | |
Khayyat et al. | Towards author recognition of ancient Arabic manuscripts using deep learning: A transfer learning approach | |
US20240152749A1 (en) | Continual learning neural network system training for classification type tasks | |
Palani et al. | Detecting and extracting information of medicines from a medical prescription using deep learning and computer vision | |
US11816909B2 (en) | Document clusterization using neural networks | |
Aharrane et al. | A comparison of supervised classification methods for a statistical set of features: Application: Amazigh OCR | |
WO2022062403A9 (zh) | 表情识别模型训练方法、装置、终端设备及存储介质 | |
Zou et al. | Supervised feature learning via L2-norm regularized logistic regression for 3D object recognition | |
CN111144469A (zh) | 基于多维关联时序分类神经网络的端到端多序列文本识别方法 | |
Bappi et al. | BNVGLENET: Hypercomplex Bangla handwriting character recognition with hierarchical class expansion using Convolutional Neural Networks | |
Rayeed et al. | BdSL47: A complete depth-based Bangla sign alphabet and digit dataset | |
Mahapatra et al. | Generator based methods for off-line handwritten character recognition | |
Cheng et al. | Maximum entropy regularization and chinese text recognition | |
Liu et al. | Combined with the residual and multi-scale method for Chinese thermal power system record text recognition | |
Chattyopadhyay et al. | Classification of MNIST image dataset using improved convolutional neural network | |
Sharma | Handwritten digit recognition using support vector machine | |
Chen et al. | Image-enhanced Adaptive Learning Rate Handwritten Vision Processing Algorithm Based on CNN | |
Zhao | Handwritten digit recognition and classification using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20900602 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20900602 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.02.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20900602 Country of ref document: EP Kind code of ref document: A1 |