TWI736230B

TWI736230B - Image processing method, electronic equipment and storage medium

Info

Publication number: TWI736230B
Application number: TW109113453A
Authority: TW
Inventors: 孫紅斌; 岳曉宇; 曠章輝; 藺琛皓; 張偉
Original assignee: 大陸商深圳市商湯科技有限公司
Priority date: 2019-12-27
Filing date: 2020-04-22
Publication date: 2021-08-11
Also published as: JP2022518889A; JP7097513B2; WO2021128578A1; TW202125307A; CN111191715A; KR20210113192A

Abstract

本發明涉及一種圖像處理方法、電子設備和儲存媒體，所述方法包括：對圖像進行識別，確定所述圖像中的多個目標區域，所述目標區域爲待提取文本所在區域；確定所述圖像中各目標區域之間的相對位置特徵；確定各所述目標區域的目標特徵，所述目標特徵包括所述待提取文本的特徵；透過圖卷積神經網路，對所述相對位置特徵和所述目標特徵進行特徵提取，得到提取後的特徵；根據提取後的特徵，確定所述待提取文本對應的字段。The present invention relates to an image processing method, electronic equipment and storage medium. The method includes: recognizing an image and determining multiple target regions in the image, where the target region is the region where the text to be extracted is located; and determining The relative position feature between each target area in the image; determine the target feature of each target area, the target feature includes the feature of the text to be extracted; through the graph convolutional neural network, the relative The location feature and the target feature are feature extracted to obtain the extracted feature; according to the extracted feature, the field corresponding to the text to be extracted is determined.

Description

Image processing method, electronic equipment and storage medium

本發明涉及電腦技術領域，尤其涉及一種圖像處理方法、電子設備和儲存媒體。The present invention relates to the field of computer technology, in particular to an image processing method, electronic equipment and storage media.

圖像中關鍵文字訊息提取在自動化辦公等場景中有著非常重要的作用，例如，透過對圖像中的關鍵文字訊息提取，可以實現諸如收據訊息提取、發票訊息提取、身份訊息提取等功能。The extraction of key text messages in images plays a very important role in automated office and other scenarios. For example, by extracting key text messages in images, functions such as receipt information extraction, invoice information extraction, and identity information extraction can be realized.

在對圖像中的文字進行提取時，會將識別出的文字對應到不同的字段中，以便後續對文字進行結構化儲存、展示等操作。例如，識別出來的文字是“19.88元”，需要確定“19.88元”是對應字段“總價”，還是對應字段“單價”，以便後續將“19.88元”作爲某個字段的值進行儲存。When extracting the text in the image, the recognized text will be mapped to different fields for subsequent operations such as structured storage and display of the text. For example, if the recognized text is "19.88 yuan", it is necessary to determine whether "19.88 yuan" corresponds to the field "total price" or the corresponding field "unit price", so that "19.88 yuan" is subsequently stored as the value of a certain field.

通常，會根據圖像中文字的排布規則，預先定義模板，模板中定義了某個位置的文字和字段的對應關係，這樣便可以確定識別出來的位於某個位置的文字對應的字段。例如，預先定義圖像右下角的文字對應的字段爲“總價”，這樣可以確定圖像右下角識別出來的“19.88元”對應的字段爲“總價”。Usually, a template is defined in advance according to the arrangement rules of the text in the image, and the corresponding relationship between the text at a certain position and the field is defined in the template, so that the field corresponding to the recognized text at a certain position can be determined. For example, predefine the field corresponding to the text in the lower right corner of the image as "Total Price", so that it can be determined that the field corresponding to "19.88 Yuan" identified in the lower right corner of the image is "Total Price".

本發明提出了一種圖像處理技術方案。The present invention proposes a technical solution for image processing.

根據本發明的一方面，提供了一種圖像處理方法，包括：對圖像進行識別，確定所述圖像中的多個目標區域，所述目標區域爲待提取文本所在區域；確定所述圖像中各目標區域之間的相對位置特徵；確定各所述目標區域的目標特徵，所述目標特徵包括所述待提取文本的特徵；透過圖卷積神經網路，對所述相對位置特徵和所述目標特徵進行特徵提取，得到提取後的特徵；根據提取後的特徵，確定所述待提取文本對應的字段。According to one aspect of the present invention, there is provided an image processing method, including: recognizing an image, determining a plurality of target regions in the image, where the target region is the region where the text to be extracted is located; determining the image The relative position feature between each target area in the image; determine the target feature of each target area, the target feature includes the feature of the text to be extracted; through the graph convolutional neural network, the relative position feature and The target feature is subjected to feature extraction to obtain the extracted feature; according to the extracted feature, the field corresponding to the text to be extracted is determined.

在本發明實施例中，能夠透過圖卷積神經網路，基於各目標區域之間的相對位置特徵以及待提取文本的特徵，確定圖像中的待提取文本對應的字段。可不依賴於固定的模板進行文本提取，相對於基於模板進行文本提取的方式，在對沒有適配模板的圖像進行文本提取時，準確性較高。In the embodiment of the present invention, a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position characteristics between the target regions and the characteristics of the text to be extracted. The text extraction can be performed without relying on a fixed template. Compared with the method of text extraction based on a template, the accuracy of text extraction is higher when the text is extracted from an image without a suitable template.

在一種可能的實現方式中，透過圖卷積神經網路，對所述相對位置特徵和所述目標特徵進行特徵提取，得到提取後的特徵，包括：以各所述目標特徵爲圖的節點，以各所述相對位置特徵爲連接兩個節點的邊，構建連通圖；透過圖卷積神經網路，對所述連通圖進行疊代更新，將疊代更新後滿足收斂條件的連通圖作爲提取後的特徵。In a possible implementation manner, feature extraction is performed on the relative position feature and the target feature through a graph convolutional neural network to obtain the extracted features, including: taking each of the target features as nodes of the graph, Use each of the relative position features as the edges connecting the two nodes to construct a connected graph; through the graph convolutional neural network, the connected graph is iteratively updated, and the connected graph that satisfies the convergence condition after the iterative update is taken as the extraction After the characteristics.

在本發明實施例中，構建的連通圖既包含了圖像中的目標特徵，也包含了圖像中目標特徵之間的相對位置特徵，可以從整體上表徵圖像中文字的特徵，因此能夠提高關鍵訊息提取結果的準確性。In the embodiment of the present invention, the constructed connected graph includes not only the target feature in the image, but also the relative position feature between the target features in the image, which can characterize the characteristics of the text in the image as a whole, so it can Improve the accuracy of key information extraction results.

圖卷積神經網路在對特徵進行提取時，能夠以連通圖的形式表示圖像，對特徵進行提取。連通圖由若干個節點（Node）及連接兩個節點的邊（Edge）所構成，邊用於刻畫不同節點之間的關係。因此，透過圖卷積神經網路提取後的特徵，能夠準確地表徵各目標區域之間的相對位置和待提取文本的特徵，以提高後續文本提取時的準確性。When extracting features, graph convolutional neural networks can represent images in the form of connected graphs and extract features. Connected graph is composed of several nodes (Node) and edges (Edge) connecting two nodes. Edges are used to describe the relationship between different nodes. Therefore, the features extracted through the graph convolutional neural network can accurately characterize the relative position between the target regions and the features of the text to be extracted, so as to improve the accuracy of subsequent text extraction.

在一種可能的實現方式中，根據提取後的特徵，確定所述待提取文本對應的字段，包括：根據預先定義的多個預設類別，對圖卷積神經網路輸出的連通圖中的節點進行分類，得到節點的類別，所述預設類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；根據所述節點的類別，確定待提取文本對應於預設字段的標識或字段值。In a possible implementation manner, determining the field corresponding to the text to be extracted according to the extracted features includes: according to a plurality of predefined categories, the nodes in the connected graph output by the graph convolutional neural network The classification is performed to obtain the category of the node, and the preset category includes: the category of the identifier that the characterizing text belongs to the preset field, and the category of the field value of the characterizing text belonging to the preset field; and the text to be extracted is determined according to the category of the node The identifier or field value corresponding to the preset field.

在本發明實施例中，透過預先定義預設類別爲預設字段的標識或字段，根據提取的特徵對待提取文本進行分類，即可得到待提取文本對應於預設字段的標識或字段值，提高了文本提取時的準確性。In the embodiment of the present invention, by predefining the preset category as the identifier or field of the preset field, and classifying the text to be extracted according to the extracted features, the identifier or the field value of the text to be extracted corresponding to the preset field can be obtained, which improves The accuracy of text extraction is improved.

在一種可能的實現方式中，確定所述圖像中各目標區域之間的相對位置特徵，包括：確定圖像中的第一目標區域和第二目標區域的相對位置參數；對所述相對位置參數進行特徵化處理，得到第一目標區域和第二目標區域的相對位置特徵。In a possible implementation manner, determining the relative position characteristics between the target areas in the image includes: determining the relative position parameters of the first target area and the second target area in the image; The parameters are characterized to obtain the relative position characteristics of the first target area and the second target area.

在一種可能的實現方式中，所述相對位置參數包括下述至少一種：第一目標區域相對於第二目標區域的橫向距離和縱向距離；所述第一目標區域的寬高比；所述第二目標區域的寬高比；所述第一目標區域和所述第二目標區域的相對尺寸關係。In a possible implementation manner, the relative position parameter includes at least one of the following: the lateral distance and the longitudinal distance of the first target area relative to the second target area; the aspect ratio of the first target area; the first target area 2. The aspect ratio of the target area; the relative size relationship between the first target area and the second target area.

在本發明實施例中，相對位置參數即包含了橫向距離和縱向距離，也包含了第一目標區域的寬高比，也包含了第一目標區域和第二目標區域的相對尺寸關係，可以使得關鍵訊息的提取結果更加準確。In the embodiment of the present invention, the relative position parameter includes the horizontal distance and the vertical distance, the aspect ratio of the first target area, and the relative size relationship between the first target area and the second target area, so that The extraction result of key information is more accurate.

在一種可能的實現方式中，對所述相對位置參數進行特徵化處理，得到第一目標區域和第二目標區域的相對位置特徵，包括：將所述相對位置參數透過正餘弦變換矩陣映射到一個D維的空間，得到D維的特徵向量，D爲正整數；透過預設權重矩陣，將所述D維的特徵向量轉化爲1維的權重值；透過預設激勵函數對所述權重值進行處理，得到相對位置特徵。In a possible implementation manner, performing characterization processing on the relative position parameters to obtain the relative position characteristics of the first target area and the second target area includes: mapping the relative position parameters to one through a sine-cosine transformation matrix In the D-dimensional space, a D-dimensional eigenvector is obtained, where D is a positive integer; the D-dimensional eigenvector is converted into a 1-dimensional weight value through a preset weight matrix; the weight value is performed through a preset excitation function Processing to obtain relative position characteristics.

在本發明實施例中，透過特徵化處理可以將相對位置參數轉換爲圖卷積神經網路的邊所需的數據格式，便於後續透過圖卷積神經網路進行特徵提取。In the embodiment of the present invention, the relative position parameter can be converted into the data format required by the edge of the graph convolutional neural network through the characteristic processing, which is convenient for subsequent feature extraction through the graph convolutional neural network.

在一種可能的實現方式中，確定各所述目標區域的目標特徵，包括：確定目標區域中的像素數據，對所述像素數據進行特徵提取，得到視覺特徵；確定目標區域中的文本字符，對所述文本字符進行特徵提取，得到字符特徵；根據提取到的視覺特徵和字符特徵，確定目標區域的目標特徵。In a possible implementation manner, determining the target feature of each target area includes: determining pixel data in the target area, performing feature extraction on the pixel data to obtain visual features; determining text characters in the target area, The text characters are subjected to feature extraction to obtain character features; according to the extracted visual features and character features, the target features of the target area are determined.

在本發明實施例中，考慮到圖像中會存在由於拍照視角、光線、遮擋等原因帶來的干擾因素，因此，透過文字檢測識別通常會有較多的誤識，即可能會識別出錯誤的文本字符，這可能會影響關鍵訊息提取的準確性。而透過視覺訊息的提取，將視覺訊息考慮到關鍵訊息提取中，會降低文本誤識對關鍵訊息提取的影響。即使文本識別錯誤，但由於視覺訊息不會改變太大，因此二者結合能夠提高關鍵訊息提取結果的準確性。In the embodiment of the present invention, considering that there will be interference factors in the image due to the camera angle, light, occlusion, etc., there will usually be more misunderstandings through text detection and recognition, that is, recognition errors may be detected. This may affect the accuracy of key information extraction. Through the extraction of visual information, the visual information is taken into account in the extraction of key information, which will reduce the influence of text misunderstanding on the extraction of key information. Even if the text recognition is wrong, the visual information will not change too much, so the combination of the two can improve the accuracy of the key information extraction results.

在一種可能的實現方式中，根據提取到的視覺特徵和字符特徵，確定目標區域的目標特徵，包括：將所述視覺特徵和字符特徵賦予不同的權重；對賦予權重後的所述視覺特徵和字符特徵進行融合，得到目標區域的目標特徵。In a possible implementation manner, determining the target feature of the target area according to the extracted visual features and character features includes: assigning different weights to the visual features and character features; and assigning weights to the visual features and The character features are fused to obtain the target feature of the target area.

在本發明實施例中，透過對視覺特徵和字符特徵賦予不同的權重，能夠提高關鍵訊息提取結果的準確性。In the embodiment of the present invention, by assigning different weights to visual features and character features, the accuracy of the key information extraction results can be improved.

在一種可能的實現方式中，所述方法透過預先構建的分類網路實現，所述分類網路的訓練步驟如下：將樣本圖像輸入所述分類網路中處理，得到樣本圖像中待提取文本的第一預測類別，以及所述第一預測類別中各個類別之間的對應關係；根據所述第一預測類別，以及所述樣本圖像的標注類別，訓練所述分類網路，所述標注類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；根據所述對應關係，以及標注的待提取文本之間的對應關係，訓練所述分類網路。In a possible implementation, the method is implemented through a pre-built classification network, and the training steps of the classification network are as follows: input sample images into the classification network for processing, and obtain sample images to be extracted The first prediction category of the text, and the correspondence between each category in the first prediction category; training the classification network according to the first prediction category and the label category of the sample image, the The labeling category includes: the type of the identifier that the characterizing text belongs to the preset field, and the type of the field value of the characterizing text belonging to the preset field; training the classification according to the corresponding relationship and the corresponding relationship between the labeled text to be extracted network.

在本發明實施例中，透過對樣本圖像的類別和各個類別之間的對應關係進行標注，能夠更準確地對分類網路進行訓練，訓練得到的分類網路在對沒有適配模板的圖像進行文本提取時，準確性較高。In the embodiment of the present invention, the classification network can be trained more accurately by labeling the classification of the sample image and the corresponding relationship between each classification. When extracting text, the accuracy is higher.

在一種可能的實現方式中，所述圖像包括下述至少一種：收據圖像、發票圖像、名片圖像。In a possible implementation manner, the image includes at least one of the following: a receipt image, an invoice image, and a business card image.

根據本發明的一方面，提供了一種圖像處理裝置，包括：識別模組，用於對圖像進行識別，確定所述圖像中的多個目標區域，所述目標區域爲待提取文本所在區域；相對位置特徵確定模組，用於確定所述圖像中各目標區域之間的相對位置特徵；目標特徵確定模組，用於確定各所述目標區域的目標特徵，所述目標特徵包括所述待提取文本的特徵；圖卷積模組，用於透過圖卷積神經網路，對所述相對位置特徵和所述目標特徵進行特徵提取，得到提取後的特徵；字段確定模組，用於根據提取後的特徵，確定所述待提取文本對應的字段。According to one aspect of the present invention, there is provided an image processing device including: a recognition module for recognizing an image and determining multiple target regions in the image, where the target region is where the text to be extracted is located Region; a relative position feature determination module, used to determine the relative position feature between each target area in the image; a target feature determination module, used to determine the target feature of each target area, the target feature includes The features of the text to be extracted; a graph convolution module, which is used to perform feature extraction on the relative position feature and the target feature through a graph convolutional neural network to obtain the extracted features; a field determination module, It is used to determine the field corresponding to the text to be extracted according to the extracted features.

在一種可能的實現方式中，圖卷積模組包括：第一圖卷積子模組和第二圖卷積子模組，其中，第一圖卷積子模組，用於以各所述目標特徵爲圖的節點，以各所述相對位置特徵爲連接兩個節點的邊，構建連通圖；第二圖卷積子模組，用於透過圖卷積神經網路，對所述連通圖進行疊代更新，將疊代更新後滿足收斂條件的連通圖作爲提取後的特徵。In a possible implementation, the graph convolution module includes: a first graph convolution sub-module and a second graph convolution sub-module, wherein the first graph convolution sub-module is used to The target feature is the node of the graph, and each of the relative position features is the edge connecting the two nodes to construct a connected graph; the second graph convolution sub-module is used to analyze the connected graph through the graph convolutional neural network The iterative update is performed, and the connected graph that meets the convergence condition after the iterative update is used as the extracted feature.

圖卷積神經網路在對特徵進行提取時，能夠以連通圖的形式表示圖像，對特徵進行提取。連通圖由若干個節點及連接兩個節點的邊所構成，邊用於刻畫不同節點之間的關係。因此，透過圖卷積神經網路提取後的特徵，能夠準確地表徵各目標區域之間的相對位置和待提取文本的特徵，以提高後續文本提取時的準確性。When extracting features, graph convolutional neural networks can represent images in the form of connected graphs and extract features. A connected graph is composed of several nodes and edges connecting two nodes. The edges are used to describe the relationship between different nodes. Therefore, the features extracted through the graph convolutional neural network can accurately characterize the relative position between the target regions and the features of the text to be extracted, so as to improve the accuracy of subsequent text extraction.

在一種可能的實現方式中，字段確定模組包括：第一字段確定子模組和第二字段確定子模組，其中，第一字段確定子模組，用於根據預先定義的多個預設類別，對圖卷積神經網路輸出的連通圖中的節點進行分類，得到節點的類別，所述預設類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；第二字段確定子模組，用於根據所述節點的類別，確定待提取文本對應於預設字段的標識或字段值。In a possible implementation manner, the field determination module includes: a first field determination sub-module and a second field determination sub-module, wherein the first field determination sub-module is used to determine the sub-module according to a plurality of predefined presets. Category, to classify the nodes in the connected graph output by the graph convolutional neural network to obtain the category of the node. The preset category includes: the category representing the identifier of the text belonging to the preset field, and the category representing the text belonging to the preset field The category of the field value; the second field determination sub-module is used to determine the identifier or the field value of the preset field corresponding to the text to be extracted according to the category of the node.

在一種可能的實現方式中，相對位置特徵確定模組包括：第一相對位置特徵確定子模組和第二相對位置特徵確定子模組，其中，第一相對位置特徵確定子模組，用於確定圖像中的第一目標區域和第二目標區域的相對位置參數；第二相對位置特徵確定子模組，用於對所述相對位置參數進行特徵化處理，得到第一目標區域和第二目標區域的相對位置特徵。In a possible implementation, the relative position feature determination module includes: a first relative position feature determination sub-module and a second relative position feature determination sub-module, wherein the first relative position feature determination sub-module is used for Determine the relative position parameters of the first target area and the second target area in the image; the second relative position feature determination sub-module is used to characterize the relative position parameters to obtain the first target area and the second target area. The relative location characteristics of the target area.

在一種可能的實現方式中，第二相對位置特徵確定子模組，用於將所述相對位置參數透過正餘弦變換矩陣映射到一個D維的空間，得到D維的特徵向量，D爲正整數；透過預設權重矩陣，將所述D維的特徵向量轉化爲1維的權重值；透過預設激勵函數對所述權重值進行處理，得到相對位置特徵。In a possible implementation, the second relative position feature determination submodule is used to map the relative position parameters to a D-dimensional space through a sine-cosine transformation matrix to obtain a D-dimensional eigenvector, where D is a positive integer Transform the D-dimensional feature vector into a 1-dimensional weight value through a preset weight matrix; process the weight value through a preset excitation function to obtain a relative position feature.

在一種可能的實現方式中，目標特徵確定模組，包括第一目標特徵確定子模組、第二目標特徵確定子模組和第三目標特徵確定子模組，其中，第一目標特徵確定子模組，用於確定目標區域中的像素數據，對所述像素數據進行特徵提取，得到視覺特徵；第二目標特徵確定子模組，用於確定目標區域中的文本字符，對所述文本字符進行特徵提取，得到字符特徵；第三目標特徵確定子模組，用於根據提取到的視覺特徵和字符特徵，確定目標區域的目標特徵。In a possible implementation, the target feature determination module includes a first target feature determination sub-module, a second target feature determination sub-module, and a third target feature determination sub-module, wherein the first target feature determination sub-module The module is used to determine the pixel data in the target area, and perform feature extraction on the pixel data to obtain visual features; the second target feature determination sub-module is used to determine the text characters in the target area and compare the text characters Perform feature extraction to obtain character features; the third target feature determination sub-module is used to determine the target features of the target area according to the extracted visual features and character features.

在一種可能的實現方式中，第三目標特徵確定子模組，用於將所述視覺特徵和字符特徵賦予不同的權重；對賦予權重後的所述視覺特徵和字符特徵進行融合，得到目標區域的目標特徵。In a possible implementation manner, the third target feature determination sub-module is used to assign different weights to the visual features and character features; to fuse the weighted visual features and character features to obtain the target area Target characteristics.

在一種可能的實現方式中，所述裝置透過預先構建的分類網路實現，所述裝置還包括：第一訓練模組，用於將樣本圖像輸入所述分類網路中處理，得到樣本圖像中待提取文本的第一預測類別，以及所述第一預測類別中各個類別之間的對應關係；第二訓練模組，用於根據所述第一預測類別，以及所述樣本圖像的標注類別，訓練所述分類網路，所述標注類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；第三訓練模組，用於根據所述對應關係，以及標注的待提取文本之間的對應關係，訓練所述分類網路。In a possible implementation manner, the device is implemented through a pre-built classification network, and the device further includes: a first training module for inputting sample images into the classification network for processing to obtain sample images The first prediction category of the text to be extracted in the image, and the corresponding relationship between each category in the first prediction category; the second training module is used for the first prediction category and the sample image The labeling category is used to train the classification network. The labeling category includes: the category of the identifier that characterizes the text belonging to the preset field, and the category of the field value that characterizes the text belonging to the preset field; the third training module is used to Training the classification network by describing the corresponding relationship and the corresponding relationship between the labeled texts to be extracted.

根據本發明的一方面，提供了一種電子設備，包括：處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置爲調用所述記憶體儲存的指令，以執行上述方法。According to one aspect of the present invention, there is provided an electronic device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute The above method.

根據本發明的一方面，提供了一種電腦可讀儲存媒體，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。According to one aspect of the present invention, there is provided a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.

根據本發明的一方面，提供了一種電腦程式，包括電腦可讀代碼，當所述電腦可讀代碼在電子設備中運行時，所述電子設備中的處理器執行用於實現上述方法的指令。According to one aspect of the present invention, there is provided a computer program including computer readable code, and when the computer readable code is run in an electronic device, a processor in the electronic device executes instructions for implementing the above method.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本發明。根據下面參考附圖對示例性實施例的詳細說明，本發明的其它特徵及方面將變得清楚。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, rather than limiting the present invention. According to the following detailed description of exemplary embodiments with reference to the accompanying drawings, other features and aspects of the present invention will become clear.

以下將參考附圖詳細說明本發明的各種示例性實施例、特徵和方面。附圖中相同的附圖標記表示功能相同或相似的元件。儘管在附圖中示出了實施例的各種方面，但是除非特別指出，不必按比例繪製附圖。Various exemplary embodiments, features, and aspects of the present invention will be described in detail below with reference to the drawings. The same reference numerals in the drawings indicate elements with the same or similar functions. Although various aspects of the embodiments are shown in the drawings, unless otherwise noted, the drawings are not necessarily drawn to scale.

在這裏專用的詞“示例性”意爲“用作例子、實施例或說明性”。這裏作爲“示例性”所說明的任何實施例不必解釋爲優於或好於其它實施例。The dedicated word "exemplary" here means "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" need not be construed as being superior or better than other embodiments.

本文中術語“和/或”，僅僅是一種描述關聯對象的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is only an association relationship describing associated objects, which means that there can be three types of relationships, for example, A and/or B can mean: A alone exists, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein means any one or any combination of at least two of the multiple, for example, including at least one of A, B, and C, and may mean including those made from A, B, and C Any one or more elements selected in the set.

另外，爲了更好地說明本發明，在下文的具體實施方式中給出了眾多的具體細節。本領域技術人員應當理解，沒有某些具體細節，本發明同樣可以實施。在一些實例中，對於本領域技術人員熟知的方法、手段、元件和電路未作詳細描述，以便於凸顯本發明的主旨。In addition, in order to better illustrate the present invention, numerous specific details are given in the following specific embodiments. Those skilled in the art should understand that the present invention can also be implemented without certain specific details. In some examples, the methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present invention.

隨著人工智慧技術的發展，基於圖像進行關鍵訊息提取的技術取得了長足的發展，在進行關鍵訊息提取時，可以將圖像中的文本識別出來，另外，還會確定識別出來的文本的結構化訊息，即確定識別出的某一文本對應結構化數據中的哪個字段，以方便後續對識別出的數據進行結構化地儲存、展示等操作。With the development of artificial intelligence technology, the technology for extracting key information based on images has made great progress. When extracting key information, the text in the image can be recognized. In addition, the recognition of the text will also be determined. The structured message is to determine which field in the structured data corresponds to a certain identified text, so as to facilitate subsequent operations such as structured storage and display of the identified data.

爲了提高關鍵訊息提取的準確性，本發明實施例提供了一種圖像處理方法，可以基於各目標區域之間的相對位置特徵，以及待提取文本的特徵，透過圖卷積神經網路，確定圖像中的待提取文本對應的字段。該方法可不依賴於固定的模板進行文字提取，相對於基於模板進行文字訊息提取的方式，在對沒有適配模板的圖像進行文字訊息提取時，準確性較高。In order to improve the accuracy of key information extraction, an embodiment of the present invention provides an image processing method, which can determine the image based on the relative position feature between the target regions and the feature of the text to be extracted through the image convolutional neural network. The field corresponding to the text to be extracted in the image. This method can perform text extraction without relying on a fixed template. Compared with the method of extracting text information based on a template, the accuracy is higher when extracting text information from an image without a suitable template.

本發明實施例提供的圖像處理方法，可應用於圖像中關鍵訊息的提取，可以實現諸如收據訊息提取、發票訊息提取、身份訊息提取等功能，具備較高的應用價值。The image processing method provided by the embodiment of the present invention can be applied to the extraction of key information in the image, can realize functions such as receipt information extraction, invoice information extraction, and identity information extraction, and has high application value.

圖1示出根據本發明實施例的圖像處理方法的流程圖，如圖1所示，所述圖像處理方法包括：Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present invention. As shown in Fig. 1, the image processing method includes:

步驟S11，對圖像進行識別，確定圖像中的多個目標區域。Step S11, the image is recognized, and multiple target regions in the image are determined.

目標區域爲待提取文本所在區域。The target area is the area where the text to be extracted is located.

由於待提取文本在圖像上的分布往往是較爲分散的，例如，文本“總價”和“19.88元”之間是有一定間隔的，因此，在確定目標區域時，可以根據文本在圖像上的分布關係，以文本之間的間隔爲依據，對圖像進行劃分，得到多個目標區域。另外也可以是根據其它方式對目標區域進行劃分，具體劃分方式可以視本發明的具體應用場景而定，本發明對此不作限制。Since the distribution of the text to be extracted on the image is often relatively scattered, for example, there is a certain interval between the text "total price" and "19.88 yuan". Therefore, when determining the target area, you can use the text in the image to determine the target area. The distribution relationship on the image is based on the spacing between the texts, and the image is divided to obtain multiple target regions. In addition, the target area may also be divided according to other methods. The specific division method may depend on the specific application scenario of the present invention, which is not limited by the present invention.

在進行目標區域的確定後，可以將構成一個詞、構成一句話或表達某一含義的文本所在的區域確定爲1個目標區域，例如，待提取文本“總價”所在的區域爲1個目標區域，“19.88元”所在的區域爲1個目標區域。After the target area is determined, the area where the text that constitutes a word, a sentence, or expresses a certain meaning can be determined as a target area, for example, the area where the text "total price" is to be extracted is a target Area, the area where "19.88 yuan" is located is a target area.

對於具體確定圖像中目標區域的方式，本發明對此不作限制。As for the specific method of determining the target area in the image, the present invention does not limit this.

步驟S12，確定圖像中各目標區域之間的相對位置特徵。Step S12: Determine the relative position characteristics between the target regions in the image.

相對位置特徵能夠表徵各目標區域之間的相對位置關係，具體的相對位置特徵可以根據2個目標區域的中心點確定，也可以是根據2個目標區域的某個頂點確定，本發明對此不作限制。另外，本發明中的相對位置特徵還可以根據一些其它參數確定，具體將會在後文本發明可能的實現方式中進行論述，此處不做贅述。The relative position feature can characterize the relative position relationship between the target areas. The specific relative position feature can be determined according to the center points of the two target areas, or according to a certain vertex of the two target areas. The present invention does not do this. limit. In addition, the relative position feature in the present invention can also be determined based on some other parameters, which will be specifically discussed in the possible implementation of the invention in the following text, and will not be repeated here.

步驟S13，確定各目標區域的目標特徵。Step S13: Determine the target feature of each target area.

目標特徵包括待提取文本的特徵。待提取文本的特徵爲待提取文本自身的特徵，該特徵即可以包括待提取文本整體上的視覺特徵，也可以包括待提取文本的文本字符的特徵，或者上述兩個特徵中的一個。The target feature includes the feature of the text to be extracted. The feature of the text to be extracted is the feature of the text to be extracted. The feature may include the visual feature of the text to be extracted as a whole, the feature of the text character of the text to be extracted, or one of the above two features.

步驟S14，透過圖卷積神經網路，對相對位置特徵和目標特徵進行特徵提取，得到提取後的特徵。Step S14: Perform feature extraction on the relative position feature and the target feature through the graph convolutional neural network to obtain the extracted feature.

將相對位置特徵和目標特徵輸入圖卷積神經網路中，進行特徵提取，可以得到提取後的特徵。The relative position feature and the target feature are input into the graph convolutional neural network, and feature extraction is performed to obtain the extracted features.

圖卷積神經網路在對特徵進行提取時，能夠以連通圖的形式表示圖像，對特徵進行提取。連通圖由若干個節點（Node）及連接兩個節點的邊（Edge）所構成，邊用於刻畫不同節點之間的關係。When extracting features, graph convolutional neural networks can represent images in the form of connected graphs and extract features. Connected graph is composed of several nodes (Node) and edges (Edge) connecting two nodes. Edges are used to describe the relationship between different nodes.

因此，透過圖卷積神經網路提取後的特徵，能夠準確地表徵各目標區域之間的相對位置和待提取文本的特徵，以提高後續文本提取時的準確性。Therefore, the features extracted through the graph convolutional neural network can accurately characterize the relative position between the target regions and the features of the text to be extracted, so as to improve the accuracy of subsequent text extraction.

步驟S15，根據提取後的特徵，確定待提取文本對應的字段。Step S15: Determine the field corresponding to the text to be extracted according to the extracted features.

根據提取後的特徵確定待提取文本對應的字段時，具體可透過訓練好的網路來實現，該網路可以根據提取後的特徵對待提取文本進行分類，分類的類別用於表徵與待提取文本對應的字段。在根據提取後的特徵確定了待提取文本的類別後，即確定了待提取文本對應的字段。When determining the field corresponding to the text to be extracted according to the extracted features, it can be implemented through a trained network. The network can classify the text to be extracted based on the extracted features. The classification category is used to characterize the text to be extracted. The corresponding field. After the category of the text to be extracted is determined according to the extracted features, the field corresponding to the text to be extracted is determined.

後文將會對網路的訓練過程進行描述，此處不做贅述。The training process of the network will be described later, so I won’t repeat it here.

根據本發明的實施例，能夠透過圖卷積神經網路，基於各目標區域之間的相對位置特徵以及待提取文本的特徵，確定圖像中的待提取文本對應的字段。可不依賴於固定的模板進行文本提取，相對於基於模板進行文本提取的方式，在對沒有適配模板的圖像進行文本提取時，準確性較高。According to the embodiment of the present invention, a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position feature between the target regions and the feature of the text to be extracted. The text extraction can be performed without relying on a fixed template. Compared with the method of text extraction based on a template, the accuracy of text extraction is higher when the text is extracted from an image without a suitable template.

在一種可能的實現方式中，確定圖像中各目標區域之間的相對位置特徵，包括：確定圖像中的第一目標區域和第二目標區域的相對位置參數；對相對位置參數進行特徵化處理，得到第一目標區域和第二目標區域的相對位置特徵。In a possible implementation manner, determining the relative position characteristics between the target areas in the image includes: determining the relative position parameters of the first target area and the second target area in the image; characterizing the relative position parameters Through processing, the relative position characteristics of the first target area and the second target area are obtained.

這裏的第一目標區域和第二目標區域爲圖像中的任意兩個目標區域。The first target area and the second target area here are any two target areas in the image.

其中，圖像中的第一目標區域和第二目標區域的相對位置參數包括下述至少一種：Wherein, the relative position parameters of the first target area and the second target area in the image include at least one of the following:

第一目標區域相對於第二目標區域的橫向距離和縱向距離；The horizontal distance and the vertical distance of the first target area relative to the second target area;

第一目標區域的寬高比；The aspect ratio of the first target area;

第二目標區域的寬高比；The aspect ratio of the second target area;

第一目標區域和第二目標區域的相對尺寸關係。The relative size relationship between the first target area and the second target area.

其中，第一目標區域相對於第二目標區域的橫向距離和縱向距離，可以是第一目標區域的參考點與第二目標區域的參考點的橫向距離和縱向距離，對於目標區域參考點的選取，可以是目標區域的中心點，也可以是目標區域的某個頂點，對具體參考點的選取，本發明不作限制。Wherein, the horizontal and vertical distances of the first target area relative to the second target area may be the horizontal and vertical distances between the reference point of the first target area and the reference point of the second target area, and the selection of the reference point of the target area It can be the center point of the target area or a vertex of the target area. The selection of a specific reference point is not limited in the present invention.

爲便於更清楚地理解相對位置特徵的確定過程，下面透過具體的數學表達式來對相對位置特徵的確定過程進行說明，需要說明的是，本發明中提供的具體數學表達式爲本發明實施例在具體實施時的一種可能的實現方式，而不應當理解爲對本發明實施例保護範圍的限制。To facilitate a clearer understanding of the process of determining relative position features, the following describes the process of determining relative position features through specific mathematical expressions. It should be noted that the specific mathematical expressions provided in the present invention are embodiments of the present invention. A possible implementation manner during specific implementation should not be understood as a limitation on the protection scope of the embodiments of the present invention.

對於一個待提取文本而言，其所在的目標區域往往是矩形的，那麽對於待提取文本 t _i ，可以表示爲 t _i = > x _i, y _i, h _i, w _i, s _i >，其中， x _i, y _i 分別表示目標區域的參考點在預設坐標系中的橫坐標和縱坐標， h _i, w _i 分別表示目標區域的高度和寬度， s _i 表示待提取文本的字符。 For a text to be extracted, the target area is often rectangular, so for the text to be extracted t _i , it can be expressed as t _i => x _i , y _i , h _i , w _i , s _i >, where , x _{_i,} y _i represent the reference point of the target area in a predetermined horizontal and vertical coordinates of the coordinate system, h _{_i,} w _i represent the height and width of the target area, s _i represents the character of the text to be extracted.

那麽，在一種可能的實現方式中，第一目標區域相對於第二目標區域的橫向距離Δ x _ij 和縱向距離Δ y _ij ，表達式如下： Then, in a possible implementation manner, the lateral distance Δ x _ij and the longitudinal distance Δ y _{ij of the} first target area relative to the second target area are expressed as follows:

（1）

(1)

（2）

(2)

其中，第一目標區域爲待提取文本 t _i 所在的區域，第二目標區域爲待提取文本 t _j 所在的區域。 The first target area is the area where the text t _i is to be extracted, and the second target area is the area where the text t _{j is to be extracted.}

在一種可能的實現方式中，還可以對橫向距離Δ x _ij 和縱向距離Δ y _ij 進行歸一化處理，得到歸一化後的橫向距離和縱向距離，具體可透過圖像的尺寸參數對Δ x _ij 和Δ y _ij 進行歸一化。例如，在透過圖像的寬度W進行歸一化時，得到相對位置參數

的表達式如下： In a possible implementation manner, the horizontal distance Δ x _ij and the vertical distance Δ y _ij can also be normalized to obtain the normalized horizontal distance and vertical distance. Specifically, the size parameter of the image can be compared to Δ x _ij and Δ y _ij are normalized. For example, when the width W of the transmitted image is normalized, the relative position parameter is obtained

The expression is as follows:

（3）

(3)

另外，也可用圖像的高 H來進行歸一化，此處不作贅述。 In addition, the high H of the image can also be used for normalization, which will not be repeated here.

透過對橫向距離Δ x _ij 和縱向距離Δ y _ij 進行歸一化處理，降低了被識別圖像的放大或縮小對最終結果的影響，使得關鍵訊息的提取結果更加準確。 By normalizing the horizontal distance Δ x _ij and the vertical distance Δ y _ij , the influence of the enlargement or reduction of the recognized image on the final result is reduced, and the extraction result of the key information is more accurate.

在一種可能的實現方式中，第一目標區域的寬高比爲 w _i /h _i ，第二目標區域的寬高比爲 w _j /h _j 。 In a possible implementation manner, the aspect ratio of the first target area is w _i /h _i , and the aspect ratio of the second target area is w _j /h _j .

第一目標區域和第二目標區域的相對尺寸關係，可以表徵第一目標區域尺寸和第二目標區域尺寸之間的相對大小關係。由於某些字段的文本的尺寸之間具備一些特定的關係，因此，相對位置特徵中考慮了第一目標區域和第二目標區域的相對尺寸關係，可以使得關鍵訊息的提取結果更加準確。The relative size relationship between the first target area and the second target area may represent the relative size relationship between the size of the first target area and the size of the second target area. Since there are some specific relationships between the text sizes of certain fields, the relative size relationship between the first target area and the second target area is considered in the relative position feature, which can make the extraction result of the key information more accurate.

例如，文本“地址”的尺寸較短，文本“xx市xx街道xx路xx號”的尺寸較長，因此，這兩個尺寸之間的差距較大；而文本“總價”和“19.88元”的尺寸之間的差距較小。因此，目標區域的相對尺寸關係能夠在一定程度上反應文本對應的字段類別。For example, the size of the text "address" is shorter, and the size of the text "xx city xx street xx road xx number" is longer, so the difference between the two sizes is larger; while the text "total price" and "19.88 yuan" "The gap between the sizes is smaller. Therefore, the relative size relationship of the target area can reflect the field category corresponding to the text to a certain extent.

在一種可能的實現方式中，相對尺寸關係

的表達式如下： In a possible implementation, the relative size relationship

The expression is as follows:

（4）

(4)

在一種可能的實現方式中，對上述公式涉及的相對位置參數進行整合，得到的整合後的相對位置參數的表達式如下：In a possible implementation manner, the relative position parameters involved in the above formulas are integrated, and the resulting expression of the integrated relative position parameters is as follows:

（5）

(5)

該實現方式中，相對位置參數即包含了歸一化後的橫向距離和縱向距離，也包含了第一目標區域的寬高比，也包含了第一目標區域和第二目標區域的相對尺寸關係，可以使得關鍵訊息的提取結果更加準確。In this implementation, the relative position parameter includes the normalized horizontal and vertical distances, the aspect ratio of the first target area, and the relative size relationship between the first target area and the second target area. , Can make the extraction result of key information more accurate.

在一種可能的實現方式中，在得到相對位置參數後，可以對相對位置參數進行特徵化處理，得到第一目標區域和第二目標區域的相對位置特徵。In a possible implementation manner, after the relative position parameters are obtained, the relative position parameters can be characterized to obtain the relative position characteristics of the first target area and the second target area.

對相對位置參數進行特徵化處理，得到第一目標區域和第二目標區域的相對位置特徵，包括：將相對位置參數透過正餘弦變換矩陣映射到一個D維的空間，得到D維的特徵向量，D爲正整數；將D維的特徵向量乘以一個預設權重矩陣，得到一個1維的權重值；透過預設激勵函數對權重值進行處理，得到相對位置特徵。Characterizing the relative position parameters to obtain the relative position characteristics of the first target area and the second target area includes: mapping the relative position parameters to a D-dimensional space through a sine-cosine transformation matrix to obtain a D-dimensional feature vector, D is a positive integer; the D-dimensional feature vector is multiplied by a preset weight matrix to obtain a 1-dimensional weight value; the weight value is processed through the preset activation function to obtain the relative position feature.

這裏的正餘弦變換矩陣爲傅立葉正弦變換或者餘弦變換時所使用的變換矩陣。The sine-cosine transformation matrix here is the transformation matrix used in Fourier sine transformation or cosine transformation.

這裏的預設權重矩陣的具體值可透過網路訓練來確定，初始值可以透過隨機等方式確定，在進行網路訓練時，將會對預設權重矩陣進行調優。後文將會對網路的訓練過程進行描述，此處不做贅述。The specific value of the preset weight matrix here can be determined through network training, and the initial value can be determined through random methods. During network training, the preset weight matrix will be tuned. The training process of the network will be described later, so I won’t repeat it here.

這裏的預設激勵函數例如可以是線性整流函數（Rectified Linear Unit, ReLU），具體的激勵函數可以視本發明的實際應用場景而定，本發明對此不作限制。The preset activation function here may be, for example, a linear rectification function (Rectified Linear Unit, ReLU), and the specific activation function may depend on the actual application scenario of the present invention, which is not limited by the present invention.

爲便於理解對相對位置參數進行特徵化處理的過程，下面透過具體的表達式來說明特徵化處理後的相對位置特徵 e _ij ，具體請參見公式（6）： In order to facilitate the understanding of the process of characterizing relative position parameters, the following specific expressions are used to illustrate the relative position characteristics e _ij after characterization. For details, please refer to formula (6):

（6）

(6)

其中，M表示正餘弦變換矩陣，M( r _ij )表示將所述相對位置參數 r _ij 透過正餘弦變換矩陣M映射到一個D維的空間，W _m 爲預設權重矩陣，ReLU表示線性整流函數。 Where, M represents a sine-cosine transformation matrix, M( r _ij ) represents that the relative position parameter r _ij is mapped to a D-dimensional space through a sine-cosine transformation matrix M, W _m is a preset weight matrix, and ReLU represents a linear rectification function .

本發明實施例中，透過特徵化處理可以將相對位置參數轉換爲圖卷積神經網路的邊所需的數據格式，便於後續透過圖卷積神經網路進行特徵提取。In the embodiment of the present invention, the relative position parameter can be converted into the data format required by the edge of the graph convolutional neural network through the characteristic processing, which is convenient for subsequent feature extraction through the graph convolutional neural network.

如前文所述，本發明實施例中的目標特徵即可以包括待提取文本整體上的視覺特徵，也可以包括待提取文本的文本字符的特徵。As mentioned above, the target feature in the embodiment of the present invention may include the visual feature of the text to be extracted as a whole, or the feature of the text character of the text to be extracted.

那麽，在一種可能的實現方式中，確定各目標區域的目標特徵，包括：確定目標區域中的像素數據，對像素數據進行特徵提取，得到視覺特徵；確定目標區域中的文本字符，對文本字符進行特徵提取，得到文本字符特徵；根據提取到的視覺特徵和字符特徵，確定目標區域的目標特徵。Then, in a possible implementation manner, determining the target characteristics of each target area includes: determining the pixel data in the target area, extracting the pixel data to obtain the visual characteristics; determining the text characters in the target area, and comparing the text characters Perform feature extraction to obtain text character features; according to the extracted visual features and character features, determine the target features of the target area.

其中，視覺特徵可以反映目標區域中文本在整體上的視覺訊息。在提取視覺特徵時，具體可以透過感興趣區域對齊（Region of Interest Align，RoI Align）方法來提取，對於具體提取視覺特徵的方式，本發明不作限制。Among them, the visual features can reflect the overall visual information of the text in the target area. When extracting the visual features, it can be extracted through the Region of Interest Align (RoI Align) method. The specific method of extracting the visual features is not limited in the present invention.

本發明實施例中，考慮到圖像中會存在由於拍照視角、光線、遮擋等原因帶來的干擾因素，因此，透過文字檢測識別通常會有較多的誤識，即可能會識別出錯誤的文本字符，這可能會影響關鍵訊息提取的準確性。而透過視覺訊息的提取，將視覺訊息考慮到關鍵訊息提取中，會降低文本誤識對關鍵訊息提取的影響。即使文本識別錯誤，但由於視覺訊息不會改變太大，因此二者結合能夠提高關鍵訊息提取結果的準確性。In the embodiment of the present invention, considering that there will be interference factors in the image due to the camera angle, light, occlusion, etc., there will usually be more misunderstandings through text detection and recognition, that is, wrong recognition may be recognized. Text characters, which may affect the accuracy of key information extraction. Through the extraction of visual information, the visual information is taken into account in the extraction of key information, which will reduce the influence of text misunderstanding on the extraction of key information. Even if the text recognition is wrong, the visual information will not change too much, so the combination of the two can improve the accuracy of the key information extraction results.

確定目標區域中的文本字符時，可以透過文字識別技術對文本字符進行識別提取。例如，可以透過光學字符識別技術（Optical Character Recognition，OCR）對文本字符進行特徵提取，得到文本字符。對於具體提取文本字符的方式，本發明不做限制。When determining the text characters in the target area, the text characters can be recognized and extracted through text recognition technology. For example, optical character recognition technology (Optical Character Recognition, OCR) can be used to extract features of text characters to obtain text characters. The present invention does not limit the specific method of extracting text characters.

在一種可能的實現方式中，對所述文本字符進行特徵提取，得到字符特徵，包括：透過獨熱（one-hot）編碼的方式將文本字符映射到一個低維特徵空間；然後透過雙向的長短時序網路（Bi-LSTM）對低維特徵空間中的文本字符進行處理，得到文本的特徵表示，即得到了待提取文本的字符特徵。In a possible implementation manner, performing feature extraction on the text characters to obtain character features includes: mapping the text characters to a low-dimensional feature space through one-hot encoding; and then through bidirectional length Time series network (Bi-LSTM) processes the text characters in the low-dimensional feature space to obtain the feature representation of the text, that is, obtain the character features of the text to be extracted.

透過獨熱編碼，可以將離散特徵（文本字符）的取值擴展到歐式空間，離散特徵的某個取值對應於歐式空間中的某個點，會讓特徵之間的計算更加合理。Through one-hot encoding, the value of discrete features (text characters) can be extended to Euclidean space. A certain value of discrete features corresponds to a point in Euclidean space, which makes the calculation between features more reasonable.

在一種可能的實現方式中，根據提取到的視覺特徵和字符特徵，確定目標區域的目標特徵，包括：將所述視覺特徵和字符特徵賦予不同的權重；對賦予權重後的所述視覺特徵和字符特徵進行融合（例如相加），得到目標區域的目標特徵。In a possible implementation manner, determining the target feature of the target area according to the extracted visual features and character features includes: assigning different weights to the visual features and character features; and assigning weights to the visual features and The character features are merged (for example, added) to obtain the target feature of the target area.

考慮到視覺特徵和字符特徵對提取結果的影響可能不同，因此這裏透過對視覺特徵和字符特徵賦予不同的權重，來提高提取結果的準確性。這裏的權重可以是透過網路訓練進行優化得到的，具體訓練過程詳見後文描述，此處不做贅述。Considering that visual features and character features may have different effects on the extraction results, hereby assigning different weights to visual features and character features to improve the accuracy of the extraction results. The weights here can be optimized through network training. The specific training process is described in detail later, and will not be repeated here.

爲便於理解對文字字符進行特徵化處理的過程，下面透過具體的表達式來說明特徵化處理後的字符特徵。In order to facilitate the understanding of the process of characterizing text characters, the following uses specific expressions to illustrate the character characteristics after the characterization process.

對於一個待提取文本而言，對文本字符s _i 進行特徵提取得到字符特徵t _i 的過程可表示爲可用公式（7）。 For a text to be extracted, _{the process of performing feature extraction on the text character s i} to obtain the character feature t _i can be expressed as the available formula (7).

（7）

(7)

其中，W∈R ^C×D 表示獨熱編碼的投影矩陣，Bi-LSTM表示透過雙向長短時序網路對獨熱編碼後的文本字符進行處理， s _i ^j 表示文本字符 s _i 中的第 j個字符。 Wherein, W∈R ^{C × D} represents a hot encoded projection matrix, Bi-LSTM showing processing of text characters after hot encoded sequence via a bidirectional network length, s _i ^j denotes the j-th text characters of s _i character.

透過對字符特徵t _i 賦予權重 α _i ，對視覺特徵v _i 賦予權重（1 - α _i ），得到目標特徵n _i 具體詳見公式（8）和（9）。 By assigning a weight α _i to the character feature t _{i and a} weight (1- α _i ) to the visual feature v _i , the target feature n _{i can} be obtained by referring to formulas (8) and (9).

（8）

(8)

（9）

(9)

其中，W _t ∈R ^1×Dt 和W _v ∈R ^1×Dv 爲一維投影矩陣，具體可以透過網路訓練進行優化得到， σ爲激勵函數。U _t ∈R ^Dh×Dt 和U _v ∈R ^Dh×Dt 爲投影參數，也可以透過網路訓練進行優化得到。 Among them, W _t ∈ R ^1×Dt and W _v ∈ R ^1×Dv are one-dimensional projection matrices, which can be optimized through network training, and σ is the excitation function. U _t ∈R ^Dh×Dt and U _v ∈R ^Dh×Dt are projection parameters, which can also be optimized through network training.

在得到目標特徵n _i 和相對位置特徵 e _ij 後，即可透過圖卷積神經網路，對相對位置特徵和目標特徵進行特徵提取。 After the target feature n _i and the relative location feature e _ij are obtained, the relative location feature and the target feature can be extracted through the graph convolutional neural network.

在一種可能的實現方式中，透過圖卷積神經網路，對相對位置特徵和目標特徵進行特徵提取，得到提取後的特徵，包括：以各目標特徵爲圖的節點，以各相對位置特徵爲連接兩個節點的邊，構建連通圖；透過圖卷積神經網路，對連通圖進行疊代更新，將疊代更新後滿足收斂條件的連通圖作爲提取後的特徵。In a possible implementation method, the relative position feature and the target feature are extracted through the graph convolutional neural network, and the extracted features are obtained, including: taking each target feature as the node of the graph, and taking each relative position feature as Connect the edges of the two nodes to construct a connected graph; through the graph convolutional neural network, the connected graph is iteratively updated, and the connected graph that satisfies the convergence condition after the iterative update is used as the extracted feature.

在將目標區域的相對位置特徵作爲連接兩個節點的邊構建連通圖時，會將相對位置特徵作爲節點之間的鄰接矩陣的一個參數，當然鄰接矩陣中還可以包含節點的語義相似性等其它參數，本發明對其它參數的具體設置不作限制。When constructing a connected graph using the relative position feature of the target area as the edge connecting two nodes, the relative position feature will be used as a parameter of the adjacency matrix between the nodes. Of course, the adjacency matrix can also include the semantic similarity of the nodes and other things. Parameters, the present invention does not limit the specific settings of other parameters.

請參閱圖2，爲本發明提供的一種連通圖的示意圖，該連通圖中，圖的節點爲各目標特徵，連接兩個節點的邊爲目標區域的相對位置特徵。Please refer to FIG. 2, which is a schematic diagram of a connected graph provided by the present invention. In the connected graph, the nodes of the graph are target features, and the edge connecting two nodes is the relative position feature of the target area.

本發明實施例構建的連通圖中，既包含了圖像中的目標特徵，也包含了圖像中目標特徵之間的相對位置特徵，可以從整體上表徵圖像中文字的特徵，因此能夠提高關鍵訊息提取結果的準確性。The connected graph constructed by the embodiment of the present invention includes not only the target features in the image, but also the relative position features between the target features in the image, which can characterize the characteristics of the text in the image as a whole, and therefore can improve The accuracy of key information extraction results.

在構建好連通圖後，可以透過圖卷積神經網路，對連通圖進行疊代更新，將疊代更新後滿足收斂條件的連通圖作爲提取後的特徵。每次疊代過程中，任意節點i的特徵，是透過與節點i相連的各節點的鄰接矩陣，對各節點的特徵值進行投影來更新的，在經過多次疊代後，各節點的特徵值將不會再隨疊代次數的增加而變化，即節點的特徵值保持不變，此時即可視爲滿足收斂條件，滿足收斂條件的連通圖即可作爲提取後的特徵。After the connected graph is constructed, the connected graph can be iteratively updated through the graph convolutional neural network, and the connected graph that satisfies the convergence condition after the iterative update is used as the extracted feature. In each iteration process, the feature of any node i is updated by projecting the feature value of each node through the adjacency matrix of each node connected to node i. After multiple iterations, the feature of each node The value will no longer change with the increase of the number of iterations, that is, the eigenvalue of the node remains unchanged. At this time, it can be regarded as meeting the convergence condition, and the connected graph meeting the convergence condition can be used as the extracted feature.

爲便於理解，第 l+1次疊代時節點N的特徵N ^l+1 的表達式如下： For ease of understanding, the expression of the feature N ^l+1 of node N in the l +1 iteration is as follows:

（10）

(10)

其中，N ^l 爲第 l次疊代時節點N的特徵，W ^l 爲轉換矩陣，可以透過網路訓練進行優化得到，A ^l 爲節點的鄰接矩陣，節點 i和 j的鄰接矩陣 A ^l _ij 的表達式如下： Among them, N ^l is the feature of node N in the lth ^{iteration, W l} is the conversion matrix, which can be optimized through network training, A ^l is the adjacency matrix of the node, and the adjacency matrix A ^l _ij of the nodes i and j The expression is as follows:

（11）

(11)

（12）

(12)

其中，(n ^l _i ) ^T表示n ^l _i 的轉置，

表示歸一化參數，可以透過網路訓練進行優化得到。 Among them, (n ^l _i ) ^T represents the transposition of ^{n l} _i,

Indicates normalized parameters, which can be optimized through network training.

在得到提取後的特徵後，在一種可能的實現方式中，根據提取後的特徵，確定待提取文本對應的字段，包括：根據預先定義的多個預設類別，對圖卷積神經網路輸出的連通圖中的節點進行分類，得到節點的類別，所述預設類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；根據所述節點的類別，確定待提取文本對應於預設字段的標識或字段值。After the extracted features are obtained, in a possible implementation manner, according to the extracted features, determine the field corresponding to the text to be extracted, including: outputting the image convolutional neural network according to a plurality of predefined categories The nodes in the connected graph of the node are classified to obtain the category of the node. The preset category includes: the category of the identifier that characterizes the text belonging to the preset field, and the category of the field value that characterizes the text belonging to the preset field; Type, to determine the identifier or field value of the preset field corresponding to the text to be extracted.

由於識別到的文本中，即可能有表徵預設字段的標識的文本，也可能有表徵預設字段的字段值的文本。表徵預設字段的標識的文本，爲圖像中用來指示字段值屬哪個字段的文本，而字段值是字段下的具體值，例如，對於預設字段“總價”而言，圖像中識別出的文本“總價”、“總價格”或者“sub total”等，都是預設字段“總價”的一種具體的標識；對於識別出的文本“19.88元”、“¥：19.88”等，都是預設字段的字段值。As the recognized text, there may be text that characterizes the identifier of the preset field, and there may also be text that characterizes the field value of the preset field. The text representing the identifier of the preset field is the text in the image used to indicate which field the field value belongs to, and the field value is the specific value under the field. For example, for the preset field "Total Price", the image The recognized text "total price", "total price" or "sub total", etc., are all a specific identification of the preset field "total price"; for the recognized text "19.88 yuan" and "¥: 19.88" And so on, are the field values of the preset fields.

因此，對於某一個預設字段，可以設置2個類別分別對應該預設字段，其中，1個類別是表徵文本屬預設字段的標識的類別，1個類別是表徵文本屬預設字段的字段值的類別。當有多個不同的預設字段時，每一預設字段均可以設置2個類別，如此便會有多個表徵文本屬預設字段的標識的類別，以及多個表徵文本屬預設字段的字段值的類別。Therefore, for a certain preset field, two categories can be set to correspond to the preset field respectively. Among them, one category is the category that characterizes the text belonging to the preset field, and the other category is the field that characterizes the text belongs to the preset field. The category of the value. When there are multiple different preset fields, each preset field can be set to 2 categories, so there will be multiple characterizing texts belonging to the preset field’s identification category, and multiple characterizing texts belonging to the preset field. The category of the field value.

例如，在針對商品購物小票進行識別時，預設字段可設置爲“名稱”、“地址”、“電話號碼”、“日期”、“時間”、“商品類目”、“商品名稱”、“商品單價”、“單品總價”、“稅費”、“合計總價”、“提示”，共計12個預設字段，那麽可以預設24個類別，分別表示各預設字段的預設字段標識，以及各預設字段的字段值。另外，還可以設置類別“其它”，以將不屬上述類別的文本進行區分提取，即共計設置25個類別。For example, when identifying a product shopping receipt, the preset fields can be set to "name", "address", "phone number", "date", "time", "product category", "product name", "Commodity unit price", "Single product total price", "Taxes", "Total total price", "Reminder", a total of 12 preset fields, then 24 categories can be preset, which respectively indicate the preset value of each preset field. Set the field identifier and the field value of each preset field. In addition, the category "Others" can be set to distinguish and extract texts that do not belong to the above categories, that is, a total of 25 categories are set.

上述舉例中的25個具體預設類別示例如下：Examples of the 25 specific preset categories in the above examples are as follows:

名稱-標識；名稱-字段值；地址-標識；地址-字段值；電話號碼-標識；電話號碼-字段值；日期-標識；日期-字段值；時間-標識；時間-字段值；商品類目-標識；商品類目-字段值；商品名稱-標識；商品名稱-字段值；商品單價-標識；商品單價-字段值；單品總價-標識；單品總價-字段值；稅費-標識；稅費-字段值；合計總價-標識；合計總價-字段值；提示-標識；提示-字段值；其它。Name-identification; name-field value; address-identification; address-field value; telephone number-identification; telephone number-field value; date-identification; date-field value; time-identification; time-field value; product category -Identification; product category-field value; product name-identification; product name-field value; product unit price-identification; product unit price-field value; single product total price-identification; single product total price-field value; tax- Identification; Taxes-Field Value; Total Total Price-Identification; Total Total Price-Field Value; Prompt-Identification; Prompt-Field Value; Others.

在一種可能的實現方式中，本發明實施例的圖像處理方法可透過預先構建的分類網路實現，該分類網路的訓練步驟如下：In a possible implementation manner, the image processing method of the embodiment of the present invention can be implemented through a pre-built classification network, and the training steps of the classification network are as follows:

將樣本圖像輸入所述分類網路中處理，得到樣本圖像中待提取文本的第一預測類別，以及所述第一預測類別中各個類別之間的對應關係；Inputting the sample image into the classification network for processing, to obtain the first prediction category of the text to be extracted in the sample image, and the corresponding relationship between each category in the first prediction category;

根據所述第一預測類別，以及所述樣本圖像的標注類別，訓練所述分類網路，所述標注類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；Train the classification network according to the first prediction category and the label category of the sample image. The label category includes: the category of the identifier that characterizes the text belonging to the preset field, and the category of the character that characterizes the text belonging to the preset field. The category of the field value;

根據所述對應關係，以及標注的標注類別之間的對應關係，訓練所述分類網路。Training the classification network according to the corresponding relationship and the corresponding relationship between the labeled annotation categories.

該分類網路可用於實現本發明的圖像處理技術，該分類網路中可以包含前文所述的圖卷積神經網路，另外，爲實現本發明的各項功能，該分類網路中還可以包含其它網路，例如，Bi-LSTM網路，對於本發明的分類網路中包含的網路，可根據本發明實施例的具體應用場景而定，本發明對此不做限制。The classification network can be used to implement the image processing technology of the present invention. The classification network can include the graph convolutional neural network described above. In addition, in order to implement the functions of the present invention, the classification network also includes Other networks may be included, for example, a Bi-LSTM network. The networks included in the classification network of the present invention may be determined according to the specific application scenarios of the embodiments of the present invention, and the present invention does not limit this.

請參閱圖3，爲本申請提供的一種分類網路的具體實現方式的結構示意圖，網路中包含目標特徵提取模組、相對位置特徵提取模組、卷積網路特徵提取模組、分類模組。透過目標特徵提取模組提取包含待提取文本的圖像的目標特徵，透過相對位置特徵提取模組提取圖像的相對位置特徵；將目標特徵和相對位置特徵輸入至卷積網路特徵提取模組，進行疊代更新，得到疊代提取後的特徵；然後透過分類模組對疊代提取後的特徵進行分類，得到節點的預測類別。由於類別表徵與待提取文本對應的字段，在根據提取後的特徵確定了待提取文本的類別後，即確定了待提取文本對應的字段。對於各個模組具體功能的實現，請參考本發明中的相關論述，此處不做贅述。Please refer to Figure 3, which is a schematic structural diagram of a specific implementation of a classification network provided for this application. The network includes a target feature extraction module, a relative position feature extraction module, a convolutional network feature extraction module, and a classification model. Group. Extract the target feature of the image containing the text to be extracted through the target feature extraction module, and extract the relative position feature of the image through the relative position feature extraction module; input the target feature and relative position feature to the convolutional network feature extraction module , Perform iterative update to obtain the features extracted by the iterative; then classify the features extracted by the iterative through the classification module to obtain the predicted category of the node. Since the category characterizes the field corresponding to the text to be extracted, after the category of the text to be extracted is determined according to the extracted features, the field corresponding to the text to be extracted is determined. For the realization of the specific functions of each module, please refer to the relevant discussion in the present invention, which will not be repeated here.

在上述訓練過程中，標注類別可以是上文所述的預設類別，此處不再贅述。In the above training process, the label category may be the preset category described above, which will not be repeated here.

在根據所述第一預測類別，以及所述樣本圖像的標注類別，訓練所述分類網路時，可以根據第一預測類別相對於標注類別的損失，調整分類網路中的參數，以使分類網路對樣本圖像的預測類別與標注類別之間的差異最小。When training the classification network according to the first prediction category and the label category of the sample image, the parameters in the classification network can be adjusted according to the loss of the first prediction category relative to the label category, so that The classification network has the smallest difference between the predicted category and the labeled category of the sample image.

此外，在訓練時，利用兩個文本是否分別屬同一預設字段的標識和標識值，對分類網路的分類準確度也有益處。爲便於後續描述，這裏將分別屬同一預設字段的標識和標識值的兩個文本稱爲字段對，例如：文本“總價”和“19.88元”構成字段對。In addition, during training, using the identification and identification value of whether the two texts belong to the same preset field is also beneficial to the classification accuracy of the classification network. For the convenience of the subsequent description, the two texts that respectively belong to the identifier and the identifier value of the same preset field are referred to as a field pair. For example, the text "total price" and "19.88 yuan" constitute a field pair.

因此，在訓練所述分類網路時，分類網路還會輸出第一預測類別中各個類別之間的對應關係，同時，樣本圖像中也會對文本之間的對應關係進行標注。那麽，便可以根據分類網路輸出的對應關係，以及標注的待提取文本之間的對應關係，訓練所述分類網路。Therefore, when training the classification network, the classification network will also output the correspondence between each category in the first prediction category, and at the same time, the correspondence between the texts will also be marked in the sample image. Then, the classification network can be trained according to the correspondence between the output of the classification network and the correspondence between the labeled texts to be extracted.

訓練時所使用的損失函數具體可以是交叉熵損失函數（Cross Entropy Loss，CE），具體的損失函數可以根據實際需求選擇，本發明對此不作具體限定。The loss function used during training may specifically be a cross entropy loss function (Cross Entropy Loss, CE), and the specific loss function may be selected according to actual requirements, which is not specifically limited in the present invention.

根據本發明的實施例，訓練後的分類網路可用於在文字關鍵訊息提取時確定待提取文本對應的字段，具體詳見本發明提供的實施例，由於在訓練時利用了待提取文本之間的對應關係，因此訓練得到的分類網路在對沒有適配模板的圖像進行文本提取時，準確性較高。According to the embodiment of the present invention, the trained classification network can be used to determine the field corresponding to the text to be extracted during the extraction of key text messages. For details, please refer to the embodiment provided by the present invention. Therefore, the trained classification network has higher accuracy when extracting text from images without a suitable template.

在一種可能的實現方式中，識別的圖像包括下述至少一種：收據圖像、發票圖像、名片圖像。當然，在實際應用中，本發明的實施例也可用於對其它圖像的識別，本發明對此不作具體限定。In a possible implementation manner, the recognized image includes at least one of the following: a receipt image, an invoice image, and a business card image. Of course, in practical applications, the embodiments of the present invention can also be used to recognize other images, and the present invention does not specifically limit this.

根據本發明的實施例，在進行文本提取時，不僅利用目標區域中的文本字符特徵，還利用了目標區域的視覺特徵，降低了文本字符誤識對最後分類的影響，提高了文本提取時的準確性；另外，透過建立了文本區域之間的空間位置聯繫，可以不依賴於預先設計好的模板，可以處理未見過的模板，有更好的拓展性。According to the embodiment of the present invention, when text extraction is performed, not only the text character features in the target area are used, but also the visual features of the target area are used, which reduces the influence of misrecognition of text characters on the final classification and improves the performance of text extraction. Accuracy; In addition, by establishing the spatial position relationship between the text areas, you can deal with unseen templates without relying on pre-designed templates, and have better scalability.

在一種可能的實現方式中，所述圖像處理方法可以由終端設備或伺服器等電子設備執行，終端設備可以爲用戶設備（User Equipment，UE）、行動設備、用戶終端、終端、行動電話、無線電話、個人數位助理（Personal Digital Assistant，PDA）、手持設備、計算設備、車載設備、可穿戴設備等，所述方法可以透過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。或者，可透過伺服器執行所述方法。In a possible implementation, the image processing method can be executed by electronic equipment such as a terminal device or a server, and the terminal device can be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a mobile phone, Wireless phones, personal digital assistants (Personal Digital Assistants, PDAs), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the method can be implemented by a processor calling computer-readable instructions stored in a memory. Alternatively, the method can be executed through a server.

可以理解，本發明提及的上述各個方法實施例，在不違背原理邏輯的情況下，均可以彼此相互結合形成結合後的實施例，限於篇幅，本發明不再贅述。本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。It can be understood that the various method embodiments mentioned in the present invention can be combined with each other to form a combined embodiment without violating the principle and logic. The length is limited, and the present invention will not be repeated. Those skilled in the art can understand that, in the above method of the specific implementation, the specific execution order of each step should be determined by its function and possible internal logic.

此外，本發明還提供了圖像處理裝置、電子設備、電腦可讀儲存媒體、程式，上述均可用來實現本發明提供的任一種圖像處理方法，相應技術方案和描述參見方法部分的相應記載，不再贅述。In addition, the present invention also provides image processing devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image processing method provided by the present invention. For the corresponding technical solutions and descriptions, please refer to the corresponding records in the method section. ,No longer.

圖4示出根據本發明實施例的圖像處理裝置的方塊圖，如圖4所示，所述圖像處理裝置20包括：FIG. 4 shows a block diagram of an image processing device according to an embodiment of the present invention. As shown in FIG. 4, the image processing device 20 includes:

識別模組21，用於對圖像進行識別，確定所述圖像中的多個目標區域，所述目標區域爲待提取文本所在區域；The recognition module 21 is used for recognizing an image and determining multiple target areas in the image, where the target area is the area where the text to be extracted is located;

相對位置特徵確定模組22，用於確定所述圖像中各目標區域之間的相對位置特徵；The relative position feature determining module 22 is used to determine the relative position feature between each target area in the image;

目標特徵確定模組23，用於確定各所述目標區域的目標特徵，所述目標特徵包括所述待提取文本的特徵；The target feature determining module 23 is configured to determine the target feature of each target area, where the target feature includes the feature of the text to be extracted;

圖卷積模組24，用於透過圖卷積神經網路，對所述相對位置特徵和所述目標特徵進行特徵提取，得到提取後的特徵；The graph convolution module 24 is configured to perform feature extraction on the relative position feature and the target feature through the graph convolution neural network to obtain the extracted feature;

字段確定模組25，用於根據提取後的特徵，確定所述待提取文本對應的字段。The field determination module 25 is used to determine the field corresponding to the text to be extracted according to the extracted features.

在一種可能的實現方式中，圖卷積模組24包括：第一圖卷積子模組和第二圖卷積子模組，其中：In a possible implementation manner, the graph convolution module 24 includes: a first graph convolution sub-module and a second graph convolution sub-module, wherein:

第一圖卷積子模組，用於以各所述目標特徵爲圖的節點，以各所述相對位置特徵爲連接兩個節點的邊，構建連通圖；The first graph convolution submodule is used to construct a connected graph by taking each of the target features as the nodes of the graph, and using each of the relative position features as the edges connecting the two nodes;

第二圖卷積子模組，用於透過圖卷積神經網路，對所述連通圖進行疊代更新，將疊代更新後滿足收斂條件的連通圖作爲提取後的特徵。The second graph convolution submodule is used to iteratively update the connected graph through the graph convolutional neural network, and use the connected graph that satisfies the convergence condition after the iterative update as the extracted feature.

在一種可能的實現方式中，字段確定模組25包括：第一字段確定子模組和第二字段確定子模組，其中：In a possible implementation manner, the field determination module 25 includes: a first field determination sub-module and a second field determination sub-module, wherein:

第一字段確定子模組，用於根據預先定義的多個預設類別，對圖卷積神經網路輸出的連通圖中的節點進行分類，得到節點的類別，所述預設類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；The first field determines the sub-module, which is used to classify the nodes in the connected graph output by the graph convolutional neural network according to a plurality of pre-defined preset categories to obtain the category of the node, and the preset category includes: characterization The type of the identifier of the text belonging to the preset field, and the type of the field value representing the text belonging to the preset field;

第二字段確定子模組，用於根據所述節點的類別，確定待提取文本對應於預設字段的標識或字段值。The second field determination submodule is used to determine the identifier or field value of the preset field corresponding to the text to be extracted according to the type of the node.

在一種可能的實現方式中，相對位置特徵確定模組22包括：第一相對位置特徵確定子模組和第二相對位置特徵確定子模組，其中：In a possible implementation manner, the relative position feature determination module 22 includes: a first relative position feature determination sub-module and a second relative position feature determination sub-module, wherein:

第一相對位置特徵確定子模組，用於確定圖像中的第一目標區域和第二目標區域的相對位置參數；The first relative position feature determining sub-module is used to determine the relative position parameters of the first target area and the second target area in the image;

第二相對位置特徵確定子模組，用於對所述相對位置參數進行特徵化處理，得到第一目標區域和第二目標區域的相對位置特徵。The second relative position feature determination sub-module is used to perform characterization processing on the relative position parameters to obtain the relative position features of the first target area and the second target area.

在一種可能的實現方式中，所述相對位置參數包括下述至少一種：In a possible implementation manner, the relative position parameter includes at least one of the following:

所述第一目標區域的寬高比；The aspect ratio of the first target area;

所述第二目標區域的寬高比；The aspect ratio of the second target area;

所述第一目標區域和所述第二目標區域的相對尺寸關係。The relative size relationship between the first target area and the second target area.

在一種可能的實現方式中，目標特徵確定模組23，包括第一目標特徵確定子模組、第二目標特徵確定子模組和第三目標特徵確定子模組，其中：In a possible implementation manner, the target feature determination module 23 includes a first target feature determination sub-module, a second target feature determination sub-module, and a third target feature determination sub-module, wherein:

第一目標特徵確定子模組，用於確定目標區域中的像素數據，對所述像素數據進行特徵提取，得到視覺特徵；The first target feature determination sub-module is used to determine pixel data in the target area, and perform feature extraction on the pixel data to obtain visual features;

第二目標特徵確定子模組，用於確定目標區域中的文本字符，對所述文本字符進行特徵提取，得到字符特徵；The second target feature determination submodule is used to determine text characters in the target area, and perform feature extraction on the text characters to obtain character features;

第三目標特徵確定子模組，用於根據提取到的視覺特徵和字符特徵，確定目標區域的目標特徵。The third target feature determination sub-module is used to determine the target feature of the target area according to the extracted visual features and character features.

在一種可能的實現方式中，所述裝置透過預先構建的分類網路實現，所述裝置還包括：In a possible implementation manner, the device is implemented through a pre-built classification network, and the device further includes:

第一訓練模組，用於將樣本圖像輸入所述分類網路中處理，得到樣本圖像中待提取文本的第一預測類別，以及所述第一預測類別中各個類別之間的對應關係；The first training module is used to input the sample image into the classification network for processing to obtain the first prediction category of the text to be extracted in the sample image, and the corresponding relationship between each category in the first prediction category ；

第二訓練模組，用於根據所述第一預測類別，以及所述樣本圖像的標注類別，訓練所述分類網路，所述標注類別包括：表徵文本屬預設字段的標識的類別，以及表徵文本屬預設字段的字段值的類別；The second training module is used to train the classification network according to the first prediction category and the label category of the sample image, and the label category includes: a category that characterizes a text belonging to an identifier of a preset field, And the type of the field value representing the text belonging to the preset field;

第三訓練模組，用於根據所述對應關係，以及標注的待提取文本之間的對應關係，訓練所述分類網路。The third training module is used to train the classification network according to the corresponding relationship and the corresponding relationship between the labeled texts to be extracted.

在一些實施例中，本發明實施例提供的裝置具有的功能或包含的模組可以用於執行上文方法實施例描述的方法，其具體實現可以參照上文方法實施例的描述，爲了簡潔，這裏不再贅述。In some embodiments, the functions or modules contained in the device provided by the embodiments of the present invention can be used to execute the methods described in the above method embodiments. For specific implementation, refer to the description of the above method embodiments. For brevity, I won't repeat it here.

本發明實施例還提出一種電腦可讀儲存媒體，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。電腦可讀儲存媒體可以是揮發性電腦可讀儲存媒體或非揮發性電腦可讀儲存媒體。An embodiment of the present invention also provides a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor. The computer-readable storage medium may be a volatile computer-readable storage medium or a non-volatile computer-readable storage medium.

本發明實施例還提出一種電子設備，包括：處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置爲調用所述記憶體儲存的指令，以執行上述方法。An embodiment of the present invention also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.

本發明實施例還提供了一種電腦程式産品，包括電腦可讀代碼，當電腦可讀代碼在設備上運行時，設備中的處理器執行用於實現如上任一實施例提供的圖像處理方法的指令。The embodiment of the present invention also provides a computer program product, including computer readable code. When the computer readable code runs on the device, the processor in the device executes the image processing method provided in any of the above embodiments. instruction.

本發明實施例還提供了另一種電腦程式産品，用於儲存電腦可讀指令，指令被執行時使得電腦執行上述任一實施例提供的圖像處理方法的操作。The embodiment of the present invention also provides another computer program product for storing computer-readable instructions, which when executed, cause the computer to perform the operation of the image processing method provided in any of the above-mentioned embodiments.

電子設備可以被提供爲終端、伺服器或其它形態的設備。Electronic devices can be provided as terminals, servers, or other types of devices.

圖5示出根據本發明實施例的一種電子設備800的方塊圖。例如，電子設備800可以是行動電話，電腦，數位廣播終端，訊息收發設備，遊戲控制台，平板設備，醫療設備，健身設備，個人數位助理等終端。FIG. 5 shows a block diagram of an electronic device 800 according to an embodiment of the present invention. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.

參照圖5，電子設備800可以包括以下一個或多個組件：處理組件802，記憶體804，電源組件806，多媒體組件808，音訊組件810，輸入/輸出（I/O）的介面812，感測器組件814，以及通訊組件816。5, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor The device component 814, and the communication component 816.

處理組件802通常控制電子設備800的整體操作，諸如與顯示，電話呼叫，數據通訊，相機操作和記錄操作相關聯的操作。處理組件802可以包括一個或多個處理器820來執行指令，以完成上述的方法的全部或部分步驟。此外，處理組件802可以包括一個或多個模組，便於處理組件802和其他組件之間的交互。例如，處理組件802可以包括多媒體模組，以方便多媒體組件808和處理組件802之間的交互。The processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method. In addition, the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.

記憶體804被配置爲儲存各種類型的數據以支持在電子設備800的操作。這些數據的示例包括用於在電子設備800上操作的任何應用程式或方法的指令，連絡人數據，電話簿數據，訊息，圖像，視訊等。記憶體804可以由任何類型的揮發性或非揮發性儲存設備或者它們的組合實現，如靜態隨機存取記憶體（SRAM），電子可抹除可程式化唯讀記憶體（EEPROM），可抹除可程式化唯讀記憶體（EPROM），可程式化唯讀記憶體（PROM），唯讀記憶體（ROM），磁記憶體，快閃記憶體，磁碟或光碟。The memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method operated on the electronic device 800, contact data, phone book data, messages, images, videos, etc. The memory 804 can be realized by any type of volatile or non-volatile storage devices or their combination, such as static random access memory (SRAM), electronically erasable programmable read-only memory (EEPROM), and erasable In addition to programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, floppy disk or optical disk.

電源組件806爲電子設備800的各種組件提供電力。電源組件806可以包括電源管理系統，一個或多個電源，及其他與爲電子設備800生成、管理和分配電力相關聯的組件。The power supply component 806 provides power for various components of the electronic device 800. The power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.

多媒體組件808包括在所述電子設備800和用戶之間的提供一個輸出介面的螢幕。在一些實施例中，螢幕可以包括液晶顯示器（LCD）和觸控面板（TP）。如果螢幕包括觸控面板，螢幕可以被實現爲觸控螢幕，以接收來自用戶的輸入訊號。觸控面板包括一個或多個觸控感測器以感測觸控、滑動和觸控面板上的手勢。所述觸控感測器可以不僅感測觸控或滑動動作的邊界，而且還檢測與所述觸控或滑動操作相關的持續時間和壓力。在一些實施例中，多媒體組件808包括一個前置鏡頭和/或後置鏡頭。當電子設備800處於操作模式，如拍攝模式或視訊模式時，前置鏡頭和/或後置鏡頭可以接收外部的多媒體數據。每個前置鏡頭和後置鏡頭可以是一個固定的光學透鏡系統或具有焦距和光學變焦能力。The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor can not only sense the boundary of the touch or sliding action, but also detect the duration and pressure related to the touch or sliding operation. In some embodiments, the multimedia component 808 includes a front lens and/or a rear lens. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front lens and/or the rear lens can receive external multimedia data. Each front lens and rear lens can be a fixed optical lens system or have focal length and optical zoom capabilities.

音訊組件810被配置爲輸出和/或輸入音訊訊號。例如，音訊組件810包括一個麥克風（MIC），當電子設備800處於操作模式，如呼叫模式、記錄模式和語音識別模式時，麥克風被配置爲接收外部音訊訊號。所接收的音訊訊號可以被進一步儲存在記憶體804或經由通訊組件816發送。在一些實施例中，音訊組件810還包括一個揚聲器，用於輸出音訊訊號。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC). When the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive external audio signals. The received audio signal can be further stored in the memory 804 or sent via the communication component 816. In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.

I/O介面812爲處理組件802和周圍介面模組之間提供介面，上述周圍介面模組可以是鍵盤，滑鼠，按鈕等。這些按鈕可包括但不限於：主頁按鈕、音量按鈕、啓動按鈕和鎖定按鈕。The I/O interface 812 provides an interface between the processing component 802 and the surrounding interface modules. The surrounding interface modules may be keyboards, mice, buttons, and so on. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.

感測器組件814包括一個或多個感測器，用於爲電子設備800提供各個方面的狀態評估。例如，感測器組件814可以檢測到電子設備800的打開/關閉狀態，組件的相對定位，例如所述組件爲電子設備800的顯示器和小鍵盤，感測器組件814還可以檢測電子設備800或電子設備800一個組件的位置改變，用戶與電子設備800接觸的存在或不存在，電子設備800方位或加速/減速和電子設備800的溫度變化。感測器組件814可以包括接近感測器，被配置用來在沒有任何的物理接觸時檢測附近物體的存在。感測器組件814還可以包括光感測器，如CMOS或CCD圖像感測器，用於在成像應用中使用。在一些實施例中，該感測器組件814還可以包括加速度感測器，陀螺儀感測器，磁感測器，壓力感測器或溫度感測器。The sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation. For example, the sensor component 814 can detect the on/off state of the electronic device 800 and the relative positioning of the components. For example, the component is the display and the keypad of the electronic device 800. The sensor component 814 can also detect the electronic device 800 or The position of a component of the electronic device 800 changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact. The sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通訊組件816被配置爲便於電子設備800和其他設備之間有線或無線方式的通訊。電子設備800可以接入基於通訊標準的無線網路，如WiFi，2G或3G，或它們的組合。在一個示例性實施例中，通訊組件816經由廣播通道接收來自外部廣播管理系統的廣播訊號或廣播相關訊息。在一個示例性實施例中，所述通訊組件816還包括近場通訊（NFC）模組，以促進短程通訊。例如，在NFC模組可基於射頻識別（RFID）技術，紅外數據協會（IrDA）技術，超寬帶（UWB）技術，藍牙（BT）技術和其他技術來實現。The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related messages from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性實施例中，電子設備800可以被一個或多個應用專用積體電路（ASIC）、數位訊號處理器（DSP）、數位訊號處理設備（DSPD）、可程式化邏輯裝置（PLD）、現場可程式化邏輯閘陣列（FPGA）、控制器、微控制器、微處理器或其他電子元件實現，用於執行上述方法。In an exemplary embodiment, the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), On-site programmable logic gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are used to implement the above methods.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存媒體，例如包括電腦程式指令的記憶體804，上述電腦程式指令可由電子設備800的處理器820執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the above method.

圖6示出根據本發明實施例的一種電子設備1900的方塊圖。例如，電子設備1900可以被提供爲一伺服器。參照圖6，電子設備1900包括處理組件1922，其進一步包括一個或多個處理器，以及由記憶體1932所代表的記憶體資源，用於儲存可由處理組件1922的執行的指令，例如應用程式。記憶體1932中儲存的應用程式可以包括一個或一個以上的每一個對應於一組指令的模組。此外，處理組件1922被配置爲執行指令，以執行上述方法。FIG. 6 shows a block diagram of an electronic device 1900 according to an embodiment of the present invention. For example, the electronic device 1900 may be provided as a server. 6, the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932 for storing instructions that can be executed by the processing component 1922, such as application programs. The application program stored in the memory 1932 may include one or more modules each corresponding to a set of commands. In addition, the processing component 1922 is configured to execute instructions to perform the above-described methods.

電子設備1900還可以包括一個電源組件1926被配置爲執行電子設備1900的電源管理，一個有線或無線網路介面1950被配置爲將電子設備1900連接到網路，和一個輸入輸出（I/O）介面1958。電子設備1900可以操作基於儲存在記憶體1932的操作系統，例如Windows ServerTM，Mac OS XTM，UnixTM, LinuxTM，FreeBSDTM或類似。The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input and output (I/O) Interface 1958. The electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存媒體，例如包括電腦程式指令的記憶體1932，上述電腦程式指令可由電子設備1900的處理組件1922執行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as a memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the above method.

本發明可以是系統、方法和/或電腦程式産品。電腦程式産品可以包括電腦可讀儲存媒體，其上載有用於使處理器實現本發明的各個方面的電腦可讀程式指令。The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling the processor to implement various aspects of the present invention.

電腦可讀儲存媒體可以是可以保持和儲存由指令執行設備使用的指令的有形設備。電腦可讀儲存媒體例如可以是――但不限於――電儲存設備、磁儲存設備、光儲存設備、電磁儲存設備、半導體儲存設備或者上述的任意合適的組合。電腦可讀儲存媒體的更具體的例子（非窮舉的列表）包括：可攜式電腦盤、硬碟、隨機存取記憶體（RAM）、唯讀記憶體（ROM）、可抹除式可程式化唯讀記憶體（EPROM或閃存）、靜態隨機存取記憶體（SRAM）、可攜式壓縮磁碟唯讀記憶體（CD-ROM）、數位影音光碟（DVD）、記憶卡、軟碟、機械編碼設備、例如其上儲存有指令的打孔卡或凹槽內凸起結構、以及上述的任意合適的組合。這裏所使用的電腦可讀儲存媒體不被解釋爲瞬時訊號本身，諸如無線電波或者其他自由傳播的電磁波、透過波導或其他傳輸媒介傳播的電磁波（例如，透過光纖電纜的光脈衝）、或者透過電線傳輸的電訊號。The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples of computer-readable storage media (non-exhaustive list) include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), removable Programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital audio-visual disc (DVD), memory card, floppy disk , Mechanical encoding equipment, such as a punch card on which instructions are stored or a raised structure in a groove, and any suitable combination of the above. The computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires The transmitted electrical signal.

這裏所描述的電腦可讀程式指令可以從電腦可讀儲存媒體下載到各個計算/處理設備，或者透過網路、例如網際網路、區域網路、廣域網路和/或無線網路下載到外部電腦或外部儲存設備。網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換機、閘道電腦和/或邊緣伺服器。每個計算/處理設備中的網路介面卡或者網路介面從網路接收電腦可讀程式指令，並轉發該電腦可讀程式指令，以供儲存在各個計算/處理設備中的電腦可讀儲存媒體中。The computer-readable program instructions described here can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network Or external storage device. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network interface card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable storage in each computing/processing device In the media.

用於執行本發明操作的電腦程式指令可以是彙編指令、指令集架構（ISA）指令、機器指令、機器相關指令、微代碼、固件指令、狀態設置數據、或者以一種或多種程式化語言的任意組合編寫的源代碼或目標代碼，所述程式化語言包括面向對象的程式化語言—諸如Smalltalk、C++等，以及常規的過程式程式化語言—諸如“C”語言或類似的程式化語言。電腦可讀程式指令可以完全地在用戶電腦上執行、部分地在用戶電腦上執行、作爲一個獨立的套裝軟體執行、部分在用戶電腦上部分在遠端電腦上執行、或者完全在遠端電腦或伺服器上執行。在涉及遠端電腦的情形中，遠端電腦可以透過任意種類的網路—包括區域網路(LAN)或廣域網路(WAN)—連接到用戶電腦，或者，可以連接到外部電腦（例如利用網際網路服務提供商來透過網際網路連接）。在一些實施例中，透過利用電腦可讀程式指令的狀態訊息來個性化定制電子電路，例如可程式化邏輯電路、現場可程式化邏輯閘陣列（FPGA）或可程式化邏輯陣列（PLA），該電子電路可以執行電腦可讀程式指令，從而實現本發明的各個方面。The computer program instructions used to perform the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or any of one or more programming languages. Combined source code or object code, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages. Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on the remote computer, or entirely on the remote computer or Execute on the server. In the case of a remote computer, the remote computer can be connected to the user’s computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using the Internet). Internet service provider to connect via the Internet). In some embodiments, the electronic circuit is personalized by using the status information of the computer-readable program instructions, such as programmable logic circuit, field programmable logic gate array (FPGA) or programmable logic array (PLA), The electronic circuit can execute computer-readable program instructions to realize various aspects of the present invention.

這裏參照根據本發明實施例的方法、裝置（系統）和電腦程式産品的流程圖和/或方塊圖描述了本發明的各個方面。應當理解，流程圖和/或方塊圖的每個方塊以及流程圖和/或方塊圖中各方塊的組合，都可以由電腦可讀程式指令實現。Herein, various aspects of the present invention are described with reference to flowcharts and/or block diagrams of methods, devices (systems) and computer program products according to embodiments of the present invention. It should be understood that each block of the flowchart and/or block diagram and the combination of each block in the flowchart and/or block diagram can be implemented by computer-readable program instructions.

這些電腦可讀程式指令可以提供給通用電腦、專用電腦或其它可程式化數據處理裝置的處理器，從而生産出一種機器，使得這些指令在透過電腦或其它可程式化數據處理裝置的處理器執行時，産生了實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的裝置。也可以把這些電腦可讀程式指令儲存在電腦可讀儲存媒體中，這些指令使得電腦、可程式化數據處理裝置和/或其他設備以特定方式工作，從而，儲存有指令的電腦可讀媒體則包括一個製造品，其包括實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的各個方面的指令。These computer-readable program instructions can be provided to the processors of general-purpose computers, dedicated computers, or other programmable data processing devices to produce a machine that allows these instructions to be executed by the processors of the computer or other programmable data processing devices At this time, a device that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make the computer, the programmable data processing device and/or other equipment work in a specific manner, so that the computer-readable medium storing the instructions is It includes an article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagram.

也可以把電腦可讀程式指令加載到電腦、其它可程式化數據處理裝置、或其它設備上，使得在電腦、其它可程式化數據處理裝置或其它設備上執行一系列操作步驟，以産生電腦實現的過程，從而使得在電腦、其它可程式化數據處理裝置、或其它設備上執行的指令實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作。It is also possible to load computer-readable program instructions on a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to generate a computer realization In this way, instructions executed on a computer, other programmable data processing device, or other equipment realize the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

附圖中的流程圖和方塊圖顯示了根據本發明的多個實施例的系統、方法和電腦程式産品的可能實現的體系架構、功能和操作。在這點上，流程圖或方塊圖中的每個方塊可以代表一個模組、程式段或指令的一部分，所述模組、程式段或指令的一部分包含一個或多個用於實現規定的邏輯功能的可執行指令。在有些作爲替換的實現中，方塊中所標注的功能也可以以不同於附圖中所標注的順序發生。例如，兩個連續的方塊實際上可以基本並行地執行，它們有時也可以按相反的順序執行，這依所涉及的功能而定。也要注意的是，方塊圖和/或流程圖中的每個方塊、以及方塊圖和/或流程圖中的方塊的組合，可以用執行規定的功能或動作的專用的基於硬體的系統來實現，或者可以用專用硬體與電腦指令的組合來實現。The flowcharts and block diagrams in the accompanying drawings show the possible implementation architecture, functions, and operations of the system, method, and computer program product according to multiple embodiments of the present invention. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction includes one or more logic for implementing the specified Executable instructions for the function. In some alternative implementations, the functions marked in the block may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed basically in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions. It can be realized, or it can be realized by a combination of dedicated hardware and computer instructions.

該電腦程式産品可以具體透過硬體、軟體或其結合的方式實現。在一個可選實施例中，所述電腦程式産品具體體現爲電腦儲存媒體，在另一個可選實施例中，電腦程式産品具體體現爲軟體産品，例如軟體開發包(Software Development Kit，SDK)等等。The computer program product can be implemented through hardware, software, or a combination thereof. In an optional embodiment, the computer program product is specifically embodied as a computer storage medium. In another optional embodiment, the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.

以上已經描述了本發明的各實施例，上述說明是示例性的，並非窮盡性的，並且也不限於所披露的各實施例。在不偏離所說明的各實施例的範圍和精神的情況下，對於本技術領域的普通技術人員來說許多修改和變更都是顯而易見的。本文中所用術語的選擇，旨在最好地解釋各實施例的原理、實際應用或對市場中的技術的改進，或者使本技術領域的其它普通技術人員能理解本文披露的各實施例。The embodiments of the present invention have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the illustrated embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or improvements to technologies in the market of the embodiments, or to enable other ordinary skilled in the art to understand the embodiments disclosed herein.

20······· 圖像處理裝置 21······· 識別模組 22······· 相對位置特徵確定模組 23······· 目標特徵確定模組 24······· 圖卷積模組 25······· 字段確定模組 800····· 電子設備 802····· 處理組件 804····· 記憶體 806····· 電源組件 808····· 多媒體組件 810····· 音訊組件 812····· 輸入/輸出介面 814····· 感測器組件 816····· 通訊組件 820····· 處理器 1900··· 電子設備 1922··· 處理組件 1926··· 電源組件 1932··· 記憶體 1950··· 網路介面 1958··· 輸入/輸出介面 20······· Image Processing Device 21······· Identification Module 22······· Relative Position Feature Determination Module 23······· Target feature determination module 24······· Graph Convolution Module 25·······Field Confirmation Module 800····· Electronic Equipment 802·····Processing components 804·····Memory 806····· Power Supply 808····· Multimedia Components 810····· Audio Components 812····· Input/Output Interface 814····· Sensor Assembly 816····· Communication Components 820····· Processor 1900··· Electronic Equipment 1922··· Handling components 1926··· Power Supply 1932··· Memory 1950··· Network Interface 1958··· Input/Output Interface

此處的附圖被併入說明書中並構成本說明書的一部分，這些附圖示出了符合本發明的實施例，並與說明書一起用於說明本發明的技術方案。圖1示出根據本發明實施例的圖像處理方法的流程圖；圖2示出根據本發明實施例的一種連通圖的結構示意圖；圖3示出根據本發明實施例的一種分類網路的結構示意圖；圖4示出根據本發明實施例的一種圖像處理裝置的方塊圖；圖5示出根據本發明實施例的一種電子設備的方塊圖；及圖6示出根據本發明實施例的一種電子設備的方塊圖。 The drawings here are incorporated into the specification and constitute a part of the specification. These drawings show embodiments in accordance with the present invention and are used together with the specification to illustrate the technical solution of the present invention. Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present invention; Figure 2 shows a schematic structural diagram of a connected graph according to an embodiment of the present invention; Figure 3 shows a schematic structural diagram of a classification network according to an embodiment of the present invention; Fig. 4 shows a block diagram of an image processing apparatus according to an embodiment of the present invention; Figure 5 shows a block diagram of an electronic device according to an embodiment of the present invention; and Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present invention.

Claims

An image processing method, including: recognizing an image, determining multiple target regions in the image, where the target region is the region where the text to be extracted is located; determining the distance between the target regions in the image Determine the target features of each of the target regions, the target features include the features of the text to be extracted; through the graph convolutional neural network, the image is represented in the form of a connected graph, and the Feature extraction is performed on the relative position feature and the target feature to obtain the extracted feature; according to the extracted feature, the field corresponding to the text to be extracted is determined.

The method according to claim 1, wherein the feature extraction is performed on the relative position feature and the target feature through a graph convolutional neural network to obtain the extracted features, including: taking each of the target features as a graph Nodes, using each of the relative position features as the edges connecting the two nodes to construct a connected graph; through the graph convolutional neural network, the connected graph is iteratively updated, and the connected graph that satisfies the convergence condition after the iterative update As the extracted feature.

The method according to claim 2, wherein, according to the extracted features, determining the field corresponding to the text to be extracted includes: according to a plurality of pre-defined preset categories, the connected graph output by the graph convolutional neural network The nodes are classified to obtain the category of the node. The preset category includes: the category representing the identifier of the text belonging to the preset field, and the table The signature text belongs to the category of the field value of the preset field; according to the category of the node, it is determined that the text to be extracted corresponds to the identification or field value of the preset field.

The method according to claim 1, wherein determining the relative position characteristics between the target areas in the image includes: determining the relative position parameters of the first target area and the second target area in the image; The relative position parameters are characterized to obtain the relative position characteristics of the first target area and the second target area.

The method according to claim 4, wherein the relative position parameter includes at least one of the following: the horizontal distance and the vertical distance of the first target area relative to the second target area; the aspect ratio of the first target area; The aspect ratio of the second target area; the relative size relationship between the first target area and the second target area.

The method according to any one of claim 4 or 5, wherein, performing characterization processing on the relative position parameter to obtain the relative position characteristics of the first target area and the second target area includes: transmitting the relative position parameter through The sine-cosine transformation matrix is mapped to a D-dimensional space to obtain a D-dimensional eigenvector, where D is a positive integer; the D-dimensional eigenvector is converted into a 1-dimensional weight value through a preset weight matrix; The weight value is processed through the preset activation function to obtain the relative position feature.

The method according to claim 1, wherein determining the target feature of each target area includes: determining pixel data in the target area, performing feature extraction on the pixel data to obtain visual features; determining text characters in the target area , Perform feature extraction on the text characters to obtain character features; determine the target feature of the target area according to the extracted visual features and character features.

The method according to claim 7, wherein determining the target feature of the target area according to the extracted visual features and character features includes: assigning different weights to the visual features and character features; and assigning weights to the visual The feature and character feature are merged to obtain the target feature of the target area.

The method according to claim 1, wherein the method is implemented through a pre-built classification network, and the training steps of the classification network are as follows: input the sample image into the classification network for processing, and obtain the sample image The first prediction category of the text to be extracted, and the correspondence between each category in the first prediction category; training the classification network according to the first prediction category and the label category of the sample image, The labeling category includes: the category of the identifier that the characterizing text belongs to the preset field, and the category of the field value of the characterizing text belonging to the preset field; According to the corresponding relationship and the corresponding relationship between the labeled texts to be extracted, the classification network is trained.

An electronic device, comprising: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute any of the request items 1 to 9 The method described in one item.

A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions are executed by a processor to implement the method described in any one of request items 1 to 9.