WO2024041032A1 - Method and device for generating editable document based on non-editable graphics-text image - Google Patents
Method and device for generating editable document based on non-editable graphics-text image Download PDFInfo
- Publication number
- WO2024041032A1 WO2024041032A1 PCT/CN2023/092757 CN2023092757W WO2024041032A1 WO 2024041032 A1 WO2024041032 A1 WO 2024041032A1 CN 2023092757 W CN2023092757 W CN 2023092757W WO 2024041032 A1 WO2024041032 A1 WO 2024041032A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- editable
- features
- relationship
- elements
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000013145 classification model Methods 0.000 claims abstract description 71
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000001514 detection method Methods 0.000 claims description 36
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 8
- 239000013589 supplement Substances 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 230000001502 supplementing effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 6
- 102100032202 Cornulin Human genes 0.000 description 4
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000002591 computed tomography Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Definitions
- the present invention relates to the field of computer technology, and in particular to methods and devices for generating editable documents based on non-editable graphics and text images.
- the present invention provides a method and device for generating editable documents based on non-editable graphic and textual images to solve the problem in the prior art that saving uneditable graphic and textual data into editable documents is time-consuming and inefficient. Problem to achieve efficient and accurate conversion of non-editable graphics and text images into editable documents.
- a method for generating editable documents based on non-editable graphic and text images includes: obtaining non-editable graphic and text images; extracting contour features from the non-editable graphic and text images; and extracting contour features from the non-editable graphic and text images. Extract text features from edited graphic images; wherein the outline features include the shape, position, size and color within the outline; the text features include text box coordinates, text content, text color and font size ; Generate initial structured data based on the outline features and the text features; determine two of the non-editable graphic and text images based on the pre-trained element relationship classification model and the outline features and the text features.
- the relationship between elements wherein, the pre-trained element relationship classification model is trained based on a data set composed of the outline features or text features of the two elements, the outline features and/or text features, and the relationship labels of the two elements. Determine; supplement the initial structured data based on the relationship between the two elements to obtain the final structured data; generate an editable image corresponding to the non-editable graphic image based on the final structured data. Edit the document.
- the pre-trained element relationship classification model includes multiple pre-trained binary classification models.
- the pre-trained element relationship classification model and the contour features and the Determining the relationship between two elements in the non-editable graphic and text-based image includes: based on each pre-trained two-classification model and the contour feature and the text feature, determining the non-editable text feature.
- the classification result of the relationship between every two elements in the graphic image based on the classification result with the largest probability value among the multiple determined classification results, determine the relationship between every two elements in the non-editable graphic image Final classification result.
- the process of determining a pretrained element relationship classification model after training based on a data set consisting of outline features and/or text features of two elements and relationship labels of the two elements includes: obtaining multiple The outline features and text features of non-editable graphic and text images; determine the data set based on the outline features and/or text features corresponding to every two elements in each non-editable graphic and text image; and based on every two elements Determine the relationship label of each two elements; based on the relationship label of each two elements and their corresponding two-classification model, determine the samples of the data set as positive samples and negative samples; train the correspondence based on the positive samples and negative samples
- the two-classification model is obtained to obtain the pre-trained two-classification model.
- determining the text features in the non-editable graphic and text-based image based on text detection and text recognition methods includes: determining the non-editable graphic and text-based image based on a preset text box detection algorithm. The text boxes included in the image and their coordinates; based on the preset text recognition algorithm, determine the text content included in each text box; determine each text based on the coordinates of each text box and the pixel histogram in each text box The color of the text in the box; based on the coordinates of each text box, determines the font size within the text box.
- determining the contour features in the non-editable graphic and text-based image based on the contour detection and shape recognition method includes: determining the non-editable graphic and text-based image based on a preset contour detection algorithm. At least one contour included in the image; a shape recognition model based on a pre-trained residual neural network, identifying the shape of each contour in the at least one contour; determining a relative shape of the shape based on a minimum circumscribed rectangle size of the shape of each contour The size of each contour is determined based on the coordinates of the preset position of the shape of each contour; the color of each contour is determined based on the color corresponding to the centroid coordinates of each contour.
- the non-editable At least one contour included in the graphic image including: based on a preset contour detection algorithm, determining a set of contours included in the non-editable graphic image; according to the coordinates of each contour and the text box , filter out the contours that coincide with the text box; determine at least one contour based on the remaining contours.
- generating an editable document corresponding to the non-editable graphic image includes: obtaining and displaying the structured data; based on the final structured data data, generate an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position on the canvas, and determine an initial editable document corresponding to the non-editable graphic and text image.
- the method further includes: In response to the user's operation, the initial editable document is added, modified or deleted.
- the present invention also provides a device for generating editable documents based on non-editable graphic and textual images.
- the device includes: an acquisition module for acquiring non-editable graphic and textual images; and a determining module for generating the non-editable graphic and textual images from the non-editable graphic and textual images.
- contour features from the graphic and text images extract text features from the non-editable graphic and text images; wherein the contour features include the shape, position, size and color within the contour; the text features including text box coordinates, text content, text color and font size; the first generation module is used to generate structured data according to the outline features and text features; the second generation module is used to generate structured data based on the final structured data , generating an editable document corresponding to the non-editable graphic image.
- the present invention also provides a computer device, including a memory and a processor.
- Computer-readable instructions are stored in the memory.
- the computer-readable instructions When executed by the processor, they cause the processor to execute the above-mentioned non-editable-based method. Steps of a method for generating editable documents from graphics and text images.
- the present invention also provides a storage medium storing computer-readable instructions.
- the computer-readable instructions When executed by one or more processors, they cause one or more processors to execute the above-mentioned non-editable graphics-based image generation. Steps to edit a document.
- the above-mentioned method and device for generating editable documents based on non-editable graphics and text images can determine the non-editable graphics and text types by determining the outline features and text features of the non-editable graphics and text images.
- the elements and corresponding attributes contained in the image are generated, and initial structured data is generated based on these elements and corresponding attributes, and then the relationship between the two elements is further determined based on the pre-trained element relationship classification model, and the two elements are The relationship between them is supplemented to the initial structured data to obtain the final structured data, and then based on the final structured data, editable documents corresponding to the non-editable graphics and text images are generated, thereby no longer relying on Manually convert non-editable graphic images into editable documents, thereby efficiently and accurately converting non-editable graphic images into editable documents.
- Figure 1 is one of the flow diagrams of a method for generating editable documents based on non-editable graphics and text images provided by the present invention
- Figure 2 is a schematic flow chart of a method for generating editable documents based on non-editable graphics and text images provided by the present invention.
- Figure 3 is the third schematic flowchart of the method for generating editable documents based on non-editable graphics and text images provided by the present invention
- Figure 4 is a schematic flow chart of the fourth method for generating editable documents based on non-editable graphics and text images provided by the present invention
- Figure 5 is a schematic flow chart of a method for generating editable documents based on non-editable graphic images provided by the present invention.
- Figure 6 is a schematic flow chart of the method for generating editable documents based on non-editable graphic images provided by the present invention.
- Figure 7 is a schematic diagram of a graphic image provided by the present invention.
- Figure 8 is a schematic diagram of the initial structured data provided by the present invention.
- Figure 9 is a schematic diagram of the final structured data provided by the present invention.
- Figure 10 is a schematic diagram of the display interface of the graphic editor provided by the present invention.
- Figure 11 is a schematic framework diagram of a device for generating editable documents based on non-editable graphics and text images provided by the present invention
- Figure 12 is a schematic diagram of the electronic device provided by the present invention.
- CTPN is a text detection algorithm proposed in ECCV 2016.
- CTPN is a deep neural network that combines a convolutional neural network (CNN) and a long short serm memory network (LSTM). It can effectively detect horizontally distributed text in complex scenes.
- CNN convolutional neural network
- LSTM long short serm memory network
- CTPN is used to detect text boxes in non-editable graphic images.
- Non-maximum suppression is to suppress elements that are not maximum values and search for local maximum values.
- many rectangular boxes that may be objects are eventually found from a picture, and then each rectangular box is To do category classification probability.
- NMS is used to filter out a part of the rectangular frames from the plurality of rectangular frames.
- NMS is used to filter the text boxes in the detected non-editable graphic and text images, so that the text boxes detected by CTPN are closer to the text boxes in the original non-editable graphic and text images.
- OpenCV is a cross-platform computer vision and machine learning software library released under the Apache2.0 license (open source) and can run on Linux, Windows, Android and Mac OS operating systems. OpenCV is lightweight and efficient. It consists of a series of C functions and a small number of C++ classes. It also provides interfaces in Python, Ruby, MATLAB and other languages, and implements many common algorithms in image processing and computer vision.
- CTC is used to solve the alignment problem of input data and given labels. It can be used to perform end-to-end training and output sequence results of variable length. It can be understood that since there are certain character intervals between words in text images of natural scenes, and text images may also have problems such as deformation, the same text will have different expressions, and the same text will appear in the text recognition results. Therefore, the CTC model can be used to remove spaced characters and repeated characters in text recognition results.
- Structured data also called row data, is data logically expressed and implemented by a two-dimensional table structure. It strictly follows data format and length specifications and is mainly stored and managed through relational databases.
- the present invention provides a method for generating editable documents based on non-editable graphic and textual images, solving the problem that in the past graphic and textual data can only be saved as uneditable images, and at the same time solving the problem of using When readers try to edit graphic data in documents, they often need to create it from scratch to efficiently convert non-editable graphic data into editable graphic data.
- Figure 1 is a schematic flowchart of a method for generating editable documents based on non-editable graphics and text images provided by the present invention. It can be understood that the method of generating an editable document based on a non-editable graphic image can be executed by a device that generates an editable document based on a non-editable graphic image.
- the device for generating an editable document based on a non-editable graphic image may be a computer device.
- a method of generating editable documents based on non-editable graphic images is proposed, which may include the following steps:
- Step 110 Obtain non-editable graphic and text images.
- graphic images include images and text.
- images and text have related properties.
- an image has properties such as the shape and position and size of the outline and the color within the outline
- text has text box coordinates, text content, text color, and font size.
- Non-editable graphic images are images in which the shape, color, and font size and color of the image or graphics cannot be directly adjusted, such as images in PNG or JPG format.
- Step 120 Extract contour features from non-editable graphic and text images; extract text features from non-editable graphic and text images.
- the outline features include the shape, position, size and color of the outline;
- the text features include text box coordinates, text content, text color and font size.
- the contour detection and shape recognition methods are used to detect the relevant attribute characteristics of the contour included in the non-editable graphic image, such as shape, position, size and color within the contour.
- Text detection and recognition methods are used to detect relevant attribute characteristics of text included in non-editable graphic images, such as text box coordinates, text content, text color and font size.
- the elements and corresponding attributes contained in the graphic image can be determined, and initial structured data can be generated based on these elements and corresponding attributes.
- Step 130 Generate initial structured data based on contour features and text features.
- the initial structured data is generated based on the contour features and text features to facilitate subsequent root analysis.
- dictionary S ⁇ contour C1: (position coordinates, color attribute value, shape attribute value, shape size), contour C2: (position coordinates, color attribute value, shape attribute value, shape size), ..., contour Ck (position coordinates, color attribute value, shape attribute value, shape size) ⁇ .
- image shape J ⁇ image 1: (image path, shape position coordinates, shape size), image 2: (image path, shape position coordinates, shape size),..., image z: (image path, Shape position coordinates, shape size) ⁇ .
- contour C1, contour C2, contour Ck and image 1, image 2 or image z are one element respectively.
- Step 140 Determine the relationship between two elements in the non-editable graphic and text image based on the pre-trained element relationship classification model and the contour features and text features.
- the pre-trained element relationship classification model is determined after training based on a data set composed of outline features or text features of two elements, outline features and/or text features, and relationship labels of the two elements.
- the elements are outlines or texts included in non-editable graphic images, such as graphics, images, text boxes or text content.
- relationship tag of two elements is used to identify the relationship between the two elements, which can be, for example, surrounding, containing, associated, or independent.
- Surrounding can be understood as an element surrounding another element, such as text above an arrow.
- Containment can be understood as an element containing another element, such as text within a certain outline.
- Association can be understood as contact between two elements, such as arrows connected to text boxes, arrows connected to other shapes, etc.
- Independence can be understood as relationships other than the above three types of relationships.
- the initial structured data includes features such as the shape, position, size, and color within the outline, text box coordinates, text content, text color, and font size, in order to more comprehensively and accurately represent non-editable graphics and text.
- the relationship between elements in class images can be further determined based on the pre-trained element relationship classification model.
- the relationship between two elements in non-editable graphic and text class images can be further determined, and the relationship between the two elements can be supplemented to the above From the initial structured data, the final structured data is obtained.
- Step 150 Supplement the initial structured data based on the relationship between the two elements to obtain the final structured data.
- Step 160 Generate an editable document corresponding to the non-editable graphic and text image based on the final structured data.
- the editable document corresponding to a non-editable graphic image is an image in which the shape, color, and font size and color can be directly adjusted, such as an image in visio format.
- the method provided by the present invention for generating editable documents based on non-editable graphic and text images can determine the elements contained in the non-editable graphic and text images and their corresponding elements by determining the outline features and text features of the non-editable graphic and text images.
- the pre-trained element relationship classification model includes multiple binary classification models.
- the inability to Editing the relationship between two elements in a graphic image includes the following steps:
- Step 210 Based on each pre-trained binary classification model and the contour features and text features, determine the classification result of the relationship between the two elements in the non-editable graphic and text image.
- the binary classification model is used to determine whether the relationship between two elements is a relationship corresponding to a preset relationship label. For example, it can be used to determine whether the relationship between two elements is an inclusion relationship.
- the two-classification model can be a support vector machine (SVM).
- the classification result may be, for example, that the probability that the relationship between the two elements is an inclusion relationship is 90%, and the probability that the relationship between the two elements is not an inclusion relationship is 10%.
- Step 220 Based on the classification result with the largest probability value among the determined multiple classification results, determine the final classification result of the relationship between each two elements in the non-editable graphic and text image.
- the classification result with the largest probability value is determined as the final classification result.
- the classification results of multiple binary classification models for the relationship between two elements of the same group are: the probability of being a contained relationship is 90%, the probability of being an associated relationship is 20%, and the probability of being an independent relationship is 15%. , the probability of being a wraparound relationship is 10%, then the classification result corresponding to 90% is determined to be the final classification result, that is, the relationship between two elements is determined to be an inclusion relationship.
- the process of determining a pre-trained element relationship classification model after training based on a data set consisting of outline features and/or text features of two elements and relationship labels of two elements. including the following steps:
- Step 310 Obtain the outline features and text features of multiple non-editable graphic and text images.
- the feature values in the contour features and text features can be normalized, and One-hot one-hot encoding is used to encode the feature values.
- Step 320 Determine a data set based on the contour features and/or text features corresponding to each two elements in each non-editable graphic image; and determine the relationship label of each two elements based on the relationship between each two elements.
- the contour features and/or text features corresponding to each two elements contained in each non-editable graphic image can be used as samples, and the relationship between each two elements is used as the label of the corresponding sample to determine the data set for training the two-classification model.
- One-hot one-hot encoding can be used to encode the relationship label to facilitate subsequent processing.
- Step 330 Based on the relationship labels of each two elements and their corresponding binary classification models, the samples of the data set are determined as positive samples and negative samples.
- the corresponding two-classification model can be, for example: a two-classification model used to determine whether the relationship between two elements is a surround relationship, or a two-classification model used to determine whether the relationship between two elements is an inclusion relationship. model, or a binary classification model used to determine whether the relationship between two elements is an association relationship, or a A binary classification model used to determine whether the relationship between two elements is independent.
- samples with the same relationship label category corresponding to the two-classification model can be classified as positive samples, and samples with different relationship label categories corresponding to the two-classification model can be classified as negative samples.
- samples with the same relationship label category corresponding to the two-classification model can be classified as positive samples, and samples with different relationship label categories corresponding to the two-classification model can be classified as negative samples.
- a binary classification model that is used to determine whether the relationship between two elements is an association relationship
- a sample in which the relationship between the two elements is an association relationship is determined as a positive sample
- a sample in which the relationship between the two elements is a surrounding relationship is determined as a positive sample.
- Samples with relationships, inclusions or independent relationships are determined as negative samples.
- Step 340 Train the corresponding two-classification model based on the positive samples and negative samples to obtain a pre-trained two-classification model.
- the positive samples and negative samples can be divided into training sets, verification sets, and test sets respectively according to preset proportions, and then the corresponding two-classification model is trained based on the training set to obtain a pre-trained two-classification model.
- determining text features in non-editable graphic and text images based on text detection and text recognition methods includes the following steps:
- Step 410 Determine the text box and its coordinates included in the non-editable graphic image based on a preset text box detection algorithm.
- the CTPN text detection network can be used as the preset text box detection algorithm to detect text on non-editable graphic images to obtain the initial text box, and then use the NMS algorithm to filter out the redundant text in the initial text box. boxes, and finally combined with the text line construction algorithm, the text boxes belonging to the same text sequence are connected to obtain the connected text boxes.
- non-editable graphics and text images can be preprocessed to facilitate subsequent detection of text boxes and text.
- non-editable graphics and text images can be grayscaled and binarized.
- grayscale can be achieved through the OpenCV function cv2.cvtColor
- binarization can be achieved through the OpenCV function cv2.threshold.
- Coordinate set T ⁇ T 1 , T 2 ,..., T t ⁇ . It can be understood that the connected text boxes can be expressed in the form of coordinates corresponding to the text boxes, for example, the coordinates of the diagonal lines corresponding to the text boxes.
- the elements T t in the coordinate set T are recorded in the form of (x 1 , y 1 , x 2 , y 2 ), and x 1 and x 2 in (x 1 , y 1 , x 2 , y 2 ) represents the abscissa coordinate of the diagonal line corresponding to the text box, and y 1 and y 2 represent the ordinate coordinate of the diagonal line corresponding to the text box.
- Step 420 Determine the text content included in each text box based on a preset text recognition algorithm.
- the CRNN text recognition algorithm can be used as the preset text recognition algorithm to identify the text content included in each text box.
- the CRNN text recognition algorithm can use the MobileNetv3 network to extract features from the image area corresponding to the input text detection box set T to obtain a feature map.
- the height of the input image can be 32, and the width can be any number greater than 0.
- the Height of the feature map becomes 1.
- the Im2Seq network layer can be used in the CRNN text recognition algorithm to transform the resulting feature map into the shape of a feature sequence to input into the subsequent sequence model. Then, the obtained feature sequence is input into the BiLSTM model.
- the BiLSTM model learns the feature sequence and uses the fully connected layer to obtain the model prediction label distribution results. Among them, the predicted label distribution result includes the recognized text and the corresponding probability. Finally, the predicted label distribution result can also be input into the CTC layer and decoded to obtain the text content recognized by the coordinate set T of the text box.
- Step 430 Determine the text color in each text box based on the coordinates of each text box and the pixel histogram in each text box.
- the pixel histogram in each text box can be obtained by combining the coordinates of each text box, thereby obtaining the text color value.
- Step 440 Determine the size of the font in the text box based on the coordinates of each text box.
- each text box can be determined according to the coordinates of each text box, so that Determine the relative size of fonts within text boxes based on the height of each text box.
- determining the contour features in the non-editable graphic image includes the following steps:
- Step 510 Determine at least one contour included in the non-editable graphic image based on a preset contour detection algorithm.
- step 510 may include step 5101 to step 5103, which are described below.
- Step 5101 Based on a preset contour detection algorithm, determine a set of contours included in the non-editable graphic image.
- OpenCV's function cv2.findContours can be used to perform contour detection and obtain a collection of contours included in non-editable graphics and text images.
- non-editable graphic and text images can be pre-processed to facilitate subsequent contour detection.
- non-editable graphics and text images can be grayscaled and binarized.
- Step 5102 Based on the coordinates of each contour and the text box, filter out the contours that overlap the text box.
- the contours obtained based on the preset contour detection algorithm include not only the contours of some images, but also the contours of text boxes. Therefore, the contours of the text boxes need to be removed to obtain the contours of the images in the non-editable graphic and text images.
- the currently traversed contour C x find whether the contour C x The degree of overlap with each text box is greater than the preset threshold. If so, the outline C x is deleted and an updated set of outlines C is obtained.
- Step 5103 Determine at least one contour based on the remaining contours.
- the contours in the updated contour set C are connected.
- the OpenCV function approxPolyDP can be used to perform contour approximation.
- Step 520 Identify the shape of each contour in at least one contour based on the shape recognition model of the pre-trained residual neural network.
- the residual neural network may be ResNet50, for example.
- the data set can be randomly divided, Obtain the training set, verification set and test set, use the training set to train the ResNet50 model, and the verification set and test set are used to adjust the parameters of the ResNet50 model and test the performance respectively.
- the cross entropy of multi-label classification is used as the loss function to train the shape recognition model based on the residual neural network, and the pre-trained shape recognition model of the residual neural network is obtained.
- the classification categories corresponding to the shape recognition model of the pre-trained residual neural network can be predetermined.
- the classification categories are determined based on common shapes, and the shapes of the relevant classification categories are set in the subsequent visual editing platform. This makes it easy to create editable shapes.
- shapes can be divided into three categories: arrow shape categories, basic shape categories, and image shape categories.
- the arrow shape categories include: up arrow, down arrow, straight arrow, curved arrow, bidirectional arrow, straight line and curve.
- Basic shape classes include: circle, triangle, square, sector, ellipse, parallelogram and rhombus.
- the image shape category includes non-arrow and non-basic images presented in graphic files. The classification of the defined shape recognition model is consistent with the shape types provided by the graphic editor of the present invention.
- each contour can be pre-processed.
- data enhancement can be performed on the contour, including operations such as scaling down and enlarging, and color transformation.
- Step 530 Determine the relative size of the shape based on the minimum circumscribed rectangle size of the shape of each outline, and determine the position of each outline based on the coordinates of the preset position of the shape of each outline.
- the coordinates of the preset position of each contour shape may be the position coordinates of the upper left corner and the lower right corner of each contour shape, or may be other coordinates on the shape of each contour that can identify the contour.
- step 530 provides a possible method for determining the size and position of the outline, and other methods can also be used for determination, and the present invention is no longer limited in this regard.
- Step 540 Determine the centroid coordinates of each contour in at least one contour.
- the set of updated contours C ⁇ C 1 , C 2 , ..., C c ⁇ can be traversed, and the centroid coordinates corresponding to each contour in the set of updated contours C can be calculated as the coordinates of the contours Centroid attribute value.
- OpenCV's function cv2.moments can be used to calculate the first-order geometric moment to obtain the centroid position of the specified contour.
- Step 550 Determine the color of each contour based on the color corresponding to the centroid coordinate of each contour.
- the color value of the centroid coordinate is obtained according to the centroid coordinate of each contour as the color attribute value of the contour.
- generating an editable document corresponding to a non-editable graphic image based on the final structured data includes the following steps:
- Step 610 Obtain and display the final structured data.
- this step can be implemented by a graphic editor.
- the final structured data can be as shown in Figure 9 below, and will not be described in detail here.
- the graphic editor can be a self-built graphic editor based on Electron.
- Step 620 Based on the final structured data, generate images corresponding to outline features and text corresponding to text features at corresponding positions on the canvas, and determine the initial editable document corresponding to the non-editable graphic and text images.
- the editor when the editor renders the visual interface, it first creates a shape based on the shape of the outline in the outline feature, then determines the position of the shape on the canvas based on the position coordinates of the outline, and then fills the shape color and adjusts the shape size.
- the text box will be created, and the text box will be created based on the coordinates of the text box in the structured data, and the recognized text content will be generated in the text box, and the text color and size will be adjusted, and the above generated and The created content is saved, thereby generating an initial editable document corresponding to the non-editable graphic and text image.
- the method further includes:
- the initial editable document is added, modified, or deleted.
- the initial editable document can be saved in a format editable by other graphic editors, and further edits such as adding, modifying, and deleting can be performed in other graphic editors.
- the other editor may be Visio.
- Figure 7 is a schematic diagram of graphic images. As shown in Figure 7, it includes CT scan images, arrows, multiple text contents, and background graphics corresponding to the text contents.
- FIG. 8 is a schematic diagram of the initial structured data corresponding to the graphic image in FIG. 7 . As shown in FIG. 8 , it includes graphic related information (shapes), image related information (figures) and text features (texts) included in the graphic image as shown in FIG. 7 .
- graphic related information shapes
- image related information figures
- text features texts
- the related information of graphics (shapes) and the related information of images (figures) can correspond to The contour characteristics of the previous article.
- Figures correspond to the CT scan images in Figure 7, where figures: [ ⁇ id: element 21, path: “fig1.jpg”, size: (a21, w21), position: ((x41, y41), (x42, y42)) ⁇ ], where "path: "fig1.jpg”” is the saving path of the image, and "position: ((x41, y41), (x42, y42))” is the image in the non-editable graphic and text category
- the relative position in , "size: (a21, w21)” is the size of the image.
- ⁇ id element 1, type: RightDirectionalConnector, color: (r1, g1, b1), size: (a1, w1), position: ((x1, y1), (x2, y2)) ⁇ in shapes
- type: RightDirectionalConnector is the shape of the outline
- color: (r1, g1, b1) is the color of the outline
- size: (a1, w1) is the size of the outline
- position: ((x1, y1) , (x2, y2)) ⁇ ” is the position of the outline.
- FIG. 9 is a schematic diagram of the final structured data corresponding to the graphic image in FIG. 7 .
- element 1 in addition to the graphics-related information, image-related information and text features included in the graphic image as shown in Figure 8, it also includes the relationship between the two elements.
- element 1 its corresponding final structured data is: "Self attributes: [type: RightDirectionalConnector, color: (r1, g1, b1), size: (a1, w1), position: ((x1, y1 ), (x2, y2))], relationship: [[combination object: element 2, combination relationship: association], [combination object: element 19, combination relationship: surround], [combination object: element 21, combination relationship: association ]]".
- the final structured data shown in Figure 9 also has more elements related to element 1 and the relationship between the two elements: "Relationship: [[Combined object: element 2, combined relationship: association] , [Combined object: Element 19, Combination relationship: Surround], [Combined object: Element 21, Combination relationship: Association]]".
- Figure 10 is a schematic diagram of the display interface of the graphic editor provided by the present invention. As shown in Figure 10, it can be to open and display the editable document corresponding to the non-editable graphic image, and make some simple adjustments to the content in the editable document, such as adjusting the size, color and position of the font in the image, and you can select lines The thickness, you can also adjust the overall layout of graphics, text and images in the editable document, and add or modify some new shapes.
- the device for generating editable documents based on non-editable graphic and text images provided by the present invention is described below.
- the device for generating editable documents based on non-editable graphic and text images described below is different from the device based on non-editable graphic and text images described above.
- Methods for generating editable documents from class images can be referenced in correspondence with each other.
- a device for generating an editable document based on a non-editable graphic image may include:
- the acquisition module 1110 is used to acquire non-editable graphic and text images
- the first determination module 1120 is used to extract outline features from the non-editable graphic and text images; extract text features from the non-editable graphic and text images; wherein the outline features include the shape of the outline, Position, size and color within the outline; the text features include text box coordinates, text content, text color and font size;
- the first generation module 1130 is used to generate initial structured data according to the outline features and the text features;
- the second determination module 1140 is used to determine the relationship between two elements in the non-editable graphic image based on the pre-trained element relationship classification model and the outline features and the text features; wherein, The above-mentioned pre-trained element relationship classification model is determined after training based on a data set composed of the outline features or text features of the two elements, the outline features and/or text features, and the relationship labels of the two elements;
- the supplement module 1150 is used to supplement the initial structured data based on the relationship between the two elements to obtain the final structured data
- the second generation module 1160 is configured to generate an editable document corresponding to the non-editable graphic image based on the final structured data.
- the device provided by the present invention for generating editable documents based on non-editable graphic and text images can determine the elements contained in the non-editable graphic and text images and their corresponding elements by determining the outline features and text features of the non-editable graphic and text images. attributes, and generate initial The initial structured data is then further determined based on the pre-trained element relationship classification model to determine the relationship between the two elements, and the relationship between the two elements is supplemented to the initial structured data to obtain the final structure.
- editable documents corresponding to the non-editable graphic and text images are generated, thereby no longer relying on manual conversion of non-editable graphic and text images into editable documents, thereby achieving efficient, Accurately convert non-editable graphics and text images into editable documents.
- the second determining module 1140 includes:
- a first determination unit configured to determine the classification result of the relationship between each two elements in the non-editable graphic and text image based on each pre-trained two-classification model and the contour features and the text features;
- the second determination unit is used to determine the final classification result of the relationship between each two elements in the non-editable graphic and text image based on the classification result with the largest probability value among the plurality of determined classification results.
- the second determination module 1140 further includes:
- the acquisition unit is used to acquire the outline features and text features of multiple non-editable graphic and text images
- the third determination unit is used to determine the data set based on the contour features and/or text features corresponding to each two elements in each non-editable graphic image; and determine each two elements based on the relationship between each two elements. relationship tag;
- the fourth determination unit is used to determine the samples of the data set as positive samples and negative samples based on the relationship labels of each two elements and their corresponding binary classification models;
- the training unit is used to train the corresponding two-classification model based on positive samples and negative samples to obtain a pre-trained two-classification model.
- the first determining module 1120 includes:
- the fifth determination unit is used to determine the text box and its coordinates included in the non-editable graphic image based on a preset text box detection algorithm
- the sixth determination unit is used to determine the text content included in each text box based on the preset text recognition algorithm
- the seventh determination unit is used to determine the text color in each text box based on the coordinates of each text box and the pixel histogram in each text box;
- the eighth determination unit is used to determine the size of the font in the text box based on the coordinates of each text box.
- the first determining module 1120 further includes:
- a ninth determination unit configured to determine at least one contour included in the non-editable graphic image based on a preset contour detection algorithm
- a recognition unit configured to recognize the shape of each contour in the at least one contour based on a shape recognition model based on a pre-trained residual neural network
- a tenth determination unit configured to determine the relative size of the shape based on the minimum circumscribed rectangle size of the shape of each outline, and determine the position of each outline based on the coordinates of the preset position of the shape of each outline;
- An eleventh determination unit configured to determine the color of each contour based on the color corresponding to the centroid coordinate of each contour.
- the ninth determining unit includes:
- a twelfth determination unit configured to determine a set of contours included in the non-editable graphic image based on a preset contour detection algorithm
- a filtering unit configured to filter out the contours that coincide with the text box according to the coordinates of each contour and the text box;
- a thirteenth determination unit configured to determine at least one contour based on the remaining contours.
- the second generation module 1160 includes:
- An acquisition and display unit is used to acquire and display the structured data
- a generating unit configured to generate, based on the final structured data, an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position on the canvas, and determine the image corresponding to the non-editable graphic and text image.
- Figure 12 illustrates a schematic diagram of the physical structure of an electronic device.
- the electronic device may include: a processor (processor) 1210, a communications interface (communications interface) 1220, a memory (memory) 1230 and a communication bus 1240.
- the processor 1210, the communication interface 1220, and the memory 1230 complete communication with each other through the communication bus 1240.
- the processor 1210 can call logical instructions in the memory 1230 to execute a method of generating an editable document based on a non-editable graphic image. The method includes: obtaining a non-editable graphic image; and generating an editable document from the non-editable graphic image.
- the above-mentioned logical instructions in the memory 1230 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
- the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program code. .
- the present invention also provides a computer program product.
- the computer program product includes a computer program stored on a non-transitory computer-readable storage medium.
- the computer program includes program instructions. When the program instructions are read by a computer, When executed, the computer can execute the method for generating an editable document based on a non-editable graphic image provided by the present invention.
- the method for generating an editable document based on a non-editable graphic image includes: obtaining the non-editable graphic image.
- contour features include the shape, position, size and content of the contour color
- text features include text box coordinates, text content, text color and font size
- generate initial structured data according to the outline features and text features based on the pre-trained element relationship classification model and the The outline features and the text features are used to determine the relationship between the two elements in the non-editable graphic image; wherein the pre-trained element relationship classification model is based on the relationship between the two elements.
- the data set composed of contour features or text features, contour features and/or text features, and the relationship labels of the two elements is determined after training; based on the relationship between the two elements, it is added to the initial structured data to obtain the final Structured data; based on the final structured data, generate an editable document corresponding to the non-editable graphic image.
- the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored.
- the computer program is implemented when executed by a processor to perform the non-editable graphics-based image generation provided by the present invention.
- a method for editing an editable document is also provided.
- the method for generating an editable document based on a non-editable graphic image includes: obtaining a non-editable graphic image; extracting outline features from the non-editable graphic image; Text features are extracted from the non-editable graphic images; wherein the outline features include the shape, position, size and color of the outline; the text features include text box coordinates, text content, text color and Font size; generate initial structured data according to the outline features and the text features; determine the non-editable graphic and text images based on a pre-trained element relationship classification model and the outline features and the text features The relationship between two elements in; wherein, the pre-trained element relationship classification model is based on a data set composed of the outline features or text features of the two elements, the outline features and/or text features, and the relationship labels of the two elements. Determine after training; add to the initial structured data based on the relationship between the two elements to obtain the final structured data; generate the non-editable graphic and text image correspondence based on the final structured data editable document.
- the device embodiments described above are only illustrative.
- the units described as separate components may or may not be physically separated.
- the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
- each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
- the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, optical disc, etc., including a number of instructions for causing a computer device to (can be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or certain parts of the embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a method and device for generating an editable document based on a non-editable graphics-text image. The method comprises: acquiring a non-editable graphics-text image; extracting contour features and text features from the non-editable graphics-text image; generating initial structured data according to the contour features and the text features; determining a relationship between two elements in the non-editable graphics-text image on the basis of a pre-trained element relationship classification model, the contour features and the text features, the pre-trained element relationship classification model being determined by training on the basis of a data set composed of contour features and/or text features of the two elements and a relationship tag of the two elements; supplementing the relationship between two elements to the initial structured data to obtain final structured data; and generating an editable document of the non-editable graphics-text image on the basis of the final structured data. On this basis, non-editable graphics-text data can be efficiently and accurately converted into an editable document.
Description
本发明涉及计算机技术领域,特别是涉及基于不可编辑的图文类图像生成可编辑文档的方法及装置。The present invention relates to the field of computer technology, and in particular to methods and devices for generating editable documents based on non-editable graphics and text images.
由于图文类图像不能进行编辑,也无法快速地在已有元素的基础上对图文类数据进行更改。若想要将无法编辑的图文类数据保存为可编辑文档,仍需从零开始手动制作,导致图文类数据的利用率低,人工制作耗时费力。因此,如何能够高效、准确地将不可编辑的图文类图像转化为可编辑文档是目前需要解决的一个技术问题。Since graphic and text images cannot be edited, graphic and text data cannot be quickly changed based on existing elements. If you want to save uneditable graphic and text data as an editable document, you still need to manually create it from scratch, resulting in low utilization of graphic data and time-consuming and laborious manual production. Therefore, how to efficiently and accurately convert non-editable graphics and text images into editable documents is a technical problem that currently needs to be solved.
发明内容Contents of the invention
本发明提供一种基于不可编辑的图文类图像生成可编辑文档的方法及装置,用以解决现有技术中将无法编辑的图文类数据保存为可编辑文档耗时较多、效率低的问题,实现高效、准确地将不可编辑的图文类图像转化为可编辑文档。The present invention provides a method and device for generating editable documents based on non-editable graphic and textual images to solve the problem in the prior art that saving uneditable graphic and textual data into editable documents is time-consuming and inefficient. Problem to achieve efficient and accurate conversion of non-editable graphics and text images into editable documents.
一种基于不可编辑的图文类图像生成可编辑文档的方法,所述方法包括:获取不可编辑的图文类图像;从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小;根据所述轮廓特征和所述文本特征生成初始的结构化数据;基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系;其中,所述预训练好的元素关系分类模型基于两个元素的轮廓特征或文本特征轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定;基于两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据;基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。
A method for generating editable documents based on non-editable graphic and text images. The method includes: obtaining non-editable graphic and text images; extracting contour features from the non-editable graphic and text images; and extracting contour features from the non-editable graphic and text images. Extract text features from edited graphic images; wherein the outline features include the shape, position, size and color within the outline; the text features include text box coordinates, text content, text color and font size ; Generate initial structured data based on the outline features and the text features; determine two of the non-editable graphic and text images based on the pre-trained element relationship classification model and the outline features and the text features. The relationship between elements; wherein, the pre-trained element relationship classification model is trained based on a data set composed of the outline features or text features of the two elements, the outline features and/or text features, and the relationship labels of the two elements. Determine; supplement the initial structured data based on the relationship between the two elements to obtain the final structured data; generate an editable image corresponding to the non-editable graphic image based on the final structured data. Edit the document.
在其中一个实施例中,所述预训练好的元素关系分类模型中包括多个预训练好的二分类模型,相应的,所述基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系,包括:基于每个预训练好的二分类模型以及所述轮廓特征和所述文本特征,确定不可编辑的图文类图像中每两个元素之间的关系的分类结果;基于确定的多个分类结果中概率值最大的分类结果,确定不可编辑的图文类图像中每两个元素之间的关系的最终分类结果。In one embodiment, the pre-trained element relationship classification model includes multiple pre-trained binary classification models. Correspondingly, the pre-trained element relationship classification model and the contour features and the Determining the relationship between two elements in the non-editable graphic and text-based image includes: based on each pre-trained two-classification model and the contour feature and the text feature, determining the non-editable text feature. The classification result of the relationship between every two elements in the graphic image; based on the classification result with the largest probability value among the multiple determined classification results, determine the relationship between every two elements in the non-editable graphic image Final classification result.
在其中一个实施例中,基于两个元素的轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定预训练好的元素关系分类模型的过程,包括:获取多个不可编辑的图文类图像的轮廓特征和文本特征;基于每个不可编辑的图文类图像中的每两个元素对应的轮廓特征和/或文本特征,确定数据集;并基于每两个元素的关系确定每两个元素的关系标签;基于所述每两个元素的关系标签及其对应的二分类模型,将数据集的样本确定为正样本和负样本;基于正样本和负样本训练对应的二分类模型,得到预训练好的二分类模型。In one embodiment, the process of determining a pretrained element relationship classification model after training based on a data set consisting of outline features and/or text features of two elements and relationship labels of the two elements includes: obtaining multiple The outline features and text features of non-editable graphic and text images; determine the data set based on the outline features and/or text features corresponding to every two elements in each non-editable graphic and text image; and based on every two elements Determine the relationship label of each two elements; based on the relationship label of each two elements and their corresponding two-classification model, determine the samples of the data set as positive samples and negative samples; train the correspondence based on the positive samples and negative samples The two-classification model is obtained to obtain the pre-trained two-classification model.
在其中一个实施例中,所述基于文本检测和文本识别方法确定所述不可编辑的图文类图像中的文本特征,包括:基于预设文本框检测算法,确定所述不可编辑的图文类图像中包括的文本框及其坐标;基于预设文本识别算法,确定每一文本框中包括的文本内容;根据每一文本框的坐标及每一文本框内的像素直方图,确定每一文本框中的文本颜色;基于每一文本框的坐标,确定文本框内字体的大小。In one embodiment, determining the text features in the non-editable graphic and text-based image based on text detection and text recognition methods includes: determining the non-editable graphic and text-based image based on a preset text box detection algorithm. The text boxes included in the image and their coordinates; based on the preset text recognition algorithm, determine the text content included in each text box; determine each text based on the coordinates of each text box and the pixel histogram in each text box The color of the text in the box; based on the coordinates of each text box, determines the font size within the text box.
在其中一个实施例中,所述基于轮廓检测与形状识别方法,确定所述不可编辑的图文类图像中的轮廓特征,包括:基于预设轮廓检测算法,确定所述不可编辑的图文类图像中包括的至少一个轮廓;基于预训练的残差神经网络的形状识别模型,识别所述至少一个轮廓中的每个轮廓的形状;基于每个轮廓的形状的最小外接矩形大小确定形状的相对大小,并根据每个轮廓的形状的预设位置的坐标确定每个轮廓的位置;基于每个轮廓的质心坐标对应的颜色确定所述每个轮廓的颜色。In one embodiment, determining the contour features in the non-editable graphic and text-based image based on the contour detection and shape recognition method includes: determining the non-editable graphic and text-based image based on a preset contour detection algorithm. At least one contour included in the image; a shape recognition model based on a pre-trained residual neural network, identifying the shape of each contour in the at least one contour; determining a relative shape of the shape based on a minimum circumscribed rectangle size of the shape of each contour The size of each contour is determined based on the coordinates of the preset position of the shape of each contour; the color of each contour is determined based on the color corresponding to the centroid coordinates of each contour.
在其中一个实施例中,所述基于预设轮廓检测算法,确定所述不可编辑的
图文类图像中包括的至少一个轮廓;包括:基于预设轮廓检测算法,确定所述不可编辑的图文类图像中包括的轮廓的集合;根据所述每一轮廓与所述文本框的坐标,过滤掉与所述文本框重合的轮廓;基于剩余的轮廓确定至少一个轮廓。In one embodiment, based on a preset contour detection algorithm, it is determined that the non-editable At least one contour included in the graphic image; including: based on a preset contour detection algorithm, determining a set of contours included in the non-editable graphic image; according to the coordinates of each contour and the text box , filter out the contours that coincide with the text box; determine at least one contour based on the remaining contours.
在其中一个实施例中,基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档,包括:获取并展示所述结构化数据;基于所述最终的结构化数据,在画布的相应位置生成与所述轮廓特征对应的图像以及与所述文本特征对应的文本,确定所述不可编辑的图文类图像对应的初始可编辑文档。In one embodiment, based on the final structured data, generating an editable document corresponding to the non-editable graphic image includes: obtaining and displaying the structured data; based on the final structured data data, generate an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position on the canvas, and determine an initial editable document corresponding to the non-editable graphic and text image.
在其中一个实施例中,在所述基于所述最终的结构化数据,在画布的相应位置创建与所述轮廓特征对应的图像以及与所述文本特征对应的文本之后,所述方法还包括:响应于用户的操作,对所述初始可编辑文档进行新增、修改或删除。In one embodiment, after creating an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position on the canvas based on the final structured data, the method further includes: In response to the user's operation, the initial editable document is added, modified or deleted.
本发明还提供一种基于不可编辑的图文类图像生成可编辑文档的装置,所述装置包括:获取模块,用于获取不可编辑的图文类图像;确定模块,用于从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小;第一生成模块,用于根据所述轮廓特征和文本特征生成结构化数据;第二生成模块,用于基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。The present invention also provides a device for generating editable documents based on non-editable graphic and textual images. The device includes: an acquisition module for acquiring non-editable graphic and textual images; and a determining module for generating the non-editable graphic and textual images from the non-editable graphic and textual images. Extract contour features from the graphic and text images; extract text features from the non-editable graphic and text images; wherein the contour features include the shape, position, size and color within the contour; the text features including text box coordinates, text content, text color and font size; the first generation module is used to generate structured data according to the outline features and text features; the second generation module is used to generate structured data based on the final structured data , generating an editable document corresponding to the non-editable graphic image.
本发明还提供计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述所述基于不可编辑的图文类图像生成可编辑文档的方法的步骤。The present invention also provides a computer device, including a memory and a processor. Computer-readable instructions are stored in the memory. When the computer-readable instructions are executed by the processor, they cause the processor to execute the above-mentioned non-editable-based method. Steps of a method for generating editable documents from graphics and text images.
本发明还提供存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述所述基于不可编辑的图文类图像生成可编辑文档的方法的步骤。The present invention also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, they cause one or more processors to execute the above-mentioned non-editable graphics-based image generation. Steps to edit a document.
上述基于不可编辑的图文类图像生成可编辑文档的方法及装置,通过确定不可编辑的图文类图像轮廓特征和文本特征,从而可以确定不可编辑的图文类
图像中包含的元素以及对应的属性,并且根据这些元素和对应的属性生成初始的结构化数据,然后进一步基于预训练好的元素关系分类模型确定两个元素之间的关系,并将两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据,进而基于最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档,从而不再依赖人工将不可编辑的图文类图像转化为可编辑文档,从而实现高效、准确地将不可编辑的图文类图像转化为可编辑文档。The above-mentioned method and device for generating editable documents based on non-editable graphics and text images can determine the non-editable graphics and text types by determining the outline features and text features of the non-editable graphics and text images. The elements and corresponding attributes contained in the image are generated, and initial structured data is generated based on these elements and corresponding attributes, and then the relationship between the two elements is further determined based on the pre-trained element relationship classification model, and the two elements are The relationship between them is supplemented to the initial structured data to obtain the final structured data, and then based on the final structured data, editable documents corresponding to the non-editable graphics and text images are generated, thereby no longer relying on Manually convert non-editable graphic images into editable documents, thereby efficiently and accurately converting non-editable graphic images into editable documents.
图1为本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法的流程示意图之一;Figure 1 is one of the flow diagrams of a method for generating editable documents based on non-editable graphics and text images provided by the present invention;
图2为本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法的流程示意图之二;Figure 2 is a schematic flow chart of a method for generating editable documents based on non-editable graphics and text images provided by the present invention.
图3为本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法的流程示意图之三;Figure 3 is the third schematic flowchart of the method for generating editable documents based on non-editable graphics and text images provided by the present invention;
图4为本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法的流程示意图之四;Figure 4 is a schematic flow chart of the fourth method for generating editable documents based on non-editable graphics and text images provided by the present invention;
图5为本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法的流程示意图之五;Figure 5 is a schematic flow chart of a method for generating editable documents based on non-editable graphic images provided by the present invention.
图6为本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法的流程示意图之六;Figure 6 is a schematic flow chart of the method for generating editable documents based on non-editable graphic images provided by the present invention.
图7为本发明提供的图文类图像的示意图;Figure 7 is a schematic diagram of a graphic image provided by the present invention;
图8为本发明提供的初始的结构化数据的示意图;Figure 8 is a schematic diagram of the initial structured data provided by the present invention;
图9为本发明提供的最终的结构化数据的示意图;Figure 9 is a schematic diagram of the final structured data provided by the present invention;
图10为本发明提供的的图文编辑器的显示界面的示意图;Figure 10 is a schematic diagram of the display interface of the graphic editor provided by the present invention;
图11为本发明提供的基于不可编辑的图文类图像生成可编辑文档的装置的框架示意图;Figure 11 is a schematic framework diagram of a device for generating editable documents based on non-editable graphics and text images provided by the present invention;
图12为本发明提供的电子设备的示意图。
Figure 12 is a schematic diagram of the electronic device provided by the present invention.
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention and are not intended to limit the present invention.
需要说明的是,除非另外定义,本公开实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。It should be noted that, unless otherwise defined, the technical terms or scientific terms used in the embodiments of this disclosure should have the usual meanings understood by those with ordinary skills in the field to which this disclosure belongs. The "first", "second" and similar words used in the embodiments of the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Words such as "include" or "comprising" mean that the elements or things appearing before the word include the elements or things listed after the word and their equivalents, without excluding other elements or things. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "down", "left", "right", etc. are only used to express relative positional relationships. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.
为了便于理解,对本申请涉及的技术术语进行解释。In order to facilitate understanding, the technical terms involved in this application are explained.
(1)连接的文本框网络(connectionist text proposal network,CTPN)(1)Connected text box network (connectionist text proposal network, CTPN)
CTPN是在ECCV 2016提出的一种文字检测算法。CTPN为一种结合卷积神经网络(convolutional neural network,CNN)与长短时记忆网络(long short serm memory network,LSTM)的深度神经网络,能有效的检测出复杂场景的横向分布的文字。CTPN is a text detection algorithm proposed in ECCV 2016. CTPN is a deep neural network that combines a convolutional neural network (CNN) and a long short serm memory network (LSTM). It can effectively detect horizontally distributed text in complex scenes.
在本发明中,CTPN用于检测不可编辑的图文类图像中的文本框。In the present invention, CTPN is used to detect text boxes in non-editable graphic images.
(2)非极大抑制(non maximum suppression,NMS)(2) Non maximum suppression (NMS)
非极大抑制就是抑制不是极大值的元素,搜索局部的极大值。在最近几年常见的物体检测算法(例如rcnn、sppnet、fast-rcnn、faster-rcnn等)中,最终都会从一张图片中找出很多个可能是物体的矩形框,然后为每个矩形框为做类别分类概率。而NMS用于从所述多个矩形框中筛选掉一部分矩形框。在本发明中,NMS用于对检测得到不可编辑的图文类图像中的文本框进行过滤,使CTPN检测得到的文本框更加接近原不可编辑的图文类图像中的文本框。Non-maximum suppression is to suppress elements that are not maximum values and search for local maximum values. In the common object detection algorithms in recent years (such as rcnn, sppnet, fast-rcnn, faster-rcnn, etc.), many rectangular boxes that may be objects are eventually found from a picture, and then each rectangular box is To do category classification probability. NMS is used to filter out a part of the rectangular frames from the plurality of rectangular frames. In the present invention, NMS is used to filter the text boxes in the detected non-editable graphic and text images, so that the text boxes detected by CTPN are closer to the text boxes in the original non-editable graphic and text images.
(3)文本线构造算法
(3) Text line construction algorithm
在本发明中,用于将经过NMS过滤后的文本框连接,从而形成文本检测框。In the present invention, it is used to connect text boxes filtered by NMS to form a text detection box.
(4)OpenCV(4)OpenCV
OpenCV是一个基于Apache2.0许可(开源)发行的跨平台计算机视觉和机器学习软件库,可以运行在Linux、Windows、Android和Mac OS操作系统上。OpenCV轻量级而且高效,由一系列C函数和少量C++类构成,同时提供了Python、Ruby、MATLAB等语言的接口,实现了图像处理和计算机视觉方面的很多通用算法。OpenCV is a cross-platform computer vision and machine learning software library released under the Apache2.0 license (open source) and can run on Linux, Windows, Android and Mac OS operating systems. OpenCV is lightweight and efficient. It consists of a series of C functions and a small number of C++ classes. It also provides interfaces in Python, Ruby, MATLAB and other languages, and implements many common algorithms in image processing and computer vision.
(5)卷积循环神经网络结构(convolutional recurrent neural network,CRNN)(5) Convolutional recurrent neural network structure (convolutional recurrent neural network, CRNN)
主要用于端到端地对不定长的文本序列进行识别,不用先对单个文字进行切割,而是将文本识别转化为时序依赖的序列学习问题,就是基于图像的序列识别。It is mainly used to recognize text sequences of variable length end-to-end without cutting individual words first. Instead, it converts text recognition into a timing-dependent sequence learning problem, which is image-based sequence recognition.
(6)连接时间分类(connectionist temporal classification,CTC)(6) Connectionist temporal classification (CTC)
CTC用于解决输入数据与给定标签的对齐问题,可用于执行端到端的训练,输出不定长的序列结果。可以理解,由于自然场景的文字图像中存在文字之间存在一定的字符间隔,并且文字图像也可能存在变形等问题,因此,导致同一个文字有不同的表现形式,文字识别结果中同一个文字会重复出现,因此,CTC模型可以用于去掉文字识别结果中的间隔字符以及重复字符。CTC is used to solve the alignment problem of input data and given labels. It can be used to perform end-to-end training and output sequence results of variable length. It can be understood that since there are certain character intervals between words in text images of natural scenes, and text images may also have problems such as deformation, the same text will have different expressions, and the same text will appear in the text recognition results. Therefore, the CTC model can be used to remove spaced characters and repeated characters in text recognition results.
(7)结构化数据(7)Structured data
结构化数据也称作行数据,是由二维表结构来逻辑表达和实现的数据,严格地遵循数据格式与长度规范,主要通过关系型数据库进行存储和管理。Structured data, also called row data, is data logically expressed and implemented by a two-dimensional table structure. It strictly follows data format and length specifications and is mainly stored and managed through relational databases.
可以理解,由于图文类数据中元素的形式多样,各类元素具有不同的属性如形状大小、字体颜色等,因此,依靠人工将不可编辑的图文类数据转化为可编辑的图文类数据比较耗费时间和精力,因此,本发明提供了一种基于不可编辑的图文类图像生成可编辑文档的方法,解决以往图文类数据只可保存为无法编辑的图像的问题,同时解决以往使用者尝试编辑文档中的图文类数据时,往往需要从零开始制作的问题,用于实现高效地将不可编辑的图文类数据转化为可编辑的图文类数据。
It can be understood that since the elements in graphic data come in various forms and each type of elements has different attributes such as shape, size, font color, etc., therefore, it is necessary to manually convert uneditable graphic data into editable graphic data. It is relatively time-consuming and energy-consuming. Therefore, the present invention provides a method for generating editable documents based on non-editable graphic and textual images, solving the problem that in the past graphic and textual data can only be saved as uneditable images, and at the same time solving the problem of using When readers try to edit graphic data in documents, they often need to create it from scratch to efficiently convert non-editable graphic data into editable graphic data.
下面结合附图说明本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法及装置。The method and device for generating editable documents based on non-editable graphics and text images provided by the present invention will be described below with reference to the accompanying drawings.
图1为本发明提供的一种基于不可编辑的图文类图像生成可编辑文档的方法的流程示意图。可以理解,该基于不可编辑的图文类图像生成可编辑文档的方法可以由基于不可编辑的图文类图像生成可编辑文档的装置执行。其中,基于不可编辑的图文类图像生成可编辑文档的装置可以为一个计算机设备。Figure 1 is a schematic flowchart of a method for generating editable documents based on non-editable graphics and text images provided by the present invention. It can be understood that the method of generating an editable document based on a non-editable graphic image can be executed by a device that generates an editable document based on a non-editable graphic image. The device for generating an editable document based on a non-editable graphic image may be a computer device.
如图1所示,在一个实施例中,提出了一种基于不可编辑的图文类图像生成可编辑文档的方法,具体可以包括以下步骤:As shown in Figure 1, in one embodiment, a method of generating editable documents based on non-editable graphic images is proposed, which may include the following steps:
步骤110,获取不可编辑的图文类图像。Step 110: Obtain non-editable graphic and text images.
其中,图文类图像中包括的图像以及文本。其中,图像和文本具有相关的属性。例如,图像具有轮廓的形状以及位置和大小以及轮廓内的颜色等属性,文本具有文本框坐标、文本内容、文本颜色以及字体大小。具体地,图文类图像可以参考后文中的图7,此处先不做详述。Among them, graphic images include images and text. Among them, images and text have related properties. For example, an image has properties such as the shape and position and size of the outline and the color within the outline, and text has text box coordinates, text content, text color, and font size. Specifically, for graphics and text images, please refer to Figure 7 below, which will not be described in detail here.
不可编辑的图文类图像为无法直接对图像中图像或者图形的形状、颜色以及字体的大小、颜色进行调整的图像,例如PNG或JPG格式的图片。Non-editable graphic images are images in which the shape, color, and font size and color of the image or graphics cannot be directly adjusted, such as images in PNG or JPG format.
步骤120,从不可编辑的图文类图像中提取轮廓特征;从不可编辑的图文类图像中提取文本特征。Step 120: Extract contour features from non-editable graphic and text images; extract text features from non-editable graphic and text images.
其中,轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小。Among them, the outline features include the shape, position, size and color of the outline; the text features include text box coordinates, text content, text color and font size.
其中,轮廓检测与形状识别方法用于检测不可编辑的图文类图像中包括的轮廓的相关属性特征,例如形状、位置、大小以及轮廓内的颜色。文本检测和识别方法用于检测不可编辑的图文类图像中包括的文本的相关属性特征,例如文本框坐标、文本内容、文本颜色以及字体大小。Among them, the contour detection and shape recognition methods are used to detect the relevant attribute characteristics of the contour included in the non-editable graphic image, such as shape, position, size and color within the contour. Text detection and recognition methods are used to detect relevant attribute characteristics of text included in non-editable graphic images, such as text box coordinates, text content, text color and font size.
可以理解,通过确定轮廓特征和文本特征,从而可以确定图文类图像中包含的元素以及对应的属性,并且根据这些元素和对应的属性生成初始的结构化数据。It can be understood that by determining the outline features and text features, the elements and corresponding attributes contained in the graphic image can be determined, and initial structured data can be generated based on these elements and corresponding attributes.
步骤130,根据轮廓特征和文本特征生成初始的结构化数据。Step 130: Generate initial structured data based on contour features and text features.
可以理解,根据轮廓特征和文本特征生成初始的结构化数据,便于后续根
据初始的结构化数据确定最终的结构化数据。It can be understood that the initial structured data is generated based on the contour features and text features to facilitate subsequent root analysis. Determine the final structured data based on the initial structured data.
具体的,可以将轮廓特征和文本特征以字典的形式保存。例如,字典S={轮廓C1:(位置坐标,颜色属性值,形状属性值,形状大小),轮廓C2:(位置坐标,颜色属性值,形状属性值,形状大小),...,轮廓Ck:(位置坐标,颜色属性值,形状属性值,形状大小)}。又例如,图像类形状J={图像1:(图像路径,形状位置坐标,形状大小),图像2:(图像路径,形状位置坐标,形状大小),...,图像z:(图像路径,形状位置坐标,形状大小)}。其中,轮廓C1、轮廓C2、轮廓Ck以及图像1、图像2或图像z分别为一个元素。Specifically, the outline features and text features can be saved in the form of a dictionary. For example, dictionary S = {contour C1: (position coordinates, color attribute value, shape attribute value, shape size), contour C2: (position coordinates, color attribute value, shape attribute value, shape size), ..., contour Ck (position coordinates, color attribute value, shape attribute value, shape size)}. For another example, image shape J={image 1: (image path, shape position coordinates, shape size), image 2: (image path, shape position coordinates, shape size),..., image z: (image path, Shape position coordinates, shape size)}. Among them, contour C1, contour C2, contour Ck and image 1, image 2 or image z are one element respectively.
步骤140,基于预训练好的元素关系分类模型以及轮廓特征和文本特征,确定不可编辑的图文类图像中两个元素之间的关系。Step 140: Determine the relationship between two elements in the non-editable graphic and text image based on the pre-trained element relationship classification model and the contour features and text features.
其中,所述预训练好的元素关系分类模型基于两个元素的轮廓特征或文本特征轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定。其中,元素为不可编辑的图文类图像中包括的轮廓或者文本,例如图形、图像、文本框或文本内容。Wherein, the pre-trained element relationship classification model is determined after training based on a data set composed of outline features or text features of two elements, outline features and/or text features, and relationship labels of the two elements. Among them, the elements are outlines or texts included in non-editable graphic images, such as graphics, images, text boxes or text content.
此外,两个元素的关系标签用于标识两个元素之间的关系,例如可以为环绕、包含、关联或独立。其中,环绕可理解为某一元素在另一元素的周围,例如文本在箭头上方。包含可理解为某一元素包含另一元素,如文本在某种轮廓内。关联可理解为两个元素之间有接触,如箭头与文本框相连,箭头与其他形状相连等。独立可理解为上述三类关系以外的关系。In addition, the relationship tag of two elements is used to identify the relationship between the two elements, which can be, for example, surrounding, containing, associated, or independent. Surrounding can be understood as an element surrounding another element, such as text above an arrow. Containment can be understood as an element containing another element, such as text within a certain outline. Association can be understood as contact between two elements, such as arrows connected to text boxes, arrows connected to other shapes, etc. Independence can be understood as relationships other than the above three types of relationships.
可以理解,初始的结构化数据中包括轮廓的形状、位置、大小以及轮廓内的颜色、文本框坐标、文本内容、文本颜色以及字体大小等特征,为了更加全面、准确地表示不可编辑的图文类图像中元素之间的关系,可以基于预训练好的元素关系分类模型进一步确定不可编辑的图文类图像中两个元素之间的关系,并将两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据。It can be understood that the initial structured data includes features such as the shape, position, size, and color within the outline, text box coordinates, text content, text color, and font size, in order to more comprehensively and accurately represent non-editable graphics and text. The relationship between elements in class images can be further determined based on the pre-trained element relationship classification model. The relationship between two elements in non-editable graphic and text class images can be further determined, and the relationship between the two elements can be supplemented to the above From the initial structured data, the final structured data is obtained.
步骤150,基于两个元素之间的关系补充至初始的结构化数据中,得到最终的结构化数据。Step 150: Supplement the initial structured data based on the relationship between the two elements to obtain the final structured data.
其中,最终的结构化数据可以参考后文中的图9,此处先不做详述。
Among them, the final structured data can be referred to Figure 9 below, which will not be described in detail here.
步骤160,基于最终的结构化数据,生成不可编辑的图文类图像对应的可编辑文档。Step 160: Generate an editable document corresponding to the non-editable graphic and text image based on the final structured data.
不可编辑的图文类图像对应的可编辑文档为可以直接对图像中的形状、颜色以及字体的大小、颜色进行调整的图像,例如visio格式的图像。The editable document corresponding to a non-editable graphic image is an image in which the shape, color, and font size and color can be directly adjusted, such as an image in visio format.
本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法,通过确定不可编辑的图文类图像轮廓特征和文本特征,从而可以确定不可编辑的图文类图像中包含的元素以及对应的属性,并且根据这些元素和对应的属性生成初始的结构化数据,然后进一步基于预训练好的元素关系分类模型确定两个元素之间的关系,并将两个元素之间的关系补充至初始的结构化数据中,得到最终的结构化数据,进而基于最终的结构化数据,生成不可编辑的图文类图像对应的可编辑文档,从而不再依赖人工将不可编辑的图文类图像转化为可编辑文档,从而实现高效、准确地将不可编辑的图文类图像转化为可编辑文档。The method provided by the present invention for generating editable documents based on non-editable graphic and text images can determine the elements contained in the non-editable graphic and text images and their corresponding elements by determining the outline features and text features of the non-editable graphic and text images. attributes, and generate initial structured data based on these elements and corresponding attributes, and then further determine the relationship between the two elements based on the pre-trained element relationship classification model, and supplement the relationship between the two elements to the initial From the structured data, the final structured data is obtained, and then based on the final structured data, editable documents corresponding to non-editable graphic and text images are generated, thereby no longer relying on manual conversion of non-editable graphic and text images into Editable documents, thereby efficiently and accurately converting non-editable graphics and text images into editable documents.
在其中一个实施例中,预训练好的元素关系分类模型中包括多个二分类模型,相应的,如图2所示,基于预训练好的元素关系分类模型以及轮廓特征和文本特征,确定不可编辑的图文类图像中两个元素之间的关系,包括以下步骤:In one embodiment, the pre-trained element relationship classification model includes multiple binary classification models. Correspondingly, as shown in Figure 2, based on the pre-trained element relationship classification model as well as contour features and text features, it is determined that the inability to Editing the relationship between two elements in a graphic image includes the following steps:
步骤210,基于每个预训练好的二分类模型以及轮廓特征和文本特征,确定不可编辑的图文类图像中两个元素之间的关系的分类结果。Step 210: Based on each pre-trained binary classification model and the contour features and text features, determine the classification result of the relationship between the two elements in the non-editable graphic and text image.
其中,二分类模型用于判断两个元素之间的关系是否为与预设的关系标签对应的关系,例如,可以用于判断两个元素之间的关系是否为包含关系。其中,二分类模型可以为支持向量机(support vector machine,SVM)。Among them, the binary classification model is used to determine whether the relationship between two elements is a relationship corresponding to a preset relationship label. For example, it can be used to determine whether the relationship between two elements is an inclusion relationship. Among them, the two-classification model can be a support vector machine (SVM).
其中,分类结果例如可以为两个元素之间的关系为包含关系的概率为90%,两个元素之间的关系不是包含关系的概率为10%。The classification result may be, for example, that the probability that the relationship between the two elements is an inclusion relationship is 90%, and the probability that the relationship between the two elements is not an inclusion relationship is 10%.
此外,假设一张不可编辑的图文类图像中具有t个轮廓,c个文本,则对应的需要计算[(t+c)*(t+c-1)/2]个元素组合对应的关系的分类结果。In addition, assuming that a non-editable graphic image has t outlines and c texts, it is necessary to calculate the corresponding relationships of [(t+c)*(t+c-1)/2] element combinations. classification results.
步骤220,基于确定的多个分类结果中概率值最大的分类结果,确定不可编辑的图文类图像中每两个元素之间的关系的最终分类结果。Step 220: Based on the classification result with the largest probability value among the determined multiple classification results, determine the final classification result of the relationship between each two elements in the non-editable graphic and text image.
可以理解,针对每种关系标签可以对应有一个二分类模型。因此,多种关系标签对应有多个二分类模型。因此,为了确定两个元素之间的关系,可以将
基于多个二分类模型的分类结果中概率值最大的分类结果确定为最终的分类结果。例如,多个二分类模型对同一组的两个元素之间的关系的分类结果分别为:是包含关系的概率为90%,是关联关系的概率为20%,是独立关系的概率为15%,是环绕关系的概率为10%,则确定90%对应的分类结果为最终的分类结果,即确定两个元素之间的关系的为包含关系。It can be understood that there can be a two-classification model corresponding to each relationship label. Therefore, multiple relationship labels correspond to multiple binary classification models. Therefore, to determine the relationship between two elements, one can Among the classification results based on multiple binary classification models, the classification result with the largest probability value is determined as the final classification result. For example, the classification results of multiple binary classification models for the relationship between two elements of the same group are: the probability of being a contained relationship is 90%, the probability of being an associated relationship is 20%, and the probability of being an independent relationship is 15%. , the probability of being a wraparound relationship is 10%, then the classification result corresponding to 90% is determined to be the final classification result, that is, the relationship between two elements is determined to be an inclusion relationship.
在其中一个实施例中,如图3所示,基于两个元素的轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定预训练好的元素关系分类模型的过程,包括以下步骤:In one of the embodiments, as shown in Figure 3, the process of determining a pre-trained element relationship classification model after training based on a data set consisting of outline features and/or text features of two elements and relationship labels of two elements. , including the following steps:
步骤310,获取多个不可编辑的图文类图像的轮廓特征和文本特征。Step 310: Obtain the outline features and text features of multiple non-editable graphic and text images.
其中,不可编辑的图文类图像的轮廓特征和文本特征的获取过程可参考前文中的相关描述,为了简洁,此处不再赘述。Among them, the process of obtaining the outline features and text features of non-editable graphics and text images can be referred to the relevant descriptions in the previous article. For the sake of simplicity, they will not be described again here.
此外,在获取轮廓特征和文本特征之后,可以对轮廓特征和文本特征中的特征值进行归一化,并采用One-hot独热编码对特征值进行编码。In addition, after obtaining the contour features and text features, the feature values in the contour features and text features can be normalized, and One-hot one-hot encoding is used to encode the feature values.
步骤320,基于每个不可编辑的图文类图像中的每两个元素对应的轮廓特征和/或文本特征,确定数据集;并基于每两个元素的关系确定每两个元素的关系标签。Step 320: Determine a data set based on the contour features and/or text features corresponding to each two elements in each non-editable graphic image; and determine the relationship label of each two elements based on the relationship between each two elements.
可以理解,为了确定每个不可编辑的图文类图像中每两个元素的关系,可以将每个不可编辑的图文类图像中包含的每两个元素对应的轮廓特征和/或文本特征作为样本,并将每两个元素的关系作为对应的样本的标签,从而确定数据集,用于训练二分类模型。It can be understood that in order to determine the relationship between every two elements in each non-editable graphic image, the contour features and/or text features corresponding to each two elements contained in each non-editable graphic image can be used as samples, and the relationship between each two elements is used as the label of the corresponding sample to determine the data set for training the two-classification model.
此外,在每两个元素的关系标签之后,可以采用One-hot独热编码对关系标签进行编码,以便于后续处理。In addition, after the relationship label of each two elements, One-hot one-hot encoding can be used to encode the relationship label to facilitate subsequent processing.
步骤330,基于每两个元素的关系标签及其对应的二分类模型,将数据集的样本确定为正样本和负样本。Step 330: Based on the relationship labels of each two elements and their corresponding binary classification models, the samples of the data set are determined as positive samples and negative samples.
可以理解,针对每种关系标签可以对应有一个二分类模型。因此,对应的二分类模型例如可以为:一个用于判断两个元素之间的关系是否为环绕关系的二分类模型,或一个用于判断两个元素之间的关系是否为包含关系的二分类模型,或一个用于判断两个元素之间的关系是否为关联关系的二分类模型,或一
个用于判断两个元素之间的关系是否为独立关系的二分类模型。It can be understood that there can be a two-classification model corresponding to each relationship label. Therefore, the corresponding two-classification model can be, for example: a two-classification model used to determine whether the relationship between two elements is a surround relationship, or a two-classification model used to determine whether the relationship between two elements is an inclusion relationship. model, or a binary classification model used to determine whether the relationship between two elements is an association relationship, or a A binary classification model used to determine whether the relationship between two elements is independent.
具体地,可以将与二分类模型对应的关系标签类别相同的样本划分为正样本,与二分类模型对应的关系标签类别不同的样本划分为负样本。例如,针对用于判断两个元素之间的关系是否为关联关系的二分类模型,将两个元素之间的关系为关联关系的样本确定为正样本,将两个元素之间的关系为环绕关系、包含关系或独立关系的样本确定为负样本。Specifically, samples with the same relationship label category corresponding to the two-classification model can be classified as positive samples, and samples with different relationship label categories corresponding to the two-classification model can be classified as negative samples. For example, for a binary classification model that is used to determine whether the relationship between two elements is an association relationship, a sample in which the relationship between the two elements is an association relationship is determined as a positive sample, and a sample in which the relationship between the two elements is a surrounding relationship is determined as a positive sample. Samples with relationships, inclusions or independent relationships are determined as negative samples.
还可以理解,由于针对每种关系标签可以对应有一个二分类模型,因此,每种关系标签对应的二分类模型对应的正样本和负样本不同。It can also be understood that since there can be a two-classification model corresponding to each relationship label, the positive samples and negative samples corresponding to the two-classification model corresponding to each relationship label are different.
步骤340,基于正样本和负样本训练对应的二分类模型,得到预训练好的二分类模型。Step 340: Train the corresponding two-classification model based on the positive samples and negative samples to obtain a pre-trained two-classification model.
具体地,可以对正样本和负样本按照预设的比例分别划分为训练集、验证集以及测试集,然后基于训练集训练对应的二分类模型,得到预训练好的二分类模型。Specifically, the positive samples and negative samples can be divided into training sets, verification sets, and test sets respectively according to preset proportions, and then the corresponding two-classification model is trained based on the training set to obtain a pre-trained two-classification model.
可以理解,结合图2的相关描述,在得到预训练好的二分类模型后,即可基于预训练的二分类模型确定不可编辑的图文类图像中每两个元素之间的关系。It can be understood that, combined with the relevant description in Figure 2, after obtaining the pre-trained two-classification model, the relationship between each two elements in the non-editable graphic and text image can be determined based on the pre-trained two-classification model.
在其中一个实施例中,如图4所示,基于文本检测和文本识别方法确定不可编辑的图文类图像中的文本特征,包括以下步骤:In one embodiment, as shown in Figure 4, determining text features in non-editable graphic and text images based on text detection and text recognition methods includes the following steps:
步骤410,基于预设文本框检测算法,确定不可编辑的图文类图像中包括的文本框及其坐标。Step 410: Determine the text box and its coordinates included in the non-editable graphic image based on a preset text box detection algorithm.
具体地,可以采用CTPN文本检测网络作为预设文本框检测算法,对不可编辑的图文类图像进行文本检测,获得初始的文本框,然后使用NMS算法来过滤掉初始的文本框中多余的文本框,最后结合文本线构造算法,将属于同一个文本序列的文本框进行相连得到相连后的文本框。Specifically, the CTPN text detection network can be used as the preset text box detection algorithm to detect text on non-editable graphic images to obtain the initial text box, and then use the NMS algorithm to filter out the redundant text in the initial text box. boxes, and finally combined with the text line construction algorithm, the text boxes belonging to the same text sequence are connected to obtain the connected text boxes.
可以理解,因为不可编辑的图文类图像内可能是没有文本的,因此,基于预设文本框检测算法若没有检测到相应的初始的文本框,则不需要对初始的文本框作后续的相关处理即可。It is understandable that there may be no text in non-editable graphic images. Therefore, if the corresponding initial text box is not detected based on the preset text box detection algorithm, there is no need to perform subsequent correlation on the initial text box. Just process it.
还可以理解,在采用CTPN文本检测网络对不可编辑的图文类图像进行文
本检测之前,可以对不可编辑的图文类图像进行预处理,便于后续对文本框以及文字的检测。例如,可以对不可编辑的图文类图像进行图像灰度化以及二值化。其中,灰度化可通过OpenCV的函数cv2.cvtColor实现,二值化可通过OpenCV的函数cv2.threshold实现。It can also be understood that when using the CTPN text detection network to text non-editable graphic and text images, Before this detection, non-editable graphics and text images can be preprocessed to facilitate subsequent detection of text boxes and text. For example, non-editable graphics and text images can be grayscaled and binarized. Among them, grayscale can be achieved through the OpenCV function cv2.cvtColor, and binarization can be achieved through the OpenCV function cv2.threshold.
此外,获得文本框后,为了后续生成符合文字倾斜角度和区域的文本框,还需要对相连后的文本框进行矫正,最终得到不可编辑的图文类图像内含有的p个文本框的文本框坐标集合T={T1,T2,...,Tt}。可以理解,相连后的文本框可以以文本框对应的坐标的形式表示,例如文本框对应的对角线的坐标。因此,其中,坐标集合T集合内的元素Tt以(x1,y1,x2,y2)的方式记录,(x1,y1,x2,y2)中x1和x2表示文本框对应的对角线的横坐标,y1和y2表示文本框对应的对角线的纵坐标。In addition, after obtaining the text box, in order to subsequently generate a text box that conforms to the tilt angle and area of the text, the connected text boxes need to be corrected, and finally a text box of p text boxes contained in the non-editable graphic image is obtained. Coordinate set T = {T 1 , T 2 ,..., T t }. It can be understood that the connected text boxes can be expressed in the form of coordinates corresponding to the text boxes, for example, the coordinates of the diagonal lines corresponding to the text boxes. Therefore, among them, the elements T t in the coordinate set T are recorded in the form of (x 1 , y 1 , x 2 , y 2 ), and x 1 and x 2 in (x 1 , y 1 , x 2 , y 2 ) represents the abscissa coordinate of the diagonal line corresponding to the text box, and y 1 and y 2 represent the ordinate coordinate of the diagonal line corresponding to the text box.
步骤420,基于预设文本识别算法,确定每一文本框中包括的文本内容。Step 420: Determine the text content included in each text box based on a preset text recognition algorithm.
具体地,可以采用CRNN文本识别算法作为预设文本识别算法,识别每一文本框中包括的文本内容。首先,CRNN文本识别算法中可以采用MobileNetv3网络,对输入的文本检测框集合T对应的图像区域进行特征提取,得到特征图。其中,输入图像的高度可以为32,宽度可以为任意大于0的数,经过MobileNetv3网络,特征图的高度变为1。其次,CRNN文本识别算法中可以采用Im2Seq网络层,将所得的特征图转变为特征序列的形状,以输入后续的序列模型。然后,将得到的特征序列输入到BiLSTM模型,BiLSTM模型对特征序列进行学习,并利用全连接层获取模型预测标签分布结果。其中,预测标签分布结果中包含识别到的文字以及对应的概率,最后,还可以将预测标签分布结果输入CTC层中,解码得到文本框的坐标集合T所识别的文本内容。Specifically, the CRNN text recognition algorithm can be used as the preset text recognition algorithm to identify the text content included in each text box. First, the CRNN text recognition algorithm can use the MobileNetv3 network to extract features from the image area corresponding to the input text detection box set T to obtain a feature map. Among them, the height of the input image can be 32, and the width can be any number greater than 0. After the MobileNetv3 network, the height of the feature map becomes 1. Secondly, the Im2Seq network layer can be used in the CRNN text recognition algorithm to transform the resulting feature map into the shape of a feature sequence to input into the subsequent sequence model. Then, the obtained feature sequence is input into the BiLSTM model. The BiLSTM model learns the feature sequence and uses the fully connected layer to obtain the model prediction label distribution results. Among them, the predicted label distribution result includes the recognized text and the corresponding probability. Finally, the predicted label distribution result can also be input into the CTC layer and decoded to obtain the text content recognized by the coordinate set T of the text box.
步骤430,根据每一文本框的坐标及每一文本框内的像素直方图,确定每一文本框中的文本颜色。Step 430: Determine the text color in each text box based on the coordinates of each text box and the pixel histogram in each text box.
具体地,可以根据文本框的坐标集合T,在不可编辑的图文类图像中,结合每一文本框的坐标获得每一文本框内的像素直方图,从而获得文本颜色值。Specifically, according to the coordinate set T of the text box, in the non-editable graphic image, the pixel histogram in each text box can be obtained by combining the coordinates of each text box, thereby obtaining the text color value.
步骤440,基于每一文本框的坐标,确定文本框内字体的大小。Step 440: Determine the size of the font in the text box based on the coordinates of each text box.
可以理解,根据每一文本框的坐标可以确定每一文本框的高度,从而可以
根据每一文本框的高度确定文本框内字体的相对大小。It can be understood that the height of each text box can be determined according to the coordinates of each text box, so that Determine the relative size of fonts within text boxes based on the height of each text box.
在其中一个实施例中,如图5所示,基于轮廓检测与形状识别方法,确定不可编辑的图文类图像中的轮廓特征,包括以下步骤:In one embodiment, as shown in Figure 5, based on the contour detection and shape recognition method, determining the contour features in the non-editable graphic image includes the following steps:
步骤510,基于预设轮廓检测算法,确定不可编辑的图文类图像中包括的至少一个轮廓。Step 510: Determine at least one contour included in the non-editable graphic image based on a preset contour detection algorithm.
具体地,步骤510可以包括步骤5101至步骤5103,下面介绍步骤5101至步骤5103。Specifically, step 510 may include step 5101 to step 5103, which are described below.
步骤5101,基于预设轮廓检测算法,确定不可编辑的图文类图像中包括的轮廓的集合。Step 5101: Based on a preset contour detection algorithm, determine a set of contours included in the non-editable graphic image.
具体地,可以采用OpenCV的函数cv2.findContours来进行轮廓检测,获取不可编辑的图文类图像中包括的轮廓的集合。其中,轮廓的集合例如可以为C={C1,C2,...,Cc},其中C1、C2、Cc分别代表一个轮廓。Specifically, OpenCV's function cv2.findContours can be used to perform contour detection and obtain a collection of contours included in non-editable graphics and text images. The set of contours may be, for example, C={C 1 , C 2 ,..., C c }, where C 1 , C 2 , and C c each represent a contour.
可以理解,在对不可编辑的图文类图像进行轮廓检测之前,可以对不可编辑的图文类图像进行预处理,便于后续的轮廓检测。例如,可以对不可编辑的图文类图像进行图像灰度化以及二值化。It can be understood that before performing contour detection on the non-editable graphic and text images, the non-editable graphic and text images can be pre-processed to facilitate subsequent contour detection. For example, non-editable graphics and text images can be grayscaled and binarized.
步骤5102,根据每一轮廓与文本框的坐标,过滤掉与文本框重合的轮廓。Step 5102: Based on the coordinates of each contour and the text box, filter out the contours that overlap the text box.
可以理解,基于预设轮廓检测算法得到的轮廓中不仅包括一些图像的轮廓,还包括文本框的轮廓,因此需要将文本框的轮廓去掉,从而得到不可编辑的图文类图像中图像的轮廓。It can be understood that the contours obtained based on the preset contour detection algorithm include not only the contours of some images, but also the contours of text boxes. Therefore, the contours of the text boxes need to be removed to obtain the contours of the images in the non-editable graphic and text images.
具体地,可以针对上述轮廓的集合C={C1,C2,...,Cc}进行遍历,记当前遍历的轮廓为Cx,针对当前遍历的轮廓Cx,查找轮廓Cx是否与每一文本框的重叠度大于预设阈值,若是,则删除轮廓Cx,得到更新后的轮廓的集合C。Specifically, the above set of contours C={C 1 , C 2 ,..., C c } can be traversed, and the currently traversed contour is recorded as C x . For the currently traversed contour C x , find whether the contour C x The degree of overlap with each text box is greater than the preset threshold. If so, the outline C x is deleted and an updated set of outlines C is obtained.
步骤5103,基于剩余的轮廓确定至少一个轮廓。Step 5103: Determine at least one contour based on the remaining contours.
具体地,将更新后的轮廓的集合C内的轮廓进行连通,具体地,可选用OpenCV的函数approxPolyDP进行轮廓逼近。Specifically, the contours in the updated contour set C are connected. Specifically, the OpenCV function approxPolyDP can be used to perform contour approximation.
步骤520,基于预训练的残差神经网络的形状识别模型,识别至少一个轮廓中的每个轮廓的形状。Step 520: Identify the shape of each contour in at least one contour based on the shape recognition model of the pre-trained residual neural network.
其中,残差神经网络例如可以为ResNet50。具体地,可以随机划分数据集,
得到训练集、验证集和测试集,使用训练集训练ResNet50模型,验证集和测试集分别用于对ResNet50模型的参数进行调整以及性能测试。并且采用多标签分类的交叉熵作为损失函数训练基于残差神经网络的形状识别模型,得到预训练的残差神经网络的形状识别模型。The residual neural network may be ResNet50, for example. Specifically, the data set can be randomly divided, Obtain the training set, verification set and test set, use the training set to train the ResNet50 model, and the verification set and test set are used to adjust the parameters of the ResNet50 model and test the performance respectively. And the cross entropy of multi-label classification is used as the loss function to train the shape recognition model based on the residual neural network, and the pre-trained shape recognition model of the residual neural network is obtained.
可以理解,预训练的残差神经网络的形状识别模型对应的分类类别可以为预先确定好的,分类类别根据常见的形状确定,并且在后续的可视化编辑平台中设置有相关的分类类别的形状,从而便于生成可编辑的形状。具体地,形状可以分为三类:箭线形状类、基础形状类和图像形状类。其中,箭线形状类包括:上箭头、下箭头、直箭头、弯箭头、双向箭头、直线和曲线。基础形状类包括:圆形、三角形、方形、扇形、椭圆、平行四边形和菱形。图像形状类包括为图文档案内呈现的非箭线、非基础类的图像。定义形状识别模型的分类与本发明的图文编辑器所提供的形状类型保持一致。It can be understood that the classification categories corresponding to the shape recognition model of the pre-trained residual neural network can be predetermined. The classification categories are determined based on common shapes, and the shapes of the relevant classification categories are set in the subsequent visual editing platform. This makes it easy to create editable shapes. Specifically, shapes can be divided into three categories: arrow shape categories, basic shape categories, and image shape categories. Among them, the arrow shape categories include: up arrow, down arrow, straight arrow, curved arrow, bidirectional arrow, straight line and curve. Basic shape classes include: circle, triangle, square, sector, ellipse, parallelogram and rhombus. The image shape category includes non-arrow and non-basic images presented in graphic files. The classification of the defined shape recognition model is consistent with the shape types provided by the graphic editor of the present invention.
还可以理解,在识别每一轮廓的形状之前,可以对每一轮廓进行预处理,例如可以对轮廓进行数据增强,包括等比例缩小放大、颜色变换等操作。It can also be understood that before identifying the shape of each contour, each contour can be pre-processed. For example, data enhancement can be performed on the contour, including operations such as scaling down and enlarging, and color transformation.
步骤530,基于每个轮廓的形状的最小外接矩形大小确定形状的相对大小,并根据每个轮廓的形状的预设位置的坐标确定每个轮廓的位置。Step 530: Determine the relative size of the shape based on the minimum circumscribed rectangle size of the shape of each outline, and determine the position of each outline based on the coordinates of the preset position of the shape of each outline.
其中,每个轮廓的形状的预设位置的坐标可以为每个轮廓的形状的左上角和右下角的位置坐标,或者可以为每个轮廓的形状上其他能够标识轮廓的坐标。The coordinates of the preset position of each contour shape may be the position coordinates of the upper left corner and the lower right corner of each contour shape, or may be other coordinates on the shape of each contour that can identify the contour.
可以理解,步骤530给出了轮廓的大小和位置的一种可能的确定方法,也可以采用其他方法确定,本发明对此不再限定。It can be understood that step 530 provides a possible method for determining the size and position of the outline, and other methods can also be used for determination, and the present invention is no longer limited in this regard.
步骤540,确定至少一个轮廓中的每个轮廓的质心坐标。Step 540: Determine the centroid coordinates of each contour in at least one contour.
具体地,可以针对更新后的轮廓的集合C={C1,C2,...,Cc}进行遍历,计算更新后的轮廓的集合C中每一轮廓对应的质心坐标,作为轮廓的质心属性值。具体地,可选用OpenCV的函数cv2.moments计算一阶几何矩得到指定轮廓的质心位置。Specifically, the set of updated contours C = {C 1 , C 2 , ..., C c } can be traversed, and the centroid coordinates corresponding to each contour in the set of updated contours C can be calculated as the coordinates of the contours Centroid attribute value. Specifically, OpenCV's function cv2.moments can be used to calculate the first-order geometric moment to obtain the centroid position of the specified contour.
步骤550,基于每个轮廓的质心坐标对应的颜色确定每个轮廓的颜色。Step 550: Determine the color of each contour based on the color corresponding to the centroid coordinate of each contour.
具体地,根据每个轮廓的质心坐标获得质心坐标的颜色值,作为轮廓的颜色属性值。
Specifically, the color value of the centroid coordinate is obtained according to the centroid coordinate of each contour as the color attribute value of the contour.
在其中一个实施例中,如图6所示,基于最终的结构化数据,生成不可编辑的图文类图像对应的可编辑文档,包括以下步骤:In one embodiment, as shown in Figure 6, generating an editable document corresponding to a non-editable graphic image based on the final structured data includes the following steps:
步骤610,获取并展示最终的结构化数据。Step 610: Obtain and display the final structured data.
可以理解,该步骤可以由一个图文编辑器实现,具体地,最终的结构化数据例如可以如后文图9所示,此处先不做详述。其中,图文编辑器可以为基于Electron自行搭建的图文编辑器。It can be understood that this step can be implemented by a graphic editor. Specifically, the final structured data can be as shown in Figure 9 below, and will not be described in detail here. Among them, the graphic editor can be a self-built graphic editor based on Electron.
步骤620,基于最终的结构化数据,在画布的相应位置生成与轮廓特征对应的图像以及与文本特征对应的文本,确定不可编辑的图文类图像对应的初始可编辑文档。Step 620: Based on the final structured data, generate images corresponding to outline features and text corresponding to text features at corresponding positions on the canvas, and determine the initial editable document corresponding to the non-editable graphic and text images.
具体地,编辑器在渲染可视化界面时,首先根据轮廓特征中的轮廓的形状创建形状,而后根据轮廓的位置坐标决定形状在画布的位置,然后再填充形状颜色,并调整形状大小。在轮廓创建完毕后,会进行文本框的创建,根据结构化数据中文本框的坐标创建文本框,并在文本框内产生识别得到的文字内容,并调整文字颜色与大小,并将上述生成和创建的内容保存,从而生成不可编辑的图文类图像对应的初始可编辑文档。Specifically, when the editor renders the visual interface, it first creates a shape based on the shape of the outline in the outline feature, then determines the position of the shape on the canvas based on the position coordinates of the outline, and then fills the shape color and adjusts the shape size. After the outline is created, the text box will be created, and the text box will be created based on the coordinates of the text box in the structured data, and the recognized text content will be generated in the text box, and the text color and size will be adjusted, and the above generated and The created content is saved, thereby generating an initial editable document corresponding to the non-editable graphic and text image.
在其中一个实施例中,在画布的相应位置创建与轮廓特征对应的图像以及与文本特征对应的文本之后,方法还包括:In one of the embodiments, after creating the image corresponding to the outline feature and the text corresponding to the text feature at corresponding positions on the canvas, the method further includes:
响应于用户的操作,对初始可编辑文档进行新增、修改或删除。In response to user operations, the initial editable document is added, modified, or deleted.
可以理解,在图文编辑器在编辑完成后,可以将初始可编辑文档保存为其他图文编辑器可编辑的格式,并在其他图文编辑器中进一步进行新增、修改、删除等编辑,实现编辑器之间的兼容。在一个可行的实施例中,其他编辑器可以为Visio。It can be understood that after the graphic editor completes editing, the initial editable document can be saved in a format editable by other graphic editors, and further edits such as adding, modifying, and deleting can be performed in other graphic editors. Implement compatibility between editors. In one possible embodiment, the other editor may be Visio.
图7为图文类图像的示意图。如图7所示,包括CT扫描图像,箭头以及多个文本内容以及文本内容对应的背景图形。Figure 7 is a schematic diagram of graphic images. As shown in Figure 7, it includes CT scan images, arrows, multiple text contents, and background graphics corresponding to the text contents.
图8为与图7的图文类图像对应初始的结构化数据的示意图。如图8所示,包括如图7所示的图文类图像中包括的图形的相关信息(shapes)、图像的相关信息(figures)以及文本特征(texts)。FIG. 8 is a schematic diagram of the initial structured data corresponding to the graphic image in FIG. 7 . As shown in FIG. 8 , it includes graphic related information (shapes), image related information (figures) and text features (texts) included in the graphic image as shown in FIG. 7 .
其中,图形的相关信息(shapes)和图像的相关信息(figures)可以对应于
前文的轮廓特征。figures即对应图7中的CT扫描图像,其中,figures:[{id:元素21,path:"fig1.jpg",size:(a21,w21),position:((x41,y41),(x42,y42))}],其中,“path:"fig1.jpg"”为图像的保存路径,“position:((x41,y41),(x42,y42))”为图像在不可编辑的图文类图像中的相对位置,“size:(a21,w21)”为图像的大小。Among them, the related information of graphics (shapes) and the related information of images (figures) can correspond to The contour characteristics of the previous article. Figures correspond to the CT scan images in Figure 7, where figures: [{id: element 21, path: "fig1.jpg", size: (a21, w21), position: ((x41, y41), (x42, y42))}], where "path: "fig1.jpg"" is the saving path of the image, and "position: ((x41, y41), (x42, y42))" is the image in the non-editable graphic and text category The relative position in , "size: (a21, w21)" is the size of the image.
此外,shapes中的{id:元素1,type:RightDirectionalConnector,color:(r1,g1,b1),size:(a1,w1),position:((x1,y1),(x2,y2))}中,“type:RightDirectionalConnector”为轮廓的形状,“color:(r1,g1,b1)”为轮廓的颜色,“size:(a1,w1)”为轮廓的大小,“position:((x1,y1),(x2,y2))}”为轮廓的位置。In addition, {id: element 1, type: RightDirectionalConnector, color: (r1, g1, b1), size: (a1, w1), position: ((x1, y1), (x2, y2))} in shapes , "type: RightDirectionalConnector" is the shape of the outline, "color: (r1, g1, b1)" is the color of the outline, "size: (a1, w1)" is the size of the outline, "position: ((x1, y1) , (x2, y2))}” is the position of the outline.
可以理解,shapes中的其他项与前述的id为元素1的项的情况类似,为了简洁,此处不再赘述。It can be understood that other items in shapes are similar to the aforementioned item with the id of element 1. For the sake of brevity, they will not be described again here.
同理,texts中的{id:元素13,content:"图像编码器",color:(r13,g13,b13),size:(a13,w13),position:((x25,y25),(x26,y26))}中,“content:"图像编码器"”为文本的内容,“color:(r13,g13,b13)”为文本的颜色,“size:(a13,w13)”为字体的大小,“position:((x25,y25),(x26,y26))”为文本框的位置坐标。texts中的其他项与前述的id为元素13的项的情况类似,为了简洁,此处不再赘述。Similarly, {id in texts: element 13, content: "Image Encoder", color: (r13, g13, b13), size: (a13, w13), position: ((x25, y25), (x26, y26))}, "content: "Image Encoder"" is the content of the text, "color: (r13, g13, b13)" is the color of the text, "size: (a13, w13)" is the font size, "position: ((x25, y25), (x26, y26))" is the position coordinate of the text box. The other items in texts are similar to the aforementioned item with the id of element 13. For the sake of brevity, they will not be described again here.
图9为与图7的图文类图像对应的最终的结构化数据的示意图。如图9所示,除了包括如图8所示的图文类图像中包括的图形的相关信息、图像的相关信息以及文本特征,还包括两个元素之间的关系。以元素1为例,其对应的最终的结构化数据为:“自身属性:[type:RightDirectionalConnector,color:(r1,g1,b1),size:(a1,w1),position:((x1,y1),(x2,y2))],关系:[[组合对象:元素2,组合关系:关联],[组合对象:元素19,组合关系:环绕],[组合对象:元素21,组合关系:关联]]”。与图8相比,图9示出的最终的结构化数据还多了与元素1相关的元素及两个元素之间的关系:“关系:[[组合对象:元素2,组合关系:关联],[组合对象:元素19,组合关系:环绕],[组合对象:元素21,组合关系:关联]]”。FIG. 9 is a schematic diagram of the final structured data corresponding to the graphic image in FIG. 7 . As shown in Figure 9, in addition to the graphics-related information, image-related information and text features included in the graphic image as shown in Figure 8, it also includes the relationship between the two elements. Taking element 1 as an example, its corresponding final structured data is: "Self attributes: [type: RightDirectionalConnector, color: (r1, g1, b1), size: (a1, w1), position: ((x1, y1 ), (x2, y2))], relationship: [[combination object: element 2, combination relationship: association], [combination object: element 19, combination relationship: surround], [combination object: element 21, combination relationship: association ]]". Compared with Figure 8, the final structured data shown in Figure 9 also has more elements related to element 1 and the relationship between the two elements: "Relationship: [[Combined object: element 2, combined relationship: association] , [Combined object: Element 19, Combination relationship: Surround], [Combined object: Element 21, Combination relationship: Association]]".
图10为本发明提供的图文编辑器的显示界面的示意图。如图10所示,可
以打开并显示不可编辑的图文类图像对应的可编辑文档,并对可编辑文档中的内容做一些简单的调整,例如对图像中的字体的大小、颜色以及位置进行调整,并且可以选择线条的粗细,还可以调整可编辑文档中的图文图像整体的布局,并增加或修改一些新的形状。Figure 10 is a schematic diagram of the display interface of the graphic editor provided by the present invention. As shown in Figure 10, it can be to open and display the editable document corresponding to the non-editable graphic image, and make some simple adjustments to the content in the editable document, such as adjusting the size, color and position of the font in the image, and you can select lines The thickness, you can also adjust the overall layout of graphics, text and images in the editable document, and add or modify some new shapes.
下面对本发明提供的基于不可编辑的图文类图像生成可编辑文档的装置进行描述,下文描述的基于不可编辑的图文类图像生成可编辑文档的装置与上文描述的基于不可编辑的图文类图像生成可编辑文档的方法可相互对应参照。The device for generating editable documents based on non-editable graphic and text images provided by the present invention is described below. The device for generating editable documents based on non-editable graphic and text images described below is different from the device based on non-editable graphic and text images described above. Methods for generating editable documents from class images can be referenced in correspondence with each other.
如图11所示,在一个实施例中,提供了一种基于不可编辑的图文类图像生成可编辑文档的装置,该基于不可编辑的图文类图像生成可编辑文档的装置可以包括:As shown in Figure 11, in one embodiment, a device for generating an editable document based on a non-editable graphic image is provided. The device for generating an editable document based on a non-editable graphic image may include:
获取模块1110,用于获取不可编辑的图文类图像;The acquisition module 1110 is used to acquire non-editable graphic and text images;
第一确定模块1120,用于从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小;The first determination module 1120 is used to extract outline features from the non-editable graphic and text images; extract text features from the non-editable graphic and text images; wherein the outline features include the shape of the outline, Position, size and color within the outline; the text features include text box coordinates, text content, text color and font size;
第一生成模块1130,用于根据所述轮廓特征和所述文本特征生成初始的结构化数据;The first generation module 1130 is used to generate initial structured data according to the outline features and the text features;
第二确定模块1140,用于基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系;其中,所述预训练好的元素关系分类模型基于两个元素的轮廓特征或文本特征轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定;The second determination module 1140 is used to determine the relationship between two elements in the non-editable graphic image based on the pre-trained element relationship classification model and the outline features and the text features; wherein, The above-mentioned pre-trained element relationship classification model is determined after training based on a data set composed of the outline features or text features of the two elements, the outline features and/or text features, and the relationship labels of the two elements;
补充模块1150,用于基于两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据;The supplement module 1150 is used to supplement the initial structured data based on the relationship between the two elements to obtain the final structured data;
第二生成模块1160,用于基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。The second generation module 1160 is configured to generate an editable document corresponding to the non-editable graphic image based on the final structured data.
本发明提供的基于不可编辑的图文类图像生成可编辑文档的装置,通过确定不可编辑的图文类图像轮廓特征和文本特征,从而可以确定不可编辑的图文类图像中包含的元素以及对应的属性,并且根据这些元素和对应的属性生成初
始的结构化数据,然后进一步基于预训练好的元素关系分类模型确定两个元素之间的关系,并将两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据,进而基于最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档,从而不再依赖人工将不可编辑的图文类图像转化为可编辑文档,从而实现高效、准确地将不可编辑的图文类图像转化为可编辑文档。The device provided by the present invention for generating editable documents based on non-editable graphic and text images can determine the elements contained in the non-editable graphic and text images and their corresponding elements by determining the outline features and text features of the non-editable graphic and text images. attributes, and generate initial The initial structured data is then further determined based on the pre-trained element relationship classification model to determine the relationship between the two elements, and the relationship between the two elements is supplemented to the initial structured data to obtain the final structure. ized data, and then based on the final structured data, editable documents corresponding to the non-editable graphic and text images are generated, thereby no longer relying on manual conversion of non-editable graphic and text images into editable documents, thereby achieving efficient, Accurately convert non-editable graphics and text images into editable documents.
在其中一个实施例中,所述第二确定模块1140包括:In one embodiment, the second determining module 1140 includes:
第一确定单元,用于基于每个预训练好的二分类模型以及所述轮廓特征和所述文本特征,确定不可编辑的图文类图像中每两个元素之间的关系的分类结果;A first determination unit, configured to determine the classification result of the relationship between each two elements in the non-editable graphic and text image based on each pre-trained two-classification model and the contour features and the text features;
第二确定单元,用于基于确定的多个分类结果中概率值最大的分类结果,确定不可编辑的图文类图像中每两个元素之间的关系的最终分类结果。The second determination unit is used to determine the final classification result of the relationship between each two elements in the non-editable graphic and text image based on the classification result with the largest probability value among the plurality of determined classification results.
在其中一个实施例中,第二确定模块1140还包括:In one embodiment, the second determination module 1140 further includes:
获取单元,用于获取多个不可编辑的图文类图像的轮廓特征和文本特征;The acquisition unit is used to acquire the outline features and text features of multiple non-editable graphic and text images;
第三确定单元,用于基于每个不可编辑的图文类图像中的每两个元素对应的轮廓特征和/或文本特征,确定数据集;并基于每两个元素的关系确定每两个元素的关系标签;The third determination unit is used to determine the data set based on the contour features and/or text features corresponding to each two elements in each non-editable graphic image; and determine each two elements based on the relationship between each two elements. relationship tag;
第四确定单元,用于基于所述每两个元素的关系标签及其对应的二分类模型,将数据集的样本确定为正样本和负样本;The fourth determination unit is used to determine the samples of the data set as positive samples and negative samples based on the relationship labels of each two elements and their corresponding binary classification models;
训练单元,用于基于正样本和负样本训练对应的二分类模型,得到预训练好的二分类模型。The training unit is used to train the corresponding two-classification model based on positive samples and negative samples to obtain a pre-trained two-classification model.
在其中一个实施例中,所述第一确定模块1120包括:In one embodiment, the first determining module 1120 includes:
第五确定单元,用于基于预设文本框检测算法,确定所述不可编辑的图文类图像中包括的文本框及其坐标;The fifth determination unit is used to determine the text box and its coordinates included in the non-editable graphic image based on a preset text box detection algorithm;
第六确定单元,用于基于预设文本识别算法,确定每一文本框中包括的文本内容;The sixth determination unit is used to determine the text content included in each text box based on the preset text recognition algorithm;
第七确定单元,用于根据每一文本框的坐标及每一文本框内的像素直方图,确定每一文本框中的文本颜色;The seventh determination unit is used to determine the text color in each text box based on the coordinates of each text box and the pixel histogram in each text box;
第八确定单元,用于基于每一文本框的坐标,确定文本框内字体的大小。
The eighth determination unit is used to determine the size of the font in the text box based on the coordinates of each text box.
在其中一个实施例中,所述第一确定模块1120还包括:In one embodiment, the first determining module 1120 further includes:
第九确定单元,用于基于预设轮廓检测算法,确定所述不可编辑的图文类图像中包括的至少一个轮廓;A ninth determination unit, configured to determine at least one contour included in the non-editable graphic image based on a preset contour detection algorithm;
识别单元,用于基于预训练的残差神经网络的形状识别模型,识别所述至少一个轮廓中的每个轮廓的形状;a recognition unit configured to recognize the shape of each contour in the at least one contour based on a shape recognition model based on a pre-trained residual neural network;
第十确定单元,用于基于每个轮廓的形状的最小外接矩形大小确定形状的相对大小,并根据每个轮廓的形状的预设位置的坐标确定每个轮廓的位置;a tenth determination unit, configured to determine the relative size of the shape based on the minimum circumscribed rectangle size of the shape of each outline, and determine the position of each outline based on the coordinates of the preset position of the shape of each outline;
第十一确定单元,用于基于每个轮廓的质心坐标对应的颜色确定所述每个轮廓的颜色。An eleventh determination unit, configured to determine the color of each contour based on the color corresponding to the centroid coordinate of each contour.
在其中一个实施例中,所述第九确定单元包括:In one embodiment, the ninth determining unit includes:
第十二确定单元,用于基于预设轮廓检测算法,确定所述不可编辑的图文类图像中包括的轮廓的集合;A twelfth determination unit, configured to determine a set of contours included in the non-editable graphic image based on a preset contour detection algorithm;
过滤单元,用于根据所述每一轮廓与所述文本框的坐标,过滤掉与所述文本框重合的轮廓;A filtering unit configured to filter out the contours that coincide with the text box according to the coordinates of each contour and the text box;
第十三确定单元,用于基于剩余的轮廓确定至少一个轮廓。A thirteenth determination unit, configured to determine at least one contour based on the remaining contours.
在其中一个实施例中,所述第二生成模块1160包括:In one embodiment, the second generation module 1160 includes:
获取及展示单元,用于获取并展示所述结构化数据;An acquisition and display unit is used to acquire and display the structured data;
生成单元,用于基于所述最终的结构化数据,在画布的相应位置生成与所述轮廓特征对应的图像以及与所述文本特征对应的文本,确定所述不可编辑的图文类图像对应的初始可编辑文档。A generating unit configured to generate, based on the final structured data, an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position on the canvas, and determine the image corresponding to the non-editable graphic and text image. Initial editable document.
图12示例了一种电子设备的实体结构示意图,如图12所示,该电子设备可以包括:处理器(processor)1210、通信接口(communications interface)1220、存储器(memory)1230和通信总线1240,其中,处理器1210,通信接口1220,存储器1230通过通信总线1240完成相互间的通信。处理器1210可以调用存储器1230中的逻辑指令,以执行基于不可编辑的图文类图像生成可编辑文档的方法,该方法包括:获取不可编辑的图文类图像;从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括
文本框坐标、文本内容、文本颜色以及字体大小;根据所述轮廓特征和所述文本特征生成初始的结构化数据;基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系;其中,所述预训练好的元素关系分类模型基于两个元素的轮廓特征或文本特征轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定;基于两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据;基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。Figure 12 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 12, the electronic device may include: a processor (processor) 1210, a communications interface (communications interface) 1220, a memory (memory) 1230 and a communication bus 1240. Among them, the processor 1210, the communication interface 1220, and the memory 1230 complete communication with each other through the communication bus 1240. The processor 1210 can call logical instructions in the memory 1230 to execute a method of generating an editable document based on a non-editable graphic image. The method includes: obtaining a non-editable graphic image; and generating an editable document from the non-editable graphic image. Extract outline features from the class image; extract text features from the non-editable graphic image; wherein the outline features include the shape, position, size and color of the outline; the text features include Text box coordinates, text content, text color and font size; generating initial structured data according to the outline features and the text features; based on the pre-trained element relationship classification model and the outline features and the text features, Determine the relationship between two elements in the non-editable graphic image; wherein the pre-trained element relationship classification model is based on the outline features or text features of the two elements, and The data set consisting of the relationship tags of the two elements is determined after training; based on the relationship between the two elements, it is added to the initial structured data to obtain the final structured data; based on the final structured data, a generated The editable document corresponding to the non-editable graphic image.
此外,上述的存储器1230中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,read-only memory)、随机存取存储器(RAM,random access memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 1230 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program code. .
另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法,所述基于不可编辑的图文类图像生成可编辑文档的方法包括:获取不可编辑的图文类图像;从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小;根据所述轮廓特征和所述文本特征生成初始的结构化数据;基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系;其中,所述预训练好的元素关系分类模型基于两个元素的
轮廓特征或文本特征轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定;基于两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据;基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。On the other hand, the present invention also provides a computer program product. The computer program product includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions. When the program instructions are read by a computer, When executed, the computer can execute the method for generating an editable document based on a non-editable graphic image provided by the present invention. The method for generating an editable document based on a non-editable graphic image includes: obtaining the non-editable graphic image. image; extract contour features from the non-editable graphic and text images; extract text features from the non-editable graphic and text images; wherein the contour features include the shape, position, size and content of the contour color; the text features include text box coordinates, text content, text color and font size; generate initial structured data according to the outline features and text features; based on the pre-trained element relationship classification model and the The outline features and the text features are used to determine the relationship between the two elements in the non-editable graphic image; wherein the pre-trained element relationship classification model is based on the relationship between the two elements. The data set composed of contour features or text features, contour features and/or text features, and the relationship labels of the two elements is determined after training; based on the relationship between the two elements, it is added to the initial structured data to obtain the final Structured data; based on the final structured data, generate an editable document corresponding to the non-editable graphic image.
又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行本发明提供的基于不可编辑的图文类图像生成可编辑文档的方法,所述基于不可编辑的图文类图像生成可编辑文档的方法包括:获取不可编辑的图文类图像;从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小;根据所述轮廓特征和所述文本特征生成初始的结构化数据;基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系;其中,所述预训练好的元素关系分类模型基于两个元素的轮廓特征或文本特征轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定;基于两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据;基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。On the other hand, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by a processor to perform the non-editable graphics-based image generation provided by the present invention. A method for editing an editable document. The method for generating an editable document based on a non-editable graphic image includes: obtaining a non-editable graphic image; extracting outline features from the non-editable graphic image; Text features are extracted from the non-editable graphic images; wherein the outline features include the shape, position, size and color of the outline; the text features include text box coordinates, text content, text color and Font size; generate initial structured data according to the outline features and the text features; determine the non-editable graphic and text images based on a pre-trained element relationship classification model and the outline features and the text features The relationship between two elements in; wherein, the pre-trained element relationship classification model is based on a data set composed of the outline features or text features of the two elements, the outline features and/or text features, and the relationship labels of the two elements. Determine after training; add to the initial structured data based on the relationship between the two elements to obtain the final structured data; generate the non-editable graphic and text image correspondence based on the final structured data editable document.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备
(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, optical disc, etc., including a number of instructions for causing a computer device to (can be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or certain parts of the embodiments.
可以理解,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。
It can be understood that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still make various modifications to the foregoing embodiments. The technical solutions described in the embodiments may be modified, or some of the technical features thereof may be equivalently substituted; however, these modifications or substitutions shall not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of each embodiment of the present invention.
Claims (10)
- 一种基于不可编辑的图文类图像生成可编辑文档的方法,其特征在于,所述方法包括:A method of generating editable documents based on non-editable graphic images, characterized in that the method includes:获取不可编辑的图文类图像;Get non-editable graphic and text images;从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小;Extract outline features from the non-editable graphic and text images; extract text features from the non-editable graphic and text images; wherein the outline features include the shape, position, size and color within the outline. ;The text features include text box coordinates, text content, text color and font size;根据所述轮廓特征和所述文本特征生成初始的结构化数据;Generate initial structured data according to the outline features and the text features;基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系;其中,所述预训练好的元素关系分类模型基于两个元素的轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定;Based on the pre-trained element relationship classification model and the outline features and the text features, the relationship between the two elements in the non-editable graphic image is determined; wherein the pre-trained element relationship classification The model is determined after training based on a data set composed of the outline features and/or text features of the two elements, and the relationship labels of the two elements;基于两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据;Supplement the initial structured data based on the relationship between the two elements to obtain the final structured data;基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。Based on the final structured data, an editable document corresponding to the non-editable graphic image is generated.
- 如权利要求1所述的基于不可编辑的图文类图像生成可编辑文档的方法,其特征在于,所述预训练好的元素关系分类模型中包括多个预训练好的二分类模型,相应的,所述基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系,包括:The method of generating editable documents based on non-editable graphics and text images according to claim 1, wherein the pre-trained element relationship classification model includes a plurality of pre-trained two-classification models, corresponding to , based on the pre-trained element relationship classification model and the outline features and the text features, determining the relationship between the two elements in the non-editable graphic image includes:基于每个预训练好的二分类模型以及所述轮廓特征和所述文本特征,确定不可编辑的图文类图像中每两个元素之间的关系的分类结果;Based on each pre-trained binary classification model and the contour features and the text features, determine the classification result of the relationship between each two elements in the non-editable graphic image;基于确定的多个分类结果中概率值最大的分类结果,确定不可编辑的图文类图像中每两个元素之间的关系的最终分类结果。Based on the classification result with the largest probability value among the multiple determined classification results, the final classification result of the relationship between each two elements in the non-editable graphic and text image is determined.
- 如权利要求2所述的基于不可编辑的图文类图像生成可编辑文档的方法,其特征在于,基于两个元素的轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定预训练好的元素关系分类模型的过程,包括: The method for generating editable documents based on non-editable graphics and text images as claimed in claim 2, characterized in that a data set is composed of outline features and/or text features of two elements and relationship tags of the two elements. The process of determining the pre-trained element relationship classification model after training includes:获取多个不可编辑的图文类图像的轮廓特征和文本特征;Obtain the outline features and text features of multiple non-editable graphic and text images;基于每个不可编辑的图文类图像中的每两个元素对应的轮廓特征和/或文本特征,确定数据集;并基于每两个元素的关系确定每两个元素的关系标签;Determine the data set based on the contour features and/or text features corresponding to each two elements in each non-editable graphic image; and determine the relationship label of each two elements based on the relationship between each two elements;基于所述每两个元素的关系标签及其对应的二分类模型,将数据集的样本确定为正样本和负样本;Based on the relationship labels of each two elements and their corresponding binary classification models, the samples of the data set are determined as positive samples and negative samples;基于正样本和负样本训练对应的二分类模型,得到预训练好的二分类模型。Based on the positive samples and negative samples, the corresponding two-classification model is trained to obtain the pre-trained two-classification model.
- 如权利要求1所述的基于不可编辑的图文类图像生成可编辑文档的方法,其特征在于,所述基于文本检测和文本识别方法确定所述不可编辑的图文类图像中的文本特征,包括:The method of generating an editable document based on a non-editable graphic image according to claim 1, wherein the text features in the non-editable graphic image are determined based on the text detection and text recognition method, include:基于预设文本框检测算法,确定所述不可编辑的图文类图像中包括的文本框及其坐标;Based on a preset text box detection algorithm, determine the text box and its coordinates included in the non-editable graphic image;基于预设文本识别算法,确定每一文本框中包括的文本内容;Based on a preset text recognition algorithm, determine the text content included in each text box;根据每一文本框的坐标及每一文本框内的像素直方图,确定每一文本框中的文本颜色;Determine the text color in each text box based on the coordinates of each text box and the pixel histogram in each text box;基于每一文本框的坐标,确定文本框内字体的大小。Based on the coordinates of each text box, determine the size of the font within the text box.
- 如权利要求4所述的基于不可编辑的图文类图像生成可编辑文档的方法,其特征在于,所述基于轮廓检测与形状识别方法,确定所述不可编辑的图文类图像中的轮廓特征,包括:The method for generating editable documents based on non-editable graphics and text images according to claim 4, wherein the contour features in the non-editable graphics and text images are determined based on the contour detection and shape recognition method. ,include:基于预设轮廓检测算法,确定所述不可编辑的图文类图像中包括的至少一个轮廓;Determine at least one contour included in the non-editable graphic and text-based image based on a preset contour detection algorithm;基于预训练的残差神经网络的形状识别模型,识别所述至少一个轮廓中的每个轮廓的形状;A shape recognition model based on a pre-trained residual neural network, identifying the shape of each of the at least one contour;基于每个轮廓的形状的最小外接矩形大小确定形状的相对大小,并根据每个轮廓的形状的预设位置的坐标确定每个轮廓的位置;determining the relative size of the shape based on the size of the smallest enclosing rectangle of the shape of each outline, and determining the position of each outline based on the coordinates of a preset position of the shape of each outline;基于每个轮廓的质心坐标对应的颜色确定所述每个轮廓的颜色。The color of each contour is determined based on the color corresponding to the centroid coordinate of each contour.
- 如权利要求5所述的基于不可编辑的图文类图像生成可编辑文档的方法,其特征在于,所述基于预设轮廓检测算法,确定所述不可编辑的图文类图像中包括的至少一个轮廓;包括: The method for generating an editable document based on a non-editable graphic image according to claim 5, characterized in that, based on a preset contour detection algorithm, at least one of the non-editable graphic images is determined. Outline; includes:基于预设轮廓检测算法,确定所述不可编辑的图文类图像中包括的轮廓的集合;Based on a preset contour detection algorithm, determine a set of contours included in the non-editable graphic image;根据所述每一轮廓与所述文本框的坐标,过滤掉与所述文本框重合的轮廓;According to the coordinates of each contour and the text box, filter out the contours that coincide with the text box;基于剩余的轮廓确定至少一个轮廓。At least one contour is determined based on the remaining contours.
- 如权利要求1述的基于不可编辑的图文类图像生成可编辑文档的方法,其特征在于,基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档,包括:The method of generating an editable document based on a non-editable graphic image as claimed in claim 1, wherein an editable document corresponding to the non-editable graphic image is generated based on the final structured data, include:获取并展示所述最终的结构化数据;Obtain and display said final structured data;基于所述最终的结构化数据,在画布的相应位置生成与所述轮廓特征对应的图像以及与所述文本特征对应的文本,确定所述不可编辑的图文类图像对应的初始可编辑文档。Based on the final structured data, an image corresponding to the outline feature and a text corresponding to the text feature are generated at corresponding positions on the canvas, and an initial editable document corresponding to the non-editable graphic image is determined.
- 一种基于不可编辑的图文类图像生成可编辑文档的装置,其特征在于,所述装置包括:A device for generating editable documents based on non-editable graphic images, characterized in that the device includes:获取模块,用于获取不可编辑的图文类图像;Acquisition module, used to obtain non-editable graphic and text images;第一确定模块,用于从所述不可编辑的图文类图像中提取轮廓特征;从所述不可编辑的图文类图像中提取文本特征;其中,所述轮廓特征中包括轮廓的形状、位置、大小以及轮廓内的颜色;所述文本特征中包括文本框坐标、文本内容、文本颜色以及字体大小;The first determination module is used to extract contour features from the non-editable graphic image; extract text features from the non-editable graphic image; wherein the contour features include the shape and position of the contour , size and color within the outline; the text features include text box coordinates, text content, text color and font size;第一生成模块,用于根据所述轮廓特征和所述文本特征生成初始的结构化数据;A first generation module, configured to generate initial structured data according to the outline features and the text features;第二确定模块,用于基于预训练好的元素关系分类模型以及所述轮廓特征和所述文本特征,确定所述不可编辑的图文类图像中两个元素之间的关系;其中,所述预训练好的元素关系分类模型基于两个元素的轮廓特征或文本特征轮廓特征和/或文本特征、以及两个元素的关系标签组成的数据集训练后确定;The second determination module is used to determine the relationship between two elements in the non-editable graphic image based on the pre-trained element relationship classification model and the outline features and the text features; wherein, the The pre-trained element relationship classification model is determined after training based on a data set composed of the outline features or text features of the two elements, the outline features and/or text features, and the relationship labels of the two elements;补充模块,用于基于两个元素之间的关系补充至所述初始的结构化数据中,得到最终的结构化数据;A supplementary module, used to supplement the initial structured data based on the relationship between the two elements to obtain the final structured data;第二生成模块,用于基于所述最终的结构化数据,生成所述不可编辑的图文类图像对应的可编辑文档。 The second generation module is configured to generate an editable document corresponding to the non-editable graphic and text image based on the final structured data.
- 一种计算机设备,包括存储器和处理器,其特征在于,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如权利要求1至7中任一项权利要求所述基于不可编辑的图文类图像生成可编辑文档的方法的步骤。A computer device, including a memory and a processor, characterized in that computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes the instructions as claimed in claim 1 The steps of the method for generating an editable document based on a non-editable graphic image according to any one of claims 1 to 7.
- 一种存储有计算机可读指令的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至7中任一项权利要求所述基于不可编辑的图文类图像生成可编辑文档的方法的步骤。 A storage medium storing computer-readable instructions, characterized in that, when the computer-readable instructions are executed by one or more processors, they cause the one or more processors to execute any one of claims 1 to 7 The steps of the method for generating editable documents based on non-editable graphic and textual images described in the claim.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211036598.0A CN115392188A (en) | 2022-08-23 | 2022-08-23 | Method and device for generating editable document based on non-editable image-text images |
CN202211036598.0 | 2022-08-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024041032A1 true WO2024041032A1 (en) | 2024-02-29 |
Family
ID=84121969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/092757 WO2024041032A1 (en) | 2022-08-23 | 2023-05-08 | Method and device for generating editable document based on non-editable graphics-text image |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115392188A (en) |
WO (1) | WO2024041032A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115392188A (en) * | 2022-08-23 | 2022-11-25 | 杭州未名信科科技有限公司 | Method and device for generating editable document based on non-editable image-text images |
US20240169143A1 (en) * | 2022-11-18 | 2024-05-23 | Microsoft Technology Licensing, Llc | Method and system of generating an editable document from a non-editable document |
CN115640788B (en) * | 2022-12-23 | 2023-03-21 | 北京左医科技有限公司 | Method and device for structuring non-editable document |
CN117576713B (en) * | 2023-10-16 | 2024-09-13 | 国网湖北省电力有限公司经济技术研究院 | Electric network infrastructure archive electronic intelligent identification method and device based on improved LSTM-CTC |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399386A (en) * | 2018-02-26 | 2018-08-14 | 阿博茨德(北京)科技有限公司 | Information extracting method in pie chart and device |
CN108416377A (en) * | 2018-02-26 | 2018-08-17 | 阿博茨德(北京)科技有限公司 | Information extracting method in block diagram and device |
US20200104586A1 (en) * | 2018-09-28 | 2020-04-02 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for manual editing of character recognition results |
CN113221711A (en) * | 2021-04-30 | 2021-08-06 | 北京金山数字娱乐科技有限公司 | Information extraction method and device |
CN115392188A (en) * | 2022-08-23 | 2022-11-25 | 杭州未名信科科技有限公司 | Method and device for generating editable document based on non-editable image-text images |
-
2022
- 2022-08-23 CN CN202211036598.0A patent/CN115392188A/en active Pending
-
2023
- 2023-05-08 WO PCT/CN2023/092757 patent/WO2024041032A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399386A (en) * | 2018-02-26 | 2018-08-14 | 阿博茨德(北京)科技有限公司 | Information extracting method in pie chart and device |
CN108416377A (en) * | 2018-02-26 | 2018-08-17 | 阿博茨德(北京)科技有限公司 | Information extracting method in block diagram and device |
US20200104586A1 (en) * | 2018-09-28 | 2020-04-02 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for manual editing of character recognition results |
CN113221711A (en) * | 2021-04-30 | 2021-08-06 | 北京金山数字娱乐科技有限公司 | Information extraction method and device |
CN115392188A (en) * | 2022-08-23 | 2022-11-25 | 杭州未名信科科技有限公司 | Method and device for generating editable document based on non-editable image-text images |
Also Published As
Publication number | Publication date |
---|---|
CN115392188A (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2024041032A1 (en) | Method and device for generating editable document based on non-editable graphics-text image | |
CN109902622B (en) | Character detection and identification method for boarding check information verification | |
Karthick et al. | Steps involved in text recognition and recent research in OCR; a study | |
US9268999B2 (en) | Table recognizing method and table recognizing system | |
Wilkinson et al. | Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections | |
CN112418216A (en) | Method for detecting characters in complex natural scene image | |
CN111460782A (en) | Information processing method, device and equipment | |
EP4200741A1 (en) | System and method to extract information from unstructured image documents | |
CN114463767A (en) | Credit card identification method, device, computer equipment and storage medium | |
CN113591746A (en) | Document table structure detection method and device | |
CN112686265A (en) | Hierarchic contour extraction-based pictograph segmentation method | |
JP7364639B2 (en) | Processing of digitized writing | |
CN115311666A (en) | Image-text recognition method and device, computer equipment and storage medium | |
CN116912865A (en) | Form image recognition method, device, equipment and medium | |
Shreya et al. | Optical character recognition using convolutional neural network | |
Yuan et al. | An opencv-based framework for table information extraction | |
CN114581928A (en) | Form identification method and system | |
CN118135584A (en) | Automatic handwriting form recognition method and system based on deep learning | |
CN114399782B (en) | Text image processing method, apparatus, device, storage medium, and program product | |
CN117576699A (en) | Locomotive work order information intelligent recognition method and system based on deep learning | |
Liang et al. | Robust table recognition for printed document images | |
CN114494678A (en) | Character recognition method and electronic equipment | |
CN113569608A (en) | Text recognition method, device and equipment based on deep learning and storage medium | |
CN117095423B (en) | Bank bill character recognition method and device | |
WO2024092957A1 (en) | Text area determination method and system, and related apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23856126 Country of ref document: EP Kind code of ref document: A1 |