CN118193463A - OFD plate type drift automatic correction method and system based on multi-flexible interpolation description - Google Patents
OFD plate type drift automatic correction method and system based on multi-flexible interpolation description Download PDFInfo
- Publication number
- CN118193463A CN118193463A CN202410600722.4A CN202410600722A CN118193463A CN 118193463 A CN118193463 A CN 118193463A CN 202410600722 A CN202410600722 A CN 202410600722A CN 118193463 A CN118193463 A CN 118193463A
- Authority
- CN
- China
- Prior art keywords
- text
- coordinate
- ofd
- target
- path
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012937 correction Methods 0.000 title claims abstract description 149
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 238000009826 distribution Methods 0.000 claims description 53
- 230000001174 ascending effect Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 5
- 238000013075 data extraction Methods 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 13
- 230000006870 function Effects 0.000 description 31
- 238000004891 communication Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000003860 storage Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 3
- 230000000740 bleeding effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012887 quadratic function Methods 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000407 epitaxy Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Document Processing Apparatus (AREA)
- Processing Or Creating Images (AREA)
Abstract
The application relates to an OFD plate drift automatic correction method and system based on multi-flexible interpolation description, which are applied to the technical field of document processing, wherein the method comprises the following steps: acquiring a target OFD template and target OFD document data; extracting a text object, a text object coordinate, a path object and a path object coordinate of a target OFD template; and extracting text data in the target OFD document data. Binding the text data with the text object coordinates to generate text coordinates, and connecting the text coordinates in series to form a text coordinate sequence; the path coordinates are concatenated to form a sequence of path coordinates. Generating a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence; generating ideal coordinate values based on a preset multi-flexible interpolation description algorithm and a coordinate sequence to be used; and carrying out drift correction on the target OFD template according to the ideal coordinate values to generate a correction document. The application has the effect of improving the adjustment efficiency.
Description
Technical Field
The application relates to the technical field of document processing, in particular to an OFD plate type drift automatic correction method and system based on multi-flexible interpolation description.
Background
The OFD is an open document format standard facing office application, has the advantages of openness, safety, authority control and the like, and is widely applied to plate-type document generation in the scenes of bank bills, electronic invoices and the like.
However, due to the fact that the template design is not accurate enough or the content length of the register data exceeds the expected value, the generated OFD plate-type document often has the problem of drifting, namely, the page object deviates from the expected position, and the readability and the neatness of the document are affected.
At present, a manual correction mode is generally adopted for format adjustment, namely, the position or the font size of each drifting object is manually adjusted, the labor is huge, the time consumption of manual correction is long, the problems of omission or inaccurate correction and the like are easy to occur, and the overall adjustment efficiency is low.
Disclosure of Invention
In order to improve the adjustment efficiency, the application provides an OFD plate drift automatic correction method and system based on multi-flexible interpolation description.
In a first aspect, the application provides an OFD plate drift automatic correction method based on multi-flexible interpolation description, which adopts the following technical scheme:
an OFD plate drift automatic correction method based on multi-flexible interpolation description comprises the following steps:
Acquiring a target OFD template and target OFD document data;
extracting a text object, a text object coordinate, a path object and a path object coordinate of the target OFD template;
extracting text data in the target OFD document data;
Binding the text data with the text object based on the text object coordinates to generate text coordinates, and connecting the text coordinates in series to form a text coordinate sequence;
connecting the path object coordinates in series to generate a path coordinate sequence;
Generating a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence;
generating ideal coordinate values based on a preset multi-flexible interpolation description algorithm and the coordinate sequence to be used;
and carrying out drift correction on the target OFD template according to the ideal coordinate values to generate a correction document.
By adopting the technical scheme, manual intervention is not needed, the ideal distribution of text objects in the target OFD template is automatically calculated by utilizing an algorithm, drift correction is automatically completed, the speed and quality of the drift correction are improved, the neatness and the attractiveness of a document are ensured, and therefore, efficient and accurate drift correction is realized.
Optionally, the extracting the text object, the text object coordinates, the path object and the path object coordinates of the target OFD template includes:
Analyzing a page object layer of the target OFD template, and determining a text object and a path object of the target OFD template;
Acquiring a coordinate origin of the target OFD template;
the text object coordinates and the path object coordinates are determined based on the origin of coordinates.
Optionally, 3. The generating the coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence includes:
Acquiring an X coordinate value of the text coordinate;
arranging the text coordinates in ascending order according to the X coordinate values to generate a text coordinate sequence to be used;
Acquiring an X coordinate value of the path coordinate;
and arranging the path coordinates in ascending order according to the X coordinate values to generate a path coordinate sequence to be used.
Optionally, the generating the ideal coordinate value based on the preset multi-flexible interpolation description algorithm and the coordinate sequence to be used includes:
calculating the text coordinate sequence to be used and the path coordinate sequence to be used based on the preset multi-flexible interpolation description algorithm to generate an objective function;
Determining a target adjustment value based on the objective function;
and determining ideal coordinate values based on the target adjustment value.
Optionally, the method further comprises:
Acquiring the distribution characteristics of the target OFD template;
dividing the target OFD template into a plurality of document modules based on the distribution characteristics, and determining the number of the document modules;
Document modules with the same distribution characteristics are taken as a drift correction group.
Optionally, after the document modules with the same distribution characteristics are used as a drift correction subgroup, the method further includes:
Obtaining drift correction data and a result score for the drift correction team;
Judging whether drift correction of the correction document is accurate or not based on the result score;
if the drift correction of the correction document is accurate, storing the drift correction data;
and if the drift correction of the correction document is inaccurate, carrying out drift correction based on the drift correction result.
Optionally, after the drift correction is performed on the target OFD template according to the ideal coordinate value, generating a correction document further includes:
monitoring the correction document and judging whether newly added drift content appears or not;
if the newly added drifting content appears, acquiring the position distribution characteristics of the position where the newly added drifting content is located;
Judging whether the position distribution characteristic is an existing distribution characteristic or not;
If the distribution characteristics are the existing distribution characteristics, drift correction is carried out on the newly added drift content based on the drift correction data;
And if the distribution characteristic is not the existing distribution characteristic, drift correction is carried out based on the newly added drift content.
In a second aspect, the application provides an OFD plate drift automatic correction system based on multi-flexible interpolation description, which adopts the following technical scheme:
an OFD plate drift automatic correction system based on multi-flexible interpolation description, comprising:
the target document acquisition module is used for acquiring a target OFD template and target OFD document data;
The object coordinate extraction module is used for extracting a text object, a text object coordinate, a path object and a path object coordinate of the target OFD template;
The document data extraction module is used for extracting text data in the target OFD document data;
the text sequence generation module is used for binding the text data with the text object based on the text object coordinates, generating text coordinates and connecting the text coordinates in series to form a text coordinate sequence;
the path sequence generating module is used for connecting the path object coordinates in series to generate a path coordinate sequence;
a standby sequence generating module, configured to generate a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence;
the ideal coordinate generation module is used for generating ideal coordinate values based on a preset multi-flexible interpolation description algorithm and the coordinate sequence to be used;
And the correction document generation module is used for carrying out drift correction on the target OFD template according to the ideal coordinate values to generate a correction document.
By adopting the technical scheme, manual intervention is not needed, the ideal distribution of text objects in the target OFD template is automatically calculated by utilizing an algorithm, drift correction is automatically completed, the speed and quality of the drift correction are improved, the neatness and the attractiveness of a document are ensured, and therefore, efficient and accurate drift correction is realized.
In a third aspect, the present application provides an electronic device, which adopts the following technical scheme:
An electronic device comprising a processor coupled with a memory;
The processor is configured to execute a computer program stored in the memory, so that the electronic device executes the computer program for the OFD plate drift automatic correction method based on the multi-flexible interpolation description according to any one of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:
A computer-readable storage medium storing a computer program capable of being loaded by a processor and executing the OFD plate drift automatic correction method based on the multi-flexible interpolation description of any one of the first aspect.
Drawings
Fig. 1 is a schematic flow chart of an OFD plate drift automatic correction method based on multi-flexible interpolation description according to an embodiment of the present application.
Fig. 2 is a block diagram of an OFD plate drift automatic correction system based on multi-flexible interpolation description according to an embodiment of the present application.
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the accompanying drawings.
The embodiment of the application provides an OFD plate type drift automatic correction method based on multi-flexible interpolation description, which can be executed by electronic equipment, wherein the electronic equipment can be a server or terminal equipment, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud computing service. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a desktop computer, etc.
Fig. 1 is a schematic flow chart of an OFD plate drift automatic correction method based on multi-flexible interpolation description according to an embodiment of the present application.
As shown in fig. 1, the main flow of the method is described as follows (steps S101 to S108):
step S101, a target OFD template and target OFD document data are acquired.
In this embodiment, the target OFD template is a template for drift correction at this time, for example, an invoice template of a company, and is composed of two parts: static contents such as fixed forms and heads, variable amounts, dynamic memory such as customer names, and the like. The target OFD board-like document is service data, for example, a field name is UserName in the template, and a value of the corresponding UserName in the board-like document is Zhang Lei, wherein Zhang Lei is service data, that is, data to be filled in or replaced in the target OFD template.
And step S102, extracting a text object, text object coordinates, a path object and path object coordinates of the target OFD template.
Aiming at step S102, analyzing a page object layer of a target OFD template, and determining a text object and a path object of the target OFD template; acquiring a coordinate origin of a target OFD template; text object coordinates and path object coordinates are determined based on the origin of coordinates.
In this embodiment, the Page object layer of the target OFD template is a < Page > </Page > element, and similarly, the Page object includes a text object and a path object, and the element attributes of the text object and the path formation have corresponding coordinates. In the process of extracting, firstly, text objects and path objects are extracted, wherein the text objects are text contents such as 'Zhang San, certain company in A city, ten thousand yuan' and the like, each individual text object corresponds to one coordinate (x, y), namely, each text object corresponds to one text object coordinate, each path object corresponds to a line in a target OFD template, likewise, each path object corresponds to one path object coordinate, in the process of determining the coordinates, drawing and extraction are required to be performed according to constraint information such as a font, a word size, a word spacing, a line spacing, a frame position and the like based on a zero point coordinate origin, the coordinate origin is selected as a display area of a drawing board to be drawn, generally, the upper left corner of a visible area printed on paper is generally understood as a bleeding area (the area with a large cutting error is prevented) is required to be removed if the bleeding area is customized, and the upper left corner after the removal is set as the coordinate origin.
Step S103, extracting text data in the target OFD document data.
In this embodiment, data extraction is performed on target OFD document data, text data to be used is extracted, and useless descriptive text may exist in the target OFD document data, so that in order to make subsequent binding faster, useful document data needs to be extracted in advance.
Step S104, binding the text data with the text object based on the text object coordinates, generating text coordinates, and connecting the text coordinates in series to form a text coordinate sequence.
In this embodiment, the text data and the text object coordinates are bound to generate text coordinates. Sequentially filling text data to corresponding positions according to the sequence, updating text contents corresponding to text object coordinates of the current position into contents corresponding to the text data, for example, the text object coordinates are (2, 3) and (3, 3), the text data is Zhang Lei, filling a page into a gap of (2, 3) when filling, filling an epitaxy into a gap of (3, 3), namely, supplementing effective contents into a target OFD template, so that the text objects in the finally generated OFD template are effective text objects, and connecting the effective text object coordinates in series into a text coordinate sequence, namely, the OFD document which can be actually used.
Step S105, the coordinates of the route object are connected in series to generate a route coordinate sequence.
In this embodiment, after binding the path object and the path object coordinates in the same manner as the text data, a path coordinate sequence is generated.
Step S106, generating a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence.
Aiming at step S106, acquiring an X coordinate value of a text coordinate; arranging the text coordinates in ascending order according to X coordinate values to generate a text coordinate sequence to be used; acquiring an X coordinate value of a path coordinate; and arranging the path coordinates in ascending order according to the X coordinate values to generate a path coordinate sequence to be used.
In this embodiment, the text coordinate sequence and the path coordinate sequence are arranged in ascending order according to the X coordinate value of each coordinate, that is, the X value is arranged at the first position and the X value is arranged at the last position, so as to generate a coordinate sequence to be used, that is, the text coordinate sequence to be used and the path coordinate sequence to be used are respectively generated. The text object 1 (200,500), the text object 2 (100,600) and the text object 3 (150,550) are illustrated by a text coordinate sequence, and the text object 2 (100,600), the text object 3 (150,550) and the text object 1 (200,500) are connected in series according to the X value to form the text coordinate sequence to be used, namely [ (100,600), (150,550), (200,500) ].
Step S107, generating ideal coordinate values based on a preset multi-flexible interpolation description algorithm and a coordinate sequence to be used.
Aiming at step S107, calculating a coordinate sequence to be used based on a preset multi-flexible interpolation description algorithm to generate an objective function; determining a target adjustment value based on the objective function; an ideal coordinate value is determined based on the target adjustment value.
In the embodiment, for the obtained coordinate sequence, a preset multi-flexible interpolation description algorithm is applied to perform data fitting on the coordinate sequence to obtain an interpolation description function y=f (x) reflecting the overall distribution rule of the text object, wherein the core formula of the MLID algorithm is that y=f (x) =sum (ai x≡) (i=0, the term, n); wherein n is the interpolation bar number, ai is the interpolation coefficient, bi is the base number. The interpolation value n is automatically set according to the number of path objects in the page, the number of the path objects is different from that of the second text object for conversion, the path objects are the number of lines on the page, the values of the lines are automatically identified when the document is analyzed, and as each path object possibly affects the plate layout, the number directly determines how many terms are needed to construct an accurate interpolation description function.
The method comprises the steps of presetting a multi-flexible interpolation description algorithm, and determining the values of parameters ai and bi in an interpolation description function y=f (x) through least square fitting, so that an objective function Q (a, b) =Σ (yi-f (xi)) -2 reaches the minimum, wherein (xi, yi) is a coordinate point in a coordinate sequence. And then searching the optimal combination of (a, b) in the search space by using a heuristic optimization algorithm, specifically, randomly generating a group of initial (a, b) values as an initial population, and outputting the current optimal solution by iterative optimization until the termination condition is met. Then, defining the obtained interpolation description function y=f (X) as an objective function, and solving an ideal Y coordinate value ŷ i, ŷ i of each text object by an optimization algorithm with the objective of minimizing the total deviation distance between the text coordinates and the objective function, wherein the calculation formula of the ideal Y coordinate value ŷ i, 3562 i is ŷ i=f (xi) (i=1,..m), m is the number of text objects, and xi is the X coordinate value of the ith text object. Finally, solving by adopting a common unconstrained optimization algorithm such as gradient descent and coordinate descent. Taking gradient descent as an example, the iterative formula is ŷ i (t+1) = ŷ i (t) -eta ∂ J/∂ ŷ i, wherein eta is the learning rate, J is the objective function, and the calculation formula of J is J=Σ (yi- ŷ i)/(2).
Specifically, the interpolation coefficient ai and the base bi are calculated by minimizing the objective function Q (a, b), which involves estimating the parameters of the function y=f (x) using the least squares method so that the sum of squares of the differences between the actual coordinate points (xi, yi) and the coordinate points predicted by the interpolation description function is as small as possible. At the code implementation level, a heuristic optimization technology is adopted to find an optimal solution in a possible value space of parameters, namely, a set of initial parameter values are generated through path object coordinates, and then an algorithm gradually adjusts the parameters in an iterative process until an optimal combination is found, so that an objective function reaches a minimum value, and the distribution rule of text objects on an OFD page is reflected most accurately. The whole calculation process is automatically executed by a computer, so that the interpolation function can adapt to complex page layout, and meanwhile, the structural accuracy of the document is reserved. a and b are clusters of values of ai and bi, which are parameters used to construct an interpolation description function y=f (x) in a preset multi-flexible interpolation description algorithm, and in the process of minimizing Q, a set of values of ai and bi is found, which can minimize the sum of squares of the differences between all observation points (xi, yi) and the model predicted f (xi). It should be noted that, each template page corresponds to an objective function, the objective function is set between 0.01 and 0.1, the lower the value of the objective function is, the better the value is, the lower the value is, the more the convergence rate is affected, if the value is a server, 0.01 can be directly specified, and if the value is a terminal used by a person, the setting is required according to the actual performance of the terminal.
And S108, carrying out drift correction on the target OFD template according to the ideal coordinate values to generate a correction document.
In this embodiment, the obtained ideal coordinate values ŷ i of each text object correspondingly adjust the coordinates of the text object in the target OFD template, and then reorganize the page object layer according to the OFD standard, and output the OFD document file after drift correction, that is, generate the final correction document.
The following is illustrated by specific examples:
Assume that the text coordinates of the following three text objects are extracted, and two lines are taken as path objects, so n=2 (two clauses, representing two lines):
The x-coordinate and y-coordinate of the text object are respectively as follows:
text object 1 (1.0, 1.5);
text object 2 (2.0,3.5);
text object 3 (3.0,6.5);
fitting a polynomial interpolation description function using a preset multi-flexible interpolation description algorithm, in this example selecting a quadratic function as the interpolation description function:
y=f(x)=a0+a1*x+a2*x^2;
The optimal values of a0, a1, and a2 need to be found to minimize the objective function Q (a, b), the following parameter values are obtained by the optimization process:
a0 (constant term of quadratic function) =0.1;
a1 (first order coefficient) =0.5;
a2 (quadratic coefficient) =0.2;
thus, the interpolation description function will be y=0.1+0.5xx+0.2xxx2.
Assuming that the x-coordinate is 1.0, the calculation formula for optimizing the Y-coordinate of the text object 1 is:
ŷ=f(1.0)=0.1+0.5*1.0+0.2*1.0^2=0.8;
The calculated ŷ value is the ideal Y coordinate of the text object 1 after the interpolation description function is calculated. And then, calculating all ideal values according to the calculation steps, adjusting the position of each text object according to the calculated ideal Y coordinates, and outputting the corrected OFD document.
In this embodiment, the distribution characteristics of the target OFD template are acquired; dividing a target OFD template into a plurality of document modules based on the distribution characteristics, and determining the number of the document modules; document modules with the same distribution characteristics are taken as a drift correction group.
In order to improve the convenience of operation and the overall drift correction efficiency, the whole target OFD template is divided into a plurality of document modules, the document modules with the same distribution characteristics are used as a drift subgroup, the document modules in each subgroup are ordered according to the number of the document modules, the more the number is, the higher the frequency of use is, accurate correction data are required to be stored after drift correction processing is completed, so that drift correction is performed by using the same scheme when the same situation occurs, and the convenience of selection in subsequent use is improved.
In this embodiment, drift correction data and a result score of a drift correction group are acquired; judging whether drift correction of the correction document is accurate or not based on the result score; if the drift correction of the correction document is accurate, storing drift correction data; if the drift correction of the correction document is inaccurate, the drift correction is performed based on the drift correction result.
And storing the drift correction data with accurate drift correction, and directly adopting the successful drift correction data for correction processing when documents with the same distribution characteristics appear later, thereby reducing the processing pressure of a computer and improving the overall correction efficiency. If the drift correction of the correction document is inaccurate, the correction processing is carried out again by adopting the drift correction mode until the actual requirement is met.
In the embodiment, the correction document is monitored to judge whether the newly added drift content appears; if the newly added drifting content appears, acquiring the position distribution characteristics of the position where the newly added drifting content is located; judging whether the position distribution characteristics are existing distribution characteristics or not; if the distribution characteristics are the existing distribution characteristics, drift correction is carried out on the newly added drift content based on the drift correction data; if the distribution characteristics are not the existing distribution characteristics, drift correction is performed based on the newly added drift content.
In order to improve the efficiency of drift correction, when the correction document with newly added drift content is subjected to drift correction, only the newly added part is required to be subjected to correction processing, the position distribution characteristics of the newly added drift inner barrel are judged, whether the position distribution characteristics are the same as the existing distribution characteristics or not is checked, if the position distribution characteristics are the same as the existing distribution characteristics, the drift correction data corresponding to the existing distribution characteristics are directly adopted for correction, if the position distribution characteristics are different from the existing distribution characteristics, the drift correction data corresponding to the newly added drift content are adopted for drift correction in the drift correction mode, and a specific drift correction method is not repeated here.
It should be noted that, the fitting interpolation describing function algorithm used in the present application may be replaced by using the Loess or IBK algorithm according to actual needs, if the Loess algorithm is adopted, more complex local weighted regression calculation needs to be performed on the text object coordinates, instead of simply applying the global function, and accordingly if a completely different regression method is adopted, new steps may need to be introduced, such as adjusting weights, selecting the neighborhood size, and the like, which will depend on the specifically selected non-parametric regression algorithm. The replacement algorithm and the modification made according to the algorithm herein need to be increased and adjusted as needed according to the actual requirements, and are not particularly limited herein.
Fig. 2 is a block diagram of an OFD plate drift automatic correction system 200 based on a multi-flexible interpolation description according to an embodiment of the present application.
As shown in fig. 2, the OFD plate drift automatic correction system 200 based on the multi-flexible interpolation description mainly includes:
A target document acquisition module 201 for acquiring a target OFD template and target OFD document data;
the object coordinate extraction module 202 is configured to extract a text object, a text object coordinate, a path object and a path object coordinate of the target OFD template;
a document data extraction module 203, configured to extract text data in target OFD document data;
A text sequence generating module 204, configured to bind text data with text objects based on text object coordinates, generate text coordinates, and concatenate the text coordinates to form a text coordinate sequence;
The path sequence generating module 205 is configured to concatenate the path object coordinates to generate a path coordinate sequence;
A standby sequence generating module 206, configured to generate a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence;
An ideal coordinate generating module 207, configured to generate ideal coordinate values based on a preset multi-flexible interpolation description algorithm and a coordinate sequence to be used;
And the correction document generation module 208 is used for carrying out drift correction on the target OFD template according to the ideal coordinate values to generate a correction document.
As an optional implementation manner of this embodiment, the object coordinate extraction module 202 is specifically configured to parse a page object layer of the target OFD template, and determine a text object and a path object of the target OFD template; acquiring a coordinate origin of a target OFD template; text object coordinates and path object coordinates are determined based on the origin of coordinates.
As an optional implementation manner of this embodiment, the coordinate sequence generating module 206 is specifically configured to obtain an X coordinate value of a text coordinate; arranging the text coordinates in ascending order according to X coordinate values to generate a text coordinate sequence to be used; acquiring an X coordinate value of a path coordinate; and arranging the path coordinates in ascending order according to the X coordinate values to generate a path coordinate sequence to be used.
As an optional implementation manner of this embodiment, the ideal coordinate generating module 207 is specifically configured to calculate a coordinate sequence to be used based on a preset multi-flexible interpolation description algorithm, so as to generate an objective function; determining a target adjustment value based on the objective function; an ideal coordinate value is determined based on the target adjustment value.
As an alternative implementation manner of this embodiment, the OFD plate drift automatic correction system 200 based on the multi-flexible interpolation description further includes:
The distribution characteristic acquisition module is used for acquiring the distribution characteristics of the target OFD template;
The document module dividing module is used for dividing the target OFD template into a plurality of document modules based on the distribution characteristics and determining the number of the document modules;
And the correction subgroup dividing module is used for taking the document modules with the same distribution characteristics as a drift correction subgroup.
As an alternative implementation manner of this embodiment, the OFD plate drift automatic correction system 200 based on the multi-flexible interpolation description further includes:
the data result acquisition module is used for acquiring drift correction data and result scores of the drift correction subgroups;
the correction accuracy judging module is used for judging whether drift correction of the correction document is accurate or not based on the result score;
The correction data storage module is used for storing drift correction data;
and the drift result correction module is used for carrying out drift correction based on the drift correction result.
As an alternative implementation manner of this embodiment, the OFD plate drift automatic correction system 200 based on the multi-flexible interpolation description further includes:
the newly added drift judging module is used for monitoring the correction document and judging whether newly added drift content appears or not;
the position characteristic acquisition module is used for acquiring position distribution characteristics of positions where the newly added drift content is located;
The existing feature judging module is used for judging whether the position distribution feature is an existing distribution feature or not;
the first drift correction module is used for carrying out drift correction on the newly added drift content based on the drift correction data;
And the second drift correction module is used for carrying out drift correction based on the newly added drift content.
In one example, a module in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integratedcircuit, ASIC), or one or more digital signal processors (DIGITAL SIGNAL processor, DSP), or one or more field programmable gate arrays (field programmable GATE ARRAY, FPGA), or a combination of at least two of these integrated circuit forms.
For another example, when a module in an apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke a program. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Fig. 3 is a block diagram of an electronic device 300 according to an embodiment of the present application.
As shown in FIG. 3, electronic device 300 includes a processor 301 and memory 302, and may further include an information input/information output (I/O) interface 303, one or more of a communication component 304, and a communication bus 305.
The processor 301 is configured to control the overall operation of the electronic device 300, so as to complete all or part of the steps of the OFD plate drift automatic correction method based on the multi-flexible interpolation description; the memory 302 is used to store various types of data to support operation at the electronic device 300, which may include, for example, instructions for any application or method operating on the electronic device 300, as well as application-related data. The Memory 302 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as one or more of static random access Memory (Static Random Access Memory, SRAM), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.
The I/O interface 303 provides an interface between the processor 301 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 304 is used for wired or wireless communication between the electronic device 300 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near field Communication (NFC for short), 2G, 3G, or 4G, or a combination of one or more thereof, the corresponding Communication component 304 can include: wi-Fi part, bluetooth part, NFC part.
The electronic device 300 may be implemented by one or more Application Specific Integrated Circuits (ASIC), digital signal Processor (DIGITAL SIGNAL Processor, DSP), digital signal processing device (DIGITAL SIGNAL Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field Programmable GATE ARRAY, FPGA), controller, microcontroller, microprocessor or other electronic components for performing the OFD plate drift auto-correction method based on the multi-flexible interpolation description given in the above embodiments.
Communication bus 305 may include a pathway to transfer information between the aforementioned components. The communication bus 305 may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus 305 may be divided into an address bus, a data bus, a control bus, and the like.
The electronic device 300 may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like, and may also be a server, and the like.
The application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the steps of the OFD plate drift automatic correction method based on the multi-flexible interpolation description are realized when the computer program is executed by a processor.
The computer readable storage medium may include: a usb disk, a removable hard disk, a read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application is not limited to the specific combinations of the features described above, but also covers other embodiments which may be formed by any combination of the features described above or their equivalents without departing from the spirit of the application. Such as the above-mentioned features and the technical features having similar functions (but not limited to) applied for in the present application are replaced with each other.
Claims (8)
1. An OFD plate drift automatic correction method based on multi-flexible interpolation description is characterized by comprising the following steps:
Acquiring a target OFD template and target OFD document data;
extracting a text object, a text object coordinate, a path object and a path object coordinate of the target OFD template;
extracting text data in the target OFD document data;
Binding the text data with the text object based on the text object coordinates to generate text coordinates, and connecting the text coordinates in series to form a text coordinate sequence;
connecting the path object coordinates in series to generate a path coordinate sequence;
Generating a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence;
generating ideal coordinate values based on a preset multi-flexible interpolation description algorithm and the coordinate sequence to be used;
and carrying out drift correction on the target OFD template according to the ideal coordinate values to generate a correction document.
2. The method of claim 1, wherein the extracting text object, text object coordinates, path object, and path object coordinates of the target OFD template comprises:
Analyzing a page object layer of the target OFD template, and determining a text object and a path object of the target OFD template;
Acquiring a coordinate origin of the target OFD template;
the text object coordinates and the path object coordinates are determined based on the origin of coordinates.
3. The method of claim 1, wherein the generating a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence comprises:
Acquiring an X coordinate value of the text coordinate;
arranging the text coordinates in ascending order according to the X coordinate values to generate a text coordinate sequence to be used;
Acquiring an X coordinate value of the path coordinate;
and arranging the path coordinates in ascending order according to the X coordinate values to generate a path coordinate sequence to be used.
4. The method of claim 3, wherein the generating ideal coordinate values based on a preset multi-flexible interpolation description algorithm and the coordinate sequence to be used comprises:
calculating the text coordinate sequence to be used and the path coordinate sequence to be used based on the preset multi-flexible interpolation description algorithm to generate an objective function;
Determining a target adjustment value based on the objective function;
and determining ideal coordinate values based on the target adjustment value.
5. The method according to claim 1, wherein the method further comprises:
Acquiring the distribution characteristics of the target OFD template;
dividing the target OFD template into a plurality of document modules based on the distribution characteristics, and determining the number of the document modules;
Document modules with the same distribution characteristics are taken as a drift correction group.
6. The method of claim 5, further comprising, after said grouping document modules having the same distribution characteristics as a drift correction group:
Obtaining drift correction data and a result score for the drift correction team;
Judging whether drift correction of the correction document is accurate or not based on the result score;
if the drift correction of the correction document is accurate, storing the drift correction data;
and if the drift correction of the correction document is inaccurate, carrying out drift correction based on the drift correction result.
7. The method according to claim 6, further comprising, after said drift correction of said target OFD template according to said ideal coordinate values, generating a correction document:
monitoring the correction document and judging whether newly added drift content appears or not;
if the newly added drifting content appears, acquiring the position distribution characteristics of the position where the newly added drifting content is located;
Judging whether the position distribution characteristic is an existing distribution characteristic or not;
If the distribution characteristics are the existing distribution characteristics, drift correction is carried out on the newly added drift content based on the drift correction data;
And if the distribution characteristic is not the existing distribution characteristic, drift correction is carried out based on the newly added drift content.
8. An OFD plate drift automatic correction system based on multi-flexible interpolation description, comprising:
the target document acquisition module is used for acquiring a target OFD template and target OFD document data;
The object coordinate extraction module is used for extracting a text object, a text object coordinate, a path object and a path object coordinate of the target OFD template;
The document data extraction module is used for extracting text data in the target OFD document data;
the text sequence generation module is used for binding the text data with the text object based on the text object coordinates, generating text coordinates and connecting the text coordinates in series to form a text coordinate sequence;
the path sequence generating module is used for connecting the path object coordinates in series to generate a path coordinate sequence;
a standby sequence generating module, configured to generate a coordinate sequence to be used based on the text coordinate sequence and the path coordinate sequence;
the ideal coordinate generation module is used for generating ideal coordinate values based on a preset multi-flexible interpolation description algorithm and the coordinate sequence to be used;
And the correction document generation module is used for carrying out drift correction on the target OFD template according to the ideal coordinate values to generate a correction document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410600722.4A CN118193463B (en) | 2024-05-15 | 2024-05-15 | OFD format drift automatic correction method and system based on multi-flexible interpolation description |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410600722.4A CN118193463B (en) | 2024-05-15 | 2024-05-15 | OFD format drift automatic correction method and system based on multi-flexible interpolation description |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118193463A true CN118193463A (en) | 2024-06-14 |
CN118193463B CN118193463B (en) | 2024-09-03 |
Family
ID=91405355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410600722.4A Active CN118193463B (en) | 2024-05-15 | 2024-05-15 | OFD format drift automatic correction method and system based on multi-flexible interpolation description |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118193463B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11282959A (en) * | 1998-03-27 | 1999-10-15 | Nec Corp | Character string collation device, its method, storage medium, document classification device, character reader and true/false judgement device |
KR20190001894A (en) * | 2017-06-28 | 2019-01-07 | 주식회사 오리지널메이커스 | Method for creating web documents and Apparatus thereof |
US20210089712A1 (en) * | 2019-09-19 | 2021-03-25 | Palantir Technologies Inc. | Data normalization and extraction system |
CN112800366A (en) * | 2020-12-31 | 2021-05-14 | 北京华宇信息技术有限公司 | OFD document online browsing method |
CN115167718A (en) * | 2022-06-14 | 2022-10-11 | 百望股份有限公司 | OFD document local amplification display method, device, equipment and storage medium |
-
2024
- 2024-05-15 CN CN202410600722.4A patent/CN118193463B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11282959A (en) * | 1998-03-27 | 1999-10-15 | Nec Corp | Character string collation device, its method, storage medium, document classification device, character reader and true/false judgement device |
KR20190001894A (en) * | 2017-06-28 | 2019-01-07 | 주식회사 오리지널메이커스 | Method for creating web documents and Apparatus thereof |
US20210089712A1 (en) * | 2019-09-19 | 2021-03-25 | Palantir Technologies Inc. | Data normalization and extraction system |
CN112800366A (en) * | 2020-12-31 | 2021-05-14 | 北京华宇信息技术有限公司 | OFD document online browsing method |
CN115167718A (en) * | 2022-06-14 | 2022-10-11 | 百望股份有限公司 | OFD document local amplification display method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN118193463B (en) | 2024-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111428457B (en) | Automatic formatting of data tables | |
CN109388675A (en) | Data analysing method, device, computer equipment and storage medium | |
CN110597511B (en) | Page automatic generation method, system, terminal equipment and storage medium | |
CN110515951B (en) | BOM standardization method and system, electronic device and storage medium | |
CN110852097A (en) | Feature word extraction method, text similarity calculation method, device and equipment | |
CN109656652B (en) | Webpage chart drawing method, device, computer equipment and storage medium | |
CN110705226A (en) | Spreadsheet creating method and device and computer equipment | |
CN104615262A (en) | Input method and input system used for virtual keyboard | |
CN112818937B (en) | Excel file identification method and device, electronic equipment and readable storage medium | |
US10643022B2 (en) | PDF extraction with text-based key | |
CN113127125A (en) | Page automatic adaptation method, device, equipment and storage medium | |
CN110851987A (en) | Method, apparatus and storage medium for predicting calculated duration based on acceleration ratio | |
CN107944931A (en) | Seed user expanding method, electronic equipment and computer-readable recording medium | |
JP2015166981A (en) | Layout verification method, verification device and verification program | |
CN113946566B (en) | Web system fingerprint database construction method and device and electronic equipment | |
CN118193463B (en) | OFD format drift automatic correction method and system based on multi-flexible interpolation description | |
CN112699634B (en) | Typesetting processing method of electronic book, electronic equipment and storage medium | |
US10082956B2 (en) | Method and apparatus for downloading data including a progress bar indicating progress of downloading | |
CN107392260B (en) | Error calibration method and device for character recognition result | |
US10740539B2 (en) | Page structure adjustments | |
CN115981617A (en) | Code sentence recommendation method and device, electronic equipment and storage medium | |
CN114882515A (en) | Table type determination method, device and medium based on neural network model | |
CN109992749A (en) | A kind of character displaying method, device, electronic equipment and readable storage medium storing program for executing | |
CN117763140B (en) | Accurate medical information conclusion generation method based on computing feature network | |
CN112698877A (en) | Data processing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |