[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112183036B - Format document generation method, device, equipment and storage medium - Google Patents

Format document generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN112183036B
CN112183036B CN201910527126.7A CN201910527126A CN112183036B CN 112183036 B CN112183036 B CN 112183036B CN 201910527126 A CN201910527126 A CN 201910527126A CN 112183036 B CN112183036 B CN 112183036B
Authority
CN
China
Prior art keywords
filled
information
document
field
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910527126.7A
Other languages
Chinese (zh)
Other versions
CN112183036A (en
Inventor
张祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910527126.7A priority Critical patent/CN112183036B/en
Publication of CN112183036A publication Critical patent/CN112183036A/en
Application granted granted Critical
Publication of CN112183036B publication Critical patent/CN112183036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention relates to a method, a device, equipment and a storage medium for generating a format document, wherein the method comprises the following steps: acquiring an original document, identifying the original document, and determining a field area to be filled in the original document; determining the associated information of the information area to be filled in the field area to be filled in; generating a form to be filled according to the field to be filled; acquiring a filled form corresponding to the form to be filled, and extracting filled information in the filled form; and associating the original document, the associated information and the filled-in information to generate a new document. The method and the device can intelligently extract the content fields needing to be filled in the document, generate the corresponding form, realize automatic summarization and statistics of data, and simultaneously generate the complete format document according to the filled form.

Description

Format document generation method, device, equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a format document generation method, apparatus, device, and storage medium.
Background
The format document means: the format is relatively fixed, and part of the content of the document needs to be modified when different scenes/specific transactions are used. In daily life and work, a large number of format documents need to be filled, for example, a company HR fills a labor contract for a newly entered colleague, a house intermediary needs to fill a rental contract and a house purchasing contract with a tenant and a landlord, and a bank provides a loan officer with a loan contract, the format of the documents is fixed, but the personal or unit information and the data agreed in the contracts are different. When filling in such documents, the general practice is to print a blank contract and then fill in different roles, however, the filling process is prone to errors, and when workers count relevant filling information afterwards, the filling information needs to be re-entered according to the contract content, so that the accuracy and efficiency of information statistics are low.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a method, an apparatus, a device and a storage medium for generating a format document, which can intelligently extract content fields to be filled in a document, generate a corresponding form, and can realize automatic summary and statistics of data by collecting the filled forms, and simultaneously can generate a complete format document according to the filled forms.
In order to solve the above technical problem, an embodiment of the present invention provides a format document generating method, where the method includes:
acquiring an original document, identifying the original document, and determining a field area to be filled in the original document, wherein the field area to be filled in comprises a field to be filled and an information area to be filled corresponding to the field to be filled in;
determining the associated information of the information area to be filled in the field area to be filled in;
generating a form to be filled according to the field to be filled;
acquiring a filled form corresponding to the form to be filled, and extracting filled information in the filled form;
and associating the original document, the associated information and the filled-in information to generate a new document.
The embodiment of the invention also provides a format document generating device, which comprises:
the system comprises a to-be-filled area determining module, a to-be-filled area determining module and a to-be-filled area determining module, wherein the to-be-filled area determining module is used for acquiring an original document, identifying the original document and determining a to-be-filled field area in the original document, and the to-be-filled field area comprises a to-be-filled field and an to-be-filled information area corresponding to the to-be-filled field;
the associated information determining module is used for determining the associated information of the information area to be filled in the field area to be filled in;
the form generation module is used for generating a form to be filled according to the field to be filled;
the information extraction module is used for acquiring a filled form corresponding to the form to be filled and extracting filled information in the filled form;
and the new document generation module is used for associating the original document, the associated information and the filled information to generate a new document.
An embodiment of the present invention provides an apparatus, where the apparatus includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the above format document generation method.
The embodiment of the present invention further provides a computer storage medium, where at least one instruction, at least one program, a code set, or an instruction set is stored in the storage medium, and the at least one instruction, at least one program, code set, or instruction set is loaded by a processor and executes the above format document generation method.
The method comprises the steps of identifying an original document, and determining a field area to be filled in the original document; generating a form to be filled corresponding to the original document according to the field to be filled so that a user can fill related information according to the form to be filled; and acquiring the filled-in form, extracting filling information in the form, summarizing and analyzing the extracted filling information, filling the document according to the associated information and the filled-in information, and generating a complete format document. The form to be filled in is generated according to the original document, so that a filler can concentrate attention on a key position, and the filling of the format document is more accurate; the filling information in the filled form can be extracted, so that the filling information can be automatically summarized and analyzed; filling information can be filled in the document, and a complete format document is automatically generated.
Drawings
FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present invention;
FIG. 2 is a diagram illustrating a format document generation method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a document identification method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a method for determining associated information according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a method for generating a form to be filled in according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a method for naming a form according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a method for generating a new document according to an embodiment of the present invention;
FIG. 8 is a first exemplary diagram provided by an embodiment of the present invention;
FIG. 9 is a second exemplary diagram provided by an embodiment of the present invention;
FIG. 10 is a third exemplary diagram provided by an embodiment of the present invention;
FIG. 11 is a fourth exemplary diagram provided by an embodiment of the present invention;
FIG. 12 is a fifth exemplary diagram provided by an embodiment of the present invention;
FIG. 13 is a sixth exemplary diagram provided by an embodiment of the present invention;
FIG. 14 is a diagram of a format document generating apparatus according to an embodiment of the present invention;
FIG. 15 is a schematic diagram of a module for determining association information according to an embodiment of the present invention;
FIG. 16 is a schematic diagram of a form generation module provided by an embodiment of the present invention;
FIG. 17 is a diagram of a new document generation module provided by an embodiment of the present invention;
fig. 18 is a schematic diagram of a to-be-filled area determining module according to an embodiment of the present invention;
FIG. 19 is a schematic diagram of a form generation module provided by an embodiment of the present invention;
fig. 20 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the terms "first", "second", and the like in the description of the present application are used for distinguishing similar objects, and are not necessarily used for describing a particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, a schematic diagram of an application environment provided by an embodiment of the present invention is shown, which may include at least one first terminal 110, at least one second terminal 120, and a server 130, where the first terminal 110 and the second terminal 120 are capable of communicating with the server 130 respectively.
The first terminal 110 sends an invitation for information filling, and sends the invitation to the user of the second terminal 120 through the server 130 for information filling; after the user of the second terminal 120 completes the filling of the information, the filled information is uploaded to the server 130, and the server 130 may perform the summarization and analysis of the information and the creation of a new document according to the collected filling information.
The first terminal 110 and the second terminal 120 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, a smart wearable device, and other types of physical devices, and may also include software running in the physical devices, such as an instant messaging software, a browser, or an application program capable of browsing and editing an electronic document. The operating systems running on the first terminal 110 and the second terminal 120 in the embodiment of the present invention may include, but are not limited to, an android system, an IOS system, linux, windows, and the like.
The server 130 is connected to the first terminal 110 and the second terminal 120 through wired or wireless communication, and the server 130 may include a server operating independently, or a distributed server, or a server cluster composed of multiple servers, where the server may be a cloud server.
In an application scenario that data collection is required and a document is generated according to data information, in the prior art, a form of additionally generating a template is generally adopted, data required for generating the document is read, and a new document is generated; in addition, when data information is collected, the document information is extracted according to the identifier, so that the method can extract the information only by presetting the corresponding identifier, and the embodiment of the invention provides a format document generation method in order to simply and efficiently collect and analyze the data and generate a new document.
Referring to fig. 2, a schematic diagram of a format document generation method is shown, the method includes:
s210, acquiring an original document, identifying the original document, and determining a field area to be filled in the original document, wherein the field area to be filled in comprises a field to be filled in and an information area to be filled corresponding to the field to be filled in.
The original document obtained here may be an imported existing document or a document created by current editing. In this embodiment, an AI (Artificial Intelligence) intelligent analysis based method may be adopted to identify an original document.
An embodiment of the present invention provides a document identification method, please refer to fig. 3, where the method includes:
and S310, scanning the original document by taking a line or paragraph as a unit.
For a document, it may be composed of several lines or several segments, so after the document is obtained, the document is firstly divided into lines or segments, and scanning processing is performed in units of lines or segments. The rules of line segmentation or segmentation can be based on the text rules in the conventional document, for example, the line segmentation can be directly determined according to the line of the document, and for the case that the segment can identify whether the head line of a certain paragraph has text indentation, if yes, the line is judged to be a single paragraph; the identified lines or paragraphs are sequentially marked with corresponding line numbers or paragraph numbers. Of course, there may be instances where inaccuracies exist with respect to the line or segment results described above, where the results may be corrected by correlation methods to obtain more accurate line or segment results. Any method capable of performing line splitting or segmentation result correction in the prior art can be applied to this embodiment, and details are not described here.
S320, when the original document is scanned to have information in a preset format, determining a line or paragraph where the information in the preset format is located as the field area to be filled in.
In the process of scanning a document, when information in a preset format, such as a space, an underline, a check box and the like, exists, a line or a paragraph where the information in the preset format is located is preliminarily determined to be the field area to be filled.
Wherein, for the judgment of the space, the space between the adjacent characters can be considered to exist when the interval between the adjacent characters is larger than the regular character interval set according to the original document by comparing the intervals between the adjacent characters. In addition, in the case where there is no content information after a symbol such as a colon or a comma in the original document, it can be considered that a space exists after the symbol.
The underline may be determined directly according to whether there is an underline in the document, and in addition, there may be a case where there is an underline in the original document, but relevant information is already filled in the underline, and at this time, relevant content in the original document may be emphasized by the underline, and cannot be directly determined as the underline to be filled in the embodiment, and only if there is an underline and the underline is blank or an underline with a blank area reserved on the underline is determined as the underline to be filled in the embodiment.
For the judgment of the check box, the check box may include a multi-selection check box or a single-selection check box, specifically, the check box may be a square or circular check box, when the check box information is scanned, it is judged whether the information is filled, and if not, it is judged that the information conforms to the preset format information in the embodiment.
S330, recording a line number or a paragraph number of the field area to be filled in the original document.
When the original document is determined to have the information in the one or more preset formats, the line number or the paragraph number of each preset format information is determined and recorded. Examples are given in paragraphs units: assuming that an underline to be filled exists in the 98 th segment of the original document and checkbox information to be filled exists in the 100 th segment, at this time, the 98 th segment → the underline, the 100 th segment → the checkbox may be recorded, and the 98 th segment and the 100 th segment are respectively determined as the field areas to be filled.
S220, determining the associated information of the information area to be filled in the field area to be filled in.
In order to facilitate the subsequent generation of a new document, the relevant information is accurately filled in the corresponding area, and here, the relevant information of the to-be-filled writing field area needs to be recorded first. Referring to fig. 4, a method for determining associated information is shown, where the associated information in this embodiment at least includes offset position information and context information, and specifically, the method includes:
s410, for each field to be filled, determining the offset position of the information area to be filled corresponding to the field to be filled relative to the field to be filled.
The offset position may specifically be character interval information between the information area to be filled and the field to be filled, or upper, lower, left, right position information between the information area to be filled and the field to be filled, or the like.
And S420, determining the context information of the information area to be filled.
The context information here may refer to content information before the area to be filled in, and content information after the area to be filled in.
In this embodiment, the association information may include a mark such as an anchor point inserted in the document, in addition to the above-described positional offset information and context information. After the associated information of the area to be filled is accurately determined, the information can be accurately filled to the corresponding position according to the associated information when the filled information is filled in the document subsequently, and the condition of improper filling or uncoordinated filling is avoided.
And S230, generating a form to be filled according to the field to be filled.
For the user to be filled in, the contents to be filled in can be concentrated in a specific form, so that the information can be concentrated and efficiently filled in, the region to be filled in is prevented from being searched in the whole document, and omission is avoided. To this end, an embodiment of the present invention provides a method for generating a to-be-filled form, please refer to fig. 5, where the method includes:
and S510, generating a blank form.
When the form is generated according to each document, a blank form is generated firstly, and then the specific information obtained by identification is added into the blank form.
S520, traversing the fields to be filled, sequentially filling the fields to be filled into the blank form, generating a corresponding filling area and a corresponding format check attribute for each field to be filled, and generating the form to be filled.
Filling the fields to be filled obtained by identification into a blank form, and generating corresponding filling areas for each field to be filled in the form, wherein the filling areas corresponding to the fields such as names, mobile phone numbers and the like can be set as single-line texts, some format check attributes can be preset, and regular check can be carried out on the mobile phone numbers; for selection boxes in a document, corresponding radio boxes or check boxes can be generated according to field semantics.
S530, acquiring a document identifier of the original document, associating the form to be filled with the document identifier, and establishing a corresponding relation between the form to be filled and the document identifier.
And for each document, corresponding document identification is provided, and after a form to be filled in is generated according to the document, the form is corresponding to the document identification of the document, so that the corresponding original document can be found according to the form. The generated corresponding relation between the form to be filled and the document identification needs to be stored, and the searching is convenient.
In addition, for the generated form to be filled, naming can be performed according to the field to be filled, and in particular, refer to fig. 6, which shows a form naming method, and the method comprises the following steps:
and S610, performing word segmentation on the field to be filled.
For example, for the field to be filled in, the name of the first party, the field can be participled according to the part of speech to obtain the name of the first party and the name of the first party.
And S620, naming the form to be filled according to the fixed language when the field to be filled after word segmentation comprises the subject language word and the fixed language corresponding to the subject language word.
Extracting the fixed language of the subject directional word, namely 'first party' here, the 'first party' can be used for the name of the form, and finally generating the following steps: xxx a house contract first fill-in form. Subject class pointers like this are also: contacts, employees, etc. can extract the idiom of the phrase for generating the form name.
And S240, acquiring a filled form corresponding to the form to be filled, and extracting filled information in the filled form.
After the form to be filled in is generated, the form to be filled in is sent to a filler for filling in, and the filled-in form uploaded by the filler according to the form to be filled in is collected. The filled information in the filled form is extracted, and the extracted filled information is collected and summarized, so that data analysis can be conveniently carried out without additionally inputting data information; the additionally extracted filled-in information may also be used to generate a new document.
The generated form to be filled in can be determined and secondarily modified by a user, and relevant data of the secondary modification can also be used as marking data of machine learning, and training and optimization of the model are carried out.
And S250, associating the original document, the associated information and the filled information to generate a new document.
Specifically, please refer to fig. 7, which shows a method for generating a new document, the method comprising:
and S710, associating a target document identifier corresponding to the filled-in form according to the corresponding relationship between the to-be-filled-in form and the document identifier and the corresponding relationship between the to-be-filled-in form and the filled-in form.
And finding the original document corresponding to the current form according to the corresponding relation between the form and the document identifier of the original document.
And S720, copying the original document corresponding to the target document identification to obtain a copied document.
And S730, filling the filled information into the information area to be filled of the copied document according to the offset position of the information area to be filled relative to the field to be filled and the context information of the information area to be filled, and generating the new document.
And for each filled form, filling the filling information extracted from the filled form into the copied document according to the offset position and the context information to form a complete new document. In addition, the name of the generated new document can also be determined by the extracted related information in the form, for example, the name of the word to be filled in is extracted, the corresponding filling information, that is, the name information filled in by the filling person is extracted, and the name information is used for naming the new document generated according to the form.
The form associates the original document ID, and the form domain associates the offset position and the front and back characters of the blank area in the document; the information collector invites the filler to fill in, after the filler finishes the form submission, the filler copies the document according to the document ID associated with the form, fills the blank area of the new document according to the associated information in the form area, finishes the creation of the new document, saves the new document in the list of the information collector, and the filler can select to generate a complete document to save when submitting. After the form is filled in, the filled information of all the filling persons is gathered to generate an online form.
And information is filled according to the offset position and the context information, so that the information can be accurately filled in the corresponding position, and the conditions that the information is filled wrongly or the filling does not meet the format requirement and the like are avoided.
It should be noted that the form to be filled corresponding to one original document can be distributed to a plurality of users for filling without mutual interference, so that better privacy is achieved; real-time collaborative editing can be realized in an online document mode; and different new documents can be respectively generated on the basis of the original document according to different filling information.
The format document generation method in this embodiment may further include:
identifying at least one field to be filled in the field to be filled in area.
Specifically, one or more fields to be filled may be obtained by performing semantic recognition on the content information in each field area to be filled. Through machine learning and a certain amount of data marking, the corresponding field to be filled in can be determined.
The field area to be filled may be a line or a paragraph, the field area to be filled includes a field to be filled and an information area to be filled corresponding to the field to be filled, the field to be filled refers to a guidance description of the information to be filled, for example: words such as name, cell phone number, identification card, contact person, etc.; the information-to-be-filled area is used for filling in filling information corresponding to the field to be filled in. Typically, one field to be filled in corresponds to one area to be filled in, but there may be one field to be filled in corresponding to a plurality of areas to be filled in.
The method provided by the embodiment can be applied to online documents and scenes of real-time/collaborative editing, wherein the online documents refer to documents and organization forms which are stored in a cloud server and can be directly browsed and edited through a browser or a specific client; real-time/collaborative editing refers to one or more persons who can directly and simultaneously edit the online document after opening the online document, and automatically save and synchronize the edited contents of other collaborators. The following description will be given with reference to a specific example.
The inviter firstly imports or newly creates a format document W, content information shown in figure 8 exists in the format document W, AI judgment such as semantic recognition is carried out on the document, and a corresponding form to be filled is generated, as shown in figure 9, in the case of XX house-buying contract, two parties are provided, the name of the party A and the identity card are required to be filled, two lines are single-line texts, the payment mode is a single-selection drop-down selection box, and the signature is not required to be filled. Corresponding fields and data checking modes can be manually modified in the process of generating the form.
After the form is set, the user can invite another person to fill, at this time, the filling inviter can send the generated form shown in fig. 9 to the first-party user, and specifically, the form can be sent to the first party by sending a link, a two-dimensional code and the like, a plurality of first-party users can exist, that is, the filling inviter can send the form to different users at the same time, the first-party user fills according to the received form, and the filled form is shown in fig. 10. After the first party submits the filled-in form, the background generates a complete document, and the specific content of the complete document is shown in fig. 11.
After a number of people fill in, for the collector, a number of documents are recorded in the list, each named with the name of the corresponding filling person to show the distinction. In addition, data filled in by a plurality of persons may form a summary table, as shown in fig. 12, which shows a collection result of form filling information.
The above specific implementation process can be seen in fig. 13, where a collector imports or creates a format document to enable a background to generate a corresponding form, the collector checks the generated form, determines whether secondary modification is needed, and finally sends the determined form to different users for filling; and after the user fills in the form, submitting the filled form, generating a corresponding complete format document by the background according to the submitted information, and performing data statistics.
According to the method, AI intelligent analysis is carried out on the original format document, content points which generally need to be filled and modified are identified, and a corresponding filling form is formed; generating a complete format document according to the contents filled by different users and the original format document; after being filled by a plurality of persons, the filling contents are extracted to generate a statistical form to help a manager to analyze and file.
The method provided by the embodiment can intelligently analyze the document content and automatically generate the form for information collection, and the form content is dynamic, is not limited to specific template documents and specific types of documents, and is suitable for most scenes of various industries; the filling of the standard document (contract) is more accurate, the attention can be focused on the key position, the efficiency can be well improved, the preview and the review can be carried out after the complete document is generated, the transparency of information data is ensured, and the document is automatically filed and stored; the data can be automatically summarized under the condition of filling in by a plurality of people, additional input is not needed, and the production efficiency is greatly improved.
Compared with the prior art, the method provided by the embodiment does not depend on a template generated in advance, provides new data collection capability, and is not limited to extraction and mining of the existing data; the form is clearer and more standard for the writer, and can be written by a plurality of persons at the same time without interference, so that better privacy is achieved; through intelligent analysis information collection point, provide information collection ability to the information that directly gathers the collection, it is simpler, efficient and accurate to information processing.
In addition, the method provided by the embodiment can be combined with authority management, electronic signature, third-party public certificate authority and a third-party public certificate platform, can perform online processing on general contract signing, and can perform third-party trusteeship on generated document and timestamp information to form a legal effect, so that the whole work flow is more complete.
The method provided by the embodiment can be split, and the existing paper documents of the user are scanned or the stock electronic documents are extracted for data statistics and analysis.
For generating the form part, the form part is not limited to a form filled by a single person, and an electronic form can be generated and filled by a person or a plurality of persons to generate a corresponding complete document. Or the existing form is associated with the area in the document, and the data in the form is directly utilized to generate a new document without additionally generating and editing a template.
Correspondingly, referring to fig. 14, the present embodiment further provides a format document generating device, where the device includes:
a to-be-filled region determining module 1410, configured to acquire an original document, identify the original document, and determine a to-be-filled field region in the original document, where the to-be-filled field region includes a to-be-filled field and an to-be-filled information region corresponding to the to-be-filled field.
The associated information determining module 1430 is configured to determine the associated information of the information area to be filled in the field area to be filled in.
And a form generating module 1440, configured to generate a form to be filled according to the field to be filled.
The information extracting module 1450 is configured to obtain a filled form corresponding to the to-be-filled form, and extract filled information in the filled form.
A new document generating module 1460, configured to associate the original document, the associated information, and the filled-in information, and generate a new document.
Further, the apparatus may further include:
the field identification module is used for identifying at least one field to be filled in the field area to be filled in; the method can be specifically used for performing semantic recognition on the content information in each field area to be filled to obtain one or more fields to be filled.
Referring to fig. 15, the association information determining module 1430 includes:
a first determining module 1510, configured to determine, for each field to be filled in, an offset position of the information area to be filled in corresponding to the field to be filled in with respect to the field to be filled in.
A second determining module 1520, configured to determine context information of the information area to be filled.
Referring to fig. 16, the form generating module 1440 includes:
a blank form generating module 1610 configured to generate a blank form.
And a field filling module 1620, configured to traverse the fields to be filled, sequentially fill the fields to be filled into the blank form, generate a corresponding filling area and a corresponding format check attribute for each field to be filled, and generate the form to be filled.
A corresponding relationship establishing module 1630, configured to obtain a document identifier of the original document, associate the form to be filled with the document identifier, and establish a corresponding relationship between the form to be filled and the document identifier.
Referring to fig. 17, the new document generation module 1460 includes:
a document identifier associating module 1710, configured to associate a target document identifier corresponding to the filled-in form according to the corresponding relationship between the to-be-filled-in form and the document identifier, and the corresponding relationship between the to-be-filled-in form and the filled-in form.
The copying module 1720 is configured to copy the original document corresponding to the target document identifier to obtain a copied document.
An information filling module 1730, configured to fill the filled-in information into the information area to be filled in of the copied document according to the offset position of the information area to be filled in with respect to the field to be filled in and the context information of the information area to be filled in, so as to generate the new document.
Referring to fig. 18, the to-be-filled region determining module 1410 includes:
the scanning module 1810 is configured to scan the original document by taking a line or a paragraph as a unit.
A third determining module 1820, configured to determine, when there is information in a preset format in the original document, a line or a paragraph where the information in the preset format is located as the field area to be filled in.
The recording module 1830 is configured to record a line number or a paragraph number of the field area to be filled in the original document.
Referring to fig. 19, the form generating module 1440 further comprises:
a word segmentation module 1910 configured to perform word segmentation on the field to be filled in.
The naming module 1920 is configured to name the form to be filled according to the fixed language when the field to be filled after word segmentation includes the subject type word and the fixed language corresponding to the subject type word.
The device provided in the above embodiments can execute the method provided in any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method. Technical details that have not been elaborated upon in the above-described embodiments may be referred to a method provided in any embodiment of the invention.
The present embodiments also provide a computer-readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded by a processor and performs any of the methods described above in the present embodiments.
Referring to fig. 20, the apparatus 2000 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 2022 (e.g., one or more processors) and a memory 2032, and one or more storage media 2030 (e.g., one or more mass storage devices) for storing applications 2042 or data 2044. The memory 2032 and the storage medium 2030 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 2030 may include one or more modules (not shown in the drawings), each of which may include a series of instruction operations in the device. Further, the central processor 2022 may be arranged to communicate with the storage medium 2030 to execute a series of instruction operations in the storage medium 2030 on the device 2000. The apparatus 2000 may also include one or more power supplies 2026, one or more wired or wireless network interfaces 2050, one or more input-output interfaces 2058, and/or one or more operating systems 2041, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc. Any of the methods described above in this embodiment can be implemented based on the apparatus shown in fig. 20.
The present specification provides method steps as described in the examples or flowcharts, but may include more or fewer steps based on routine or non-inventive labor. The steps and sequences recited in the embodiments are but one manner of performing the steps in a multitude of sequences and do not represent a unique order of performance. In the actual system or interrupted product execution, it may be performed sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
The configurations shown in the present embodiment are only partial configurations related to the present application, and do not constitute a limitation on the devices to which the present application is applied, and a specific device may include more or less components than those shown, or combine some components, or have an arrangement of different components. It should be understood that the methods, apparatuses, and the like disclosed in the embodiments may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a division of one logic function, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or unit modules.
Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for generating a formatted document, comprising:
acquiring an original document, identifying the original document, and determining a field area to be filled in the original document based on information in a preset format, wherein the field area to be filled in comprises a field to be filled and an information area to be filled corresponding to the field to be filled in; the information in the preset format indicates a universal document character with filling-in guide intention;
determining the associated information of the information area to be filled in the field area to be filled in;
generating a corresponding form to be filled according to filling object information carried by the field to be filled;
acquiring a filled form corresponding to the form to be filled, and extracting filled information in the filled form;
associating the original document, the associated information and the filled-in information to generate a new document;
generating a corresponding form to be filled according to the filling object information carried by the field to be filled further comprises:
performing word segmentation on the field to be filled;
and when the field to be filled after word segmentation comprises a subject word and a fixed language corresponding to the subject word, naming the form to be filled according to the fixed language, wherein the fixed language indicates the information of the filling object.
2. The method as claimed in claim 1, wherein the step of determining the associated information of the field area to be filled in comprises:
for each field to be filled, determining the offset position of the information area to be filled corresponding to the field to be filled relative to the field to be filled;
and determining the context information of the information area to be filled in.
3. The method as claimed in claim 2, wherein the generating the corresponding form to be filled according to the filling object information carried by the field to be filled comprises:
generating a blank form;
traversing the fields to be filled, sequentially filling the fields to be filled into the blank form, generating a corresponding filling area and a corresponding format check attribute for each field to be filled, and generating the form to be filled;
and acquiring a document identifier of the original document, associating the form to be filled with the document identifier, and establishing a corresponding relation between the form to be filled and the document identifier.
4. The method of claim 3, wherein associating the original document, the associated information, and the filled-in information to generate a new document comprises:
determining a target document identifier corresponding to the filled-in form according to the corresponding relationship between the to-be-filled-in form and the document identifier and the corresponding relationship between the to-be-filled-in form and the filled-in form;
copying an original document corresponding to the target document identification to obtain a copied document;
and filling the filled information into the information area to be filled of the copied document according to the offset position of the information area to be filled relative to the field to be filled and the context information of the information area to be filled, and generating the new document.
5. The method as claimed in claim 1, wherein the obtaining of the original document, the recognition of the original document, and the determining of the field area to be filled in based on the information in the preset format comprises:
scanning the original document by taking a line or a paragraph as a unit;
when the original document is scanned to have the information in the preset format, determining a line or paragraph where the information in the preset format is located as the field area to be filled;
and recording the line number or paragraph number of the field area to be filled in the original document.
6. The method of claim 1, further comprising:
and performing semantic recognition on the content information in each field area to be filled to obtain one or more fields to be filled.
7. A formatted document generation apparatus, comprising:
the system comprises a to-be-filled area determining module, a to-be-filled area determining module and a to-be-filled area determining module, wherein the to-be-filled area determining module is used for acquiring an original document, identifying the original document and determining a to-be-filled field area in the original document based on information in a preset format, and the to-be-filled field area comprises a to-be-filled field and an to-be-filled information area corresponding to the to-be-filled field; the information in the preset format indicates a universal document character with filling-in guide intention;
the associated information determining module is used for determining the associated information of the information area to be filled in the field area to be filled in;
the form generation module is used for generating a corresponding form to be filled according to filling object information carried by the field to be filled;
the information extraction module is used for acquiring a filled form corresponding to the form to be filled and extracting filled information in the filled form;
a new document generation module, configured to associate the original document, the association information, and the filled-in information, and generate a new document;
the form generation module is also used for segmenting the fields to be filled in; and when the field to be filled after word segmentation comprises a subject word and a fixed language corresponding to the subject word, naming the form to be filled according to the fixed language, wherein the fixed language indicates the information of the filling object.
8. An apparatus for formatted document generation, comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and wherein the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the formatted document generation method of any of claims 1 to 6.
9. A computer storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions that is loaded by a processor and that performs a formatted document generation method according to any one of claims 1 to 6.
CN201910527126.7A 2019-06-18 2019-06-18 Format document generation method, device, equipment and storage medium Active CN112183036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910527126.7A CN112183036B (en) 2019-06-18 2019-06-18 Format document generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910527126.7A CN112183036B (en) 2019-06-18 2019-06-18 Format document generation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112183036A CN112183036A (en) 2021-01-05
CN112183036B true CN112183036B (en) 2022-04-19

Family

ID=73914420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910527126.7A Active CN112183036B (en) 2019-06-18 2019-06-18 Format document generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112183036B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883012B (en) * 2021-02-08 2022-10-28 建信金融科技有限责任公司 Implementation method and device of Domino data table component
CN113296613A (en) * 2021-03-12 2021-08-24 阿里巴巴新加坡控股有限公司 Customs clearance information processing method and device and electronic equipment
CN112800763B (en) * 2021-04-14 2021-08-06 北京金山云网络技术有限公司 Data processing method, medical text data processing method and device and electronic equipment
CN113283224A (en) * 2021-06-09 2021-08-20 京东方科技集团股份有限公司 Form generation method and device, electronic equipment and storage medium
CN113434504B (en) * 2021-06-28 2023-10-24 青岛海尔科技有限公司 Method and device for storing death medical evidence table, storage medium and electronic device
CN113486637A (en) * 2021-07-07 2021-10-08 上海中通吉网络技术有限公司 Intelligent dynamic custom contract generation method and device
CN114936950A (en) * 2022-04-27 2022-08-23 上海乾臻信息科技有限公司 Electronic contract signing method and related device
CN115169302B (en) * 2022-09-08 2022-12-09 天津联想协同科技有限公司 Data collection method and device based on online form document and storage medium
CN116681042B (en) * 2023-08-01 2023-10-10 成都信通信息技术有限公司 Content summary generation method, system and medium based on keyword extraction
CN116663509B (en) * 2023-08-02 2023-09-29 四川享宇科技有限公司 Automatic information acquisition and filling robot for banking complex system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461507A (en) * 2014-11-10 2015-03-25 吴涛军 Organization, presentation and user response of information fragments and multi-information-fragment collaboration
CN105095168A (en) * 2015-07-17 2015-11-25 北京奇虎科技有限公司 Automatic generation method and device for contract files
CN108287927A (en) * 2018-03-05 2018-07-17 北京百度网讯科技有限公司 Method and device for obtaining information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207936B (en) * 2010-03-30 2013-10-23 国际商业机器公司 Method and system for indicating content change of electronic document
US9813670B2 (en) * 2014-08-20 2017-11-07 Liveoak Technologies, Inc. Online conference system with real-time document transaction platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461507A (en) * 2014-11-10 2015-03-25 吴涛军 Organization, presentation and user response of information fragments and multi-information-fragment collaboration
CN105095168A (en) * 2015-07-17 2015-11-25 北京奇虎科技有限公司 Automatic generation method and device for contract files
CN108287927A (en) * 2018-03-05 2018-07-17 北京百度网讯科技有限公司 Method and device for obtaining information

Also Published As

Publication number Publication date
CN112183036A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112183036B (en) Format document generation method, device, equipment and storage medium
CN107766371B (en) Text information classification method and device
US20210366055A1 (en) Systems and methods for generating accurate transaction data and manipulation
CN102779140A (en) Keyword acquiring method and device
CN112651218A (en) Automatic generation method and management method of bidding document, medium and computer
CN112417274A (en) Message pushing method and device, electronic equipment and storage medium
EP3918512A1 (en) System and method for spatial encoding and feature generators for enhancing information extraction
CN115828874A (en) Industry table digital processing method based on image recognition technology
CN110990651B (en) Address data processing method and device, electronic equipment and computer readable medium
CN109005167B (en) Authentication data processing method and device, server and storage medium
CN110765276A (en) Entity alignment method and device in knowledge graph
WO2022247231A1 (en) Resume screening method, resume screening apparatus, terminal device and storage medium
CN114708186A (en) Electronic signature positioning method and device
CN111581937A (en) Document generation method and device, computer readable medium and electronic equipment
CN107491530B (en) Social relationship mining analysis method based on file automatic marking information
CN111159411A (en) Knowledge graph fused text position analysis method, system and storage medium
CN110688842A (en) Document title level analysis method and device and server
CN114495138A (en) Intelligent document identification and feature extraction method, device platform and storage medium
CN115935231A (en) Data classification method, device, equipment and storage medium
CN115374198A (en) Urban global data processing method and device
CN113988020A (en) Engineering technical label book compiling method, device, equipment and storage medium
CN112417220A (en) Heterogeneous data integration method
CN112991131A (en) Government affair data processing method suitable for electronic government affair platform
CN112348022A (en) Free-form document identification method based on deep learning
CN113688607B (en) Portrait updating method and device for online document authors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40037757

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant