CN110688349B - Document sorting method, device, terminal and computer readable storage medium - Google Patents
Document sorting method, device, terminal and computer readable storage medium Download PDFInfo
- Publication number
- CN110688349B CN110688349B CN201910820963.9A CN201910820963A CN110688349B CN 110688349 B CN110688349 B CN 110688349B CN 201910820963 A CN201910820963 A CN 201910820963A CN 110688349 B CN110688349 B CN 110688349B
- Authority
- CN
- China
- Prior art keywords
- document
- content
- keywords
- sorted
- information corresponding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/113—Details of archiving
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/144—Query formulation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a document sorting method, a device, a terminal and a computer readable storage medium, wherein the method comprises the following steps: determining a plurality of content keywords, and acquiring a document to be collated according to a set target path; scanning the document to be sorted, and respectively extracting information corresponding to the content keywords from the document to be sorted; and filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document. By implementing the method, the documents can be automatically tidied, and the documents are tidied according to the rules set by the user, so that the complex and error-prone manual operation is solved, and the working efficiency is improved.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a document sorting method, a document sorting device, a document sorting terminal, and a computer readable storage medium.
Background
Along with the rapid development of the computer field, the electronic documents gradually replace the traditional paper documents, a large number of electronic documents such as financial documents, personnel documents and the like can be generated in the development of related works of enterprises, and along with the appearance of the electronic documents, the work of document arrangement also appears, and when the documents are required to be arranged, most enterprises adopt a manual operation method.
At present, for the method of document arrangement, the operations of searching the document, opening the document, extracting the document content and copying and pasting the document to the target table document are all manually executed, so that the operation is quite troublesome, time-consuming and labor-consuming, and the operation is easy to go wrong in arrangement, so that the working efficiency cannot be improved.
Disclosure of Invention
The embodiment of the invention provides a document sorting method, a device, a terminal and a computer readable storage medium, which can automatically sort documents and sort the documents according to rules set by a user, solve the problems of complicated and error-prone manual operation and improve the working efficiency.
The embodiment of the invention discloses a document finishing method, which comprises the following steps:
determining a plurality of content keywords, and acquiring a document to be collated according to a set target path;
scanning the document to be sorted, and respectively extracting information corresponding to the content keywords from the document to be sorted;
and filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document.
The second aspect of the embodiment of the invention discloses a document finishing device, which comprises:
the acquisition module is used for determining a plurality of content keywords and acquiring the document to be collated according to the set target path;
the extraction module is used for scanning the document to be sorted and respectively extracting information corresponding to the content keywords from the document to be sorted;
and the filling module is used for respectively filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document.
A third aspect of the embodiments of the present invention discloses a terminal, comprising a processor and a memory, the processor and the memory being connected to each other, wherein the memory is configured to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of the first aspect.
A fourth aspect of the embodiments of the present invention discloses a computer readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect described above.
In the embodiment of the invention, the terminal determines a plurality of content keywords, acquires a document to be sorted according to a set target path, scans the document to be sorted, respectively extracts information corresponding to the content keywords from the document to be sorted, and further fills the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document. By implementing the method, the documents can be automatically tidied, and the documents are tidied according to the rules set by the user, so that the complex and error-prone manual operation is solved, and the working efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a document finishing method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another document finishing method according to an embodiment of the present invention;
FIG. 3 is a sort interface provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a document finishing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a schematic flow chart of a document sorting method according to an embodiment of the present invention is shown. The document sorting method described in the present embodiment includes the steps of:
101: and determining a plurality of content keywords, and acquiring the document to be processed according to the set target path.
The document to be sorted can include one or more documents, such as intellectual property documents, financial documents, personnel documents, etc., the document to be sorted can be of a word, excel form, PPT slide, etc., the content keywords can be set according to the requirements of users, and the content keywords can be keywords such as document content titles, dates, etc., for example, the application dates, the issue dates, the authorization dates, the application numbers, the applicant, the inventor, etc. in patent related documents can be set as content keywords.
Specifically, when a user needs to sort documents such as intellectual property documents, the user can set a plurality of content keywords and target paths of the documents to be sorted, after the user sets the content keywords and the target paths of the documents to be sorted, the terminal obtains a document sorting request from the user, the document sorting request comprises the plurality of content keywords and the target paths of the documents to be sorted, and the terminal obtains the documents to be sorted according to the target paths.
For example, as shown in fig. 3, when a user needs to sort documents such as intellectual property documents, the terminal display screen outputs a sort interface, where the sort interface includes a parameter setting area for the user to input a target path, content keywords, and document keywords of the document to be sorted, and a status indication area for displaying the progress of the document sorting, where the progress of the document sorting may be expressed in percentage. For example, the user inputs a target path of the document to be collated in a path input box of searching for a file name in the parameter setting area, inputs a content keyword of the document to be collated in a content keyword input box, and after clicking a corresponding "ok" button, the terminal obtains a document collating request from the user, where the document collating request includes a plurality of content keywords input by the user in the parameter setting area and the target path of the document to be collated, and further, the terminal obtains the document to be collated according to the target path.
It should be noted that, the target paths of all the documents to be sorted may be under the same path or different paths, and the target paths of the documents to be sorted may be paths newly created by the user when sorting the documents to be sorted, or may be paths of the original documents to be sorted before sorting the documents by the user, where the target paths of the documents to be sorted are set and selected by the user.
102: scanning a document to be sorted, and respectively extracting information corresponding to a plurality of content keywords from the document to be sorted.
Specifically, after determining a document to be sorted according to a target path, the terminal scans the document to be sorted, and extracts information corresponding to a plurality of content keywords from the document to be sorted. In the process that the terminal extracts information corresponding to the plurality of content keywords from the document to be sorted, the terminal firstly obtains the name of the document to be sorted, and extracts the information corresponding to the plurality of content keywords from the name of the document to be sorted, further, the terminal detects whether the target content keywords which do not extract the corresponding information exist in the plurality of content keywords, and if the target content keywords which do not extract the corresponding information exist in the plurality of content keywords, the terminal scans the content of the document to be sorted and extracts the information corresponding to the target content keywords from the content of the document to be sorted.
For example, the content keyword set by the user is "date of application", the terminal scans the document to be sorted, and extracts the corresponding information according to the "date of application", for example, the description about the "date of application" in the document to be sorted is "date of application: 2018.03.30", the information corresponding to the" application date "extracted by the terminal is" 2018.03.30".
103: and filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document.
The summary document is word or Excel for information collection of the document to be processed.
Specifically, the terminal may obtain a target table in the summary document, determine a target header associated with each content keyword from the headers of the target table, and fill, for each content keyword, the extracted information corresponding to the content keyword into a corresponding position of the target header associated with the content keyword.
For example, when a user sorts an intellectual property related document, content keywords preset by the user may be a document content title, an application date, an application number, an applicant, and an inventor. For example, table 1 is a target table in a summary document, the terminal needs to fill information corresponding to content keywords in the document to be sorted into table 1, before filling, the terminal may obtain the target table in the summary document, that is, table 1, and determine a target header associated with each content keyword from the headers in table 1, where the target header is a document content title, an application date, an application number, an applicant, and an inventor, further, for each content keyword, the terminal fills the extracted information corresponding to the content keyword into a corresponding position of the target header associated with the content keyword, and the header in table 1 is a corresponding position of a date of transmission and an authorization date without filling information.
Table 1:
file header | Date of filling | Day of the hair | Day of authorization | Application number | Applicant | Inventor(s): |
for another example, when the user organizes the personnel files, the content keywords preset by the user may be employee name, date of birth, academic, graduation, home address, contact, and related information. For example, table 2 is a target table in the summary document, the terminal needs to fill information corresponding to the content keywords in the document to be sorted into table 2, before filling, the terminal may acquire the target table in the summary document, that is, table 2, and determine a target header associated with each content keyword from the headers in table 2, where the target header is employee name, birth date, academy, graduation institution, home address, contact address, and relative information, and further, for each content keyword, the terminal fills the extracted information corresponding to the content keyword into a corresponding position of the target header associated with the content keyword.
Table 2:
employee name | Birth date | Learning calendar | Graduation universities and colleges | Household address | Contact means | Relative information |
In one implementation, after the terminal scans the current document in the document to be sorted, and respectively acquires the information corresponding to the content keywords from the current document, the terminal extracts the information corresponding to the content keywords acquired from the current document into a cache space, then respectively fills the information corresponding to the content keywords in the cache space in the position matched with each content keyword in the summary document, and then judges whether the current document is the last document of the document to be sorted, if not, scans the next document of the current document, and if so, finishes the scanning.
It should be noted that, the target table in the summary document is not limited to the table in the excel document or the word document, where multiple tables may exist in the summary document, and the setting and selection of the target table in the summary document are performed by the user, which is not limited to the embodiment of the present invention.
In the embodiment of the invention, the terminal determines a plurality of content keywords, acquires a document to be sorted according to a set target path, scans the document to be sorted, respectively extracts information corresponding to the content keywords from the document to be sorted, and further fills the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document. By implementing the method, the documents can be automatically tidied, and the documents are tidied according to the rules set by the user, so that the complex and error-prone manual operation is solved, and the working efficiency is improved.
Referring to fig. 2, a flowchart of another document finishing method according to an embodiment of the present invention is shown. The document sorting method described in the present embodiment includes the steps of:
201: and acquiring target tables in the summary document, and respectively determining the headers of the target table files as content keywords.
Wherein, the target table in the summary document can be set by the user, and the target table is not limited to the table in the excel document or the word document.
Specifically, the terminal may obtain the target table in the summary document, and determine the header of the target table file as the content keyword respectively. For example, table 1 is a target table in the summary document, then the header content in the table: file title, filing date, textday, authorizing date, filing number, applicant, inventor are determined as content keywords.
202: and acquiring the document to be sorted according to the set target path.
Specifically, the terminal may obtain preset document keywords, where the document keywords include one or more of a document type (txt, xls, xlsx, doc, docx, pptx, etc.), a document name, and a document editing time, scan all documents under a set target path, screen documents matching the document keywords from all the documents, and determine the documents matching the document keywords as documents to be sorted.
For example, as shown in fig. 3, the user inputs a target path of a document to be collated in a path input box for searching for a file name, inputs a content keyword of the document to be collated in a content keyword input box, inputs a document keyword of the document to be collated in a document keyword input box, and clicks a "ok" button, and then the terminal obtains a document collating request from the user, where the document collating request includes the target path of the document to be collated, the content keyword and the document keyword, which are input by the user in a parameter setting area, and the terminal judges whether the path is a document or a folder one by one according to the target path. If the file is a folder, the terminal continues to search the file in the folder until no folder exists and only the file exists, if a plurality of files exist, screening is carried out according to the keywords of the files, the files matched with the keywords of the files are screened, and the files matched with the keywords of the files are determined to be files to be tidied.
203: scanning a document to be sorted, and respectively extracting information corresponding to a plurality of content keywords from the document to be sorted.
Specifically, for the specific implementation of step 203, reference may be made to the description related to step 103 in the above embodiment, which is not repeated here.
204: and filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document.
Specifically, the terminal may obtain a target table in the summary document, determine a target header associated with each content keyword from the headers of the target table, and fill, for each content keyword, the extracted information corresponding to the content keyword into a corresponding position of the target header associated with the content keyword.
In one implementation, the terminal scans all contents of the document to be sorted according to the content keywords set by the user, acquires information corresponding to the content keywords set by the user, and stores the acquired information corresponding to the content keywords in a cache for standby. The terminal does not store the information corresponding to the content keywords of all the documents to be sorted in the cache, but scans one document to be sorted to process one document to be sorted, and fills the information corresponding to each content keyword in the cache into the corresponding position of the target header in the summarized document by taking one document to be sorted as a unit, if the header in the table 1 is: file title, filing date, date of hair, date of authority, filing number, applicant, inventor. The terminal scans a document to be sorted and extracts information corresponding to content keywords, fills the information corresponding to the content keywords in the document to be sorted into the corresponding position of the target header in the summary table document, and does not fill the document to be sorted after scanning all the documents to be sorted and extracting the information corresponding to the content keywords, so that the situation of insufficient cache caused by too many files and too large content is avoided. After searching and matching all the files to be sorted, the terminal prompts the user that the content searching is finished, namely, the task status frame of the status indication area is 100% displayed as shown in fig. 3, and meanwhile, the target table in the summary document is filled.
205: adding an identifier to the document to be processed, recording the processing time of the summary document, and periodically acquiring the editing time of the document under the target path.
Wherein identifying the location of the information marking the collated document in the summary document, for example, assuming that the information in table 1 is filled completely, the terminal adds an identification to the collated document in table 1, identifying the location of the information used to specify a certain collated document in the summary document, e.g., the information corresponding to collated document a is filled in table 1 of the summary document, in particular in the second row of table 1, the identification of collated document a is the second row of table 1 of the summary document.
Specifically, the terminal may acquire the editing time of the document under the target path at a fixed time per day, such as 17:00 per day.
206: when there is a target document whose editing time is later than the finishing time, target information corresponding to the content keyword is extracted from the target document.
207: and replacing the information of the position corresponding to the identification of the target document in the summarized document with the target information.
Specifically, the sorted documents may be recorded with errors in the relevant time in the document content for some reasons, the user may modify the content of a certain document in the sorted documents, at this time, the editing time of the document, such as the modification date, may change, and if the changed editing time is found to be later than the sorting time, the information of the document filled in the summary document may have error information, so the terminal needs to periodically obtain the editing time of the document in the target path, determine that the editing time of the document is later than the sorting time, and when there is a target document whose editing time is later than the sorting time, extract the target information corresponding to the content keyword from the target document, and replace the target information with the information of the position corresponding to the identifier of the target document in the summary document.
In one implementation, when the terminal detects that no identifier is added to the document to be sorted, and the editing time of the document to be sorted is later than the sorting time, the terminal scans the document to be sorted, extracts information corresponding to a plurality of content keywords from the document to be sorted, and further fills the extracted information corresponding to the plurality of content keywords into positions, matched with each content keyword, in the summary document, and adds the identifier to the document.
Therefore, after the arrangement time of the summary document is recorded, if the target document with the editing time later than the arrangement time is detected, the target information corresponding to the content keyword is required to be extracted from the target document, the target information is replaced with the information of the position corresponding to the identification of the target document in the summary document, and the arrangement time of the summary document is changed when the information in the summary document is replaced, so that the terminal needs to update the arrangement time of the summary document every time the summary document information is arranged.
In one implementation, after finishing the summary document arrangement, the terminal may scan the contents in the target table, if the information under the set header is the same, the information combination is performed on the same line number of the information under the set header, and the document identifier is modified at the same time, where the information under the set header should be used for uniquely identifying whether the information under the set header represents the same attribute, such as application number, identity card number, and the like. Wherein, the setting header can be set by a user.
For example, as shown in table 1, when the user sets the application number as the set header and the terminal scans the contents in the target table, it finds that the application numbers in the first row and the third row in table 1 are the same, and the terminal merges the information in the first row and the third row and fills the merged information in the first row, where which row in table 1 can be set by the user and can be in the first row or the third row, because the position of the information of the sorted document in table 1 changes after the information is merged, the identifier of the sorted document needs to be modified, for example, when the merged information is filled in the first row, the identifier of the sorted document corresponding to the information of the third row before the merging should be modified from the third row in table 1 to the first row in table 1.
In the embodiment of the invention, a terminal acquires a target table in a summary document, respectively determines the header of a target table file as content keywords, then acquires the document to be sorted according to a set target path, scans the document to be sorted, respectively extracts information corresponding to a plurality of content keywords from the document to be sorted, respectively fills the extracted information corresponding to the plurality of content keywords into positions matched with each content keyword in the summary document, further, the terminal adds an identifier to the document to be sorted, records the sorting time of the summary document, periodically acquires the editing time of the document under the target path, extracts target information corresponding to the content keywords from the target document when the target document with the editing time later than the sorting time exists, and replaces the information corresponding to the identifier of the target document in the summary document with the target information. By implementing the method, the documents can be automatically tidied, and the documents are tidied according to the rules set by the user, so that the complex and error-prone manual operation is solved, and the working efficiency is improved.
Fig. 4 is a schematic structural diagram of a document finishing apparatus according to an embodiment of the present invention. The document finishing apparatus includes:
an obtaining module 401, configured to determine a plurality of content keywords, and obtain a document to be collated according to a set target path;
an extracting module 402, configured to scan the document to be collated, and extract information corresponding to the plurality of content keywords from the document to be collated, respectively;
and a filling module 403, configured to fill the extracted information corresponding to the plurality of content keywords into positions matching each content keyword in the summary document.
In one implementation, the extracting module 402 is specifically configured to:
acquiring the names of the documents to be sorted, and respectively extracting information corresponding to the content keywords from the names of the documents to be sorted;
scanning the content of the document to be sorted under the condition that the target content keywords which do not extract the corresponding information exist in the content keywords;
and extracting information corresponding to the target content keywords from the content of the document to be sorted.
In one implementation, the obtaining module 401 is specifically configured to:
acquiring preset document keywords, wherein the document keywords comprise one or more of document types, document names and document editing time;
scanning all documents under the set target path, and screening documents matched with the document keywords from all the documents;
and determining the documents matched with the document keywords as documents to be processed.
In one implementation, the filling module 403 is specifically configured to:
acquiring a target table in a summary document, and determining a target table head associated with each content keyword from the table heads of the target table;
and filling the extracted information corresponding to the content keywords into the corresponding positions of the target headers associated with the content keywords for each content keyword.
In one implementation, the obtaining module 401 is specifically configured to:
acquiring a target table in the summary document;
and respectively determining the headers of the target table files as content keywords.
In one implementation manner, the extracting module 402 is specifically configured to scan a current document in the documents to be sorted, obtain information corresponding to the plurality of content keywords from the current document, and extract the obtained information corresponding to the plurality of content keywords into a cache space;
the filling module 403 is specifically configured to fill information corresponding to the plurality of content keywords of the current document in the cache space to a position matching each content keyword in the summary document, and determine whether the current document is the last document of the documents to be sorted, and if not, scan a next document of the current document; if yes, the scanning is ended.
In one implementation manner, the obtaining module 401 is further configured to add an identifier to the document to be collated, where the identifier is used to mark a position of information of the collated document in the summary document, record a collating time of the summary document, and periodically obtain an editing time of the document under the target path;
the extracting module 402 is further configured to extract, when there is a target document whose editing time is later than the sorting time, target information corresponding to the content keyword from the target document;
the populating module 403 is further configured to replace the target information with information in the summary document at a location corresponding to the identifier of the target document.
It may be understood that the functions of each functional module of the document finishing apparatus described in the embodiments of the present invention may be specifically implemented according to the method in the embodiment of the method described in fig. 1 or fig. 2, and the specific implementation process may refer to the relevant description of the embodiment of the method in fig. 1 or fig. 2, which is not repeated herein.
In the embodiment of the present invention, the obtaining module 401 determines a plurality of content keywords, obtains a document to be collated according to a set target path, the extracting module 402 scans the document to be collated, and extracts information corresponding to the plurality of content keywords from the document to be collated, and further, the filling module 403 fills the extracted information corresponding to the plurality of content keywords into a position matching each content keyword in the summary document. By implementing the method, the documents can be automatically tidied, and the documents are tidied according to the rules set by the user, so that the complex and error-prone manual operation is solved, and the working efficiency is improved.
Referring to fig. 5, a schematic structural diagram of a terminal is provided in an embodiment of the present invention. The terminal described in this embodiment includes: a processor 501 and a memory 502. The processor 501 and the memory 502 are connected via a bus.
The processor 501 may be a central processing unit (Central Processing Unit, CPU) which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 502 may include read only memory and random access memory and provides program instructions and data to the processor 501. A portion of memory 502 may also include non-volatile random access memory. Wherein the processor 501, when calling the program instructions, is configured to execute:
determining a plurality of content keywords, and acquiring a document to be collated according to a set target path;
scanning the document to be sorted, and respectively extracting information corresponding to the content keywords from the document to be sorted;
and filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document.
In one implementation, the processor 501 is specifically configured to:
acquiring the names of the documents to be sorted, and respectively extracting information corresponding to the content keywords from the names of the documents to be sorted;
scanning the content of the document to be sorted under the condition that the target content keywords which do not extract the corresponding information exist in the content keywords;
and extracting information corresponding to the target content keywords from the content of the document to be sorted.
In one implementation, the processor 501 is specifically configured to:
acquiring preset document keywords, wherein the document keywords comprise one or more of document types, document names and document editing time;
scanning all documents under the set target path, and screening documents matched with the document keywords from all the documents;
and determining the documents matched with the document keywords as documents to be processed.
In one implementation, the processor 501 is specifically configured to:
acquiring a target table in a summary document, and determining a target table head associated with each content keyword from the table heads of the target table;
and filling the extracted information corresponding to the content keywords into the corresponding positions of the target headers associated with the content keywords for each content keyword.
In one implementation, the processor 501 is specifically configured to:
acquiring a target table in the summary document;
and respectively determining the headers of the target table files as content keywords.
In one implementation, the processor 501 is specifically configured to:
scanning a current document in the documents to be sorted, and respectively acquiring information corresponding to the content keywords from the current document;
extracting the acquired information corresponding to the content keywords into a cache space;
filling information corresponding to the content keywords of the current document in the cache space into positions matched with each content keyword in the summarized document respectively;
judging whether the current document is the last document of the document to be tidied, if not, scanning the next document of the current document; if yes, the scanning is ended.
In one implementation, the processor 501 is further configured to:
adding an identifier to the document to be collated in the document to be collated, wherein the identifier is used for marking the position of the information of the collated document in the summary document;
recording the arrangement time of the summarized documents, and periodically acquiring the editing time of the documents under the target path;
when a target document with editing time later than the arrangement time exists, extracting target information corresponding to the content keywords from the target document;
and replacing the information of the position corresponding to the identification of the target document in the summarized document with the target information.
In a specific implementation, the processor 501 and the memory 502 described in the embodiment of the present invention may perform the implementation described in the document sorting method provided in fig. 1 or fig. 2, or may perform the implementation of the document sorting apparatus described in fig. 4, which is not described herein.
In the embodiment of the present invention, the processor 501 may determine a plurality of content keywords, obtain a document to be collated according to a set target path, scan the document to be collated, extract information corresponding to the plurality of content keywords from the document to be collated, and further fill the extracted information corresponding to the plurality of content keywords into a position matching each content keyword in a summary document. Through the implementation of the mode, the documents can be automatically tidied, and are tidied according to the rules set by the user, so that the complex and error-prone manual operation is solved, and the working efficiency is improved.
The embodiment of the invention also provides a computer storage medium, and the computer storage medium stores program instructions, and the program can include part or all of the steps of the document sorting method in the corresponding embodiment of fig. 1 or fig. 2 when being executed.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present invention is not limited by the described action sequences, as some steps may be performed in other sequences or simultaneously, according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing has described in detail the methods, apparatuses, terminals and computer readable storage medium provided by the embodiments of the present invention, and specific examples have been applied to illustrate the principles and embodiments of the present invention, and the above description of the embodiments is only for aiding in understanding the methods and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (8)
1. A document arranging method, characterized by comprising:
determining a plurality of content keywords, and acquiring a document to be tidied according to a set target path, wherein the content keywords comprise one or more of a document content title and a document date; the obtaining the document to be sorted according to the set target path comprises the following steps: acquiring preset document keywords, wherein the document keywords comprise one or more of document types, document names and document editing time; scanning all documents under the set target path, and screening documents matched with the document keywords from all the documents; determining the documents matched with the document keywords as documents to be processed;
scanning the document to be sorted, and respectively extracting information corresponding to the content keywords from the document to be sorted; the scanning the document to be sorted, and extracting information corresponding to the content keywords from the document to be sorted, respectively, includes: acquiring the names of the documents to be sorted, and respectively extracting information corresponding to the content keywords from the names of the documents to be sorted; scanning the content of the document to be sorted under the condition that the target content keywords which do not extract the corresponding information exist in the content keywords; extracting information corresponding to the target content keywords from the content of the document to be sorted;
and filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document.
2. The method according to claim 1, wherein the filling the extracted information corresponding to the plurality of content keywords into the summarized document at the positions matching each content keyword, respectively, comprises:
acquiring a target table in a summary document, and determining a target table head associated with each content keyword from the table heads of the target table;
and filling the extracted information corresponding to the content keywords into the corresponding positions of the target headers associated with the content keywords for each content keyword.
3. The method of claim 2, wherein the determining a plurality of content keywords comprises:
acquiring a target table in the summary document;
and respectively determining the headers of the target table files as content keywords.
4. The method according to claim 1, wherein the scanning the document to be collated and extracting information corresponding to the plurality of content keywords from the document to be collated, respectively, includes:
scanning a current document in the documents to be sorted, and respectively acquiring information corresponding to the content keywords from the current document;
extracting the acquired information corresponding to the content keywords into a cache space;
the filling the extracted information corresponding to the content keywords into the positions matched with each content keyword in the summarized document respectively comprises the following steps:
filling information corresponding to the content keywords of the current document in the cache space into positions matched with each content keyword in the summarized document respectively;
after the extracted information corresponding to the content keywords is respectively filled in the positions matched with each content keyword in the summarized document, the method further comprises the following steps:
judging whether the current document is the last document of the document to be tidied, if not, scanning the next document of the current document; if yes, the scanning is ended.
5. The method according to claim 1, wherein after the filling the extracted information corresponding to the plurality of content keywords into the summarized document at the positions matching each content keyword, respectively, the method further comprises:
adding an identifier to the document to be collated in the document to be collated, wherein the identifier is used for marking the position of the information of the collated document in the summary document;
recording the arrangement time of the summarized documents, and periodically acquiring the editing time of the documents under the target path;
when a target document with editing time later than the arrangement time exists, extracting target information corresponding to the content keywords from the target document;
and replacing the information of the position corresponding to the identification of the target document in the summarized document with the target information.
6. A document finishing apparatus, the apparatus comprising:
the acquisition module is used for determining a plurality of content keywords, and acquiring a document to be processed according to a set target path, wherein the content keywords comprise one or more of a document content title and a document content date; the obtaining the document to be sorted according to the set target path comprises the following steps: acquiring preset document keywords, wherein the document keywords comprise one or more of document types, document names and document editing time; scanning all documents under the set target path, and screening documents matched with the document keywords from all the documents; determining the documents matched with the document keywords as documents to be processed;
the extraction module is used for scanning the document to be sorted and respectively extracting information corresponding to the content keywords from the document to be sorted; the scanning the document to be sorted, and extracting information corresponding to the content keywords from the document to be sorted, respectively, includes: acquiring the names of the documents to be sorted, and respectively extracting information corresponding to the content keywords from the names of the documents to be sorted; scanning the content of the document to be sorted under the condition that the target content keywords which do not extract the corresponding information exist in the content keywords; extracting information corresponding to the target content keywords from the content of the document to be sorted;
and the filling module is used for respectively filling the extracted information corresponding to the content keywords into positions matched with each content keyword in the summarized document.
7. A terminal comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is adapted to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-6.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820963.9A CN110688349B (en) | 2019-08-29 | 2019-08-29 | Document sorting method, device, terminal and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820963.9A CN110688349B (en) | 2019-08-29 | 2019-08-29 | Document sorting method, device, terminal and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110688349A CN110688349A (en) | 2020-01-14 |
CN110688349B true CN110688349B (en) | 2023-05-26 |
Family
ID=69108778
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910820963.9A Active CN110688349B (en) | 2019-08-29 | 2019-08-29 | Document sorting method, device, terminal and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110688349B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111552666B (en) * | 2020-03-23 | 2021-02-26 | 苏州沁游网络科技有限公司 | Resource acquisition method, device, equipment and storage medium |
CN112269870A (en) * | 2020-11-03 | 2021-01-26 | 北京字跳网络技术有限公司 | Document sorting method and device, electronic equipment and computer readable storage medium |
CN112800761B (en) * | 2020-12-25 | 2024-09-13 | 讯飞智元信息科技有限公司 | Information backfilling method, related electronic equipment and storage medium |
CN113505580A (en) * | 2021-07-26 | 2021-10-15 | 京东科技控股股份有限公司 | Method and device for analyzing table file |
CN114939532B (en) * | 2022-07-11 | 2022-11-08 | 河北汇金集团股份有限公司 | Sorting method for disordered documents |
CN115757915B (en) * | 2023-01-09 | 2023-04-28 | 佰聆数据股份有限公司 | Online electronic file generation method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159998A (en) * | 2015-09-08 | 2015-12-16 | 海南大学 | Keyword calculation method based on document clustering |
CN105608068A (en) * | 2014-11-17 | 2016-05-25 | 三星电子株式会社 | Display apparatus and method for summarizing of document |
CN106844328A (en) * | 2016-08-23 | 2017-06-13 | 华南师范大学 | A kind of new extensive document subject matter semantic analysis and system |
CN108073616A (en) * | 2016-11-14 | 2018-05-25 | 北京航天长峰科技工业集团有限公司 | A kind of magnanimity document keyword method for quickly retrieving based on big data technology |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273555A (en) * | 2017-08-18 | 2017-10-20 | 郑州云海信息技术有限公司 | A kind of document information extraction element and method |
CN108038095A (en) * | 2017-12-15 | 2018-05-15 | 四川汉科计算机信息技术有限公司 | A kind of document automatic creation method |
CN109284427A (en) * | 2018-08-30 | 2019-01-29 | 上海与德通讯技术有限公司 | A kind of document structure tree method, apparatus, server and storage medium |
CN109831323B (en) * | 2019-01-15 | 2022-04-05 | 网宿科技股份有限公司 | Server information management method, management system and server |
-
2019
- 2019-08-29 CN CN201910820963.9A patent/CN110688349B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608068A (en) * | 2014-11-17 | 2016-05-25 | 三星电子株式会社 | Display apparatus and method for summarizing of document |
CN105159998A (en) * | 2015-09-08 | 2015-12-16 | 海南大学 | Keyword calculation method based on document clustering |
CN106844328A (en) * | 2016-08-23 | 2017-06-13 | 华南师范大学 | A kind of new extensive document subject matter semantic analysis and system |
CN108073616A (en) * | 2016-11-14 | 2018-05-25 | 北京航天长峰科技工业集团有限公司 | A kind of magnanimity document keyword method for quickly retrieving based on big data technology |
Non-Patent Citations (2)
Title |
---|
Hongxi Wei等.A multiple instances approach to improving keyword spotting on historical Mongolian document images.《2015 13th International Conference on Document Analysis and Recognition (ICDAR)》.2015,第121-122页. * |
秦代辉 等.图书馆图书信息自动整合检索仿真研究.《计算机仿真》.2018,第第35卷卷(第第35卷期),第409-410页. * |
Also Published As
Publication number | Publication date |
---|---|
CN110688349A (en) | 2020-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110688349B (en) | Document sorting method, device, terminal and computer readable storage medium | |
US7324998B2 (en) | Document search methods and systems | |
CN101673256B (en) | Method and system for automatically extracting article metadata information based on word flow | |
CN112052749A (en) | Archive filing method and device, electronic equipment and computer readable storage medium | |
CN106503930B (en) | A kind of Note Auditing method and device | |
CN104346415B (en) | Method for naming image document | |
CN109241003B (en) | File management method and device | |
CN110516220B (en) | Report data input method, system and related equipment | |
CN110619115A (en) | Template creating method and device, electronic equipment and storage medium | |
CN112381087B (en) | Image recognition method, device, computer equipment and medium combining RPA and AI | |
CN109460518B (en) | Book recommendation method based on user website access records | |
CN117194322A (en) | File classification management method, system and computing device | |
CN114021716A (en) | Model training method and system and electronic equipment | |
CN114971556A (en) | File information summarizing method and device, electronic equipment and storage medium | |
CN111079375B (en) | Information sorting method and device, computer storage medium and terminal | |
CN113821691A (en) | Document processing method and device, electronic equipment and readable storage medium | |
CN117493712B (en) | PDF document navigable directory extraction method and device, electronic equipment and storage medium | |
JP2003132332A (en) | Learning data construction support device | |
CN105653525B (en) | Method and system for importing data between account sets | |
CN111061863B (en) | Journal catalog display method, device and equipment | |
CN113821482A (en) | Information processing method and device, electronic equipment and readable storage medium | |
US9990420B2 (en) | Method of searching and generating a relevant search string | |
CN103902178A (en) | Multi-media file processing method and device based on Android system | |
US20230326225A1 (en) | System and method for machine learning document partitioning | |
CN111046629B (en) | Outline display method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |