CN113094508A - Data detection method and device, computer equipment and storage medium - Google Patents
Data detection method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN113094508A CN113094508A CN202110462398.0A CN202110462398A CN113094508A CN 113094508 A CN113094508 A CN 113094508A CN 202110462398 A CN202110462398 A CN 202110462398A CN 113094508 A CN113094508 A CN 113094508A
- Authority
- CN
- China
- Prior art keywords
- document
- block
- specified
- target
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 232
- 238000012545 processing Methods 0.000 claims abstract description 66
- 238000000034 method Methods 0.000 claims abstract description 64
- 230000008569 process Effects 0.000 claims description 35
- 238000012795 verification Methods 0.000 claims description 20
- 238000012216 screening Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 13
- 238000007689 inspection Methods 0.000 claims description 7
- 230000006399 behavior Effects 0.000 claims description 4
- 238000011895 specific detection Methods 0.000 claims 2
- 230000006870 function Effects 0.000 description 11
- 238000007726 management method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000012550 audit Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000000275 quality assurance Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012954 risk control Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to the field of data processing, and provides a data detection method, a data detection device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a demand document to be detected; matching the document objects in the requirement document based on the check list, and generating an anchor point container corresponding to the requirement document; performing area division processing on the required document based on the anchor point container to obtain a plurality of document blocks; acquiring a specified document block, calling a specified detection algorithm corresponding to the specified document block to perform automatic detection processing on the specified document block, and generating a corresponding specified detection result; and acquiring detection results respectively corresponding to the document blocks, and generating a detection result report corresponding to the required document based on all the detection results. The method and the device can reduce the detection cost of the requirement document and improve the generation efficiency and accuracy of the detection result report of the requirement document. The method and the device can also be applied to the field of block chains, and the detection result report can be stored on the block chains.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data detection method, apparatus, computer device, and storage medium.
Background
Currently, inside an enterprise, system development often involves professional software engineering and professional process management, and the assurance of software quality therein often needs to rely on writing good requirement documents. Often, enterprises spend a certain amount of labor and material costs to check and monitor whether these required documents meet the QA (QUALITY ASSURANCE) standards of documents in the industry or group. The existing quality detection method for the required document is that an enterprise sets a functional unit for managing the process, and a large amount of manpower is used for manually checking the content of the required document. However, the method for manually checking the requirement document occupies a large amount of human resources and takes a large amount of time, which easily causes a low detection efficiency of the requirement document, and the manual detection has a high error rate, thereby affecting the accuracy of the generated document detection result.
Disclosure of Invention
The main purpose of the present application is to provide a data detection method, apparatus, computer device and storage medium, and aims to solve the technical problems that the existing manual method for checking a demand document needs to occupy a large amount of human resources and spend a large amount of time, the detection efficiency of the demand document is low easily, and the manual detection has a high error rate, thereby affecting the accuracy of the generated document detection result.
The application provides a data detection method, which comprises the following steps:
acquiring a demand document to be detected;
matching the document objects in the requirement document based on a preset check list, and generating an anchor point container corresponding to the requirement document;
performing area division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
acquiring a specified document block, and calling a specified detection algorithm corresponding to the block type of the specified document block to perform automatic detection processing on the specified document block to generate a corresponding specified detection result; the specified document block is any one block in all the document blocks;
and acquiring detection results corresponding to the document blocks respectively, and generating a detection result report corresponding to the required document based on all the detection results.
Optionally, the step of performing matching processing on a document object in the requirement document based on a preset check list to generate an anchor point container corresponding to the requirement document includes:
traversing the document objects of the required document line by line to obtain the object content of each line;
matching each object content with all anchor point information contained in the inspection list to obtain a plurality of corresponding matching results; wherein, the result content of the matching result is matching success or matching failure;
screening out result contents from all the matching results as specified matching results which are successfully matched, and acquiring specified object contents and specified anchor point information which respectively correspond to each specified matching result; acquiring address information of the designated object content in the requirement document, and generating designated anchor point address information corresponding to the designated anchor point information based on the address information;
creating an original anchor container;
and correspondingly storing the appointed anchor point address information and the appointed anchor point information in the original anchor point container, and generating the anchor point container corresponding to the requirement document.
Optionally, the step of performing area division processing on the required document based on the anchor container to obtain a plurality of corresponding document blocks includes:
acquiring all anchor address information stored in the anchor container;
inquiring the requirement document based on the anchor address information, and searching target row number information which is respectively the same as each anchor address information from the requirement document;
taking every two adjacent target line number information as dividing end points, and carrying out region division processing on the required document to obtain a plurality of corresponding document content regions;
and taking the document content area as the document block.
Optionally, the step of invoking a designated detection algorithm corresponding to the block type of the designated document block to perform automatic detection processing on the designated document block and generate a corresponding designated detection result includes:
acquiring target anchor point information corresponding to the specified document block from the anchor point container;
judging whether the block type of the specified document block belongs to the table type block or not based on the target anchor point information;
if the block type of the specified document block belongs to the tabular block, obtaining a specified tabular object in the tabular block;
traversing the specified table object to acquire first data corresponding to a first cell of a target row; wherein, the target behavior is any one of all the rows contained in the specified table object;
judging whether the title of the first data contains first target content;
if the title of the first data contains the first target content, acquiring second data corresponding to a second cell of the target row;
judging whether the second data is a first preset value or not;
if the second data is not the first preset value, judging whether the second data contains a first target field;
if the second data does not contain the first target field, judging whether the second data contains second target content;
and if the second data contains the second target content, judging that the table type block passes the check, and generating a first detection result of the check.
Optionally, after the step of determining whether the block type of the specified document block belongs to the table type block based on the target anchor information, the method includes:
if the block type of the specified document block does not belong to the table type block, judging whether the block type of the specified document block belongs to the paragraph type block or not based on the target anchor point information;
if the block type of the specified document block belongs to the paragraph type block, splicing each line of texts in the paragraph type block to obtain a processed target paragraph object;
judging whether the target paragraph object is a second preset value or not;
if the target paragraph object is not the second preset value, judging whether the target paragraph object contains a second target field;
if the target paragraph object does not contain the second target field, determining whether the target paragraph object contains a third target field;
and if the target paragraph object comprises the third target field, judging that the paragraph type block passes the check, and generating a second detection result of the check.
Optionally, after the step of determining whether the block type of the specified document block belongs to the table type block based on the target anchor information, the method includes:
if the block type of the specified document block does not belong to the table type block, judging whether the block type of the specified document block belongs to the picture type block or not based on the target anchor point information;
if the block type of the specified document block belongs to the picture type block, acquiring the number of pictures in each row of pictures in the picture type block;
filling the number of the pictures into a preset container to obtain a target picture container;
calculating the total number of pictures of the target picture container, and judging whether the total number of the pictures is smaller than a preset number;
and if the total number of the pictures in the target picture container is not less than the preset number, judging that the picture type blocks pass the verification, and generating a third detection result of the passing verification.
Optionally, the step of acquiring the requirement document to be detected includes:
acquiring a folder path to be detected, and judging whether the folder path is empty or not;
if the folder path is not empty, traversing all the documents corresponding to the folder path, and screening out a first document belonging to a specified file format from all the documents;
judging whether a temporary file exists in the first document;
if the temporary file exists in the first document, the temporary file is removed from the first document to obtain a second document;
acquiring file names of all the second files, and screening out third files containing specified file names from all the second files based on the file names;
and taking the third document as the requirement document, and calling a preset tool to obtain the requirement document.
The present application further provides a data detection apparatus, including:
the acquisition module is used for acquiring a demand document to be detected;
the first generation module is used for matching the document object in the requirement document based on a preset check list and generating an anchor point container corresponding to the requirement document;
the first processing module is used for carrying out region division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
the second processing module is used for acquiring a specified document block, calling a specified detection algorithm corresponding to the block type of the specified document block to automatically detect the specified document block and generating a corresponding specified detection result; the specified document block is any one block in all the document blocks;
and the second generation module is used for acquiring detection results corresponding to the document blocks respectively and generating a detection result report corresponding to the required document based on all the detection results.
The present application further provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above method when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method.
The data detection method, the data detection device, the computer equipment and the storage medium have the following beneficial effects:
different from the existing processing mode of manually checking a demand document, the data detection method, the data detection device, the computer equipment and the storage medium provided by the application can generate an anchor point container corresponding to the demand document based on a preset check list after the demand document to be detected is acquired, then perform document division processing on a document object in the demand document based on the anchor point container to obtain a plurality of corresponding document blocks, subsequently call detection algorithms respectively corresponding to each document block to intelligently perform automatic detection processing on each document block, generate a detection result of each document block, and finally generate a detection result report corresponding to the demand document based on the detection result to realize automatic detection on the demand document. The whole data detection process for the required document is carried out automatically, and manual participation is not needed, so that a large amount of human resources and time can be saved, the detection cost of the required document is effectively reduced, the generation efficiency of the detection result report of the required document is improved, and the accuracy of the generated detection result report is ensured.
Drawings
FIG. 1 is a schematic flow chart of a data detection method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a data detection apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Referring to fig. 1, a data detection method according to an embodiment of the present application includes:
s1: acquiring a demand document to be detected;
s2: matching the document objects in the requirement document based on a preset check list, and generating an anchor point container corresponding to the requirement document;
s3: performing area division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
s4: acquiring a specified document block, and calling a specified detection algorithm corresponding to the block type of the specified document block to perform automatic detection processing on the specified document block to generate a corresponding specified detection result; the specified document block is any one block in all the document blocks;
s5: and acquiring detection results corresponding to the document blocks respectively, and generating a detection result report corresponding to the required document based on all the detection results.
As described in the above steps S1 to S5, the main implementation of the present method embodiment is a data detection apparatus. In practical applications, the data detection apparatus may be implemented by a virtual apparatus, such as a software code, or by an entity apparatus written or integrated with a relevant execution code, and may perform human-computer interaction with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device. The data detection device in the embodiment can automatically perform the data detection process on the requirement document, thereby saving a large amount of human resources and time, effectively reducing the detection cost of the requirement document, and improving the generation efficiency and accuracy of the detection result report of the requirement document. Specifically, a demand document to be detected is obtained first. The requirement document refers to a document with detection requirements, and the requirement document can be screened from all candidate documents based on a preset file format and a preset file name. And then, carrying out matching processing on the document objects in the requirement document based on a preset check list, and generating an anchor point container corresponding to the requirement document. The specific generation process of the anchor point container may include: firstly, respectively matching the object content of each line in the required document with all anchor point information contained in the check list to obtain a plurality of corresponding matching results; then, selecting a designated matching result which is successfully matched from all the matching results, selecting designated anchor point information corresponding to the designated matching result from all the anchor point information, and selecting designated object content corresponding to the designated matching result from all the document objects; generating designated anchor point address information corresponding to each of the designated anchor point information based on address information of the designated object content in the requirement document; and finally, storing the appointed anchor point address information and the appointed anchor point information in a pre-created original anchor point container in a one-to-one correspondence manner, and further generating the anchor point container.
And then carrying out region division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks. The requirement document is manually written and generated based on a preset requirement document template, so that the content information in the requirement document is relatively structured, the whole requirement document can be divided into a plurality of document blocks, and no repeated block exists among the document blocks. The block types of the document block may specifically include a table type block, a paragraph type block, and a picture type block, where the inside of the table type block is table type data, the inside of the paragraph type block is text type data, and the inside of the picture type block is picture type data. In addition, all anchor information stored in the anchor container can be acquired first, the requirement document is inquired based on the anchor address information, the target line number information which is the same as each anchor address information is searched from the requirement document, and then the area division processing is carried out on the requirement document by taking every two adjacent target line number information as division endpoints so as to obtain a plurality of corresponding document blocks. And after the document blocks are obtained, acquiring the appointed document blocks, calling an appointed detection algorithm corresponding to the block types of the appointed document blocks to carry out automatic detection processing on the appointed document blocks, and generating corresponding appointed detection results. The specified document block is any one of all the document blocks. In addition, corresponding detection algorithms are preset for different types of document blocks, and the specified detection algorithm corresponding to the type of the specified document block is called to automatically detect the specified document block, so that the specified detection result corresponding to the specified document block can be accurately generated. And finally, obtaining detection results respectively corresponding to the document blocks, and generating a detection result report corresponding to the required document based on all the detection results. The report template may be preset, and the template type of the result template is not particularly limited, and may be, for example, Excel. And the output structure of the report template at least comprises a detection block and a detection result. In addition, after the detection results are obtained, the block information corresponding to each detection result can be simultaneously obtained, and all the block information and all the detection results are filled into corresponding positions in the report template to generate a detection result report corresponding to the requirement document, so that the document detection result of the requirement document can be clearly and clearly obtained by referring to the detection result report.
Different from the existing processing method for manually checking a demand document, in this embodiment, after a demand document to be detected is acquired, an anchor container corresponding to the demand document is generated based on a preset check list, then document partitioning processing is performed on a document object in the demand document based on the anchor container to obtain a plurality of corresponding document blocks, then a detection algorithm corresponding to each document block is invoked to intelligently perform automatic detection processing on each document block, a detection result of each document block is generated, and finally a detection result report corresponding to the demand document is generated based on the detection result, so that automatic detection on the demand document is realized. The whole data detection process for the required document is carried out automatically, and manual participation is not needed, so that a large amount of human resources and time can be saved, the detection cost of the required document is effectively reduced, the generation efficiency of the detection result report of the required document is improved, and the accuracy of the generated detection result report is ensured.
Further, in an embodiment of the present application, the step S2 includes:
s200: traversing the document objects of the required document line by line to obtain the object content of each line;
s201: matching each object content with all anchor point information contained in the inspection list to obtain a plurality of corresponding matching results; wherein, the result content of the matching result is matching success or matching failure;
s202: screening out result contents from all the matching results as specified matching results which are successfully matched, and acquiring specified object contents and specified anchor point information which respectively correspond to each specified matching result;
s203: acquiring address information of the designated object content in the requirement document, and generating designated anchor point address information corresponding to the designated anchor point information based on the address information;
s204: creating an original anchor container;
s205: and correspondingly storing the appointed anchor point address information and the appointed anchor point information in the original anchor point container, and generating the anchor point container corresponding to the requirement document.
As described in the foregoing steps S200 to S205, the step of performing matching processing on the document object in the requirement document based on the preset check list to generate the anchor point container corresponding to the requirement document may specifically include: firstly, traversing the document objects of the requirement document line by line to obtain the object content of each line. Then, matching each object content with all anchor point information contained in the inspection list to obtain a plurality of corresponding matching results; and the result content of the matching result is matching success or matching failure. The inspection list comprises a plurality of anchor point information which is placed in advance, the anchor point information can also be called as a standard keyword, the anchor point information is specifically a standard keyword used for partitioning a preset requirement document template, and the specific content of the standard keyword can be set according to actual requirements, such as detection requirements of a requirement document. For example, the standard keywords may include requirements background, requirements scope, business flow diagram, interaction design, architecture, and security review scheme results, and so on. And then screening out result contents from all the matching results to be appointed matching results which are successfully matched, and acquiring appointed object contents and appointed anchor point information which respectively correspond to all the appointed matching results. And subsequently acquiring the address information of the specified object content in the requirement document, and generating specified anchor point address information corresponding to the specified anchor point information based on the address information. Wherein, the address information of the designated anchor point and the address information belong to the same information. The document object of the requirement document can be traversed line by line in a p ═ doc. In addition, by finding out all the specified anchor information which is the same as the document object in the requirement information from all the anchor information contained in the check list, the document block which needs to be subjected to data detection can be screened out from the requirement document based on all the specified anchor information. And finally, creating an original anchor container, correspondingly storing the address information of the designated anchor and the information of the designated anchor in the original anchor container, and generating the anchor container corresponding to the requirement document. In the process of traversing the document object of the requirement document, all the appointed anchor point information which appears in the checking list and has a matching relation (same) with the document object in the requirement document is stored in a pre-created original anchor point container, and each address information of the target object content corresponding to the matching relation in the requirement document is simultaneously stored in the original anchor point container as the appointed anchor point address information respectively corresponding to each appointed anchor point information, so as to generate the required anchor point container. In the embodiment, the anchor point container corresponding to the requirement document is generated based on the preset inspection list, so that the anchor point container can be subsequently used for performing area division processing on the requirement document to obtain the corresponding document blocks, and then automatic detection processing is performed on each document block according to the inspection algorithm corresponding to each document block to generate the corresponding detection result, which is beneficial to quickly and accurately generating the detection result report corresponding to the requirement document based on the detection result so as to complete quality detection on the requirement document.
Further, in an embodiment of the present application, the step S3 includes:
s300: acquiring all anchor address information stored in the anchor container;
s301: inquiring the requirement document based on the anchor address information, and searching target row number information which is respectively the same as each anchor address information from the requirement document;
s302: taking every two adjacent target line number information as dividing end points, and carrying out region division processing on the required document to obtain a plurality of corresponding document content regions;
s304: and taking the document content area as the document block.
As described in the foregoing steps S300 to S303, the step of performing area division processing on the required document based on the anchor container to obtain a plurality of corresponding document blocks may specifically include: firstly, all anchor address information stored in the anchor container is obtained. The anchor address information and all specific anchor information which is stored in the anchor container and is the same as the document object of the requirement document have a corresponding relation, and specifically refer to address information or line number information of the specific specified anchor information in the requirement document. And then, inquiring the requirement document based on the anchor address information, and finding out the target line number information which is respectively the same as the anchor address information from the requirement document. And then, taking every two adjacent target line number information as dividing end points, and carrying out region division processing on the required document to obtain a plurality of corresponding document content regions. The method comprises the steps that a block can be constructed based on any two target line number information in a requirement document, the block corresponds to a detection block with processing significance in the requirement document, and description information content contained in the detection block is a significant content part which needs to be subjected to data detection and verification in the requirement document. And finally, taking the document content area as the document block. In the embodiment, the anchor point container is used for dividing the regions in the required document to obtain the corresponding document blocks, so that the automatic detection processing of the document blocks according to the detection algorithm corresponding to the document blocks is facilitated to generate the corresponding detection result, and the detection result report corresponding to the required document can be quickly and accurately generated based on the detection result, so that the quality detection of the required document is completed.
Further, in an embodiment of the present application, the step S4 includes:
s400: acquiring target anchor point information corresponding to the specified document block from the anchor point container;
s401: judging whether the block type of the specified document block belongs to the table type block or not based on the target anchor point information;
s402: if the block type of the specified document block belongs to the tabular block, obtaining a specified tabular object in the tabular block;
s403: traversing the specified table object to acquire first data corresponding to a first cell of a target row; wherein, the target behavior is any one of all the rows contained in the specified table object;
s404: judging whether the title of the first data contains first target content;
s405: if the title of the first data contains the first target content, acquiring second data corresponding to a second cell of the target row;
s406: judging whether the second data is a first preset value or not;
s407: if the second data is not the first preset value, judging whether the second data contains a first target field;
s408: if the second data does not contain the first target field, judging whether the second data contains second target content;
s409: and if the second data contains the second target content, judging that the table type block passes the check, and generating a first detection result of the check.
As described in the foregoing steps S400 to S409, the step of invoking a designated detection algorithm corresponding to the block type of the designated document block to perform an automatic detection process on the designated document block and generate a corresponding designated detection result may specifically include: first, target anchor information corresponding to the specified document block is acquired from the anchor container. Then, based on the target anchor point information, whether the block type of the specified document block belongs to the table type block is judged. The target anchor point information corresponds to a keyword for dividing the upper and lower boundaries of the specified document block. The method is characterized in that a keyword-block type mapping table is preset, and the keyword-block type mapping table can be inquired according to the target anchor point information so as to determine the block type corresponding to the document block. For example, the following may be stored in the keyword-block type mapping table: the requirement background and the requirement range correspond to the paragraph type blocks; the business flow chart and the interactive design correspond to the picture type block; the interactive design and architecture and the security review scheme results correspond to the tabular blocks. In addition, the judgment sequence of the block types of the designated document blocks is not particularly limited, and can be set according to actual requirements. Specifically, in addition to the determination of whether the block type of the specified document block is a table type block and then the determination of whether the block type is a paragraph type block or a picture type block, the determination of whether the block type of the specified document block is a table type block, then the determination of whether the block type is a paragraph type block, and then the determination of whether the block type is a picture type block are provided in the embodiment of the present application. Or the judgment sequence may be a judgment sequence of judging whether the block type of the specified document block is a table type block, judging whether the block type is a picture type block, and subsequently judging whether the block type is a paragraph type block. If the block type of the specified document block belongs to the table type block, obtaining a specified table object in the table type block. And then, traversing the specified table object to acquire first data corresponding to the first cell of the target row. Wherein the target line is any one of all lines included in the specified table object. And determining whether the title of the first data contains the first target content. The specific content included in the first target content is not limited, and may be set according to actual requirements, for example, the first target content may be a UI modification or a US list. If the title of the first data does not include the first target content, a detection result indicating that the verification fails is directly generated. And if the title of the first data contains the first target content, acquiring second data corresponding to a second cell of the target row, and judging whether the second data is a first preset value. The specific content included in the first preset value is not limited, and may be set according to actual requirements, for example, the first preset value may be a null value or blank. In addition, if the second data is the first preset value, a detection result that the verification fails is directly generated. And if the second data is not the first preset value, judging whether the second data contains a first target field. The specific content of the first target field is not limited, and may be set according to actual requirements, for example, the first target field may be a "TODO" field. And if the second data contains the first target field, directly generating a detection result of failed verification. If the second data does not contain the first target field, determining whether the second data contains second target content. The specific content included in the second target content is not limited, and may be set according to actual needs, for example, the second target content may be an "US" start or a "no/no-refer" field. And if the second data comprises the second target content, judging that the table type block passes the check, and generating a first detection result of the check. In addition, if the second data does not include the second target content, a detection result that the verification fails is directly generated. When the block type of the specified document block is judged to belong to the tabular block, the specified document block can be intelligently and accurately verified based on the detection algorithm corresponding to the tabular block, and a corresponding detection result is generated, so that a detection result report corresponding to the required document can be rapidly and accurately generated based on the detection result and the detection results of the rest document blocks in the required document, and further the quality detection of the required document is completed.
Further, in an embodiment of the present application, after the step S401, the method includes:
s4010: if the block type of the specified document block does not belong to the table type block, judging whether the block type of the specified document block belongs to the paragraph type block or not based on the target anchor point information;
s4011: if the block type of the specified document block belongs to the paragraph type block, splicing each line of texts in the paragraph type block to obtain a processed target paragraph object;
s4012: judging whether the target paragraph object is a second preset value or not;
s4013: if the target paragraph object is not the second preset value, judging whether the target paragraph object contains a second target field;
s4014: if the target paragraph object does not contain the second target field, determining whether the target paragraph object contains a third target field;
s4015: and if the target paragraph object comprises the third target field, judging that the paragraph type block passes the check, and generating a second detection result of the check.
As described in steps S4010 to S4015, the block type of the specified document block may belong to a paragraph type block in addition to a table type block, and the detection process for the paragraph type block is different from the detection process for the table type block. Specifically, after the step of determining whether the block type of the specified document block belongs to the table type block based on the target anchor information, the method may further include: if the block type of the specified document block does not belong to the table type block, judging whether the block type of the specified document block belongs to a paragraph type block or not based on the target anchor information. And if the block type of the specified document block belongs to the paragraph type block, splicing each line of text in the paragraph type block to obtain a processed target paragraph object. And then judging whether the target paragraph object is a second preset value. The specific content included in the second preset value is not limited, and may be set according to actual requirements, for example, the second preset value may be a null value or blank. If the target paragraph object is not in the second preset value, determining whether the target paragraph object contains a second target field. The specific content of the second target field is not limited, and may be set according to actual requirements, for example, the second target field may be a "TODO" field. In addition, if the target paragraph object does not include the second target field, a detection result that the verification fails is directly generated. If the target paragraph object does not contain the second target field, then it is determined whether the target paragraph object contains a third target field. The specific content of the second target field is not limited, and may be set according to actual requirements, for example, the second target field may be a "none/nothing" field. If the target paragraph object does not include the third target field, a detection result indicating that the verification fails is directly generated. If the target paragraph object contains the third target field, the paragraph type block is determined to pass the check, and a second detection result of the pass is generated. When the block type of the specified document block is judged to belong to the paragraph type block, the specified document block can be intelligently and accurately verified based on the detection algorithm corresponding to the paragraph type block, and a corresponding detection result is generated, so that a detection result report corresponding to the required document can be quickly and accurately generated based on the detection result and the detection results of the rest document blocks in the required document, and the quality detection of the required document can be further completed.
Further, in an embodiment of the present application, after the step S401, the method includes:
s4020: if the block type of the specified document block does not belong to the table type block, judging whether the block type of the specified document block belongs to the picture type block or not based on the target anchor point information;
s4021: if the block type of the specified document block belongs to the picture type block, acquiring the number of pictures in each row of pictures in the picture type block;
s4022: filling the number of the pictures into a preset container to obtain a target picture container;
s4023: calculating the total number of pictures of the target picture container, and judging whether the total number of the pictures is smaller than a preset number;
s4024: and if the total number of the pictures in the target picture container is not less than the preset number, judging that the picture type blocks pass the verification, and generating a third detection result of the passing verification.
As described in steps S4020 to S4024, the block type of the designated document block may be a pictorial block in addition to the tabular block or the paragraph block, and the detection process for the pictorial block is different from the detection process for the tabular block or the paragraph block. Specifically, after the step of determining whether the block type of the specified document block belongs to the table type block based on the target anchor information, the method may further include: if the block type of the specified document block does not belong to the table type block, determining whether the block type of the specified document block belongs to the picture type block based on the target anchor information. And if the block type of the specified document block belongs to the picture type block, acquiring the number of pictures of each row of pictures in the picture type block. And then filling the number of the pictures into a preset container to obtain a target picture container. And then calculating the total number of the pictures of the target picture container, and judging whether the total number of the pictures is smaller than a preset number. The preset number is not particularly limited, and may be set according to an actual detection requirement, for example, may be set to 1. And if the total number of the pictures in the target picture container is not less than the preset number, judging that the picture type blocks pass the verification, and generating a third detection result of passing the verification. In addition, if the total number of the pictures in the target picture container is smaller than the preset number, a detection result that the verification fails is directly generated. When the block type of the specified document block is judged to belong to the picture type block, the specified document block can be intelligently and accurately verified based on the detection algorithm corresponding to the picture type block, and a corresponding detection result is generated, so that a detection result report corresponding to the required document can be rapidly and accurately generated based on the detection result and detection results of other document blocks in the required document, and further the quality detection of the required document is completed.
Further, in an embodiment of the present application, the step S1 includes:
s100: acquiring a folder path to be detected, and judging whether the folder path is empty or not;
s101: if the folder path is not empty, traversing all the documents corresponding to the folder path, and screening out a first document belonging to a specified file format from all the documents;
s102: judging whether a temporary file exists in the first document;
s103: if the temporary file exists in the first document, the temporary file is removed from the first document to obtain a second document;
s104: acquiring file names of all the second files, and screening out third files containing specified file names from all the second files based on the file names;
s105: and taking the third document as the requirement document, and calling a preset tool to obtain the requirement document.
As described in the foregoing steps S100 to S105, the step of acquiring the requirement document to be detected may specifically include: firstly, a folder path to be detected is obtained, and whether the folder path is empty or not is judged. And if the folder path is not empty, traversing all the documents corresponding to the folder path, and screening out the first document belonging to the specified file format from all the documents. The specified file format is not particularly limited, and may be set according to actual requirements, for example, the specified file format may be docx. Then, whether a temporary file exists in the first document is judged. The format of the temporary file is not limited, and for example, the temporary file is a file containing a "-" character. And if the temporary file exists in the first document, removing the temporary file from the first document to obtain a second document. And then acquiring the file names of all the second files, and screening out third files containing specified file names from all the second files based on the file names. The specified file name is not particularly limited, and may be set according to actual requirements, for example, the specified file name may be a requirement document. And finally, taking the third document as the requirement document, and calling a preset tool to obtain the requirement document. The preset tool can be a word app, the word app can be called based on a win32 corner client, and the demand document can be acquired and started by the word app, so that data such as a corresponding table object, a text object and a picture object can be acquired from the demand document after the demand document is started. According to the method and the device, the requirement documents needing to be subjected to data detection processing are screened out from all the documents, so that only the requirement documents meeting the detection requirements are subjected to detection processing subsequently, and all the documents are not subjected to detection processing, the accuracy and the efficiency of data detection processing are improved, the generation of useless power consumption is reduced, and the generation speed of generated detection results is improved. In addition, after the requirement document is obtained, the following matching processing of the document object in the requirement document based on a preset check list is facilitated, so that the anchor point container corresponding to the requirement document can be quickly generated.
The data detection method in the embodiment of the present application may also be applied to the field of block chains, for example, data such as the detection result may be stored in the block chain. By storing and managing the detection result by using the block chain, the security and the non-tamper property of the detection result can be effectively ensured.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
Referring to fig. 2, an embodiment of the present application further provides a data detection apparatus, including:
the acquisition module 1 is used for acquiring a demand document to be detected;
the first generation module 2 is used for matching the document objects in the requirement document based on a preset check list and generating an anchor point container corresponding to the requirement document;
the first processing module 3 is configured to perform area division processing on the required document based on the anchor container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
the second processing module 4 is used for acquiring a specified document block, calling a specified detection algorithm corresponding to the block type of the specified document block to automatically detect the specified document block, and generating a corresponding specified detection result; the specified document block is any one block in all the document blocks;
and a second generating module 5, configured to obtain detection results corresponding to the document blocks, and generate a detection result report corresponding to the required document based on all the detection results.
In this embodiment, the implementation processes of the functions and actions of the obtaining module 1, the first generating module 2, the first processing module 3, the second processing module 4, and the second generating module 5 in the data detection apparatus are specifically described in the implementation processes corresponding to steps S1 to S5 in the data detection method, and are not described herein again.
Further, in an embodiment of the present application, the first generating module 2 includes:
the first acquisition unit is used for traversing the document objects of the required document line by line and acquiring the object content of each line;
the matching unit is used for respectively matching each object content with all anchor point information contained in the check list to obtain a plurality of corresponding matching results; wherein, the result content of the matching result is matching success or matching failure;
the screening unit is used for screening out result contents from all the matching results as specified matching results which are successfully matched, and acquiring specified object contents and specified anchor point information which respectively correspond to each specified matching result;
the second acquisition unit is used for acquiring the address information of the specified object content in the requirement document and generating specified anchor point address information corresponding to the specified anchor point information based on the address information;
a creating unit for creating an original anchor container;
and the storage unit is used for correspondingly storing the designated anchor point address information and the designated anchor point information in the original anchor point container and generating the anchor point container corresponding to the requirement document.
In this embodiment, the implementation processes of the functions and functions of the first obtaining unit, the matching unit, the screening unit, the second obtaining unit, the creating unit and the storing unit in the data detection apparatus are specifically described in the implementation processes corresponding to steps S200 to S205 in the data detection method, and are not described herein again.
Further, in an embodiment of the present application, the first processing module 3 includes:
the third acquisition unit is used for acquiring all anchor point address information stored in the anchor point container;
the searching unit is used for inquiring the requirement document based on the anchor point address information and searching target row number information which is respectively the same as each anchor point address information from the requirement document;
the dividing unit is used for taking every two adjacent target line number information as dividing endpoints, and performing area division processing on the required document to obtain a plurality of corresponding document content areas;
a first determining unit, configured to take the document content area as the document block.
In this embodiment, the implementation processes of the functions and actions of the third obtaining unit, the searching unit, the dividing unit and the first determining unit in the data detection apparatus are specifically described in the implementation processes corresponding to steps S300 to S303 in the data detection method, and are not described herein again.
Further, in an embodiment of the present application, the second processing module 4 includes:
a fourth obtaining unit, configured to obtain target anchor information corresponding to the specified document block from the anchor container;
a first judgment unit, configured to judge whether a block type of the specified document block belongs to the table type block based on the target anchor information;
a fifth obtaining unit, configured to obtain a specified table object in the table type block if the block type of the specified document block belongs to the table type block;
a sixth obtaining unit, configured to perform traversal processing on the specified table object, and obtain first data corresponding to a first cell of a target row; wherein, the target behavior is any one of all the rows contained in the specified table object;
a second judging unit configured to judge whether or not a title of the first data includes a first target content;
a seventh obtaining unit, configured to obtain second data corresponding to a second cell of the target row if the title of the first data includes the first target content;
the third judging unit is used for judging whether the second data is a first preset value or not;
a fourth determining unit, configured to determine whether the second data includes a first target field if the second data is not the first preset value;
a fifth determining unit, configured to determine whether the second data includes second target content if the second data does not include the first target field;
and the first generating unit is used for judging that the table type block passes the check if the second data contains the second target content and generating a first detection result of the check.
In this embodiment, the implementation processes of the functions and actions of the fourth obtaining unit, the first determining unit, the fifth obtaining unit, the sixth obtaining unit, the seventh obtaining unit, the third determining unit, the fourth determining unit, the fifth determining unit and the first generating unit in the data detection apparatus are specifically described in the implementation processes corresponding to steps S400 to S409 in the data detection method, and are not described herein again.
Further, in an embodiment of the present application, the second processing module 4 includes:
a fifth judging unit, configured to judge, if the block type of the specified document block does not belong to the table type block, whether the block type of the specified document block belongs to the paragraph type block based on the target anchor information;
the processing unit is used for splicing each line of texts in the paragraph type block to obtain a processed target paragraph object if the block type of the specified document block belongs to the paragraph type block;
a sixth judging unit, configured to judge whether the target paragraph object is a second preset value;
a seventh determining unit, configured to determine whether the target paragraph object includes a second target field if the target paragraph object is not the second preset value;
an eighth determining unit, configured to determine whether the target paragraph object includes a third target field if the target paragraph object does not include the second target field;
and if the target paragraph object contains the third target field, determining that the paragraph type block passes verification and generating a second detection result of passing verification.
In this embodiment, the implementation processes of the functions and functions of the fifth determining unit, the processing unit, the sixth determining unit, the seventh determining unit, the eighth determining unit, and the second generating unit in the data detecting apparatus are specifically described in the implementation processes corresponding to steps S4010 to S4015 in the data detecting method, and are not described herein again.
Further, in an embodiment of the present application, the second processing module 4 includes:
a ninth judging unit, configured to judge whether the block type of the specified document block belongs to the picture-type block based on the target anchor information if the block type of the specified document block does not belong to the table-type block;
an eighth obtaining unit, configured to obtain, if the block type of the specified document block belongs to the picture type block, a number of pictures in each row of pictures in the picture type block;
the filling unit is used for filling the number of the pictures into a preset container to obtain a target picture container;
a tenth judging unit, configured to calculate a total number of pictures of the target picture container, and judge whether the total number of pictures is smaller than a preset number;
and the third generating unit is used for judging that the picture type blocks pass the verification if the total number of the pictures in the target picture container is not less than the preset number, and generating a third detection result of the passing verification.
In this embodiment, the implementation processes of the functions and actions of the ninth determining unit, the eighth obtaining unit, the filling unit, the tenth determining unit and the third generating unit in the data detecting apparatus are specifically described in the implementation processes corresponding to steps S4020 to S4024 in the data detecting method, and are not described herein again.
Further, in an embodiment of the present application, the obtaining module 1 includes:
a ninth acquiring unit, configured to acquire a folder path to be detected, and determine whether the folder path is empty;
the screening unit is used for traversing all the documents corresponding to the folder path and screening out a first document belonging to a specified file format from all the documents if the folder path is not empty;
an eleventh judging unit operable to judge whether or not a temporary file exists in the first document;
the removing unit is used for removing the temporary file from the first document to obtain a second document if the temporary file exists in the first document;
a tenth obtaining unit, configured to obtain file names of all the second documents, and screen out, from all the second documents, a third document that includes a specified file name based on the file names;
and the second determining unit is used for taking the third document as the requirement document and calling a preset tool to acquire the requirement document.
In this embodiment, the implementation processes of the functions and functions of the ninth obtaining unit, the screening unit, the eleventh judging unit, the rejecting unit, the tenth obtaining unit and the second determining unit in the data detecting device are specifically described in the implementation processes corresponding to steps S100 to S105 in the data detecting method, and are not described herein again.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device comprises a processor, a memory, a network interface, a display screen, an input device and a database which are connected through a system bus. Wherein the processor of the computer device is designed to provide computing and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and computer programs in the storage medium to run. The database of the computer device is used for storing a requirement document, a check list, an anchor container, a document block, a specified detection algorithm, a specified detection result and a detection result report. The network interface of the computer device is used for communicating with an external terminal through a network connection. The display screen of the computer equipment is an indispensable image-text output equipment in the computer, and is used for converting digital signals into optical signals so that characters and figures are displayed on the screen of the display screen. The input device of the computer equipment is the main device for information exchange between the computer and the user or other equipment, and is used for transmitting data, instructions, some mark information and the like to the computer. The computer program is executed by a processor to implement a data detection method.
The processor executes the steps of the data detection method:
acquiring a demand document to be detected;
matching the document objects in the requirement document based on a preset check list, and generating an anchor point container corresponding to the requirement document;
performing area division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
acquiring a specified document block, and calling a specified detection algorithm corresponding to the block type of the specified document block to perform automatic detection processing on the specified document block to generate a corresponding specified detection result; the specified document block is any one block in all the document blocks;
and acquiring detection results corresponding to the document blocks respectively, and generating a detection result report corresponding to the required document based on all the detection results.
Those skilled in the art will appreciate that the structure shown in fig. 3 is only a block diagram of a part of the structure related to the present application, and does not constitute a limitation to the apparatus and the computer device to which the present application is applied.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a data detection method, and specifically:
acquiring a demand document to be detected;
matching the document objects in the requirement document based on a preset check list, and generating an anchor point container corresponding to the requirement document;
performing area division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
acquiring a specified document block, and calling a specified detection algorithm corresponding to the block type of the specified document block to perform automatic detection processing on the specified document block to generate a corresponding specified detection result; the specified document block is any one block in all the document blocks;
and acquiring detection results corresponding to the document blocks respectively, and generating a detection result report corresponding to the required document based on all the detection results.
In summary, different from the existing processing manner of manually checking a requirement document, the data detection method, apparatus, computer device and storage medium provided in this embodiment of the present application, after a requirement document to be detected is obtained, generate an anchor point container corresponding to the requirement document based on a preset check list, then perform document partitioning processing on a document object in the requirement document based on the anchor point container to obtain a plurality of corresponding document blocks, subsequently invoke detection algorithms respectively corresponding to each document block to perform automatic detection processing on each document block intelligently, generate a detection result of each document block, and finally generate a detection result report corresponding to the requirement document based on the detection result, so as to implement automatic detection on the requirement document. The whole data detection process for the required document is carried out automatically, and manual participation is not needed, so that a large amount of human resources and time can be saved, the detection cost of the required document is effectively reduced, the generation efficiency of the detection result report of the required document is improved, and the accuracy of the generated detection result report is ensured.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.
Claims (10)
1. A method for data detection, comprising:
acquiring a demand document to be detected;
matching the document objects in the requirement document based on a preset check list, and generating an anchor point container corresponding to the requirement document;
performing area division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
acquiring a specified document block, and calling a specified detection algorithm corresponding to the block type of the specified document block to perform automatic detection processing on the specified document block to generate a corresponding specified detection result; the specified document block is any one block in all the document blocks;
and acquiring detection results corresponding to the document blocks respectively, and generating a detection result report corresponding to the required document based on all the detection results.
2. The data detection method according to claim 1, wherein the step of matching the document object in the requirement document based on a preset check list to generate an anchor container corresponding to the requirement document comprises:
traversing the document objects of the required document line by line to obtain the object content of each line;
matching each object content with all anchor point information contained in the inspection list to obtain a plurality of corresponding matching results; wherein, the result content of the matching result is matching success or matching failure;
screening out result contents from all the matching results as specified matching results which are successfully matched, and acquiring specified object contents and specified anchor point information which respectively correspond to each specified matching result; acquiring address information of the designated object content in the requirement document, and generating designated anchor point address information corresponding to the designated anchor point information based on the address information;
creating an original anchor container;
and correspondingly storing the appointed anchor point address information and the appointed anchor point information in the original anchor point container, and generating the anchor point container corresponding to the requirement document.
3. The data detection method of claim 2, wherein the step of performing region division processing on the required document based on the anchor container to obtain a plurality of corresponding document blocks comprises:
acquiring all anchor address information stored in the anchor container;
inquiring the requirement document based on the anchor address information, and searching target row number information which is respectively the same as each anchor address information from the requirement document;
taking every two adjacent target line number information as dividing end points, and carrying out region division processing on the required document to obtain a plurality of corresponding document content regions;
and taking the document content area as the document block.
4. The data detection method of claim 1, wherein the step of invoking a specific detection algorithm corresponding to the block type of the specific document block to perform an automatic detection process on the specific document block and generate a corresponding specific detection result comprises:
acquiring target anchor point information corresponding to the specified document block from the anchor point container;
judging whether the block type of the specified document block belongs to the table type block or not based on the target anchor point information;
if the block type of the specified document block belongs to the tabular block, obtaining a specified tabular object in the tabular block;
traversing the specified table object to acquire first data corresponding to a first cell of a target row; wherein, the target behavior is any one of all the rows contained in the specified table object;
judging whether the title of the first data contains first target content;
if the title of the first data contains the first target content, acquiring second data corresponding to a second cell of the target row;
judging whether the second data is a first preset value or not;
if the second data is not the first preset value, judging whether the second data contains a first target field;
if the second data does not contain the first target field, judging whether the second data contains second target content;
and if the second data contains the second target content, judging that the table type block passes the check, and generating a first detection result of the check.
5. The data detection method of claim 4, wherein the step of determining whether the block type of the specified document block belongs to the table type block based on the target anchor information is followed by:
if the block type of the specified document block does not belong to the table type block, judging whether the block type of the specified document block belongs to the paragraph type block or not based on the target anchor point information;
if the block type of the specified document block belongs to the paragraph type block, splicing each line of texts in the paragraph type block to obtain a processed target paragraph object;
judging whether the target paragraph object is a second preset value or not;
if the target paragraph object is not the second preset value, judging whether the target paragraph object contains a second target field;
if the target paragraph object does not contain the second target field, determining whether the target paragraph object contains a third target field;
and if the target paragraph object comprises the third target field, judging that the paragraph type block passes the check, and generating a second detection result of the check.
6. The data detection method of claim 4, wherein the step of determining whether the block type of the specified document block belongs to the table type block based on the target anchor information is followed by:
if the block type of the specified document block does not belong to the table type block, judging whether the block type of the specified document block belongs to the picture type block or not based on the target anchor point information;
if the block type of the specified document block belongs to the picture type block, acquiring the number of pictures in each row of pictures in the picture type block;
filling the number of the pictures into a preset container to obtain a target picture container;
calculating the total number of pictures of the target picture container, and judging whether the total number of the pictures is smaller than a preset number;
and if the total number of the pictures in the target picture container is not less than the preset number, judging that the picture type blocks pass the verification, and generating a third detection result of the passing verification.
7. The data detection method according to claim 1, wherein the step of acquiring the requirement document to be detected comprises:
acquiring a folder path to be detected, and judging whether the folder path is empty or not;
if the folder path is not empty, traversing all the documents corresponding to the folder path, and screening out a first document belonging to a specified file format from all the documents;
judging whether a temporary file exists in the first document;
if the temporary file exists in the first document, the temporary file is removed from the first document to obtain a second document;
acquiring file names of all the second files, and screening out third files containing specified file names from all the second files based on the file names;
and taking the third document as the requirement document, and calling a preset tool to obtain the requirement document.
8. A data detection apparatus, comprising:
the acquisition module is used for acquiring a demand document to be detected;
the first generation module is used for matching the document object in the requirement document based on a preset check list and generating an anchor point container corresponding to the requirement document;
the first processing module is used for carrying out region division processing on the required document based on the anchor point container to obtain a plurality of corresponding document blocks; the block types of the document blocks comprise table type blocks with table type data inside the blocks, paragraph type blocks with text type data inside the blocks and picture type blocks with picture type data inside the blocks;
the second processing module is used for acquiring a specified document block, calling a specified detection algorithm corresponding to the block type of the specified document block to automatically detect the specified document block and generating a corresponding specified detection result; the specified document block is any one block in all the document blocks;
and the second generation module is used for acquiring detection results corresponding to the document blocks respectively and generating a detection result report corresponding to the required document based on all the detection results.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110462398.0A CN113094508A (en) | 2021-04-27 | 2021-04-27 | Data detection method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110462398.0A CN113094508A (en) | 2021-04-27 | 2021-04-27 | Data detection method and device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113094508A true CN113094508A (en) | 2021-07-09 |
Family
ID=76680472
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110462398.0A Pending CN113094508A (en) | 2021-04-27 | 2021-04-27 | Data detection method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113094508A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162425A1 (en) * | 2006-12-28 | 2008-07-03 | International Business Machines Corporation | Global anchor text processing |
CN109086756A (en) * | 2018-06-15 | 2018-12-25 | 众安信息技术服务有限公司 | A kind of text detection analysis method, device and equipment based on deep neural network |
CN110134929A (en) * | 2019-04-17 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Document verification method, apparatus, computer equipment and storage medium |
CN111382184A (en) * | 2020-05-25 | 2020-07-07 | 浙江明度智控科技有限公司 | Method for verifying drug document and drug document verification system |
CN112579727A (en) * | 2020-12-16 | 2021-03-30 | 北京百度网讯科技有限公司 | Document content extraction method and device, electronic equipment and storage medium |
-
2021
- 2021-04-27 CN CN202110462398.0A patent/CN113094508A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162425A1 (en) * | 2006-12-28 | 2008-07-03 | International Business Machines Corporation | Global anchor text processing |
CN109086756A (en) * | 2018-06-15 | 2018-12-25 | 众安信息技术服务有限公司 | A kind of text detection analysis method, device and equipment based on deep neural network |
WO2019238063A1 (en) * | 2018-06-15 | 2019-12-19 | 众安信息技术服务有限公司 | Text detection and analysis method and apparatus, and device |
CN110134929A (en) * | 2019-04-17 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Document verification method, apparatus, computer equipment and storage medium |
CN111382184A (en) * | 2020-05-25 | 2020-07-07 | 浙江明度智控科技有限公司 | Method for verifying drug document and drug document verification system |
CN112579727A (en) * | 2020-12-16 | 2021-03-30 | 北京百度网讯科技有限公司 | Document content extraction method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109359939B (en) | Service data verification method, device, computer equipment and storage medium | |
KR101999409B1 (en) | Formatting data by example | |
CN111737963B (en) | Configuration file based form filling method and device and computer equipment | |
CN112540811B (en) | Cache data detection method and device, computer equipment and storage medium | |
CN113642039B (en) | Configuration method and device of document template, computer equipment and storage medium | |
CN112668041A (en) | Document file generation method and device, computer equipment and storage medium | |
CN111880921A (en) | Job processing method and device based on rule engine and computer equipment | |
CN112328482A (en) | Test method and device based on script template, computer equipment and storage medium | |
CN112667592A (en) | Data storage method and device, computer equipment and storage medium | |
CN114218097A (en) | Test case generation method and device, computer equipment and storage medium | |
CN113742776A (en) | Data verification method and device based on biological recognition technology and computer equipment | |
CN111797605A (en) | Report generation method and device based on report template and computer equipment | |
CN113282514A (en) | Problem data processing method and device, computer equipment and storage medium | |
CN111259793A (en) | Nuclear power station material arrival registration method and device, computer equipment and storage medium | |
CN113435990B (en) | Certificate generation method and device based on rule engine and computer equipment | |
CN112650659B (en) | Buried point setting method and device, computer equipment and storage medium | |
CN113515444B (en) | Test case generation method, device, computer equipment and storage medium | |
CN113656588A (en) | Data code matching method, device, equipment and storage medium based on knowledge graph | |
CN113094508A (en) | Data detection method and device, computer equipment and storage medium | |
CN113077185B (en) | Workload evaluation method, workload evaluation device, computer equipment and storage medium | |
CN116070315A (en) | Method, device, equipment and storage medium for examining building model | |
CN112350868B (en) | Wall opening processing method, device, server, system and readable storage medium | |
CN112632634A (en) | Signature data processing method and device, computer equipment and storage medium | |
CN114138734A (en) | Web-based version management method facing database and file resource | |
CN114398441A (en) | Data export method, data export device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210709 |
|
WD01 | Invention patent application deemed withdrawn after publication |