[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118093527B - Report quality inspection method and device and electronic equipment - Google Patents

Report quality inspection method and device and electronic equipment Download PDF

Info

Publication number
CN118093527B
CN118093527B CN202410495193.6A CN202410495193A CN118093527B CN 118093527 B CN118093527 B CN 118093527B CN 202410495193 A CN202410495193 A CN 202410495193A CN 118093527 B CN118093527 B CN 118093527B
Authority
CN
China
Prior art keywords
quality inspection
text
report
category
language model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410495193.6A
Other languages
Chinese (zh)
Other versions
CN118093527A (en
Inventor
石一磊
颜丙聪
胡敬良
牟立超
侯雨
陈咏虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maide Intelligent Technology Wuxi Co ltd
Original Assignee
Maide Intelligent Technology Wuxi Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maide Intelligent Technology Wuxi Co ltd filed Critical Maide Intelligent Technology Wuxi Co ltd
Priority to CN202410495193.6A priority Critical patent/CN118093527B/en
Publication of CN118093527A publication Critical patent/CN118093527A/en
Application granted granted Critical
Publication of CN118093527B publication Critical patent/CN118093527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a report quality inspection method, a report quality inspection device and electronic equipment, wherein the method comprises the following steps: text extraction is carried out on the report to be processed, and a report text is obtained; word segmentation is carried out on the report text, and a text after word segmentation is obtained; and performing quality inspection on the segmented text by using the pre-training language model to obtain a quality inspection category and a starting position and an ending position of the quality inspection category. In the implementation process of the scheme, the text after word segmentation is obtained by word segmentation of the message text extracted from the report to be processed, and the text after word segmentation is subjected to quality inspection by using a pre-training language model, so that the condition that the quality inspection efficiency is influenced by the artificial factors existing in the quality inspection process of staff is effectively improved, and the quality inspection efficiency of the report is improved.

Description

Report quality inspection method and device and electronic equipment
Technical Field
The application relates to the technical field of artificial intelligence and deep learning, in particular to a report quality inspection method, a report quality inspection device and electronic equipment.
Background
Currently, the way to perform quality inspection on the report is mainly a manual auditing way, for example: the organization may schedule specialized staff to review and revise the report, including but not limited to: ultrasound reporting, X-ray reporting, electrocardiographic reporting, pathology reporting, or annual summary reporting, etc. In a specific implementation process, it is found that, because the inspection report is usually manually written by a person, human factors and subjective judgment exist, errors, such as wrongly written characters, punctuation errors, unit errors, irregular expression, common sense errors and the like, are unavoidable, and human factors exist in the quality inspection process by a worker, so that the quality inspection efficiency of the report is low.
Disclosure of Invention
The embodiment of the application aims to provide a report quality inspection method, a report quality inspection device and electronic equipment, which are used for improving the problem of low report quality inspection efficiency.
The embodiment of the application provides a report quality inspection method, which comprises the following steps: text extraction is carried out on the report to be processed, and a report text is obtained; word segmentation is carried out on the report text, and a text after word segmentation is obtained; and performing quality inspection on the segmented text by using the pre-training language model to obtain a quality inspection category and a starting position and an ending position of the quality inspection category. In the implementation process of the scheme, the text after word segmentation is obtained by word segmentation of the message text extracted from the report to be processed, and the text after word segmentation is subjected to quality inspection by using a pre-training language model, so that the condition that the quality inspection efficiency is influenced by the artificial factors existing in the quality inspection process of staff is effectively improved, and the quality inspection efficiency of the report is improved.
Optionally, in an embodiment of the present application, before using the pre-trained language model to perform quality inspection on the text after word segmentation, the method further includes: acquiring a plurality of sample reports and quality inspection categories, starting positions and ending positions of the sample reports; and training the pre-training language model by taking a plurality of sample reports as training data and taking quality inspection types, starting positions and ending positions of the sample reports as training labels. In the implementation process of the scheme, the pre-training language model is combined with the context of the context information to learn how to analyze potential problems according to the context relation, so that various errors (such as spelling errors, grammar errors, format errors and the like) in the report text can be more accurately identified, and the starting and ending positions of the errors can be accurately marked, thereby improving the detection accuracy of the error types with strong context dependence in the report.
Optionally, in an embodiment of the present application, using a pre-trained language model to perform quality inspection on the text after word segmentation includes: extracting features of the segmented text by using a transducer structure layer in the pre-training language model to obtain text feature vectors; predicting the text feature vector by using a first linear network layer in the pre-training language model to obtain a starting position and a category probability corresponding to the starting position; predicting the text feature vector by using a second linear network layer in the pre-training language model to obtain an end position and class probability corresponding to the end position; and determining the quality inspection category according to the category probability corresponding to the starting position and the category probability corresponding to the ending position. In the implementation process of the scheme, in the process of determining the quality inspection category according to the category probability corresponding to the starting position and the category probability corresponding to the ending position, the linear network layer can generate a probability distribution for each word vector or character vector in the text feature vector, and for error type detection (such as spelling errors and lexical layer errors in grammar errors) with obvious boundary features, the linear network layer can accurately capture the features, so that the positioning accuracy at the starting position and the ending position is improved.
Optionally, in an embodiment of the present application, determining the quality inspection category according to the category probability corresponding to the start position and the category probability corresponding to the end position includes: screening out the maximum probability from the class probability corresponding to the starting position and the class probability corresponding to the ending position; and determining the quality inspection category corresponding to the maximum probability from a plurality of preset quality inspection categories. In the implementation process of the scheme, the quality inspection category corresponding to the maximum probability is determined from a plurality of preset quality inspection categories, so that the quality inspection category with the maximum probability is effectively focused, the model can more accurately identify and report specific errors rather than provide a series of possible error categories, and the quality inspection efficiency of the report is improved.
Optionally, in an embodiment of the present application, after obtaining the quality inspection category and the start position and the end position of the quality inspection category, the method further includes: judging whether the quality inspection category is a preset error category or not; if yes, outputting a quality inspection category between the starting position and the ending position in the report to be processed. In the implementation process of the scheme, by accurately marking the starting and ending positions of errors or problems, more accurate error positioning can be provided, which is helpful for quickly identifying and correcting the problems, and unnecessary examination of the whole report is reduced, so that the quality inspection efficiency of the report is improved.
Optionally, in an embodiment of the present application, after determining whether the quality inspection category is a category in which no error exists, the method further includes: if the quality inspection category is not the preset error category, the to-be-processed report is stored in a database or a file system, and the report in the database or the file system is used for printing or backtracking. In the implementation process of the scheme, under the condition that the quality inspection category is not the preset error category, the to-be-processed report is stored in a database or a file system, and for the high-quality report which is not marked as an error, the to-be-processed report can be accumulated in the database or the file system as standard cases, the reports in the database or the file system are used for subsequent training and calibrating the pre-training language model, even if no error is found in the report in the current stage, the stored reports can be used as historical data for retrospective analysis along with the change of business requirements or the update of the quality inspection standard in the future, so that the working quality and improvement space before the evaluation are helped, and the accuracy of the pre-training language model is further improved.
The embodiment of the application also provides a report quality inspection device, which comprises: the report text obtaining module is used for extracting text from the report to be processed to obtain a report text; the word segmentation text obtaining module is used for segmenting the report text to obtain segmented text; and the word segmentation text quality inspection module is used for inspecting the quality of the segmented text by using the pre-training language model to obtain quality inspection categories and the starting position and the ending position of the quality inspection categories.
Optionally, in an embodiment of the present application, the report quality inspection device further includes: the report data acquisition module is used for acquiring a plurality of sample reports and quality inspection categories, starting positions and ending positions of the sample reports; the language model training module is used for training the pre-training language model by taking a plurality of sample reports as training data and taking quality inspection categories, starting positions and ending positions of the sample reports as training labels.
Optionally, in an embodiment of the present application, the word segmentation text quality inspection module includes: the feature vector obtaining sub-module is used for extracting features of the segmented text by using a transducer structure layer in the pre-training language model to obtain text feature vectors; the initial position obtaining sub-module is used for predicting the text feature vector by using a first linear network layer in the pre-training language model to obtain an initial position and a category probability corresponding to the initial position; an end position obtaining sub-module, configured to predict a text feature vector by using a second linear network layer in the pre-training language model, to obtain an end position and a class probability corresponding to the end position; and the error category determination submodule is used for determining quality inspection categories according to category probabilities corresponding to the starting position and the ending position.
Optionally, in an embodiment of the present application, the error category determination submodule includes: the maximum probability screening unit is used for screening the maximum probability from the class probability corresponding to the starting position and the class probability corresponding to the ending position; and the quality inspection error determining unit is used for determining the quality inspection category corresponding to the maximum probability from a plurality of preset quality inspection categories.
Optionally, in an embodiment of the present application, the report quality inspection device further includes: the quality inspection category judging module is used for judging whether the quality inspection category is a preset error category or not; and the quality inspection category output module is used for outputting the quality inspection category between the starting position and the ending position in the report to be processed if the quality inspection category is a preset error category.
Optionally, in an embodiment of the present application, the report quality inspection device further includes: and the report storage printing module is used for storing the pending report into a database or a file system if the quality inspection category is not the preset error category, and the report in the database or the file system is used for printing or backtracking and checking.
The embodiment of the application also provides electronic equipment, which comprises: a processor and a memory storing machine-readable instructions executable by the processor, which when executed by the processor perform the method described above.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the method described above.
The embodiment of the application also provides a computer program product, which comprises: computer programs or computer instructions which, when executed by a processor, perform the method described above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application, and therefore should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a schematic flow chart of a report quality inspection method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a processing procedure of a report to be processed according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a report quality inspection device according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the embodiments of the present application are only for the purpose of illustration and description, and are not intended to limit the scope of the embodiments of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in the embodiments of the present application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the teachings of the embodiments of the present application.
In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Accordingly, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the claimed embodiments of the application, but is merely representative of selected embodiments of the application.
It will be appreciated that "first" and "second" in embodiments of the application are used to distinguish similar objects. It will be appreciated by those skilled in the art that the words "first," "second," etc. do not limit the number and order of execution, and that the words "first," "second," etc. do not necessarily differ. In the description of the embodiments of the present application, the term "and/or" is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. The term "plurality" refers to two or more (including two), and similarly, "plurality" refers to two or more (including two).
It should be noted that, the report quality inspection method provided by the embodiment of the present application may be executed by an electronic device, where the electronic device refers to a device terminal or a server having a function of executing a computer program, where the device terminal is for example: smart phones, personal computers, tablet computers, personal digital assistants, or mobile internet appliances, etc. A server refers to a device that provides computing services over a network, such as: an x86 server and a non-x 86 server, the non-x 86 server comprising: mainframe, minicomputer, and UNIX servers.
An example of an application scenario for which the report quality inspection method is applicable is described below: written reports (such as ultrasonic examination reports, X-ray reports, electrocardiogram reports, pathology reports or annual end summary reports and the like) in some organizations are usually written manually, and the reports have contents generated by human factors and subjective judgment in the writing process, so that errors such as wrongly written words, punctuation errors, unit errors, irregular expression, medical common sense errors and the like are unavoidable, and in this case, the report quality inspection method can be used for quality inspection of the written reports, so that the condition that human factors existing in the quality inspection process affect quality inspection efficiency of workers is effectively improved, the cost of manual inspection is reduced, and the quality inspection efficiency of the reports is improved. Further, the report to be processed for the report quality inspection method is not limited in format, and can be widely applied to various application scenes, including but not limited to: medical report quality inspection scenes of hospitals (such as ultrasonic report of any organ or tissue part), academic report quality inspection scenes of schools, product quality inspection report scenes of factories, quality inspection report scenes of food and medicines, quality inspection scenes of work report or annual summary report of companies, and the like.
Please refer to fig. 1, which illustrates a flowchart of a report quality inspection method according to an embodiment of the present application; the main idea of the report quality inspection method is that the quality inspection is performed by adopting the pre-training language model, so that not only the quality inspection category can be obtained, but also the starting position and the ending position of the quality inspection category can be obtained, the condition that the quality inspection efficiency is influenced by the artificial factors existing in the quality inspection process of staff can be improved, and the quality inspection efficiency of the report is improved. Embodiments of the report quality inspection method may include:
step S110: and extracting the text of the report to be processed to obtain a report text.
Pending reports, which refer to reports requiring quality testing, include, but are not limited to: medical reports of hospitals, academic reports of schools, quality inspection reports of products of factories, quality inspection reports of food and medicines, work reports of companies, annual end summary reports and the like.
Optionally, before text extraction is performed on the report to be processed, the report to be processed may be further preprocessed, including: converting the pending report into a unified data format (e.g., txt text format), removing sensitive information (e.g., name phone, etc.) contained in the pending report, deleting duplicate reports, scrambled reports, or incomplete reports, etc.
Step S120: and segmenting the report text to obtain segmented text.
It will be appreciated that since the report text cannot be directly input into the pre-trained language model, the report text may be segmented first (Tokenization), and specifically the expanded BERT self-contained dictionary may be used to segment the report text (Tokenization).
Step S130: and performing quality inspection on the segmented text by using the pre-training language model to obtain a quality inspection category and a starting position and an ending position of the quality inspection category.
The Pre-training language model (Pre-trained Language Model) is a neural network model for changing the network structure aiming at the unidirectional characteristic representation model or the bidirectional characteristic representation model, and specifically can be that two linear network layers are bridged behind the unidirectional characteristic representation model or the bidirectional characteristic representation model, so that the two linear network layers respectively output the starting position and the ending position of the quality inspection category. Wherein the unidirectional feature representation model is for example: an embedded language Model (Embeddings from Language Models, ELMO), a generic language tuning Model (Universal Language Fine-tuning Model), or a generative pre-training transformation (GENERATIVE PRE-trained Transformer, GPT) Model, etc., the bi-directional feature representation Model being, for example: the bi-directional encoder representation of the transformer (Bidirectional Encoder Representations from Transformers, BERT) model, spanBERT model, ALBERT model, roBERTa model, etc.
In the implementation process of the scheme, the text after word segmentation is obtained by word segmentation of the message text extracted from the report to be processed, and the text after word segmentation is subjected to quality inspection by using a pre-training language model, so that the condition that the quality inspection efficiency is influenced by the artificial factors existing in the quality inspection process of staff is effectively improved, and the quality inspection efficiency of the report is improved.
As an alternative implementation manner of the above step S110, if the report to be processed is an electronic document, the executable program may be directly written to extract the report text from the electronic document; if the report to be processed is in a video format, an image comprising the report to be processed can be extracted from the report in the video format, and then the image is subjected to character recognition to obtain a report text; if the report to be processed is in the picture format, the text recognition can be directly carried out on the report in the picture format to obtain a report text; if the report to be processed is a paper document, the paper document can be photographed to obtain a report in a picture format, and then the report in the picture format is subjected to word recognition to obtain a report text.
As an alternative implementation manner of the above step S120, the report text may be segmented (Tokenization), specifically, the extended BERT self dictionary may be used to segment the report text (Tokenization) to obtain a plurality of words, then, each word in the plurality of words is converted into a digital ID, a string composed of the digital IDs is obtained, and the string composed of the digital IDs is determined as the segmented text.
As an alternative embodiment of the above step S130, before the quality inspection of the segmented text using the pre-training language model, fine-tuning (training) may also be performed on the pre-training language model, where the process of fine-tuning may include:
Step S131: a plurality of sample reports and quality class, start position and end position of the sample reports are obtained.
Step S132: and training the pre-training language model by taking a plurality of sample reports as training data and taking quality inspection types, starting positions and ending positions of the sample reports as training labels.
The embodiments of the above steps S131 to S132 are, for example: the pre-trained language model may also be trained prior to quality inspection of the segmented text using the pre-trained language model, for example: a training data set is acquired first, which may include: sample reports and sample labels, the sample labels are obtained by labeling the sample reports, and a specific labeling process is described below. The sample report can be obtained by directly reading from a local storage, or can be collected from various application scenes for training the model. Taking medical reports of hospitals as an example, in order to increase the diversity of training data, sample reports of various examination sites may be obtained from medical databases of a plurality of hospitals.
After obtaining the sample report, the sample report may be further labeled, where, taking a medical report of a hospital as an example, quality inspection categories (including misprinted words, punctuation errors, unit errors, term non-normative, common error types such as medical common sense errors) and location information may be labeled in the sample report, where the location information includes: a start position and an end position of the error type. Of course, the quality inspection category may be an error type or a correct type, and the correct type is a report type in which no error is found. The sample report may be annotated by a staff member having a professional medical context, for example: for each medical report, the staff marks the error type, the starting position and the ending position of the error type and the like in the report by using a marking tool. Furthermore, to increase the diversity of the training data set, sample reports and sample tags may also be generated by way of construction, such as: and generating a sample report with the wrongly written word in a homonym substitution mode, and then marking the sample report with the wrongly written word.
After obtaining the training dataset, reporting as training data with a plurality of samples in the training dataset, and training with quality class, starting position, and ending position in the training dataset as training labels, fine-tuning the pre-training language model with a AdamW optimizer, wherein AdamW optimizer is a variant of Adam optimizer that modifies weight attenuation (L2 regularization) based on original Adam optimizer to improve generalization ability of the deep learning model. The fine tuning training process specifically includes: fine tuning training is carried out on the pre-training language model through warmup strategies and AdamW optimizers, and warmup strategies are used for gradually increasing the learning rate in a few rounds at the initial stage of training, so that the model can slowly tend to be stable, and the model convergence speed is higher, and the model effect is better.
It will be appreciated that for each sample report of length N (meaning that the sample report is made up of N tokens), there are a true training label corresponding to the start position and a true training label corresponding to the end position, both of which are N in size, i.e., each token has a class label whose number is within the preset error class. Then, before the fine tuning training is carried out on the pre-training language model, the dimension size of the real training label is adjusted to be consistent with the model output, namely the dimension size of the real training label is converted from [ N ] code to [ N, C ] code by an One-Hot coding mode, and the converted initial position label sequence is expressed asThe converted end position tag sequence is expressed as. Finally, use of loss functionSum formulaCalculating a loss value, and updating network weight parameters of the pre-training language model according to the loss value, wherein,Representing the probability of a predicted starting position for the pre-trained language model,Representing the probability of the predicted end position of the pre-trained language model, the dimensions of both probability sequences beingWhere N represents the text length of the sample report, and C represents the total number of error types,One-Hot encoding representing the converted starting position tag sequence,One-Hot encoding of the tag sequence representing the converted end position,A loss value indicative of the starting position is provided,A loss value indicative of the end position,Representing a Loss function of the Focal Loss type. Updating network weight parameters of the model in the process that the pre-training language model is trained until the accuracy rate is not increased any more or the number of rounds (epochs) is larger than a preset threshold value, so that the pre-training language model after fine-tuning training can be obtained; the preset threshold may also be set according to the specific situation, for example, set to 100 or 1000, etc.
Please refer to fig. 2, which illustrates a schematic diagram of a processing procedure of a pending report according to an embodiment of the present application; as an alternative embodiment of the step S130, the embodiment of performing quality inspection on the segmented text using the pre-trained language model may include:
Step S133: and extracting features of the segmented text by using a transducer structure layer in the pre-training language model to obtain text feature vectors.
The embodiment of step S133 described above is, for example: because the pre-training language model is obtained by bridging two linear network layers behind the unidirectional feature representation model or the bidirectional feature representation model, a transducer structure layer in the pre-training language model can adopt the unidirectional feature representation model or the bidirectional feature representation model. The text after word segmentation can be subjected to feature extraction by using a unidirectional feature representation model or a bidirectional feature representation model to obtain a text feature vector; wherein the unidirectional feature representation model is for example: ELMO model, generic language tuning model, GPT model, etc., bi-directional feature representation model such as: BERT model, spanBERT model, ALBERT model, roBERTa model, etc.
Step S134: and predicting the text feature vector by using a first linear network layer in the pre-training language model to obtain a starting position and a category probability corresponding to the starting position.
It will be appreciated that assuming that the length of the report text entered is N characters and the number of quality inspection categories is 6 (including 1 correct type and 5 error types), then the dimension of the output of each linear network layer is N x 6 and the output of each location is a vector of length 6. Wherein dimension 1 represents the probability that the location is error free, dimension 2 represents the probability that the location is of the first error type, dimension 3 represents the probability that the location is of the second error type, dimension 4 represents the probability that the location is of the third error type, dimension 5 represents the probability that the location is of the fourth error type, and dimension 6 represents the probability that the location is of the fifth error type.
Step S135: and predicting the text feature vector by using a second linear network layer in the pre-training language model to obtain the ending position and the class probability corresponding to the ending position.
Step S136: and determining the quality inspection category according to the category probability corresponding to the starting position and the category probability corresponding to the ending position.
The embodiments of the above steps S134 to S136 are, for example: assuming that the length of the input report text is N characters, the number of quality inspection categories is 6 (including 1 correct type and 5 error types), the first linear network layer outputs the probability that the character corresponds to the initial position of the dislocation type, the second linear network layer outputs the probability that the character corresponds to the end position of the error type, the two outputs respectively take the probability of the 6 categories to be the largest, and then the specific error type and the position information corresponding to the error type can be accurately positioned by combining the two output sequences. Specifically, the result after the probability of the first linear network layer is maximized is [ 0,0,1,0,2,0 ], the result after the probability of the second linear network layer is maximized is [ 0,0,0,1,0,2 ], the 3 rd token and the 5 th token of the first linear network layer are predicted as the starting positions of errors, and the types of errors are 1 and 2 respectively; the fourth token and the 6 th token of the second linear network layer are predicted as the end positions of the errors, the types of errors are 1 and 2, respectively, and then the two outputs are combined to obtain the 3 rd to 4 th errors, the type is 1, the 5 th to 6 th errors, the type is 2 of the text. That is, the outputs of the two linear network layers are matched to obtain the final result, and if there is no match, the error type is not output.
As an alternative implementation manner of the step S136, the implementation manner of determining the quality inspection category according to the category probability corresponding to the start position and the category probability corresponding to the end position may include:
step S136a: and screening the maximum probability from the class probability corresponding to the starting position and the class probability corresponding to the ending position.
Step S136b: and determining the quality inspection category corresponding to the maximum probability from a plurality of preset quality inspection categories.
The embodiment of the above steps S136a to S136b is, for example: assuming that the maximum probability of the first linear network layer is [0,0,1,0,2,0 ] and the maximum probability of the second linear network layer is [0,0,0,1,0,2 ], the 3 rd token and the 5 th token of the first linear network layer are predicted as the starting positions of errors, and the types of errors are 1 and 2 respectively; the fourth token and the 6 th token of the second linear network layer are predicted as the end positions of the errors, the types of errors are 1 and 2, respectively, and then the two outputs are combined to obtain the 3 rd to 4 th errors, the type is 1, the 5 th to 6 th errors, the type is 2 of the text. That is, the outputs of the two linear network layers are matched to obtain the final result, and if there is no match, the error type is not output.
As an optional implementation manner of the report quality inspection method, after obtaining the quality inspection category and the start position and the end position of the quality inspection category, whether to output the quality inspection category in the report to be processed may be further determined according to a preset error category, which may include:
step S140: and judging whether the quality inspection category is a preset error category or not.
Step S150: if the quality inspection category is a preset error category, outputting the quality inspection category between the starting position and the ending position in the report to be processed.
The embodiments of the above steps S140 to S150 are, for example: judging whether the quality inspection category is a preset error category or not by using an executable program compiled or interpreted by a preset programming language; among these, programming languages that can be used are, for example: C. c++, java, BASIC, javaScript, LISP, shell, perl, ruby, python, PHP, and the like. Assuming that the original text of the report to be processed is "headache headdache symptoms by patient" and the preset error category is spelling error, "headdache" can be identified as spelling error, and the error category and the start position and end position corresponding to the error category are highlighted in the report to be processed: starting position: 10, end position: and 19, the report can be returned to a presenter reporting the pending report for modification. A modification algorithm can also be used to quickly locate and fix errors in the report from this feedback, correcting it to the correct version: the patient presented with symptoms of headache headache. In practical applications, for different quality inspection categories, there may be different modification algorithms to locate errors and propose correction schemes, such as rule-based methods, statistical machine learning methods, or deep learning models, etc.
As an optional implementation manner of the above method, after determining whether the quality inspection category is a category in which no error exists, the method further includes:
Step S160: if the quality inspection category is not the preset error category, the to-be-processed report is stored in a database or a file system, and the report in the database or the file system is used for printing or backtracking.
The embodiment of step S160 described above is, for example: assuming that the original text of the report to be processed is "headache symptom of patient" and the preset error class is spelling error, it is obvious that the report to be processed has no error, and at this time, the report to be processed, which is "headache symptom of patient", of the original text can be stored in a database or a file system, and the report in the database or the file system is used for printing or backtracking.
Please refer to fig. 3, which illustrates a flow chart of a report quality inspection apparatus according to an embodiment of the present application; the embodiment of the application provides a report quality inspection device 200, which comprises:
The report text obtaining module 210 is configured to extract text from the report to be processed, and obtain report text.
The word segmentation text obtaining module 220 is configured to segment the report text, and obtain the segmented text.
The word segmentation text quality inspection module 230 is configured to use a pre-training language model to perform quality inspection on the segmented text, and obtain a quality inspection category and a start position and an end position of the quality inspection category.
As an alternative embodiment of the above report quality testing device, the report quality testing device further includes:
and the report data acquisition module is used for acquiring a plurality of sample reports and quality inspection categories, starting positions and ending positions of the sample reports.
The language model training module is used for training the pre-training language model by taking a plurality of sample reports as training data and taking quality inspection categories, starting positions and ending positions of the sample reports as training labels.
As an optional implementation manner of the report quality inspection device, the word segmentation text quality inspection module includes:
And the feature vector obtaining sub-module is used for extracting features of the segmented text by using a transducer structure layer in the pre-training language model to obtain text feature vectors.
And the starting position obtaining sub-module is used for predicting the text feature vector by using a first linear network layer in the pre-training language model to obtain the starting position and the category probability corresponding to the starting position.
And the ending position obtaining sub-module is used for predicting the text feature vector by using a second linear network layer in the pre-training language model to obtain the ending position and the class probability corresponding to the ending position.
And the error category determination submodule is used for determining quality inspection categories according to category probabilities corresponding to the starting position and the ending position.
As an alternative embodiment of the above report quality inspection device, the error category determination submodule includes:
And the maximum probability screening unit is used for screening the maximum probability from the class probability corresponding to the starting position and the class probability corresponding to the ending position.
And the quality inspection error determining unit is used for determining the quality inspection category corresponding to the maximum probability from a plurality of preset quality inspection categories.
As an alternative embodiment of the above report quality testing device, the report quality testing device further includes:
the quality inspection category judging module is used for judging whether the quality inspection category is a preset error category or not.
And the quality inspection category output module is used for outputting the quality inspection category between the starting position and the ending position in the report to be processed if the quality inspection category is a preset error category.
As an alternative embodiment of the above report quality testing device, the report quality testing device further includes:
And the report storage printing module is used for storing the pending report into a database or a file system if the quality inspection category is not the preset error category, and the report in the database or the file system is used for printing or backtracking and checking.
It should be understood that, corresponding to the above report quality inspection method embodiment, the apparatus can perform the steps related to the above method embodiment, and specific functions of the apparatus may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate. The device includes at least one software functional module that can be stored in memory in the form of software or firmware (firmware) or cured in an Operating System (OS) of the device.
Please refer to fig. 4, which illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application. An electronic device 300 provided in an embodiment of the present application includes: a processor 310 and a memory 320, the memory 320 storing machine-readable instructions executable by the processor 310, which when executed by the processor 310 perform the method as described above.
The embodiment of the present application also provides a computer readable storage medium 330, on which computer readable storage medium 330 a computer program is stored which, when executed by the processor 310, performs a method as above. The computer-readable storage medium 330 may be implemented by any type or combination of volatile or nonvolatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk.
The embodiment of the application also provides a computer program product, which comprises: computer programs or computer instructions which, when executed by a processor, perform the method described above.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
In the embodiments of the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
In addition, the functional modules of the embodiments of the present application may be integrated together to form a single part, or the modules may exist separately, or two or more modules may be integrated to form a single part. Furthermore, in the description herein, the descriptions of the terms "one embodiment," "some embodiments," "examples," "specific examples," "some examples," and the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
The foregoing description is merely an optional implementation of the embodiment of the present application, but the scope of the embodiment of the present application is not limited thereto, and any person skilled in the art may easily think about changes or substitutions within the technical scope of the embodiment of the present application, and the changes or substitutions are covered by the scope of the embodiment of the present application.

Claims (8)

1. A method of reporting quality control, comprising:
Text extraction is carried out on the report to be processed, and a report text is obtained;
Word segmentation is carried out on the report text, and a text after word segmentation is obtained;
Performing quality inspection on the segmented text by using a pre-training language model to obtain a quality inspection category and a starting position and an ending position of the quality inspection category;
The quality inspection of the segmented text by using a pre-training language model comprises the following steps: extracting features of the segmented text by using a Transformer structural layer in the pre-training language model to obtain text feature vectors; predicting the text feature vector by using a first linear network layer in the pre-training language model to obtain a starting position and class probability corresponding to the starting position; predicting the text feature vector by using a second linear network layer in the pre-training language model to obtain an end position and class probability corresponding to the end position; determining the quality inspection category according to the category probability corresponding to the starting position and the category probability corresponding to the ending position, wherein the pre-training language model is obtained by bridging the first linear network layer and the second linear network layer behind a unidirectional characteristic representation model or a bidirectional characteristic representation model, the first linear network layer is used for outputting the category probability corresponding to the starting position and the starting position, and the second linear network layer is used for outputting the category probability corresponding to the ending position and the ending position;
Before the text subjected to word segmentation is subjected to quality inspection by using the pre-training language model, the method further comprises the following steps: acquiring a plurality of sample reports and quality inspection categories, starting positions and ending positions of the sample reports; using the sample reports as training data, using quality inspection category, starting position and ending position of the sample report as training labels, using loss function Sum formulaCalculating a loss value, and updating network weight parameters of the pre-training language model according to the loss value until the accuracy is no longer increased or the number of rounds is greater than a preset threshold value, so as to obtain the pre-training language model after fine-tuning training; wherein,Representing the probability of a predicted starting position for the pre-trained language model,Representing the probability of the predicted end position of the pre-trained language model, the dimensions of both probability sequences beingWhere N represents the text length of the sample report, and C represents the total number of error types,One-Hot encoding representing the converted starting position tag sequence,One-Hot encoding of the tag sequence representing the converted end position,A loss value indicative of the starting position is provided,A loss value indicative of the end position,Representing a Loss function of the Focal Loss type.
2. The method of claim 1, wherein the determining the quality inspection category according to the category probability corresponding to the start position and the category probability corresponding to the end position comprises:
Screening out the maximum probability from the class probability corresponding to the starting position and the class probability corresponding to the ending position;
and determining the quality inspection category corresponding to the maximum probability from a plurality of preset quality inspection categories.
3. The method of claim 1, further comprising, after the obtaining the quality inspection category and the starting and ending positions of the quality inspection category:
judging whether the quality inspection category is a preset error category or not;
if yes, outputting the quality inspection category between the starting position and the ending position in the report to be processed.
4. A method according to claim 3, further comprising, after said determining whether said quality inspection category is a category that is free of errors:
And if the quality inspection category is not the preset error category, storing the report to be processed into a database or a file system, wherein the report in the database or the file system is used for printing or backtracking.
5. A reporting quality control apparatus, comprising:
the report text obtaining module is used for extracting text from the report to be processed to obtain a report text;
The word segmentation text obtaining module is used for segmenting the report text to obtain segmented text;
The word segmentation text quality inspection module is used for inspecting the quality of the segmented text by using a pre-trained language model to obtain quality inspection categories and the starting position and the ending position of the quality inspection categories;
The quality inspection of the segmented text by using a pre-training language model comprises the following steps: extracting features of the segmented text by using a Transformer structural layer in the pre-training language model to obtain text feature vectors; predicting the text feature vector by using a first linear network layer in the pre-training language model to obtain a starting position and class probability corresponding to the starting position; predicting the text feature vector by using a second linear network layer in the pre-training language model to obtain an end position and class probability corresponding to the end position; determining the quality inspection category according to the category probability corresponding to the starting position and the category probability corresponding to the ending position, wherein the pre-training language model is obtained by bridging the first linear network layer and the second linear network layer behind a unidirectional characteristic representation model or a bidirectional characteristic representation model, the first linear network layer is used for outputting the category probability corresponding to the starting position and the starting position, and the second linear network layer is used for outputting the category probability corresponding to the ending position and the ending position;
Before the text subjected to word segmentation is subjected to quality inspection by using the pre-training language model, the method further comprises the following steps: acquiring a plurality of sample reports and quality inspection categories, starting positions and ending positions of the sample reports; using the sample reports as training data, using quality inspection category, starting position and ending position of the sample report as training labels, using loss function Sum formulaCalculating a loss value, and updating network weight parameters of the pre-training language model according to the loss value until the accuracy is no longer increased or the number of rounds is greater than a preset threshold value, so as to obtain the pre-training language model after fine-tuning training; wherein,Representing the probability of a predicted starting position for the pre-trained language model,Representing the probability of the predicted end position of the pre-trained language model, the dimensions of both probability sequences beingWhere N represents the text length of the sample report, and C represents the total number of error types,One-Hot encoding representing the converted starting position tag sequence,One-Hot encoding of the tag sequence representing the converted end position,A loss value indicative of the starting position is provided,A loss value indicative of the end position,Representing a Loss function of the Focal Loss type.
6. An electronic device, comprising: a processor and a memory storing machine-readable instructions executable by the processor, which when executed by the processor perform the method of any one of claims 1 to 4.
7. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the method of any of claims 1 to 4.
8. A computer program product, comprising: computer program or computer instructions which, when executed by a processor, performs the method of any of claims 1 to 4.
CN202410495193.6A 2024-04-24 2024-04-24 Report quality inspection method and device and electronic equipment Active CN118093527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410495193.6A CN118093527B (en) 2024-04-24 2024-04-24 Report quality inspection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410495193.6A CN118093527B (en) 2024-04-24 2024-04-24 Report quality inspection method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN118093527A CN118093527A (en) 2024-05-28
CN118093527B true CN118093527B (en) 2024-08-16

Family

ID=91155591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410495193.6A Active CN118093527B (en) 2024-04-24 2024-04-24 Report quality inspection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN118093527B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118627470B (en) * 2024-08-12 2024-10-29 深圳华付技术股份有限公司 Lithium battery detection report quality inspection method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559770A (en) * 2020-12-15 2021-03-26 北京邮电大学 Text data relation extraction method, device and equipment and readable storage medium
CN113268571A (en) * 2021-07-21 2021-08-17 北京明略软件系统有限公司 Method, device, equipment and medium for determining correct answer position in paragraph
CN116258136A (en) * 2023-01-10 2023-06-13 首都医科大学附属北京友谊医院 Error detection model training method, medical image report detection method, system and equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639489A (en) * 2020-05-15 2020-09-08 民生科技有限责任公司 Chinese text error correction system, method, device and computer readable storage medium
CN112420148A (en) * 2020-11-24 2021-02-26 北京一脉阳光医学信息技术有限公司 Medical image report quality control system, method and medium based on artificial intelligence
CN112836496B (en) * 2021-01-25 2024-02-13 之江实验室 Text error correction method based on BERT and feedforward neural network
CN114154485A (en) * 2021-11-05 2022-03-08 北京搜狗科技发展有限公司 Text error correction method and device
CN114610888A (en) * 2022-03-18 2022-06-10 中国科学院软件研究所 Automatic monitoring and synthesizing method for defect report of developer group chat
CN116136957B (en) * 2023-04-18 2023-07-07 之江实验室 Text error correction method, device and medium based on intention consistency

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559770A (en) * 2020-12-15 2021-03-26 北京邮电大学 Text data relation extraction method, device and equipment and readable storage medium
CN113268571A (en) * 2021-07-21 2021-08-17 北京明略软件系统有限公司 Method, device, equipment and medium for determining correct answer position in paragraph
CN116258136A (en) * 2023-01-10 2023-06-13 首都医科大学附属北京友谊医院 Error detection model training method, medical image report detection method, system and equipment

Also Published As

Publication number Publication date
CN118093527A (en) 2024-05-28

Similar Documents

Publication Publication Date Title
US11790171B2 (en) Computer-implemented natural language understanding of medical reports
CA3137096A1 (en) Computer-implemented natural language understanding of medical reports
EP3879475A1 (en) Method of classifying medical documents
US20220059200A1 (en) Deep-learning systems and methods for medical report generation and anomaly detection
CN112509661B (en) Methods, computing devices, and media for identifying physical examination reports
CN118093527B (en) Report quality inspection method and device and electronic equipment
CN110705596A (en) White screen detection method and device, electronic equipment and storage medium
CN113724819A (en) Training method, device, equipment and medium for medical named entity recognition model
CN113255583A (en) Data annotation method and device, computer equipment and storage medium
CN116611449A (en) Abnormality log analysis method, device, equipment and medium
CN112989256B (en) Method and device for identifying web fingerprint in response information
CN115082659A (en) Image annotation method and device, electronic equipment and storage medium
CN118013963B (en) Method and device for identifying and replacing sensitive words
CN118397642A (en) Bill information identification method, device and equipment based on OCR (optical character recognition) and storage medium
CN113761845A (en) Text generation method and device, storage medium and electronic equipment
CN110826616A (en) Information processing method and device, electronic equipment and storage medium
CN116258136A (en) Error detection model training method, medical image report detection method, system and equipment
CN113407719B (en) Text data detection method and device, electronic equipment and storage medium
CN113127635B (en) Data processing method, device and system, storage medium and electronic equipment
CN114781386A (en) Method and device for acquiring text error correction training corpus and electronic equipment
CN113901817A (en) Document classification method and device, computer equipment and storage medium
CN116991983B (en) Event extraction method and system for company information text
CN113469202A (en) Data processing method, electronic device and computer readable storage medium
US11978273B1 (en) Domain-specific processing and information management using machine learning and artificial intelligence models
US20240202551A1 (en) Visual Question Answering for Discrete Document Field Extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant