CN116312915A - Method and system for standardized association of drug terms in electronic medical records - Google Patents
Method and system for standardized association of drug terms in electronic medical records Download PDFInfo
- Publication number
- CN116312915A CN116312915A CN202310567874.4A CN202310567874A CN116312915A CN 116312915 A CN116312915 A CN 116312915A CN 202310567874 A CN202310567874 A CN 202310567874A CN 116312915 A CN116312915 A CN 116312915A
- Authority
- CN
- China
- Prior art keywords
- drug
- term
- medicine
- terms
- electronic medical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000003814 drug Substances 0.000 title claims abstract description 486
- 229940079593 drug Drugs 0.000 title claims abstract description 295
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000005065 mining Methods 0.000 claims abstract description 57
- 238000012549 training Methods 0.000 claims description 54
- 230000006870 function Effects 0.000 claims description 24
- 238000003062 neural network model Methods 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 5
- 238000005096 rolling process Methods 0.000 claims description 5
- 238000000926 separation method Methods 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 4
- SUBDBMMJDZJVOS-DEOSSOPVSA-N esomeprazole Chemical compound C([S@](=O)C1=NC2=CC=C(C=C2N1)OC)C1=NC=C(C)C(OC)=C1C SUBDBMMJDZJVOS-DEOSSOPVSA-N 0.000 description 17
- 238000013507 mapping Methods 0.000 description 11
- 238000003860 storage Methods 0.000 description 11
- KWORUUGOSLYAGD-YPPDDXJESA-N esomeprazole magnesium Chemical compound [Mg+2].C([S@](=O)C=1[N-]C2=CC=C(C=C2N=1)OC)C1=NC=C(C)C(OC)=C1C.C([S@](=O)C=1[N-]C2=CC=C(C=C2N=1)OC)C1=NC=C(C)C(OC)=C1C KWORUUGOSLYAGD-YPPDDXJESA-N 0.000 description 10
- 229960004770 esomeprazole Drugs 0.000 description 9
- 230000036541 health Effects 0.000 description 9
- 229960000496 esomeprazole sodium Drugs 0.000 description 8
- MBBZMMPHUWSWHV-BDVNFPICSA-N N-methylglucamine Chemical compound CNC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO MBBZMMPHUWSWHV-BDVNFPICSA-N 0.000 description 7
- 229960000197 esomeprazole magnesium Drugs 0.000 description 7
- IZOOGPBRAOKZFK-UHFFFAOYSA-K gadopentetate Chemical compound [Gd+3].OC(=O)CN(CC([O-])=O)CCN(CC([O-])=O)CCN(CC(O)=O)CC([O-])=O IZOOGPBRAOKZFK-UHFFFAOYSA-K 0.000 description 7
- 229960003460 gadopentetic acid Drugs 0.000 description 7
- 229960003194 meglumine Drugs 0.000 description 7
- 238000011160 research Methods 0.000 description 7
- NCGHIAKEJNQSMS-QLGOZJDFSA-N strontium;5-methoxy-2-[(s)-(4-methoxy-3,5-dimethylpyridin-2-yl)methylsulfinyl]benzimidazol-1-ide;tetrahydrate Chemical compound O.O.O.O.[Sr+2].C([S@](=O)C=1[N-]C2=CC=C(C=C2N=1)OC)C1=NC=C(C)C(OC)=C1C.C([S@](=O)C=1[N-]C2=CC=C(C=C2N=1)OC)C1=NC=C(C)C(OC)=C1C NCGHIAKEJNQSMS-QLGOZJDFSA-N 0.000 description 5
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 4
- 229960002588 cefradine Drugs 0.000 description 4
- RDLPVSKMFDYCOR-UEKVPHQBSA-N cephradine Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@@H]3N(C2=O)C(=C(CS3)C)C(O)=O)=CCC=CC1 RDLPVSKMFDYCOR-UEKVPHQBSA-N 0.000 description 4
- 239000011734 sodium Substances 0.000 description 4
- 229910052708 sodium Inorganic materials 0.000 description 4
- 229960000914 esomeprazole magnesium dihydrate Drugs 0.000 description 3
- DBOUSUONOXEWHU-VCKZSRROSA-N magnesium;5-methoxy-2-[(s)-(4-methoxy-3,5-dimethylpyridin-2-yl)methylsulfinyl]benzimidazol-1-ide;dihydrate Chemical compound O.O.[Mg+2].C([S@](=O)C=1[N-]C2=CC=C(C=C2N=1)OC)C1=NC=C(C)C(OC)=C1C.C([S@](=O)C=1[N-]C2=CC=C(C=C2N=1)OC)C1=NC=C(C)C(OC)=C1C DBOUSUONOXEWHU-VCKZSRROSA-N 0.000 description 3
- 150000003851 azoles Chemical class 0.000 description 2
- 239000002552 dosage form Substances 0.000 description 2
- 229960005093 esomeprazole strontium Drugs 0.000 description 2
- 229960000433 latamoxef Drugs 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000003390 Chinese drug Substances 0.000 description 1
- JWCSIUVGFCSJCK-CAVRMKNVSA-N Disodium Moxalactam Chemical compound N([C@]1(OC)C(N2C(=C(CSC=3N(N=NN=3)C)CO[C@@H]21)C(O)=O)=O)C(=O)C(C(O)=O)C1=CC=C(O)C=C1 JWCSIUVGFCSJCK-CAVRMKNVSA-N 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- GRIXGZQULWMCLU-HUTAOCTPSA-L disodium;(6r,7r)-7-[[2-carboxylato-2-(4-hydroxyphenyl)acetyl]amino]-7-methoxy-3-[(1-methyltetrazol-5-yl)sulfanylmethyl]-8-oxo-5-oxa-1-azabicyclo[4.2.0]oct-2-ene-2-carboxylate Chemical compound [Na+].[Na+].N([C@]1(OC)C(N2C(=C(CSC=3N(N=NN=3)C)CO[C@@H]21)C([O-])=O)=O)C(=O)C(C([O-])=O)C1=CC=C(O)C=C1 GRIXGZQULWMCLU-HUTAOCTPSA-L 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 235000015067 sauces Nutrition 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Public Health (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a method and a system for standardized association of drug terms in electronic medical records, wherein a drug term library is updated through a synonym mining technology to obtain a drug term library based on synonym mining update, so that the problem of low semantic similarity between standard drug terms in the drug term library and external drug terms in the electronic medical records is solved; when the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library based on synonym mining update, the semantic information is utilized to add pinyin character sequences of corresponding medicine terms except Chinese character marks, and the diagram structure information of the external medicine terms in the medicine term library and the electronic medical record is fully utilized; an association prediction model based on semantic embedding and structural embedding is constructed, so that the association between external medicine terms and standard medicine terms in a medicine term library in the real-world electronic medical record is accurately established.
Description
Technical Field
The invention belongs to the technical field of medical information, and particularly relates to a method and a system for standardized association of drug terminology in an electronic medical record.
Background
With the development of information technology and its continuous deep application in the medical health industry, a large amount of data is stored in the medical health industry. Among these, it is typical to include a Knowledge Base (KB) presented in a relatively standardized form and electronic health medical records (Electronic health records, EHRs) presented in the form of real world medical procedure data. Knowledge bases are techniques used by computer systems to store complex structured and unstructured information, where a Term Base (TB) is a special type of knowledge base used to store term concepts and their related information, and in the field of drug research, general drug term bases that have been built and are still being updated include drug banks and whodrugs, etc., and in academia and industry, there is also a need for a chinese drug term base that is built. However, since in the practice of real-world clinical practice, different areas, even different hospitals, different doctors, may use a variety of different names to represent the same drug, the existing drug terminology library does not have to record all the names of the drugs. For example, a drug with drug id=db 00736, the english name of drug is "Esomeprazole magnesium", the chinese-zehng name is "esomeprazole magnesium", the current name is "esomeprazole magnesium", and in the electronic health medical record system, the drug may be recorded as "esomeprazole magnesium" before the modification of the drug common name, and the drug may be recorded as "esomeprazole magnesium" after the modification of the drug common name, when the electronic health medical record data is used for developing the real world drug study, if any one of the names is missed, the data will be not fully retrieved, thereby resulting in unreasonable screening of the study population, misreckoning of the drug condition and ultimately affecting the study quality. Therefore, when using EHRs data to conduct real-world drug research, especially multi-center, real-world drug research involving multiple drugs, it is necessary to correlate drug names in EHRs with corresponding drugs in a drug terminology library, which is also an important precondition for ensuring the quality of the research and the reliability of the results. The medicine term library is used as important information in the medical research and engineering fields, timely updating of the medicine term library is the basis for promoting information communication and even technical progress in the field, and the medicine term library is associated with real-world electronic health medical record data, so that the medicine term library can provide bottom support for research and engineering tasks in aspects of natural language processing, artificial intelligence, expert system, real-world medicine research and the like based on EHRs and has promotion and promotion effects.
In the existing medicine association method, a medical standard term management system and method (publication number is CN 115080751A) based on a general model relate to mapping of medical record texts and standard terms, firstly, text subdivision attributes are obtained by splitting the medical record texts based on a sequence labeling model, then similarity between the medical record texts and any semantic standard word is calculated, validity of standardized mapping is judged through the semantic similarity, if the standardized mapping is valid, the standardized mapping is directly used as a mapping result, if the standardized mapping is invalid, other possible standardized mapping is recalculated, and finally, the standardized mapping result is used as a mapping result recommended by an algorithm and needing manual examination. However, the technical scheme only uses semantic similarity to judge the validity of the mapping, and ignores the structural characteristics of the drug term library.
A method and a device for matching medicine names (publication number is CN 112711642A) relate to medicine matching among different electronic medical records, word vectors of an electronic medical record corpus are obtained through electronic medical record data training, medicine names are extracted based on a unified medical language system, medicine entity word vectors are obtained, a neural network model is adopted to obtain component vectors, meanwhile, engineering characteristics are combined, similarity among medicine entities is calculated, and finally medicine matching among different electronic medical record systems is achieved. According to the technical scheme, under the condition that the unified medical language system is perfect, the problem of medicine matching among different electronic health medical record data is solved, and the problem of matching medicine terms in the electronic health medical record into a medicine term library, which is to be solved by the invention, can be referred to with limited value.
The drug information matching method and system (publication number 107103048B) relate to matching among drugs, firstly, sub-information of multiple dimensions of the drugs to be matched, such as drug names, preparation specifications, dosage forms and the like, are obtained, association degree identification is carried out on target sub-information and standard sub-information, and when an association degree identification result meets preset association requirements, the target information meeting the preset association requirements and one or more standard information are respectively configured into one or more candidate information pairs; and calculating the similarity of the target information and the standard information on the sub-information of the multiple dimensions for each candidate information pair, calculating the comprehensive matching score of each candidate information pair based on the calculated similarity, and finally determining the standard information of the candidate information pair corresponding to the maximum comprehensive matching score as the matching information of the target information. Medical drug matching methods, devices, electronic equipment and storage media (publication number CN111798969 a) relate to matching between a target drug and a drug standard library, and the method of the application comprises: for the target medicine to be matched, selecting a plurality of medicine identifications or specifications for representing the target medicine from medicine information as reference items, assigning a weight value to each reference item according to importance, matching the reference item with standard items in a medicine reference library, calculating a comparison value, and calculating the matching degree of the target medicine and the medicine in the medicine reference library according to the comparison value and the weight value, so as to establish a mapping relation between the medicine identifications of the target medicine and the standard identifications of the target medicine stored in the medicine reference library. The two technical schemes solve the problem of matching the target medicine with the medicine standard library, compared with medicines, the medicines contain more sub-information and also contain multi-dimensional information such as preparation specification, dosage form, manufacturer, approval document and the like besides medicine names, but the method is not applicable because the problem to be solved by the invention is related to medicines and the available text information is limited.
The limitations of the prior art are mainly reflected in: only semantic similarity is utilized in the association process, and the graph structure information is not utilized; in the association, the semantic similarity does not use pinyin information, and because the medicine names may have different words and the same pronunciation, if the semantic similarity of the Chinese names is simply used, for example, "cefradine" and "cefradine" may be calculated to have similar similarity, but from the pinyin, it is obvious that "cefradine" and "cefradine" are the same medicine, so that the association result is inaccurate.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a standardized association method and a standardized association system for drug terms in electronic medical records, which realize the association between external drug terms and standard drug terms in a drug term library in the electronic medical records.
The invention aims at realizing the following technical scheme:
according to a first aspect of the present specification, there is provided a method for standardized association of drug terms in an electronic medical record, including:
s1, inputting a drug term library to obtain a synonym set of each standard drug term;
s2, obtaining a drug term library based on synonym mining update, comprising:
Constructing a corpus used for synonym mining, and acquiring a drug term list from the corpus;
training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, and obtaining all synonym sets based on synonym mining update according to a preset probability threshold;
updating the drug term library according to all synonym sets based on synonym mining update;
s3, training a correlation prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record, wherein the method comprises the following steps:
the semantic embedded representation of standard drug term pairs in an external drug term and updated drug term library in the electronic medical record is obtained through a pre-training language model, and specifically comprises the following steps: the external medicine terms and their pinyin character sequences, the standard medicine terms and their pinyin character sequences are combined with the initial characters and the separation characters to form the related medicine term pair character sequences, and the related medicine term pair character sequences are input into a pre-training language model to obtain semantic embedded representations;
obtaining the structure embedded representation of the external medicine term and the standard medicine term pair in the updated medicine term library in the electronic medical record through a graph convolution neural network model, wherein the structure embedded representation specifically comprises the following steps: establishing candidate association relation between the external medicine term and the medicine term in the updated medicine term library based on similarity calculation, respectively taking semantic embedded representations of the external medicine term and the medicine term in the updated medicine term library as initialized node embedded representations of corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedded representations of the corresponding medicine terms, and taking the product of the node embedded representations of the external medicine term and the standard medicine term as a structure embedded representation;
S4, predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
Further, in the training process of the synonym set classifier, the probability that the drug term to be classified belongs to the synonym set is predicted based on the change of the set uniformity score, and the calculation method of the set uniformity score is as follows: calculating an embedded representation of each term in the set, inputting the embedded representation into the fully-connected neural network model to obtain a new term representation, summing all the new term representations to obtain an initialized term set representation, and inputting the initialized term set representation into the fully-connected neural network model to obtain a set uniformity score.
Further, the training set generation mode of the synonym set classifier comprises the following steps: extracting a drug term from the synonym set in a random extraction mode, and generating a positive training sample by combining a set formed by the rest drug terms in the synonym set; for each positive training sample, matching a plurality of negative training samples, the negative training samples being extracted from a drug term library after the drug terms in the synonym set are excluded.
Further, updating the drug term library according to all synonym sets based on synonym mining update, specifically: if the synonym set of the standard drug term as the upper language is updated, establishing synonym association between the corresponding synonym and the standard drug term as the upper language, and simultaneously establishing lower language association between the corresponding synonym and all the standard drug terms associated with the standard drug term as the upper language; if the synonym set of the standard medicine terms of the non-upper languages is updated, corresponding synonyms are associated with the standard medicine terms of the non-upper languages.
Further, the pre-training language model is adjusted, specifically: the semantic embedded representation of the initial character is used as an independent variable, and the dependent variable is a label of whether the semantic association of the external medicine term in the electronic medical record and the medicine term in the updated medicine term library is carried out; acquiring a prediction result based on semantic embedded representation by adopting a nonlinear activation function; and (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, acquiring a training set, training a pre-training language model, and optimizing and adjusting semantic embedding representation.
Further, the candidate association relationship is established between the external drug term and the drug term in the updated drug term library, specifically: calculating the TF-IDF value of each word in the medicine term, obtaining vector representation of the external medicine term in the electronic medical record and each medicine term in the updated medicine term library, calculating the similarity between the two medicine terms, and if the similarity is larger than a preset similarity threshold, establishing a candidate association relationship between the external medicine term in the electronic medical record and the medicine term in the corresponding medicine term library.
Further, the input of each layer in the graph rolling neural network model comprises two parts, the first part is a node embedded representation matrix, the second part is an adjacent matrix, the output of each layer is used as a node embedded representation matrix of the next layer, the graph rolling neural network model is obtained through normalized graph Laplace transformation, and the graph rolling neural network model is optimized by adopting a distance loss function based on a marginal.
Further, the value of the adjacency matrix specifically includes: if there is an edge from one drug term to another in the updated drug term library, the corresponding value is 1, otherwise the value is 0; if there is an edge from the external drug term in the electronic medical record to the updated drug term in the drug term library, the corresponding value is the similarity value in the candidate association.
Further, the semantic embedded representation and the structural embedded representation are spliced, the spliced representation is input into a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, the output of the multi-layer perceptron is converted into a scalar through a nonlinear activation function, and the association probability of external medicine terms in each electronic medical record and standard medicine terms in the updated medicine term library is obtained.
According to a second aspect of the present specification, there is provided a system for standardizing association of drug terms in an electronic medical record, comprising:
the electronic medical record drug term input module is used for acquiring all external drug terms to be subjected to drug term standardization in the electronic medical record;
the system comprises a drug term library synonym mining updating module, a database processing module and a database processing module, wherein the drug term library synonym mining updating module is used for constructing a corpus and acquiring a drug term list for synonym mining from the corpus; obtaining a synonym set of each standard drug term in the drug term library; training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, obtaining all synonym sets based on synonym mining update according to a preset probability threshold, and updating the drug term library;
The candidate association relation establishing module is used for establishing candidate association relation between external medicine terms in the electronic medical record and the medicine terms in the updated medicine term library based on similarity calculation;
the semantic embedded representation module is used for inputting external medicine terms and pinyin character sequences thereof in the electronic medical record, standard medicine terms and pinyin character sequences thereof in the updated medicine term library, combining initial characters and separation characters to form a related medicine term pair character sequence, and inputting a pre-training language model to obtain semantic embedded representation;
the structure embedding representation module is used for respectively taking semantic embedding representations of external medicine terms and updated medicine terms in the medicine term library in the electronic medical record as initialized node embedding representations of the corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedding representations of the corresponding medicine terms, and taking the product of the node embedding representations of the external medicine terms and the standard medicine terms as the structure embedding representation;
the association prediction module is used for training an association prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record; and predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
The beneficial effects of the invention are as follows: the invention enriches semantic information and graph structure information through synonym mining technology; in the prediction of the associated prediction model, semantic information and graph structure information are simultaneously used; the concrete steps are as follows:
1) Updating the drug term library by a synonym mining technology to obtain a drug term library based on synonym mining updating, so that the problem of low semantic similarity between standard drug terms in the drug term library and external drug terms in the electronic medical record is solved;
2) When external medicine terms in the electronic medical record are associated with standard medicine terms in a medicine term library which is mined and updated based on synonyms, the semantic information is utilized to add pinyin marks of corresponding terms besides Chinese character marks;
3) When the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library which is mined and updated based on synonyms, the diagram structure information of the medicine term library is fully utilized;
4) When the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library based on synonym mining update, the graph structure information of the external medicine terms in the electronic medical record is obtained by associating the external medicine terms in the electronic medical record with the medicine terms in the medicine term library;
5) Through the method, embedded representation information of external drug terms and standard drug term pairs in a drug term library based on synonym mining update in the electronic medical record is finally obtained, and prediction of the association prediction model is carried out.
Drawings
FIG. 1 is a flowchart illustrating overall steps of a method for standardized association of drug terminology in an electronic medical record according to an exemplary embodiment;
FIG. 2 is a flowchart of a method for standardized association of drug terminology in electronic medical records provided in an exemplary embodiment;
FIG. 3 is a schematic diagram of a library of raw drug terms provided by an exemplary embodiment;
FIG. 4 is a diagram of a drug terminology library based on synonym mining updates, as provided by an example embodiment;
FIG. 5 is a block diagram of a system for standardizing association of drug terminology in electronic medical records provided in an exemplary embodiment.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1 and fig. 2, the method for standardized association of drug terms in electronic medical records provided by the embodiment of the invention includes the following steps:
step S1: inputting a drug term library, the drug term library being expressed asWherein E represents a collection of drug terms, including standard drug terms as upper or non-upper terms, synonyms for standard drug termsR represents a set of relationships between drug terms, in particular, < >>Wherein->Representing the lower order relation->Representing synonym relationships, relationships between drug terms may be expressed asThe term h represents a synonym for t, orThe lower phrase representing the drug term h is t; the relation in the drug term library is +.>The drug terms of (2) are converted into synonym sets, and the synonym sets of each standard drug term are obtained;
FIG. 3 is an example of a library of drug terms, wherein standard drug terms include esomeprazole, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate, esomeprazole magnesium trihydrate, gadofacic, gadofemetic acid meglumine, latamoxef, and latamoxef sodium; the lower language of esomeprazole includes esomeprazole strontium, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate and esomeprazole magnesium trihydrate, and the relationship can be expressed as The method comprises the steps of carrying out a first treatment on the surface of the Synonyms for Laxef include Laxef, lower phrases include Laxef sodium; the lower language of gadopentetic acid includes gadopentetic acid meglumine, the synonyms of gadopentetic acid meglumine include gadopentetic acid meglumine and gadopentetic acid meglumine, and the relationship of gadopentetic acid meglumine and gadopentetic acid meglumine can be expressedIs thatThe method comprises the steps of carrying out a first treatment on the surface of the Standard pharmaceutical terminology is the target association object of the present embodiment.
Step S2: aiming at the problem of incomplete synonym relation caused by the problems of irregular translation, inaccurate manual labeling, untimely data updating and the like of an original drug term library, adopting a synonym mining method to perfect the synonym relation of the drug term library, and obtaining the drug term library based on synonym mining updating; the method specifically comprises the following substeps:
step S21: obtaining Chinese abstract and text of medicine related Chinese document from Chinese knowledge network, mastership and other document retrieval platforms to form synonym mining corpusAnd acquiring the drug terms for synonym mining by using a named entity recognition method, expressed as a drug term list +.>Wherein->Represents the i-th drug term,/->Representation->The number of drug terms, named entity recognition method adopted in this embodiment is a conditional random field model;
All synonym sets in the drug term library are noted asWherein->Synonym set for j-th standard drug term,/->Representation->Number of synonym sets in->Equal to the number of all standard drug terms in a drug term library, if a standard drug term has no synonyms, its corresponding set of synonyms contains only 1 element, i.e., the standard drug term.
Step S22: training a synonym set classifier to obtain a drug term listClassification prediction results of synonym sets in the drug term library and each drug term in the drug term library; the method specifically comprises the following substeps:
step S221: representing a synonym set classifier asWherein->Represents a synonym set,/->Representing a term of the drug to be categorized into a synonym set;
step S222: predicting drug terms to be categorized based on aggregate uniformity score variationBelongs to synonym set->The formula can be expressed as +.>Where Pr represents the probability that the probability,activating a function for sigmoid->A score function for aggregate unity;
step S223: for a group of dataWherein->Representing synonym set ++>And to be categorized into synonym sets->In the medicine term->,/>Indicating label->Representation->,/>Representation->,/>,/>The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment, the synonym set classifier uses a fully connected neural network model, and the loss function of the synonym set classifier uses a logarithmic loss function, which is specifically expressed in the following form: 。
first for a term setIs->Calculating its embedded representation using text embedding method>As an initialization parameter of the embedding layer, the embedding layer is input, and the text embedding method used in this embodiment is Word2Vec; then the embedded representation is input into the fully connected neural network model to obtain a new term representation of the corresponding term +.>The term set->All new terms corresponding to ++>Taking the mean value after addition to ensure the substitution invariance, obtaining the initialized term set representation, namely +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, the above term set is denoted +.>Inputting the full-connection neural network model to obtain a final term set +.>Uniformity score->For measuring the aggregate->The degree of similarity of all terms herein.
In one embodiment, the training set is generated by:
first, the relation in the drug term library is as followsIs converted into a set of synonyms, all the sets of synonyms in the drug term library are denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Each synonym set is denoted +.>Wherein->Representing each drug term in the set of synonyms; extracting any one of the drug terms from the set ES by random extraction >The remaining terms in the set ES constitute the setObtaining a positive sample for model training +.>Label y=1; for each positive sample, K negative samples were matched, denoted +.>The label y=0, in this example K takes 5, where +.>Can be obtained by extracting from a drug term library after the drug terms in the synonym set ES are removed, specifically, can be obtained by mixing samples extracted in the following two ways according to a set proportion: (1) extracted by means of completely random samplingTo (3) the point; (2) limiting the sampling range to the AND set +.>The medicine terms in the medicine are extracted by adopting a random sampling mode, wherein the medicine terms contain the medicine terms with the same characters; in this example, the ratio was set to 2:3.
Step S224: model training is carried out by utilizing a training set, and a medicine term list is predicted based on the trained modelProbability that each drug term in a database of drug terms belongs to each synonym set;
specifically, before synonym mining is performed, all synonym sets in the drug term library are represented asThe method comprises the steps of carrying out a first treatment on the surface of the For->Is of the term->Calculate it to be->Arbitrary synonym set ++>Is expressed as +.>Taking the maximum probability as the drug term +. >Setting a probability threshold +.>If the probability maximum is greater than the probability threshold +.>The corresponding synonym set will be updated if the probability maximum is less than or equal to the probability threshold +.>The above medicine term->Put back to +.>The method comprises the steps of carrying out a first treatment on the surface of the Starting the next cycle untilThe probability that all the drug terms belong to any synonym set is smaller than or equal to the probability threshold +.>Finally, all synonym sets based on synonym mining update are obtained, expressed as +.>The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment ∈>。
Step S23: updating the drug term library according to all synonym sets based on synonym mining update to obtain a drug term library based on synonym mining update;
specifically, if there is an update to the synonym set that is the standard drug term in the upper-level language, the corresponding synonym is associated with the standard drug term in the upper-level languageIn addition, corresponding synonyms are simultaneously associated with all non-upper-level standard drug terms associated with the standard drug terms as upper-level terms to establish lower-level associations, and the relationship is expressed as. Standard drugs in non-upper languagesThe synonym set of the terms is updated, corresponding synonyms and the standard drug terms of the non-upper language are established to form synonym association, and the relationship is expressed as . In the example of FIG. 4, updated synonym sets are mined based on synonyms: [ { Esomeprazole, esomeprazole }, { Esomeprazole sodium, esomeprazole sodium }, …]The synonym esomeprazole is associated with the standard drug term esomeprazole as the upper-level language, and the following standard drug terms are associated with the standard drug term esomeprazole as the upper-level language: esomeprazole strontium, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate, esomeprazole magnesium trihydrate; establishing synonym association between esomeprazole sodium and esomeprazole sodium; and finally obtaining the updated drug term library based on synonym mining.
Step S3: training a related prediction model based on semantic embedding and structural embedding according to the medicine term library based on synonym mining updating and external medicine terms in the real-world electronic medical record data; the method specifically comprises the following substeps:
step S31: acquiring semantic embedded representations of external drug terms in the electronic medical record and standard drug term pairs in a drug term library based on synonym mining update through a pre-training language model, wherein the pre-training language model adopts a BERT model in the embodiment;
In particular, the set of external drug terms in the real-world electronic medical record is represented asStandard drug term set in the updated drug term library based on synonym mining is denoted +.>The external medicine term set G in the real world electronic medical record is arbitrary external medicine term +.>The Pinyin character sequence is expressed asMining any standard drug term in the standard drug term set E in the updated drug term library based on synonyms +.>Is expressed as +.>Will->Incorporating start character [ CLS ]]And separating characters [ SEP ]]The character sequence of the related medicine term pair is marked as +.>To->Is 'Esomeprazole sodium'>For the example of "sodium esomeprazole", the related drug term pair character sequence can be expressed as { [ CLS ]][ moxa ]][ sauce ]][ ao][ Mei ]][ Lala ]][ azoles ]][ sodium ]][ai][si][ao][mei][la][zuo][na][SEP][ Angstrom ]][ rope ]][ Mei ]][ Lala ]][ azoles ]][ sodium ]] [ai][suo] [mei][la][zuo][na] [SEP]-a }; in the embodiment, a BERT model pre-trained based on a Chinese corpus is adopted, semantic embedded representation of the related drug terms on a character sequence is obtained through a plurality of bi-directional coding layers of a Transformer, and finally a starting character [ CLS ] is utilized]Is represented by a semantic embedded representation of +.>And->Is a relationship of (a) and (b).
In this embodiment, the BERT model is adjusted, specifically: will start character Is expressed as an argument and is noted asThe dependent variable is the external drug term +.>And mining standard drug terms in the updated drug term library based on synonyms +.>Label of semantic association or not->If a semantic association existsOtherwise->The method comprises the steps of carrying out a first treatment on the surface of the By means of a nonlinear activation function>Obtaining a prediction result based on the BERT semantic embedded representation +.>In this embodiment, a sigmoid activation function is used, expressed asThe loss function uses a two-class cross entropy loss function, expressed asThe method comprises the steps of carrying out a first treatment on the surface of the And (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, obtaining a training set, performing BERT model training, and optimally adjusting BERT semantic embedded representation.
Step S32: obtaining external drug terms in the electronic medical record and structural embedded representations of standard drug term pairs in a drug term library based on synonym mining update through a graph convolution neural network model;
specifically, candidate association relations are established between external drug terms in the real-world electronic medical record and drug terms in a drug term library based on synonym mining update: calculating TF-IDF value of each word in each medicine term, further obtaining vector representation of external medicine term in electronic medical record and each medicine term in medicine term library based on synonym mining update, calculating similarity before using cosine similarity, setting similarity threshold, if the similarity is larger than the similarity threshold, establishing candidate association relationship between external medicine term in electronic medical record and medicine term in corresponding medicine term library, and representing as As shown, for example, in fig. 4, esomeprazole establishes a candidate association with esomeprazole magnesium, which establishes a candidate association with esomeprazole magnesium.
Converting external drug terms in electronic medical records into sequencesConverting drug terms in a drug term library based on synonym mining update into the sequence +.>The BERT model trained in the step S31 is utilized to calculate the semantic embedded representation of the sequence, and the initial character [ CLS ] is obtained]The corresponding semantic embedded representation is an initialized node embedded representation of the corresponding medication term;
embedding the initialization node into a convolutional neural network model representing an input graph, specifically, the convolutional neural network model comprises an L layer, in this embodiment, l=10, wherein the input of the first layer comprises two parts, and the first part is
Node embedded representation matrix of a dimensionWherein n represents the number of nodes, which isSum of external drug terms and drug terms in drug term library based on synonym mining update in electronic medical record, ++>The node representing the first layer embeds the representation dimension, the second part is +>The adjacent matrix A of the dimension, the output of the first layer is used as the node embedded representation matrix of the first layer +1, and the node embedded representation matrix is obtained through normalized graph Laplace transformation, and the formula is as follows:
Wherein the method comprises the steps ofFor a nonlinear activation function, a sigmoid activation function can be used, < >>I is an identity matrix>For diagonal matrix, the value of the element on the diagonal is +.>,/>A weight matrix of the first layer;
the above-mentioned adjacency matrix a takes the value, in particular, if there is a drug term from the drug term library updated based on synonym miningTo->For (1), then->The value is 1, otherwise, the value is 0; if there is a term of external medicine from electronic medical record +.>To the drug terminology library based on synonym mining update +.>For (1), then->The value is the value of the similarity of the candidate association relations;
the output of the final layer L graph roll-up neural network model is used as node embedded representation of the drug terms in the drug term library based on synonym mining update and the external drug terms in the electronic medical record, and standard drug terms in each drug term library based on synonym mining update are obtained from the node embedded representationAnd the external drug term +_ in each electronic medical record>Is represented by the product of the node embedded representations of the two>And->The structural embedding representation of the association, noted +.>;
Adopting a marginal-based distance loss function optimization graph convolution neural network model, wherein a loss function formula is as follows:
Wherein the method comprises the steps ofStandard drug term +_representing drug term library based on synonym mining update>And the external drug term +_ in each electronic medical record>Distance function of the structure embedded representation of (2)>For indicating the super parameter of the marginal value distinguishing positive and negative samples, ++>Respectively representing positive and negative sample sets; the distance function of the structure embedded representation used in this embodiment is Euclidean distance, i.e.>In this embodiment, get +.>。
Step S33: embedding the semantic embedded representation output in step S31And the structure embedded representation outputted in step S32 +.>Spliced together, denoted->The representation is used as the input of a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, and the output layer is represented as +.>The method comprises the steps of carrying out a first treatment on the surface of the The output vector of the multi-layer perceptron is treated by a nonlinear activation function +.>Is converted into scalar quantity, finally obtaining the external medicine term +.>Standard drug term +.2 in drug term library updated based on synonym mining>Associated probabilities of (a)As an output, a sigmoid activation function is employed in this embodiment, denoted asThe method comprises the steps of carrying out a first treatment on the surface of the The loss function is represented as a two-class cross entropy loss function using the same as the BERT modelWherein->Is the external medicine term +. >Standard drug term +.2 in drug term library updated based on synonym mining>Tag of association or not, if->And->Associative presence +.>Otherwise->。
Step S4: and predicting and obtaining a correlation result of the external medicine terms in the electronic medical record and standard medicine terms in the medicine term library by using a correlation prediction model, and establishing the correlation of the external medicine terms in the real-world electronic medical record and the standard medicine terms in the medicine term library.
As shown in fig. 5, the present invention further provides an embodiment of a system for standardized association of drug terms in electronic medical records implemented based on the above method, where the system includes:
the electronic medical record drug term input module is used for acquiring all external drug terms to be subjected to drug term standardization in the electronic medical record;
the system comprises a drug term library synonym mining updating module, a database processing module and a database processing module, wherein the drug term library synonym mining updating module is used for constructing a corpus and acquiring a drug term list for synonym mining from the corpus; obtaining a synonym set of each standard drug term in the drug term library; training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, obtaining all synonym sets based on synonym mining update according to a preset probability threshold, and updating the drug term library;
The candidate association relation establishing module is used for establishing candidate association relation between external medicine terms in the electronic medical record and the medicine terms in the updated medicine term library based on similarity calculation;
the semantic embedded representation module is used for inputting external medicine terms and pinyin character sequences thereof in the electronic medical record, standard medicine terms and pinyin character sequences thereof in the updated medicine term library, combining initial characters and separation characters to form a related medicine term pair character sequence, and inputting a pre-training language model to obtain semantic embedded representation;
the structure embedding representation module is used for respectively taking semantic embedding representations of external medicine terms and updated medicine terms in the medicine term library in the electronic medical record as initialized node embedding representations of the corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedding representations of the corresponding medicine terms, and taking the product of the node embedding representations of the external medicine terms and the standard medicine terms as the structure embedding representation;
the association prediction module is used for training an association prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record; and predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
Corresponding to the embodiment of the method for standardizing and associating the drug terminology in the electronic medical record, the invention also provides an embodiment of the device for standardizing and associating the drug terminology in the electronic medical record. The device for standardizing and associating the medicine terms in the electronic medical record provided by the embodiment of the invention comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the processors are used for realizing the method for standardizing and associating the medicine terms in the electronic medical record in the embodiment when executing the executable codes.
The embodiment of the invention also provides a computer readable storage medium, wherein a program is stored in the computer readable storage medium, and when the program is executed by a processor, the method for standardizing and associating the drug terms in the electronic medical record in the embodiment is realized.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any external storage device that has data processing capability, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.
Claims (10)
1. The standardized association method for the drug terminology in the electronic medical record is characterized by comprising the following steps of:
s1, inputting a drug term library to obtain a synonym set of each standard drug term;
s2, obtaining a drug term library based on synonym mining update, comprising:
constructing a corpus used for synonym mining, and acquiring a drug term list from the corpus;
training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, and obtaining all synonym sets based on synonym mining update according to a preset probability threshold;
updating the drug term library according to all synonym sets based on synonym mining update;
s3, training a correlation prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record, wherein the method comprises the following steps:
The semantic embedded representation of standard drug term pairs in an external drug term and updated drug term library in the electronic medical record is obtained through a pre-training language model, and specifically comprises the following steps: the external medicine terms and their pinyin character sequences, the standard medicine terms and their pinyin character sequences are combined with the initial characters and the separation characters to form the related medicine term pair character sequences, and the related medicine term pair character sequences are input into a pre-training language model to obtain semantic embedded representations;
obtaining the structure embedded representation of the external medicine term and the standard medicine term pair in the updated medicine term library in the electronic medical record through a graph convolution neural network model, wherein the structure embedded representation specifically comprises the following steps: establishing candidate association relation between the external medicine term and the medicine term in the updated medicine term library based on similarity calculation, respectively taking semantic embedded representations of the external medicine term and the medicine term in the updated medicine term library as initialized node embedded representations of corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedded representations of the corresponding medicine terms, and taking the product of the node embedded representations of the external medicine term and the standard medicine term as a structure embedded representation;
S4, predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
2. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein in the training process of the synonym set classifier, the probability that the drug terms to be categorized belong to the synonym set is predicted based on the change of the set uniformity score, and the method for calculating the set uniformity score is as follows: calculating an embedded representation of each term in the set, inputting the embedded representation into the fully-connected neural network model to obtain a new term representation, summing all the new term representations to obtain an initialized term set representation, and inputting the initialized term set representation into the fully-connected neural network model to obtain a set uniformity score.
3. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the training set generation mode of the synonym set classifier comprises: extracting a drug term from the synonym set in a random extraction mode, and generating a positive training sample by combining a set formed by the rest drug terms in the synonym set; for each positive training sample, matching a plurality of negative training samples, the negative training samples being extracted from a drug term library after the drug terms in the synonym set are excluded.
4. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the updating of the drug term library according to all synonym sets based on synonym mining update is specifically: if the synonym set of the standard drug term as the upper language is updated, establishing synonym association between the corresponding synonym and the standard drug term as the upper language, and simultaneously establishing lower language association between the corresponding synonym and all the standard drug terms associated with the standard drug term as the upper language; if the synonym set of the standard medicine terms of the non-upper languages is updated, corresponding synonyms are associated with the standard medicine terms of the non-upper languages.
5. The method for standardized association of drug terminology in electronic medical records according to claim 1, wherein the pre-training language model is adjusted, specifically: the semantic embedded representation of the initial character is used as an independent variable, and the dependent variable is a label of whether the semantic association of the external medicine term in the electronic medical record and the medicine term in the updated medicine term library is carried out; acquiring a prediction result based on semantic embedded representation by adopting a nonlinear activation function; and (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, acquiring a training set, training a pre-training language model, and optimizing and adjusting semantic embedding representation.
6. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the candidate association relationship is established between the external drug terms and the drug terms in the updated drug term library, specifically: calculating the TF-IDF value of each word in the medicine term, obtaining vector representation of the external medicine term in the electronic medical record and each medicine term in the updated medicine term library, calculating the similarity between the two medicine terms, and if the similarity is larger than a preset similarity threshold, establishing a candidate association relationship between the external medicine term in the electronic medical record and the medicine term in the corresponding medicine term library.
7. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the inputs of each layer in the graph rolling neural network model comprise two parts, the first part is a node embedded representation matrix, the second part is an adjacent matrix, the output of each layer is used as the node embedded representation matrix of the next layer, the output is obtained through normalized graph laplace transformation, and the graph rolling neural network model is optimized by adopting a distance loss function based on a marginal.
8. The method for standardized association of drug terminology in electronic medical records according to claim 7, wherein the values of the adjacency matrix are specifically: if there is an edge from one drug term to another in the updated drug term library, the corresponding value is 1, otherwise the value is 0; if there is an edge from the external drug term in the electronic medical record to the updated drug term in the drug term library, the corresponding value is the similarity value in the candidate association.
9. The standardized association method of drug terms in electronic medical records according to claim 1, wherein semantic embedded representations and structural embedded representations are spliced, the spliced representations are input into a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, and the output of the multi-layer perceptron is converted into scalar quantities through a nonlinear activation function to obtain the association probability of external drug terms in each electronic medical record and standard drug terms in an updated drug term library.
10. A system for standardizing association of drug terminology in electronic medical records implemented based on the method of any one of claims 1-9, comprising:
the electronic medical record drug term input module is used for acquiring all external drug terms to be subjected to drug term standardization in the electronic medical record;
the system comprises a drug term library synonym mining updating module, a database processing module and a database processing module, wherein the drug term library synonym mining updating module is used for constructing a corpus and acquiring a drug term list for synonym mining from the corpus; obtaining a synonym set of each standard drug term in the drug term library; training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, obtaining all synonym sets based on synonym mining update according to a preset probability threshold, and updating the drug term library;
The candidate association relation establishing module is used for establishing candidate association relation between external medicine terms in the electronic medical record and the medicine terms in the updated medicine term library based on similarity calculation;
the semantic embedded representation module is used for inputting external medicine terms and pinyin character sequences thereof in the electronic medical record, standard medicine terms and pinyin character sequences thereof in the updated medicine term library, combining initial characters and separation characters to form a related medicine term pair character sequence, and inputting a pre-training language model to obtain semantic embedded representation;
the structure embedding representation module is used for respectively taking semantic embedding representations of external medicine terms and updated medicine terms in the medicine term library in the electronic medical record as initialized node embedding representations of the corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedding representations of the corresponding medicine terms, and taking the product of the node embedding representations of the external medicine terms and the standard medicine terms as the structure embedding representation;
the association prediction module is used for training an association prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record; and predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310567874.4A CN116312915B (en) | 2023-05-19 | 2023-05-19 | Method and system for standardized association of drug terms in electronic medical records |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310567874.4A CN116312915B (en) | 2023-05-19 | 2023-05-19 | Method and system for standardized association of drug terms in electronic medical records |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116312915A true CN116312915A (en) | 2023-06-23 |
CN116312915B CN116312915B (en) | 2023-09-19 |
Family
ID=86781981
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310567874.4A Active CN116312915B (en) | 2023-05-19 | 2023-05-19 | Method and system for standardized association of drug terms in electronic medical records |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116312915B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118210960A (en) * | 2023-12-13 | 2024-06-18 | 西湖大学 | Construction and use method of natural medicinal material special domain knowledge base |
CN118227776A (en) * | 2024-05-23 | 2024-06-21 | 四川省肿瘤医院 | Disease science popularization method and system based on artificial intelligence |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544383A (en) * | 2013-10-10 | 2014-01-29 | 中国中医科学院 | Standard-term-based fast EMR (electronic medical record) entry system |
US20140149103A1 (en) * | 2010-05-26 | 2014-05-29 | Warren Daniel Child | Modular system and method for managing chinese, japanese, and korean linguistic data in electronic form |
US9436760B1 (en) * | 2016-02-05 | 2016-09-06 | Quid, Inc. | Measuring accuracy of semantic graphs with exogenous datasets |
US20170024461A1 (en) * | 2015-07-23 | 2017-01-26 | International Business Machines Corporation | Context sensitive query expansion |
CN106383853A (en) * | 2016-08-30 | 2017-02-08 | 刘勇 | Realization method and system for electronic medical record post-structuring and auxiliary diagnosis |
CN106897568A (en) * | 2017-02-28 | 2017-06-27 | 北京大数医达科技有限公司 | The treating method and apparatus of case history structuring |
CN107665190A (en) * | 2017-09-29 | 2018-02-06 | 李晓妮 | A kind of method for automatically constructing and device of text proofreading mistake dictionary |
CN111460175A (en) * | 2020-04-08 | 2020-07-28 | 福州数据技术研究院有限公司 | SNOMED-CT-based medical noun dictionary construction and expansion method |
KR20200097949A (en) * | 2019-02-11 | 2020-08-20 | 네이버 주식회사 | Method and system for extracting synonym by using keyword relation structure |
CN111986759A (en) * | 2020-08-31 | 2020-11-24 | 平安医疗健康管理股份有限公司 | Method and system for analyzing electronic medical record, computer equipment and readable storage medium |
CN113657109A (en) * | 2021-08-31 | 2021-11-16 | 平安医疗健康管理股份有限公司 | Method, apparatus and computer device for standardization of model-based clinical terminology |
CN114091425A (en) * | 2021-11-25 | 2022-02-25 | 北京富通东方科技有限公司 | Medical entity alignment method and device |
US20220108188A1 (en) * | 2020-10-01 | 2022-04-07 | International Business Machines Corporation | Querying knowledge graphs with sub-graph matching networks |
CN114417809A (en) * | 2021-12-27 | 2022-04-29 | 北京滴普科技有限公司 | Entity alignment method based on combination of graph structure information and text semantic model |
WO2022088672A1 (en) * | 2020-10-29 | 2022-05-05 | 平安科技(深圳)有限公司 | Machine reading comprehension method and apparatus based on bert, and device and storage medium |
CN114444501A (en) * | 2022-01-24 | 2022-05-06 | 荃豆数字科技有限公司 | Method and device for searching traditional Chinese medicine decoction pieces, electronic equipment and storage medium |
CN115374792A (en) * | 2022-09-14 | 2022-11-22 | 山东省计算中心(国家超级计算济南中心) | Policy text labeling method and system combining pre-training and graph neural network |
WO2023065858A1 (en) * | 2021-10-19 | 2023-04-27 | 之江实验室 | Medical term standardization system and method based on heterogeneous graph neural network |
-
2023
- 2023-05-19 CN CN202310567874.4A patent/CN116312915B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140149103A1 (en) * | 2010-05-26 | 2014-05-29 | Warren Daniel Child | Modular system and method for managing chinese, japanese, and korean linguistic data in electronic form |
CN103544383A (en) * | 2013-10-10 | 2014-01-29 | 中国中医科学院 | Standard-term-based fast EMR (electronic medical record) entry system |
US20170024461A1 (en) * | 2015-07-23 | 2017-01-26 | International Business Machines Corporation | Context sensitive query expansion |
US9436760B1 (en) * | 2016-02-05 | 2016-09-06 | Quid, Inc. | Measuring accuracy of semantic graphs with exogenous datasets |
CN106383853A (en) * | 2016-08-30 | 2017-02-08 | 刘勇 | Realization method and system for electronic medical record post-structuring and auxiliary diagnosis |
CN106897568A (en) * | 2017-02-28 | 2017-06-27 | 北京大数医达科技有限公司 | The treating method and apparatus of case history structuring |
CN107665190A (en) * | 2017-09-29 | 2018-02-06 | 李晓妮 | A kind of method for automatically constructing and device of text proofreading mistake dictionary |
KR20200097949A (en) * | 2019-02-11 | 2020-08-20 | 네이버 주식회사 | Method and system for extracting synonym by using keyword relation structure |
CN111460175A (en) * | 2020-04-08 | 2020-07-28 | 福州数据技术研究院有限公司 | SNOMED-CT-based medical noun dictionary construction and expansion method |
CN111986759A (en) * | 2020-08-31 | 2020-11-24 | 平安医疗健康管理股份有限公司 | Method and system for analyzing electronic medical record, computer equipment and readable storage medium |
US20220108188A1 (en) * | 2020-10-01 | 2022-04-07 | International Business Machines Corporation | Querying knowledge graphs with sub-graph matching networks |
WO2022088672A1 (en) * | 2020-10-29 | 2022-05-05 | 平安科技(深圳)有限公司 | Machine reading comprehension method and apparatus based on bert, and device and storage medium |
CN113657109A (en) * | 2021-08-31 | 2021-11-16 | 平安医疗健康管理股份有限公司 | Method, apparatus and computer device for standardization of model-based clinical terminology |
WO2023065858A1 (en) * | 2021-10-19 | 2023-04-27 | 之江实验室 | Medical term standardization system and method based on heterogeneous graph neural network |
CN114091425A (en) * | 2021-11-25 | 2022-02-25 | 北京富通东方科技有限公司 | Medical entity alignment method and device |
CN114417809A (en) * | 2021-12-27 | 2022-04-29 | 北京滴普科技有限公司 | Entity alignment method based on combination of graph structure information and text semantic model |
CN114444501A (en) * | 2022-01-24 | 2022-05-06 | 荃豆数字科技有限公司 | Method and device for searching traditional Chinese medicine decoction pieces, electronic equipment and storage medium |
CN115374792A (en) * | 2022-09-14 | 2022-11-22 | 山东省计算中心(国家超级计算济南中心) | Policy text labeling method and system combining pre-training and graph neural network |
Non-Patent Citations (3)
Title |
---|
SHWETA TANEJA 等: "A Text Preprocessing Approach for Efficacious Information Retrieval", 《SMART INNOVATIONS IN COMMUNICATION AND COMPUTATIONAL SCIENCES》, vol. 669, pages 13 * |
张健;冯飞;刘宇;马红烨;: "基于本体概念相似度的网页排序算法研究", 情报学报, no. 11, pages 56 - 65 * |
赵蒙月: "基于语料库对比的英语母语者有标转折复句习得研究", 《中国优秀硕士学位论文全文数据库 哲学与人文科学辑》, no. 11, pages 084 - 699 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118210960A (en) * | 2023-12-13 | 2024-06-18 | 西湖大学 | Construction and use method of natural medicinal material special domain knowledge base |
CN118227776A (en) * | 2024-05-23 | 2024-06-21 | 四川省肿瘤医院 | Disease science popularization method and system based on artificial intelligence |
CN118227776B (en) * | 2024-05-23 | 2024-07-23 | 四川省肿瘤医院 | Disease science popularization method and system based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN116312915B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112214995B (en) | Hierarchical multitasking term embedded learning for synonym prediction | |
CN111382272B (en) | Electronic medical record ICD automatic coding method based on knowledge graph | |
CN110210037B (en) | Syndrome-oriented medical field category detection method | |
CN111540468B (en) | ICD automatic coding method and system for visualizing diagnostic reasons | |
CN106776711B (en) | Chinese medical knowledge map construction method based on deep learning | |
CN111834014A (en) | Medical field named entity identification method and system | |
CN116312915B (en) | Method and system for standardized association of drug terms in electronic medical records | |
CN111274790B (en) | Chapter-level event embedding method and device based on syntactic dependency graph | |
CN111858940B (en) | Multi-head attention-based legal case similarity calculation method and system | |
CN112232065B (en) | Method and device for mining synonyms | |
CN111950283B (en) | Chinese word segmentation and named entity recognition system for large-scale medical text mining | |
WO2017193685A1 (en) | Method and device for data processing in social network | |
CN113378970B (en) | Sentence similarity detection method and device, electronic equipment and storage medium | |
CN111881292B (en) | Text classification method and device | |
CN113707299A (en) | Auxiliary diagnosis method and device based on inquiry session and computer equipment | |
CN115293161A (en) | Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph | |
CN113707339A (en) | Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases | |
CN111582506A (en) | Multi-label learning method based on global and local label relation | |
CN113657105A (en) | Medical entity extraction method, device, equipment and medium based on vocabulary enhancement | |
CN115858886B (en) | Data processing method, device, equipment and readable storage medium | |
CN112735584A (en) | Malignant tumor diagnosis and treatment auxiliary decision generation method and device | |
CN118468061B (en) | Automatic algorithm matching and parameter optimizing method and system | |
CN111597330A (en) | Intelligent expert recommendation-oriented user image drawing method based on support vector machine | |
CN112861538A (en) | Entity linking method based on context semantic relation and document consistency constraint | |
CN111782818A (en) | Device, method and system for constructing biomedical knowledge graph and memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |