CN116312915A

CN116312915A - Method and system for standardized association of drug terms in electronic medical records

Info

Publication number: CN116312915A
Application number: CN202310567874.4A
Authority: CN
Inventors: 李劲松; 马爽; 杨宗峰; 王昱
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-06-23
Anticipated expiration: 2043-05-19
Also published as: CN116312915B

Abstract

The invention discloses a method and a system for standardized association of drug terms in electronic medical records, wherein a drug term library is updated through a synonym mining technology to obtain a drug term library based on synonym mining update, so that the problem of low semantic similarity between standard drug terms in the drug term library and external drug terms in the electronic medical records is solved; when the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library based on synonym mining update, the semantic information is utilized to add pinyin character sequences of corresponding medicine terms except Chinese character marks, and the diagram structure information of the external medicine terms in the medicine term library and the electronic medical record is fully utilized; an association prediction model based on semantic embedding and structural embedding is constructed, so that the association between external medicine terms and standard medicine terms in a medicine term library in the real-world electronic medical record is accurately established.

Description

Method and system for standardized association of drug terms in electronic medical records

Technical Field

The invention belongs to the technical field of medical information, and particularly relates to a method and a system for standardized association of drug terminology in an electronic medical record.

Background

With the development of information technology and its continuous deep application in the medical health industry, a large amount of data is stored in the medical health industry. Among these, it is typical to include a Knowledge Base (KB) presented in a relatively standardized form and electronic health medical records (Electronic health records, EHRs) presented in the form of real world medical procedure data. Knowledge bases are techniques used by computer systems to store complex structured and unstructured information, where a Term Base (TB) is a special type of knowledge base used to store term concepts and their related information, and in the field of drug research, general drug term bases that have been built and are still being updated include drug banks and whodrugs, etc., and in academia and industry, there is also a need for a chinese drug term base that is built. However, since in the practice of real-world clinical practice, different areas, even different hospitals, different doctors, may use a variety of different names to represent the same drug, the existing drug terminology library does not have to record all the names of the drugs. For example, a drug with drug id=db 00736, the english name of drug is "Esomeprazole magnesium", the chinese-zehng name is "esomeprazole magnesium", the current name is "esomeprazole magnesium", and in the electronic health medical record system, the drug may be recorded as "esomeprazole magnesium" before the modification of the drug common name, and the drug may be recorded as "esomeprazole magnesium" after the modification of the drug common name, when the electronic health medical record data is used for developing the real world drug study, if any one of the names is missed, the data will be not fully retrieved, thereby resulting in unreasonable screening of the study population, misreckoning of the drug condition and ultimately affecting the study quality. Therefore, when using EHRs data to conduct real-world drug research, especially multi-center, real-world drug research involving multiple drugs, it is necessary to correlate drug names in EHRs with corresponding drugs in a drug terminology library, which is also an important precondition for ensuring the quality of the research and the reliability of the results. The medicine term library is used as important information in the medical research and engineering fields, timely updating of the medicine term library is the basis for promoting information communication and even technical progress in the field, and the medicine term library is associated with real-world electronic health medical record data, so that the medicine term library can provide bottom support for research and engineering tasks in aspects of natural language processing, artificial intelligence, expert system, real-world medicine research and the like based on EHRs and has promotion and promotion effects.

In the existing medicine association method, a medical standard term management system and method (publication number is CN 115080751A) based on a general model relate to mapping of medical record texts and standard terms, firstly, text subdivision attributes are obtained by splitting the medical record texts based on a sequence labeling model, then similarity between the medical record texts and any semantic standard word is calculated, validity of standardized mapping is judged through the semantic similarity, if the standardized mapping is valid, the standardized mapping is directly used as a mapping result, if the standardized mapping is invalid, other possible standardized mapping is recalculated, and finally, the standardized mapping result is used as a mapping result recommended by an algorithm and needing manual examination. However, the technical scheme only uses semantic similarity to judge the validity of the mapping, and ignores the structural characteristics of the drug term library.

A method and a device for matching medicine names (publication number is CN 112711642A) relate to medicine matching among different electronic medical records, word vectors of an electronic medical record corpus are obtained through electronic medical record data training, medicine names are extracted based on a unified medical language system, medicine entity word vectors are obtained, a neural network model is adopted to obtain component vectors, meanwhile, engineering characteristics are combined, similarity among medicine entities is calculated, and finally medicine matching among different electronic medical record systems is achieved. According to the technical scheme, under the condition that the unified medical language system is perfect, the problem of medicine matching among different electronic health medical record data is solved, and the problem of matching medicine terms in the electronic health medical record into a medicine term library, which is to be solved by the invention, can be referred to with limited value.

The drug information matching method and system (publication number 107103048B) relate to matching among drugs, firstly, sub-information of multiple dimensions of the drugs to be matched, such as drug names, preparation specifications, dosage forms and the like, are obtained, association degree identification is carried out on target sub-information and standard sub-information, and when an association degree identification result meets preset association requirements, the target information meeting the preset association requirements and one or more standard information are respectively configured into one or more candidate information pairs; and calculating the similarity of the target information and the standard information on the sub-information of the multiple dimensions for each candidate information pair, calculating the comprehensive matching score of each candidate information pair based on the calculated similarity, and finally determining the standard information of the candidate information pair corresponding to the maximum comprehensive matching score as the matching information of the target information. Medical drug matching methods, devices, electronic equipment and storage media (publication number CN111798969 a) relate to matching between a target drug and a drug standard library, and the method of the application comprises: for the target medicine to be matched, selecting a plurality of medicine identifications or specifications for representing the target medicine from medicine information as reference items, assigning a weight value to each reference item according to importance, matching the reference item with standard items in a medicine reference library, calculating a comparison value, and calculating the matching degree of the target medicine and the medicine in the medicine reference library according to the comparison value and the weight value, so as to establish a mapping relation between the medicine identifications of the target medicine and the standard identifications of the target medicine stored in the medicine reference library. The two technical schemes solve the problem of matching the target medicine with the medicine standard library, compared with medicines, the medicines contain more sub-information and also contain multi-dimensional information such as preparation specification, dosage form, manufacturer, approval document and the like besides medicine names, but the method is not applicable because the problem to be solved by the invention is related to medicines and the available text information is limited.

The limitations of the prior art are mainly reflected in: only semantic similarity is utilized in the association process, and the graph structure information is not utilized; in the association, the semantic similarity does not use pinyin information, and because the medicine names may have different words and the same pronunciation, if the semantic similarity of the Chinese names is simply used, for example, "cefradine" and "cefradine" may be calculated to have similar similarity, but from the pinyin, it is obvious that "cefradine" and "cefradine" are the same medicine, so that the association result is inaccurate.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a standardized association method and a standardized association system for drug terms in electronic medical records, which realize the association between external drug terms and standard drug terms in a drug term library in the electronic medical records.

The invention aims at realizing the following technical scheme:

according to a first aspect of the present specification, there is provided a method for standardized association of drug terms in an electronic medical record, including:

s1, inputting a drug term library to obtain a synonym set of each standard drug term;

s2, obtaining a drug term library based on synonym mining update, comprising:

Constructing a corpus used for synonym mining, and acquiring a drug term list from the corpus;

training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, and obtaining all synonym sets based on synonym mining update according to a preset probability threshold;

updating the drug term library according to all synonym sets based on synonym mining update;

s3, training a correlation prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record, wherein the method comprises the following steps:

the semantic embedded representation of standard drug term pairs in an external drug term and updated drug term library in the electronic medical record is obtained through a pre-training language model, and specifically comprises the following steps: the external medicine terms and their pinyin character sequences, the standard medicine terms and their pinyin character sequences are combined with the initial characters and the separation characters to form the related medicine term pair character sequences, and the related medicine term pair character sequences are input into a pre-training language model to obtain semantic embedded representations;

obtaining the structure embedded representation of the external medicine term and the standard medicine term pair in the updated medicine term library in the electronic medical record through a graph convolution neural network model, wherein the structure embedded representation specifically comprises the following steps: establishing candidate association relation between the external medicine term and the medicine term in the updated medicine term library based on similarity calculation, respectively taking semantic embedded representations of the external medicine term and the medicine term in the updated medicine term library as initialized node embedded representations of corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedded representations of the corresponding medicine terms, and taking the product of the node embedded representations of the external medicine term and the standard medicine term as a structure embedded representation;

S4, predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.

Further, in the training process of the synonym set classifier, the probability that the drug term to be classified belongs to the synonym set is predicted based on the change of the set uniformity score, and the calculation method of the set uniformity score is as follows: calculating an embedded representation of each term in the set, inputting the embedded representation into the fully-connected neural network model to obtain a new term representation, summing all the new term representations to obtain an initialized term set representation, and inputting the initialized term set representation into the fully-connected neural network model to obtain a set uniformity score.

Further, the training set generation mode of the synonym set classifier comprises the following steps: extracting a drug term from the synonym set in a random extraction mode, and generating a positive training sample by combining a set formed by the rest drug terms in the synonym set; for each positive training sample, matching a plurality of negative training samples, the negative training samples being extracted from a drug term library after the drug terms in the synonym set are excluded.

Further, updating the drug term library according to all synonym sets based on synonym mining update, specifically: if the synonym set of the standard drug term as the upper language is updated, establishing synonym association between the corresponding synonym and the standard drug term as the upper language, and simultaneously establishing lower language association between the corresponding synonym and all the standard drug terms associated with the standard drug term as the upper language; if the synonym set of the standard medicine terms of the non-upper languages is updated, corresponding synonyms are associated with the standard medicine terms of the non-upper languages.

Further, the pre-training language model is adjusted, specifically: the semantic embedded representation of the initial character is used as an independent variable, and the dependent variable is a label of whether the semantic association of the external medicine term in the electronic medical record and the medicine term in the updated medicine term library is carried out; acquiring a prediction result based on semantic embedded representation by adopting a nonlinear activation function; and (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, acquiring a training set, training a pre-training language model, and optimizing and adjusting semantic embedding representation.

Further, the candidate association relationship is established between the external drug term and the drug term in the updated drug term library, specifically: calculating the TF-IDF value of each word in the medicine term, obtaining vector representation of the external medicine term in the electronic medical record and each medicine term in the updated medicine term library, calculating the similarity between the two medicine terms, and if the similarity is larger than a preset similarity threshold, establishing a candidate association relationship between the external medicine term in the electronic medical record and the medicine term in the corresponding medicine term library.

Further, the input of each layer in the graph rolling neural network model comprises two parts, the first part is a node embedded representation matrix, the second part is an adjacent matrix, the output of each layer is used as a node embedded representation matrix of the next layer, the graph rolling neural network model is obtained through normalized graph Laplace transformation, and the graph rolling neural network model is optimized by adopting a distance loss function based on a marginal.

Further, the value of the adjacency matrix specifically includes: if there is an edge from one drug term to another in the updated drug term library, the corresponding value is 1, otherwise the value is 0; if there is an edge from the external drug term in the electronic medical record to the updated drug term in the drug term library, the corresponding value is the similarity value in the candidate association.

Further, the semantic embedded representation and the structural embedded representation are spliced, the spliced representation is input into a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, the output of the multi-layer perceptron is converted into a scalar through a nonlinear activation function, and the association probability of external medicine terms in each electronic medical record and standard medicine terms in the updated medicine term library is obtained.

According to a second aspect of the present specification, there is provided a system for standardizing association of drug terms in an electronic medical record, comprising:

the electronic medical record drug term input module is used for acquiring all external drug terms to be subjected to drug term standardization in the electronic medical record;

the system comprises a drug term library synonym mining updating module, a database processing module and a database processing module, wherein the drug term library synonym mining updating module is used for constructing a corpus and acquiring a drug term list for synonym mining from the corpus; obtaining a synonym set of each standard drug term in the drug term library; training a synonym set classifier to obtain a classification prediction result of each drug term in the drug term list and the synonym set in the drug term library, obtaining all synonym sets based on synonym mining update according to a preset probability threshold, and updating the drug term library;

The candidate association relation establishing module is used for establishing candidate association relation between external medicine terms in the electronic medical record and the medicine terms in the updated medicine term library based on similarity calculation;

the semantic embedded representation module is used for inputting external medicine terms and pinyin character sequences thereof in the electronic medical record, standard medicine terms and pinyin character sequences thereof in the updated medicine term library, combining initial characters and separation characters to form a related medicine term pair character sequence, and inputting a pre-training language model to obtain semantic embedded representation;

the structure embedding representation module is used for respectively taking semantic embedding representations of external medicine terms and updated medicine terms in the medicine term library in the electronic medical record as initialized node embedding representations of the corresponding medicine terms, inputting a graph convolutional neural network model to obtain node embedding representations of the corresponding medicine terms, and taking the product of the node embedding representations of the external medicine terms and the standard medicine terms as the structure embedding representation;

the association prediction module is used for training an association prediction model based on semantic embedding and structural embedding according to the updated drug term library and external drug terms in the electronic medical record; and predicting and obtaining a correlation result of the external drug term and the standard drug term in the drug term library in the electronic medical record by using the correlation prediction model.

The beneficial effects of the invention are as follows: the invention enriches semantic information and graph structure information through synonym mining technology; in the prediction of the associated prediction model, semantic information and graph structure information are simultaneously used; the concrete steps are as follows:

1) Updating the drug term library by a synonym mining technology to obtain a drug term library based on synonym mining updating, so that the problem of low semantic similarity between standard drug terms in the drug term library and external drug terms in the electronic medical record is solved;

2) When external medicine terms in the electronic medical record are associated with standard medicine terms in a medicine term library which is mined and updated based on synonyms, the semantic information is utilized to add pinyin marks of corresponding terms besides Chinese character marks;

3) When the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library which is mined and updated based on synonyms, the diagram structure information of the medicine term library is fully utilized;

4) When the external medicine terms in the electronic medical record are associated with standard medicine terms in the medicine term library based on synonym mining update, the graph structure information of the external medicine terms in the electronic medical record is obtained by associating the external medicine terms in the electronic medical record with the medicine terms in the medicine term library;

5) Through the method, embedded representation information of external drug terms and standard drug term pairs in a drug term library based on synonym mining update in the electronic medical record is finally obtained, and prediction of the association prediction model is carried out.

Drawings

FIG. 1 is a flowchart illustrating overall steps of a method for standardized association of drug terminology in an electronic medical record according to an exemplary embodiment;

FIG. 2 is a flowchart of a method for standardized association of drug terminology in electronic medical records provided in an exemplary embodiment;

FIG. 3 is a schematic diagram of a library of raw drug terms provided by an exemplary embodiment;

FIG. 4 is a diagram of a drug terminology library based on synonym mining updates, as provided by an example embodiment;

FIG. 5 is a block diagram of a system for standardizing association of drug terminology in electronic medical records provided in an exemplary embodiment.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

As shown in fig. 1 and fig. 2, the method for standardized association of drug terms in electronic medical records provided by the embodiment of the invention includes the following steps:

step S1: inputting a drug term library, the drug term library being expressed as

Wherein E represents a collection of drug terms, including standard drug terms as upper or non-upper terms, synonyms for standard drug termsR represents a set of relationships between drug terms, in particular, < >>

Wherein->

Representing the lower order relation->

Representing synonym relationships, relationships between drug terms may be expressed as

The term h represents a synonym for t, or

The lower phrase representing the drug term h is t; the relation in the drug term library is +.>

The drug terms of (2) are converted into synonym sets, and the synonym sets of each standard drug term are obtained;

FIG. 3 is an example of a library of drug terms, wherein standard drug terms include esomeprazole, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate, esomeprazole magnesium trihydrate, gadofacic, gadofemetic acid meglumine, latamoxef, and latamoxef sodium; the lower language of esomeprazole includes esomeprazole strontium, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate and esomeprazole magnesium trihydrate, and the relationship can be expressed as

The method comprises the steps of carrying out a first treatment on the surface of the Synonyms for Laxef include Laxef, lower phrases include Laxef sodium; the lower language of gadopentetic acid includes gadopentetic acid meglumine, the synonyms of gadopentetic acid meglumine include gadopentetic acid meglumine and gadopentetic acid meglumine, and the relationship of gadopentetic acid meglumine and gadopentetic acid meglumine can be expressedIs that

The method comprises the steps of carrying out a first treatment on the surface of the Standard pharmaceutical terminology is the target association object of the present embodiment.

Step S2: aiming at the problem of incomplete synonym relation caused by the problems of irregular translation, inaccurate manual labeling, untimely data updating and the like of an original drug term library, adopting a synonym mining method to perfect the synonym relation of the drug term library, and obtaining the drug term library based on synonym mining updating; the method specifically comprises the following substeps:

step S21: obtaining Chinese abstract and text of medicine related Chinese document from Chinese knowledge network, mastership and other document retrieval platforms to form synonym mining corpus

And acquiring the drug terms for synonym mining by using a named entity recognition method, expressed as a drug term list +.>

Wherein->

Represents the i-th drug term,/->

Representation->

The number of drug terms, named entity recognition method adopted in this embodiment is a conditional random field model;

All synonym sets in the drug term library are noted as

Wherein->

Synonym set for j-th standard drug term,/->

Representation->

Number of synonym sets in->

Equal to the number of all standard drug terms in a drug term library, if a standard drug term has no synonyms, its corresponding set of synonyms contains only 1 element, i.e., the standard drug term.

Step S22: training a synonym set classifier to obtain a drug term list

Classification prediction results of synonym sets in the drug term library and each drug term in the drug term library; the method specifically comprises the following substeps:

step S221: representing a synonym set classifier as

Wherein->

Represents a synonym set,/->

Representing a term of the drug to be categorized into a synonym set;

step S222: predicting drug terms to be categorized based on aggregate uniformity score variation

Belongs to synonym set->

The formula can be expressed as +.>

Where Pr represents the probability that the probability,

activating a function for sigmoid->

A score function for aggregate unity;

step S223: for a group of data

Wherein->

Representing synonym set ++>

And to be categorized into synonym sets->

In the medicine term->

，/>

Indicating label->

Representation->

，/>

Representation->

，/>

，/>

The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment, the synonym set classifier uses a fully connected neural network model, and the loss function of the synonym set classifier uses a logarithmic loss function, which is specifically expressed in the following form:

。

In one embodiment, the aggregate uniformity score

Estimated by:

first for a term set

Is->

Calculating its embedded representation using text embedding method>

As an initialization parameter of the embedding layer, the embedding layer is input, and the text embedding method used in this embodiment is Word2Vec; then the embedded representation is input into the fully connected neural network model to obtain a new term representation of the corresponding term +.>

The term set->

All new terms corresponding to ++>

Taking the mean value after addition to ensure the substitution invariance, obtaining the initialized term set representation, namely +.>

The method comprises the steps of carrying out a first treatment on the surface of the Finally, the above term set is denoted +.>

Inputting the full-connection neural network model to obtain a final term set +.>

Uniformity score->

For measuring the aggregate->

The degree of similarity of all terms herein.

In one embodiment, the training set is generated by:

first, the relation in the drug term library is as follows

Is converted into a set of synonyms, all the sets of synonyms in the drug term library are denoted +.>

The method comprises the steps of carrying out a first treatment on the surface of the Each synonym set is denoted +.>

Wherein->

Representing each drug term in the set of synonyms; extracting any one of the drug terms from the set ES by random extraction >

The remaining terms in the set ES constitute the set

Obtaining a positive sample for model training +.>

Label y=1; for each positive sample, K negative samples were matched, denoted +.>

The label y=0, in this example K takes 5, where +.>

Can be obtained by extracting from a drug term library after the drug terms in the synonym set ES are removed, specifically, can be obtained by mixing samples extracted in the following two ways according to a set proportion: (1) extracted by means of completely random samplingTo (3) the point; (2) limiting the sampling range to the AND set +.>

The medicine terms in the medicine are extracted by adopting a random sampling mode, wherein the medicine terms contain the medicine terms with the same characters; in this example, the ratio was set to 2:3.

Step S224: model training is carried out by utilizing a training set, and a medicine term list is predicted based on the trained model

Probability that each drug term in a database of drug terms belongs to each synonym set;

specifically, before synonym mining is performed, all synonym sets in the drug term library are represented as

The method comprises the steps of carrying out a first treatment on the surface of the For->

Is of the term->

Calculate it to be->

Arbitrary synonym set ++>

Is expressed as +.>

Taking the maximum probability as the drug term +. >

Setting a probability threshold +.>

If the probability maximum is greater than the probability threshold +.>

The corresponding synonym set will be updated if the probability maximum is less than or equal to the probability threshold +.>

The above medicine term->

Put back to +.>

The method comprises the steps of carrying out a first treatment on the surface of the Starting the next cycle until

The probability that all the drug terms belong to any synonym set is smaller than or equal to the probability threshold +.>

Finally, all synonym sets based on synonym mining update are obtained, expressed as +.>

The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment ∈>

。

Step S23: updating the drug term library according to all synonym sets based on synonym mining update to obtain a drug term library based on synonym mining update;

specifically, if there is an update to the synonym set that is the standard drug term in the upper-level language, the corresponding synonym is associated with the standard drug term in the upper-level language

In addition, corresponding synonyms are simultaneously associated with all non-upper-level standard drug terms associated with the standard drug terms as upper-level terms to establish lower-level associations, and the relationship is expressed as

. Standard drugs in non-upper languagesThe synonym set of the terms is updated, corresponding synonyms and the standard drug terms of the non-upper language are established to form synonym association, and the relationship is expressed as

. In the example of FIG. 4, updated synonym sets are mined based on synonyms: [ { Esomeprazole, esomeprazole }, { Esomeprazole sodium, esomeprazole sodium }, …]The synonym esomeprazole is associated with the standard drug term esomeprazole as the upper-level language, and the following standard drug terms are associated with the standard drug term esomeprazole as the upper-level language: esomeprazole strontium, esomeprazole strontium hydrate, esomeprazole sodium, esomeprazole magnesium dihydrate, esomeprazole magnesium trihydrate; establishing synonym association between esomeprazole sodium and esomeprazole sodium; and finally obtaining the updated drug term library based on synonym mining.

Step S3: training a related prediction model based on semantic embedding and structural embedding according to the medicine term library based on synonym mining updating and external medicine terms in the real-world electronic medical record data; the method specifically comprises the following substeps:

step S31: acquiring semantic embedded representations of external drug terms in the electronic medical record and standard drug term pairs in a drug term library based on synonym mining update through a pre-training language model, wherein the pre-training language model adopts a BERT model in the embodiment;

In particular, the set of external drug terms in the real-world electronic medical record is represented as

Standard drug term set in the updated drug term library based on synonym mining is denoted +.>

The external medicine term set G in the real world electronic medical record is arbitrary external medicine term +.>

The Pinyin character sequence is expressed as

Mining any standard drug term in the standard drug term set E in the updated drug term library based on synonyms +.>

Is expressed as +.>

Will->

Incorporating start character [ CLS ]]And separating characters [ SEP ]]The character sequence of the related medicine term pair is marked as +.>

To->

Is 'Esomeprazole sodium'>

For the example of "sodium esomeprazole", the related drug term pair character sequence can be expressed as { [ CLS ]][ moxa ]][ sauce ]][ ao][ Mei ]][ Lala ]][ azoles ]][ sodium ]][ai][si][ao][mei][la][zuo][na][SEP][ Angstrom ]][ rope ]][ Mei ]][ Lala ]][ azoles ]][ sodium ]] [ai][suo] [mei][la][zuo][na] [SEP]-a }; in the embodiment, a BERT model pre-trained based on a Chinese corpus is adopted, semantic embedded representation of the related drug terms on a character sequence is obtained through a plurality of bi-directional coding layers of a Transformer, and finally a starting character [ CLS ] is utilized]Is represented by a semantic embedded representation of +.>

And->

Is a relationship of (a) and (b).

In this embodiment, the BERT model is adjusted, specifically: will start character Is expressed as an argument and is noted as

The dependent variable is the external drug term +.>

And mining standard drug terms in the updated drug term library based on synonyms +.>

Label of semantic association or not->

If a semantic association exists

Otherwise->

The method comprises the steps of carrying out a first treatment on the surface of the By means of a nonlinear activation function>

Obtaining a prediction result based on the BERT semantic embedded representation +.>

In this embodiment, a sigmoid activation function is used, expressed as

The loss function uses a two-class cross entropy loss function, expressed as

The method comprises the steps of carrying out a first treatment on the surface of the And (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, obtaining a training set, performing BERT model training, and optimally adjusting BERT semantic embedded representation.

Step S32: obtaining external drug terms in the electronic medical record and structural embedded representations of standard drug term pairs in a drug term library based on synonym mining update through a graph convolution neural network model;

specifically, candidate association relations are established between external drug terms in the real-world electronic medical record and drug terms in a drug term library based on synonym mining update: calculating TF-IDF value of each word in each medicine term, further obtaining vector representation of external medicine term in electronic medical record and each medicine term in medicine term library based on synonym mining update, calculating similarity before using cosine similarity, setting similarity threshold, if the similarity is larger than the similarity threshold, establishing candidate association relationship between external medicine term in electronic medical record and medicine term in corresponding medicine term library, and representing as

As shown, for example, in fig. 4, esomeprazole establishes a candidate association with esomeprazole magnesium, which establishes a candidate association with esomeprazole magnesium.

Converting external drug terms in electronic medical records into sequences

Converting drug terms in a drug term library based on synonym mining update into the sequence +.>

The BERT model trained in the step S31 is utilized to calculate the semantic embedded representation of the sequence, and the initial character [ CLS ] is obtained]The corresponding semantic embedded representation is an initialized node embedded representation of the corresponding medication term;

embedding the initialization node into a convolutional neural network model representing an input graph, specifically, the convolutional neural network model comprises an L layer, in this embodiment, l=10, wherein the input of the first layer comprises two parts, and the first part is

Node embedded representation matrix of a dimension

Wherein n represents the number of nodes, which isSum of external drug terms and drug terms in drug term library based on synonym mining update in electronic medical record, ++>

The node representing the first layer embeds the representation dimension, the second part is +>

The adjacent matrix A of the dimension, the output of the first layer is used as the node embedded representation matrix of the first layer +1, and the node embedded representation matrix is obtained through normalized graph Laplace transformation, and the formula is as follows:

Wherein the method comprises the steps of

For a nonlinear activation function, a sigmoid activation function can be used, < >>

I is an identity matrix>

For diagonal matrix, the value of the element on the diagonal is +.>

，/>

A weight matrix of the first layer;

the above-mentioned adjacency matrix a takes the value, in particular, if there is a drug term from the drug term library updated based on synonym mining

To->

For (1), then->

The value is 1, otherwise, the value is 0; if there is a term of external medicine from electronic medical record +.>

To the drug terminology library based on synonym mining update +.>

For (1), then->

The value is the value of the similarity of the candidate association relations;

the output of the final layer L graph roll-up neural network model is used as node embedded representation of the drug terms in the drug term library based on synonym mining update and the external drug terms in the electronic medical record, and standard drug terms in each drug term library based on synonym mining update are obtained from the node embedded representation

And the external drug term +_ in each electronic medical record>

Is represented by the product of the node embedded representations of the two>

And->

The structural embedding representation of the association, noted +.>

；

Adopting a marginal-based distance loss function optimization graph convolution neural network model, wherein a loss function formula is as follows:

Wherein the method comprises the steps of

Standard drug term +_representing drug term library based on synonym mining update>

And the external drug term +_ in each electronic medical record>

Distance function of the structure embedded representation of (2)>

For indicating the super parameter of the marginal value distinguishing positive and negative samples, ++>

Respectively representing positive and negative sample sets; the distance function of the structure embedded representation used in this embodiment is Euclidean distance, i.e.>

In this embodiment, get +.>

。

Step S33: embedding the semantic embedded representation output in step S31

And the structure embedded representation outputted in step S32 +.>

Spliced together, denoted->

The representation is used as the input of a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, and the output layer is represented as +.>

The method comprises the steps of carrying out a first treatment on the surface of the The output vector of the multi-layer perceptron is treated by a nonlinear activation function +.>

Is converted into scalar quantity, finally obtaining the external medicine term +.>

Standard drug term +.2 in drug term library updated based on synonym mining>

Associated probabilities of (a)

As an output, a sigmoid activation function is employed in this embodiment, denoted as

The method comprises the steps of carrying out a first treatment on the surface of the The loss function is represented as a two-class cross entropy loss function using the same as the BERT model

Wherein->

Is the external medicine term +. >

Standard drug term +.2 in drug term library updated based on synonym mining>

Tag of association or not, if->

And->

Associative presence +.>

Otherwise->

。

Step S4: and predicting and obtaining a correlation result of the external medicine terms in the electronic medical record and standard medicine terms in the medicine term library by using a correlation prediction model, and establishing the correlation of the external medicine terms in the real-world electronic medical record and the standard medicine terms in the medicine term library.

As shown in fig. 5, the present invention further provides an embodiment of a system for standardized association of drug terms in electronic medical records implemented based on the above method, where the system includes:

Corresponding to the embodiment of the method for standardizing and associating the drug terminology in the electronic medical record, the invention also provides an embodiment of the device for standardizing and associating the drug terminology in the electronic medical record. The device for standardizing and associating the medicine terms in the electronic medical record provided by the embodiment of the invention comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the processors are used for realizing the method for standardizing and associating the medicine terms in the electronic medical record in the embodiment when executing the executable codes.

The embodiment of the invention also provides a computer readable storage medium, wherein a program is stored in the computer readable storage medium, and when the program is executed by a processor, the method for standardizing and associating the drug terms in the electronic medical record in the embodiment is realized.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any external storage device that has data processing capability, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims

1. The standardized association method for the drug terminology in the electronic medical record is characterized by comprising the following steps of:

s2, obtaining a drug term library based on synonym mining update, comprising:

2. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein in the training process of the synonym set classifier, the probability that the drug terms to be categorized belong to the synonym set is predicted based on the change of the set uniformity score, and the method for calculating the set uniformity score is as follows: calculating an embedded representation of each term in the set, inputting the embedded representation into the fully-connected neural network model to obtain a new term representation, summing all the new term representations to obtain an initialized term set representation, and inputting the initialized term set representation into the fully-connected neural network model to obtain a set uniformity score.

3. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the training set generation mode of the synonym set classifier comprises: extracting a drug term from the synonym set in a random extraction mode, and generating a positive training sample by combining a set formed by the rest drug terms in the synonym set; for each positive training sample, matching a plurality of negative training samples, the negative training samples being extracted from a drug term library after the drug terms in the synonym set are excluded.

4. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the updating of the drug term library according to all synonym sets based on synonym mining update is specifically: if the synonym set of the standard drug term as the upper language is updated, establishing synonym association between the corresponding synonym and the standard drug term as the upper language, and simultaneously establishing lower language association between the corresponding synonym and all the standard drug terms associated with the standard drug term as the upper language; if the synonym set of the standard medicine terms of the non-upper languages is updated, corresponding synonyms are associated with the standard medicine terms of the non-upper languages.

5. The method for standardized association of drug terminology in electronic medical records according to claim 1, wherein the pre-training language model is adjusted, specifically: the semantic embedded representation of the initial character is used as an independent variable, and the dependent variable is a label of whether the semantic association of the external medicine term in the electronic medical record and the medicine term in the updated medicine term library is carried out; acquiring a prediction result based on semantic embedded representation by adopting a nonlinear activation function; and (3) acquiring a positive training sample by adopting a manual labeling mode, acquiring a negative training sample by adopting a random extraction mode, acquiring a training set, training a pre-training language model, and optimizing and adjusting semantic embedding representation.

6. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the candidate association relationship is established between the external drug terms and the drug terms in the updated drug term library, specifically: calculating the TF-IDF value of each word in the medicine term, obtaining vector representation of the external medicine term in the electronic medical record and each medicine term in the updated medicine term library, calculating the similarity between the two medicine terms, and if the similarity is larger than a preset similarity threshold, establishing a candidate association relationship between the external medicine term in the electronic medical record and the medicine term in the corresponding medicine term library.

7. The method for standardized association of drug terms in electronic medical records according to claim 1, wherein the inputs of each layer in the graph rolling neural network model comprise two parts, the first part is a node embedded representation matrix, the second part is an adjacent matrix, the output of each layer is used as the node embedded representation matrix of the next layer, the output is obtained through normalized graph laplace transformation, and the graph rolling neural network model is optimized by adopting a distance loss function based on a marginal.

8. The method for standardized association of drug terminology in electronic medical records according to claim 7, wherein the values of the adjacency matrix are specifically: if there is an edge from one drug term to another in the updated drug term library, the corresponding value is 1, otherwise the value is 0; if there is an edge from the external drug term in the electronic medical record to the updated drug term in the drug term library, the corresponding value is the similarity value in the candidate association.

9. The standardized association method of drug terms in electronic medical records according to claim 1, wherein semantic embedded representations and structural embedded representations are spliced, the spliced representations are input into a multi-layer perceptron, the multi-layer perceptron comprises a plurality of fully-connected hidden layers and a single-node output layer, and the output of the multi-layer perceptron is converted into scalar quantities through a nonlinear activation function to obtain the association probability of external drug terms in each electronic medical record and standard drug terms in an updated drug term library.

10. A system for standardizing association of drug terminology in electronic medical records implemented based on the method of any one of claims 1-9, comprising: