CN114580389B - Chinese medical field causal relation extraction method integrating radical information - Google Patents
Chinese medical field causal relation extraction method integrating radical information Download PDFInfo
- Publication number
- CN114580389B CN114580389B CN202210220870.4A CN202210220870A CN114580389B CN 114580389 B CN114580389 B CN 114580389B CN 202210220870 A CN202210220870 A CN 202210220870A CN 114580389 B CN114580389 B CN 114580389B
- Authority
- CN
- China
- Prior art keywords
- radical
- text
- character
- chinese
- medical field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000001364 causal effect Effects 0.000 title claims abstract description 39
- 238000000605 extraction Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000013519 translation Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 14
- 238000002372 labelling Methods 0.000 claims description 8
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000007418 data mining Methods 0.000 abstract 1
- 238000007781 pre-processing Methods 0.000 abstract 1
- 230000000875 corresponding effect Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000006403 short-term memory Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007721 medicinal effect Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Machine Translation (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及医疗领域因果关系抽取,尤其涉及一种融合部首信息的中文医疗领域因果关系抽取方法。The present invention relates to causal relationship extraction in the medical field, and in particular to a causal relationship extraction method in the Chinese medical field integrating radical information.
背景技术Background Art
目前,医疗领域的信息化建设稳步开展,现代化的医疗信息系统已经积累了海量医疗数据。随着数据的不断积累,利用自然语言处理技术和深度学习的方法挖掘医疗领域文本数据中蕴含的丰富信息,已经成为医学领域和人工智能领域交叉研究的热点。医疗领域文本数据中蕴含着大量医疗活动的记录,包含所患疾病、药物、检查和治疗结果等。这些信息是重要的临床数据,对其进行精确高效地分析和挖掘,能给建立医学知识库、构建临床诊疗系统等提供理论和技术支持。但是,医疗领域文本数据与传统的文本有许多不同的特征,如包含大量英文实体名、语义与部首高度相关等特性,这些特性给因果关系抽取来了新的挑战。此时,就需要一个能融合部首信息、丰富文本语义信息的因果关系抽取方法。At present, the information construction in the medical field is steadily developing, and modern medical information systems have accumulated massive medical data. With the continuous accumulation of data, the use of natural language processing technology and deep learning methods to mine the rich information contained in medical text data has become a hot spot in the cross-study of the medical field and artificial intelligence. Text data in the medical field contains a large number of records of medical activities, including diseases, drugs, examinations and treatment results. This information is important clinical data. Accurate and efficient analysis and mining of it can provide theoretical and technical support for the establishment of medical knowledge bases and the construction of clinical diagnosis and treatment systems. However, medical text data has many different characteristics from traditional texts, such as containing a large number of English entity names, and semantics are highly correlated with radicals. These characteristics bring new challenges to causal extraction. At this time, a causal extraction method that can integrate radical information and enrich text semantic information is needed.
目前,人们对部首信息的研究主要集中在命名实体识别领域。汉字具有单字可成词的特点,且汉字的偏旁部首往往蕴含着重要的信息。对部首信息的研究主要是通过条件随机场模型、双向长短期记忆网络模型等,获取部首特征,将部首特征融入到字符特征中,实现文本语义信息的丰富,得到融合部首信息的字符特征向量表示。At present, the research on radical information is mainly focused on the field of named entity recognition. Chinese characters have the characteristics of being able to form words with a single word, and the radicals of Chinese characters often contain important information. The research on radical information is mainly to obtain radical features through conditional random field models, bidirectional long short-term memory network models, etc., integrate radical features into character features, enrich text semantic information, and obtain character feature vector representation that integrates radical information.
对于得到的融合部首信息的字符特征表示,还需将其作为因果关系抽取模型的输入,得到因果关系实体。对于因果关系抽取的研究,常用的方法为基于机器学习的方法和基于深度学习的方法。机器学习的方法首先建模成一个多分类问题,提取特征向量后再使用有监督的分类器进行事件抽取。随着神经网络的火热研究,将神经网络模型应用于因果关系抽取中,可以提高因果关系抽取准确效率。但现有的方法很少考虑到字符的部首特征,导致语义信息获取不够完善,给因果关系抽取模型应用在医疗领域带来风险。本文通过融合部首信息,对医疗领域文本数据进行因果关系抽取,提高因果关系抽取准确率。The obtained character feature representation of the fused radical information needs to be used as the input of the causal extraction model to obtain the causal entity. For the research on causal extraction, the commonly used methods are machine learning-based methods and deep learning-based methods. The machine learning method first models it as a multi-classification problem, extracts the feature vector, and then uses a supervised classifier for event extraction. With the hot research of neural networks, the application of neural network models to causal extraction can improve the accuracy and efficiency of causal extraction. However, the existing methods rarely consider the radical features of characters, resulting in incomplete semantic information acquisition, which brings risks to the application of causal extraction models in the medical field. This paper extracts causal relationships from medical text data by fusing radical information to improve the accuracy of causal extraction.
发明内容Summary of the invention
为了解决上述问题,本发明的目的在于提供一种融合部首信息的中文医疗领域因果关系抽取方法。In order to solve the above problems, the purpose of the present invention is to provide a method for extracting causal relationships in the Chinese medical field by integrating radical information.
为了达到上述目的,本发明提供的一种融合部首信息的中文医疗领域因果关系抽取方法方法是按以下步骤进行的:In order to achieve the above-mentioned purpose, the present invention provides a method for extracting causal relationships in the Chinese medical field by integrating radical information, which is carried out in the following steps:
步骤1:数据获取。获取中文医疗领域文本数据集合D={D1,D2...Dn},Di表示第i个文本,1≤i≤n,n为集合D中的文本总数;Step 1: Data acquisition. Obtain the Chinese medical field text data set D = {D 1 ,D 2 ...D n }, where D i represents the i-th text, 1≤i≤n, and n is the total number of texts in the set D;
步骤2:对获取的文本数据进行预处理,其基本步骤如下:Step 2: Preprocess the acquired text data. The basic steps are as follows:
步骤2.1:去除文本中的停用词、网页标签等,进行分词;Step 2.1: Remove stop words, web page tags, etc. from the text and perform word segmentation;
步骤2.2:将文本提取成结构化数据,装入数据库;Step 2.2: Extract the text into structured data and load it into the database;
步骤3:将文本数据中的英文专业术语转化为中文,其基本步骤如下:Step 3: Convert the English professional terms in the text data into Chinese. The basic steps are as follows:
步骤3.1:利用ASCII码值定位数据集中的英文专业术语;Step 3.1: Use ASCII code values to locate English professional terms in the data set;
步骤3.2:利用翻译接口将英文专业术语转化为中文,得到仅含中文字符的数据集;Step 3.2: Use the translation interface to convert English professional terms into Chinese to obtain a data set containing only Chinese characters;
步骤4:部首特征获取,其基本步骤如下:Step 4: Radical feature acquisition, the basic steps are as follows:
步骤4.1:通过查询在线新华字典,获取数据集中所有字符的部首,对于没有部首的汉字,将字符本身看作词;Step 4.1: Obtain the radicals of all characters in the dataset by querying the online Xinhua Dictionary. For Chinese characters without radicals, treat the characters themselves as words.
步骤4.2:将部首看作词,作为Word2Vec架构的输入,对部首进行增量训练,得到部首特征向量表示;Step 4.2: Treat the radical as a word and use it as the input of the Word2Vec architecture. Perform incremental training on the radical to obtain the radical feature vector representation.
步骤5:融合部首信息的中文医疗领域因果关系抽取,其基本步骤如下:Step 5: Extracting causal relationships in the Chinese medical field by integrating radical information. The basic steps are as follows:
步骤5.1:输入层,对于中文医疗领域原始文本数据,将句子输入到BERT模型中获取字符级特征,同时将部首输入到Word2Vec中进行增量训练,得到部首特征表示;Step 5.1: Input layer: For the original text data in the Chinese medical field, the sentences are input into the BERT model to obtain character-level features, and the radicals are input into Word2Vec for incremental training to obtain radical feature representation;
步骤5.2:接收字符特征与部首特征,并通过查找嵌入字典输出两个嵌入矩阵,将字符与部首的向量维数设为相同大小,这样,一个中文字符可以由两个向量序列来表示,即字符序列和部首序列;Step 5.2: Receive character features and radical features, and output two embedding matrices by searching the embedding dictionary, and set the vector dimensions of characters and radicals to the same size. In this way, a Chinese character can be represented by two vector sequences, namely, a character sequence and a radical sequence;
步骤5.3:表示层将字符信息与部首信息结合起来,生成输入文本的全面表示,利用双向长短期记忆网络可以捕获前后上下文信息,捕获双向的语义依赖,考虑将部首特征作为行向量拼接在字符特征之后,将部首信息编码到字符特征向量中,将文本分别通过BERT模型和Word2Vec架构,得到字符特征与部首特征,再将这两种独立的特征向量进行拼接,得到融合部首信息的文本特征向量表示;Step 5.3: The representation layer combines character information with radical information to generate a comprehensive representation of the input text. The bidirectional long short-term memory network can capture the previous and next context information and the bidirectional semantic dependency. Consider concatenating the radical feature as a row vector after the character feature, encode the radical information into the character feature vector, pass the text through the BERT model and the Word2Vec architecture respectively, obtain the character feature and radical feature, and then concatenate these two independent feature vectors to obtain the text feature vector representation fused with the radical information.
步骤5.4:将表示层中Bi-LSTM的最终隐层状态作为输出,并将其连接形成一个综合表示。然后将其输入到条件随机场模型中,采用Softmax函数作为激活函数,对每个词进行映射得到条件概率;最后,利用BIO序列标注方法对输出文本进行标记,得到最终抽取结果;Step 5.4: Take the final hidden state of the Bi-LSTM in the representation layer as the output and connect it to form a comprehensive representation. Then input it into the conditional random field model, use the Softmax function as the activation function, map each word to get the conditional probability; finally, use the BIO sequence labeling method to label the output text and get the final extraction result;
步骤6:序列标注,用序列标注的方式进行因果关系抽取,需要对句子中的每个单词标记相应的标签,B-cause表示原因事件的开始,B-effect表示结果事件的开始,I-cause表示原因事件的中间词或结尾词,I-effect表示结果事件的中间词或结尾词,O标签表示这个词既不属于原因事件也不属于结果事件,对预测层的语句进行概率计算,得到每个字符对应的因果标签,得到因果实体。Step 6: Sequence labeling. Use sequence labeling to extract causal relationships. Each word in the sentence needs to be marked with a corresponding label. B-cause indicates the beginning of the cause event, B-effect indicates the beginning of the result event, I-cause indicates the middle word or ending word of the cause event, I-effect indicates the middle word or ending word of the result event, and the O label indicates that this word belongs to neither the cause event nor the result event. Perform probability calculation on the sentences in the prediction layer to obtain the causal label corresponding to each character and the causal entity.
本发明所具有的优点和积极效果是:本发明的融合部首信息的中文医疗领域因果关系抽取方法能够融合字符部首信息,丰富文本语义信息,并将其作为因果关系抽取模型的输入,提高因果关系抽取准确率,在医疗领域可用于建立医学知识库、构建在线问诊平台等任务。The advantages and positive effects of the present invention are: the Chinese medical field causal relationship extraction method integrating radical information of the present invention can integrate character radical information, enrich text semantic information, and use it as the input of the causal relationship extraction model, thereby improving the accuracy of causal relationship extraction, and can be used in the medical field for tasks such as establishing a medical knowledge base and building an online consultation platform.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明的技术方案,对本发明所需要使用的附图作简单的介绍。In order to more clearly illustrate the technical solution of the present invention, a brief introduction is given to the drawings required for use in the present invention.
图1为本发明提供的一种融合部首信息的中文医疗领域因果关系抽取方法流程图;FIG1 is a flow chart of a method for extracting causal relationships in the Chinese medical field by integrating radical information provided by the present invention;
图2为本发明提供的一种进行中文字符部首特征获取的结构框图FIG. 2 is a structural block diagram of a method for obtaining Chinese character radical features provided by the present invention.
图3为本发明提供的一种进行因果关系抽取的结构框图。FIG3 is a structural block diagram of a causal relationship extraction method provided by the present invention.
具体实施方式DETAILED DESCRIPTION
下面对本发明做进一步说明:The present invention will be further described below:
本发明的目的在于提供一种融合部首信息的中文医疗领域因果关系抽取方法。这是一种在现有因果关系抽取的基础上,通过融合中文字符部首信息,丰富文本语义信息,来达到更好抽取效果的方法。The purpose of the present invention is to provide a method for extracting causal relationships in the Chinese medical field by integrating radical information. This is a method that achieves better extraction effects by integrating Chinese character radical information and enriching text semantic information on the basis of existing causal relationship extraction.
结合图1、2、3,本发明一种融合部首信息的中文医疗领域因果关系抽取方法是按以下步骤进行的:In conjunction with FIGS. 1 , 2 and 3 , a method for extracting causal relationships in the Chinese medical field by integrating radical information of the present invention is performed in the following steps:
步骤1:数据获取。获取中文医疗领域文本数据集合D={D1,D2...Dn},Di表示第i个文本,1≤i≤n,n为集合D中的文本总数;Step 1: Data acquisition. Obtain the Chinese medical field text data set D = {D 1 ,D 2 ...D n }, where D i represents the i-th text, 1≤i≤n, and n is the total number of texts in the set D;
步骤2:对获取的文本数据进行预处理,其基本步骤如下:Step 2: Preprocess the acquired text data. The basic steps are as follows:
步骤2.1:去除文本中的停用词、网页标签等,进行分词;Step 2.1: Remove stop words, web page tags, etc. from the text and perform word segmentation;
步骤2.2:将文本提取成结构化数据,装入数据库;Step 2.2: Extract the text into structured data and load it into the database;
步骤3:将文本数据中的英文专业术语转化为中文,其基本步骤如下:Step 3: Convert the English professional terms in the text data into Chinese. The basic steps are as follows:
步骤3.1:利用ASCII码值定位数据集中的英文专业术语;Step 3.1: Use ASCII code values to locate English professional terms in the data set;
步骤3.2:利用翻译接口将英文专业术语转化为中文,得到仅含中文字符的数据集;Step 3.2: Use the translation interface to convert English professional terms into Chinese to obtain a data set containing only Chinese characters;
步骤4:部首特征获取,其基本步骤如下:Step 4: Radical feature acquisition, the basic steps are as follows:
步骤4.1:通过查询在线新华字典,获取数据集中所有字符的部首,对于没有部首的汉字,将字符本身看作词;Step 4.1: Obtain the radicals of all characters in the dataset by querying the online Xinhua Dictionary. For Chinese characters without radicals, treat the characters themselves as words.
步骤4.2:将部首看作词,作为Word2Vec架构的输入,对部首进行增量训练,得到部首特征向量表示;Step 4.2: Treat the radical as a word and use it as the input of the Word2Vec architecture. Perform incremental training on the radical to obtain the radical feature vector representation.
步骤5:融合部首信息的中文医疗领域因果关系抽取,其基本步骤如下:Step 5: Extracting causal relationships in the Chinese medical field by integrating radical information. The basic steps are as follows:
步骤5.1:输入层,对于中文医疗领域原始文本数据,将句子输入到BERT模型中获取字符级特征,同时将部首输入到Word2Vec中进行增量训练,得到部首特征表示;Step 5.1: Input layer: For the original text data in the Chinese medical field, the sentences are input into the BERT model to obtain character-level features, and the radicals are input into Word2Vec for incremental training to obtain radical feature representation;
步骤5.2:接收字符特征与部首特征,并通过查找嵌入字典输出两个嵌入矩阵,将字符与部首的向量维数设为相同大小,这样,一个中文字符可以由两个向量序列来表示,即字符序列和部首序列;Step 5.2: Receive character features and radical features, and output two embedding matrices by searching the embedding dictionary, and set the vector dimensions of characters and radicals to the same size. In this way, a Chinese character can be represented by two vector sequences, namely, a character sequence and a radical sequence;
步骤5.3:表示层将字符信息与部首信息结合起来,生成输入文本的全面表示,利用双向长短期记忆网络可以捕获前后上下文信息,捕获双向的语义依赖,考虑将部首特征作为行向量拼接在字符特征之后,将部首信息编码到字符特征向量中,将文本分别通过BERT模型和Word2Vec架构,得到字符特征与部首特征,再将这两种独立的特征向量进行拼接,得到融合部首信息的文本特征向量表示;Step 5.3: The representation layer combines character information with radical information to generate a comprehensive representation of the input text. The bidirectional long short-term memory network can capture the previous and next context information and the bidirectional semantic dependency. Consider concatenating the radical feature as a row vector after the character feature, encode the radical information into the character feature vector, pass the text through the BERT model and the Word2Vec architecture respectively, obtain the character feature and radical feature, and then concatenate these two independent feature vectors to obtain the text feature vector representation fused with the radical information.
步骤5.4:将表示层中Bi-LSTM的最终隐层状态作为输出,并将其连接形成一个综合表示。然后将其输入到条件随机场模型中,采用Softmax函数作为激活函数,对每个词进行映射得到条件概率;最后,利用BIO序列标注方法对输出文本进行标记,得到最终抽取结果;Step 5.4: Take the final hidden state of the Bi-LSTM in the representation layer as the output and connect it to form a comprehensive representation. Then input it into the conditional random field model, use the Softmax function as the activation function, map each word to get the conditional probability; finally, use the BIO sequence labeling method to label the output text and get the final extraction result;
步骤6:序列标注,用序列标注的方式进行因果关系抽取,需要对句子中的每个单词标记相应的标签,B-cause表示原因事件的开始,B-effect表示结果事件的开始,I-cause表示原因事件的中间词或结尾词,I-effect表示结果事件的中间词或结尾词,O标签表示这个词既不属于原因事件也不属于结果事件,对预测层的语句进行概率计算,得到每个字符对应的因果标签,得到因果实体。Step 6: Sequence labeling. Use sequence labeling to extract causal relationships. Each word in the sentence needs to be marked with a corresponding label. B-cause indicates the beginning of the cause event, B-effect indicates the beginning of the result event, I-cause indicates the middle word or ending word of the cause event, I-effect indicates the middle word or ending word of the result event, and the O label indicates that this word belongs to neither the cause event nor the result event. Perform probability calculation on the sentences in the prediction layer to obtain the causal label corresponding to each character and the causal entity.
此外,以上实施方式仅用以说明本发明的具体实施方式而不是对其限制,本领域技术人员应当理解,还可以对其中部分技术特征进行同等替换,这些修改和替换亦属于本发明保护范围。In addition, the above embodiments are only used to illustrate the specific embodiments of the present invention rather than to limit it. Those skilled in the art should understand that some of the technical features therein may be replaced by equivalents, and these modifications and replacements also fall within the scope of protection of the present invention.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210220870.4A CN114580389B (en) | 2022-03-08 | 2022-03-08 | Chinese medical field causal relation extraction method integrating radical information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210220870.4A CN114580389B (en) | 2022-03-08 | 2022-03-08 | Chinese medical field causal relation extraction method integrating radical information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114580389A CN114580389A (en) | 2022-06-03 |
CN114580389B true CN114580389B (en) | 2024-08-20 |
Family
ID=81772878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210220870.4A Active CN114580389B (en) | 2022-03-08 | 2022-03-08 | Chinese medical field causal relation extraction method integrating radical information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114580389B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033203A (en) * | 2021-02-05 | 2021-06-25 | 浙江大学 | Structured information extraction method oriented to medical instruction book text |
CN113903422A (en) * | 2021-09-09 | 2022-01-07 | 北京邮电大学 | Method, device and equipment for entity extraction of medical image diagnosis report |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021000362A1 (en) * | 2019-07-04 | 2021-01-07 | 浙江大学 | Deep neural network model-based address information feature extraction method |
-
2022
- 2022-03-08 CN CN202210220870.4A patent/CN114580389B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033203A (en) * | 2021-02-05 | 2021-06-25 | 浙江大学 | Structured information extraction method oriented to medical instruction book text |
CN113903422A (en) * | 2021-09-09 | 2022-01-07 | 北京邮电大学 | Method, device and equipment for entity extraction of medical image diagnosis report |
Also Published As
Publication number | Publication date |
---|---|
CN114580389A (en) | 2022-06-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083710B (en) | A Word Definition Generation Method Based on Recurrent Neural Network and Latent Variable Structure | |
CN109992664B (en) | Dispute focus label classification method and device, computer equipment and storage medium | |
CN110609983B (en) | Structured decomposition method for policy file | |
CN115599901B (en) | Machine Question Answering Method, Device, Equipment and Storage Medium Based on Semantic Prompts | |
CN111143571B (en) | Entity labeling model training method, entity labeling method and device | |
CN112541337B (en) | Document template automatic generation method and system based on recurrent neural network language model | |
CN113312478A (en) | Viewpoint mining method and device based on reading understanding | |
CN112183030A (en) | Event extraction method and device based on preset neural network, computer equipment and storage medium | |
Sifa et al. | Towards contradiction detection in german: a translation-driven approach | |
CN112580329B (en) | Text noise data identification method, device, computer equipment and storage medium | |
CN113742733A (en) | Reading comprehension vulnerability event trigger word extraction and vulnerability type identification method and device | |
CN115713085A (en) | Document theme content analysis method and device | |
CN115545021A (en) | Clinical term identification method and device based on deep learning | |
CN117520561A (en) | Entity relation extraction method and system for knowledge graph construction in helicopter assembly field | |
CN116070632A (en) | Informal text entity tag identification method and device | |
CN117115505A (en) | Emotion enhancement continuous training method combining knowledge distillation and contrast learning | |
CN114330350A (en) | Named entity identification method and device, electronic equipment and storage medium | |
CN114580389B (en) | Chinese medical field causal relation extraction method integrating radical information | |
CN117172241B (en) | Tibetan language syntax component labeling method | |
CN112949311A (en) | Named entity identification method fusing font information | |
CN117828024A (en) | Plug-in retrieval method, device, storage medium and equipment | |
CN117725458A (en) | Method and device for obtaining threat information sample data generation model | |
CN114842301A (en) | Semi-supervised training method of image annotation model | |
CN112241630A (en) | Method and system for analyzing transformer variable-research standard vocabulary entry based on natural language processing | |
CN112632985A (en) | Corpus processing method and device, storage medium and processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |