Leveraging Dual Gloss Encoders in Chinese Biomedical Entity Linking
Abstract
1 Introduction
2 Related Work
2.1 WSD Models
2.2 WSD Language Resources
3 DGE
3.1 Context-aware Gloss Encoder
3.2 Lexical Gloss Encoder
3.3 Scoring
Sentence: 腺瘤性 息肉癌變機率較高, 建議即時切除。Target Entity: 息肉 (polyp) (Adenomatous polyps have a higher probability to be a cancer and prompt excision is recommended.) |
---|
Context-Aware Gloss Encoder |
X1: [CLS] 腺瘤性#息肉#癌變機率較高…… [SEP] 息肉是突出於黏膜的組織… [SEP] |
X2: [CLS] 腺瘤性#息肉#癌變機率較高…… [SEP] 共同服用的兩種形式中的… [SEP] |
Lexical Gloss Encoder |
g1: [CLS] 息肉是突出於黏膜的組織異常生長… [SEP] (A small vascular growth on the surface of a mucous membrane.) |
g2: [CLS] 共同服用的兩種形式中的一種 (例如, Hydra或珊瑚) … [SEP] (One of two forms that coelenterates take (e.g. a hydra or coral) …) |
Dot product Score |
\({X}_1 \cdot {g}_1\) 58.60 (the highest score: a correct sense g1) |
\({X}_1 \cdot {g}_2\) 26.08 |
\({X}_2 \cdot {g}_1\) 57.87 (the second higher score, wrong context-gloss pair, but correct gloss) |
\({X}_2 \cdot {g}_2\) 24.67 |
4 Experiments for Performance Evaluation
4.1 Datasets
Type | Target Entities | #Gloss | #Sent | #Token |
---|---|---|---|---|
2 glosses | 前臂 (forearm)、皮毛 (coat)、*部位 (component)、黏液 (mucus)、心臟病 (heart disease)、乳管 (lactiferous duct)、薄膜 (Biological membrane)、鼓膜 (eardrum)、胚囊 (gestational sec)、分泌物 (exudate)、組織 (tissue)、手足 (hands and feet)、多巴胺 (dopamine)、卵巢 (ovary)、超音波 (ultrasound)、顯影劑 (contrast medium)、息肉 (polyp)、炭疽病 (anthrax)、白斑 (vitiligo)、*緊身衣 (skin-tight garment)、*檢測 (assay)、*鏡頭 (camera lens)、呼吸管(snorkel)、牙套 (gumshield)、神經衰弱 (neurasthenia)、閉鎖 (atresia)、過敏反應 (allergy)、脂肪 (adipose tissue)、石膏 (gypsum) | 58 | 7,409 | 393,726 |
3 glosses | 眨眼 (blink)、結節 (nodule)、黑眼圈 (black eye)、乳房 (udder)、*香料 (spice)、穿刺 (centesis)、*手套 (glove) | 21 | 1,780 | 92,676 |
4 glosses | 隔膜 (diaphragm)、眼睛 (eye)、導管 (catheter)、眼罩 (blindfold) | 16 | 1,029 | 49,931 |
Total | 40 distinct entities | 95 | 10,218 | 536,333 |
4.2 Settings
Datasets | #Sent | #Token |
---|---|---|
Training | 7,109 | 373,152 |
Validation | 979 | 50,320 |
Test | 2,130 | 112,861 |
All | 10,218 | 536,333 |
4.3 Results
4.4 In-depth Analysis
DGE Model | F1-score |
---|---|
Context-aware Gloss Encoder (weak + target + average) | 97.81 |
– weak | 97.34 |
– weak, target → [CLS] | 93.13 |
– weak, average → last | 96.89 |
Biomedical Gloss | Number of Target Entities | F1-score |
---|---|---|
Yes | 34 | 97.87 |
No | 6 | 97.43 |
Gloss Number | Number of Target Words | F1-score |
---|---|---|
2 glosses | 29 | 98.38 |
3 glosses | 7 | 95.73 |
4 glosses | 4 | 97.3 |
4.5 Error Analysis
4.6 Discussion
5 Conclusions
References
Index Terms
- Leveraging Dual Gloss Encoders in Chinese Biomedical Entity Linking
Recommendations
Entity linking leveraging: automatically generated annotation
COLING '10: Proceedings of the 23rd International Conference on Computational LinguisticsEntity linking refers entity mentions in a document to their representations in a knowledge base (KB). In this paper, we propose to use additional information sources from Wikipedia to find more name variations for entity linking task. In addition, as ...
Arabic word sense disambiguation: a review
AbstractWord sense disambiguation (WSD) is a specific task of computational linguistics which aims at automatically identifying the correct sense of a given ambiguous word from a set of predefined senses. In WSD the goal is to tag each ambiguous word in a ...
Biomedical text disambiguation using UMLS
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningInterest in extracting information from biomedical documents has increased significantly in recent years but has always been challenged by the ambiguity of natural language. An important source of ambiguity is the usage of polysemous words: words with ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Funding Sources
- National Science and Technology Council, Taiwan
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 569Total Downloads
- Downloads (Last 12 months)561
- Downloads (Last 6 weeks)56
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in