CN109977391A - 一种文本数据的信息抽取方法及装置 - Google Patents
一种文本数据的信息抽取方法及装置 Download PDFInfo
- Publication number
- CN109977391A CN109977391A CN201711458887.9A CN201711458887A CN109977391A CN 109977391 A CN109977391 A CN 109977391A CN 201711458887 A CN201711458887 A CN 201711458887A CN 109977391 A CN109977391 A CN 109977391A
- Authority
- CN
- China
- Prior art keywords
- phrase
- text data
- phrases
- data
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 claims description 41
- 238000002372 labelling Methods 0.000 claims description 18
- 238000012216 screening Methods 0.000 claims description 15
- 239000003550 marker Substances 0.000 claims description 14
- 239000002245 particle Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 13
- 230000000717 retained effect Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 6
- 238000000547 structure data Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 description 9
- 230000001360 synchronised effect Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- COCAUCFPFHUGAA-MGNBDDOMSA-N n-[3-[(1s,7s)-5-amino-4-thia-6-azabicyclo[5.1.0]oct-5-en-7-yl]-4-fluorophenyl]-5-chloropyridine-2-carboxamide Chemical compound C=1C=C(F)C([C@@]23N=C(SCC[C@@H]2C3)N)=CC=1NC(=O)C1=CC=C(Cl)C=N1 COCAUCFPFHUGAA-MGNBDDOMSA-N 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 208000032023 Signs and Symptoms Diseases 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711458887.9A CN109977391B (zh) | 2017-12-28 | 2017-12-28 | 一种文本数据的信息抽取方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711458887.9A CN109977391B (zh) | 2017-12-28 | 2017-12-28 | 一种文本数据的信息抽取方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977391A true CN109977391A (zh) | 2019-07-05 |
CN109977391B CN109977391B (zh) | 2020-12-08 |
Family
ID=67074603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711458887.9A Active CN109977391B (zh) | 2017-12-28 | 2017-12-28 | 一种文本数据的信息抽取方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977391B (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347803A (zh) * | 2019-07-18 | 2019-10-18 | 北京百度网讯科技有限公司 | 获得阅读理解素材的方法和装置、电子设备、可读介质 |
WO2021170085A1 (zh) * | 2020-02-27 | 2021-09-02 | 京东方科技集团股份有限公司 | 标注方法、关系抽取方法、存储介质和运算装置 |
CN113836902A (zh) * | 2021-08-25 | 2021-12-24 | 广东外语外贸大学 | 一种短语语料库的构建方法、装置、设备和存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067285A1 (en) * | 2005-09-22 | 2007-03-22 | Matthias Blume | Method and apparatus for automatic entity disambiguation |
CN102968432A (zh) * | 2012-09-19 | 2013-03-13 | 华东师范大学 | 一种基于置信度验证元组的控制方法 |
CN103268339A (zh) * | 2013-05-17 | 2013-08-28 | 中国科学院计算技术研究所 | 微博消息中命名实体识别方法及系统 |
CN104156352A (zh) * | 2014-08-15 | 2014-11-19 | 苏州大学 | 一种中文事件的处理方法及系统 |
-
2017
- 2017-12-28 CN CN201711458887.9A patent/CN109977391B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067285A1 (en) * | 2005-09-22 | 2007-03-22 | Matthias Blume | Method and apparatus for automatic entity disambiguation |
CN102968432A (zh) * | 2012-09-19 | 2013-03-13 | 华东师范大学 | 一种基于置信度验证元组的控制方法 |
CN103268339A (zh) * | 2013-05-17 | 2013-08-28 | 中国科学院计算技术研究所 | 微博消息中命名实体识别方法及系统 |
CN104156352A (zh) * | 2014-08-15 | 2014-11-19 | 苏州大学 | 一种中文事件的处理方法及系统 |
Non-Patent Citations (2)
Title |
---|
ANTHONY FADER ET.AL: "Identifying Relations for Open Information Extraction", 《HTTPS://DL.ACM.ORG/DOI/ABS/10.5555/2145432.2145596》 * |
邓擘 等: "信息抽取中实体关系模式的可信度评估", 《情报理论与实践》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347803A (zh) * | 2019-07-18 | 2019-10-18 | 北京百度网讯科技有限公司 | 获得阅读理解素材的方法和装置、电子设备、可读介质 |
WO2021170085A1 (zh) * | 2020-02-27 | 2021-09-02 | 京东方科技集团股份有限公司 | 标注方法、关系抽取方法、存储介质和运算装置 |
US12026453B2 (en) | 2020-02-27 | 2024-07-02 | Boe Technology Group Co., Ltd. | Annotation method, relation extraction method, storage medium and computing device |
CN113836902A (zh) * | 2021-08-25 | 2021-12-24 | 广东外语外贸大学 | 一种短语语料库的构建方法、装置、设备和存储介质 |
CN113836902B (zh) * | 2021-08-25 | 2024-04-26 | 广东外语外贸大学 | 一种短语语料库的构建方法、装置、设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN109977391B (zh) | 2020-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Singh et al. | A decision tree based word sense disambiguation system in Manipuri language | |
CN110727796A (zh) | 面向分级读物的多尺度难度向量分类方法 | |
CN106570180A (zh) | 基于人工智能的语音搜索方法及装置 | |
Mohammed | Using machine learning to build POS tagger for under-resourced language: the case of Somali | |
CN109977391B (zh) | 一种文本数据的信息抽取方法及装置 | |
Almanea | Automatic methods and neural networks in Arabic texts diacritization: a comprehensive survey | |
Mezghanni et al. | CrimAr: A criminal Arabic ontology for a benchmark based evaluation | |
Chennoufi et al. | Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization | |
Singha et al. | Part of speech tagging in Manipuri with hidden markov model | |
Sheng et al. | Chinese prosodic phrasing with extended features | |
Kapočiūtė-Dzikienė et al. | Improving topic classification for highly inflective languages | |
Hirpassa | Information extraction system for Amharic text | |
KR101869362B1 (ko) | 함의 문장 생성 기술을 활용한 문장 표절 판단 장치, 이를 구현하기 위한 프로그램 및 기록 매체 | |
CN112071304B (zh) | 一种语意分析方法及装置 | |
Shekhar et al. | Computational linguistic retrieval framework using negative bootstrapping for retrieving transliteration variants | |
Maulud et al. | Towards a Complete Kurdish NLP Pipeline: Challenges and Opportunities | |
Mesfar | Towards a cascade of morpho-syntactic tools for arabic natural language processing | |
Gavhal et al. | Sentence Compression Using Natural Language Processing | |
Aparna et al. | A review on different approaches of pos tagging in NLP | |
Gast | 11 Comparing Annotation Types and n-Gram Sizes | |
Abdelkader | HMM Based Part of Speech Tagging for Hadith Isnad | |
CHEN | Syntax Error Detection in English Text Images Based on Sparse Representation | |
Rhazi et al. | A Complex Annotation-Based Approach for the Arabic Syntactic Analyzer using NooJ Text Annotation Structures | |
Jaf et al. | Towards the development of a hybrid parser for natural languages | |
Mekki et al. | TTK: A toolkit for Tunisian linguistic analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100032 No. 29, Finance Street, Beijing, Xicheng District Patentee after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd. Patentee after: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd. Address before: 100032 No. 29, Finance Street, Beijing, Xicheng District Patentee before: CHINA MOBILE COMMUNICATIONS Corp. Patentee before: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220704 Address after: 610041 China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan Patentee after: China Mobile (Chengdu) information and Communication Technology Co.,Ltd. Patentee after: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd. Patentee after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd. Address before: 100032 No. 29, Finance Street, Beijing, Xicheng District Patentee before: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd. Patentee before: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd. |