CN107766323A - 一种基于互信息和关联规则的文本特征提取方法 - Google Patents
一种基于互信息和关联规则的文本特征提取方法 Download PDFInfo
- Publication number
- CN107766323A CN107766323A CN201710796425.1A CN201710796425A CN107766323A CN 107766323 A CN107766323 A CN 107766323A CN 201710796425 A CN201710796425 A CN 201710796425A CN 107766323 A CN107766323 A CN 107766323A
- Authority
- CN
- China
- Prior art keywords
- text
- feature
- word
- collection
- term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 73
- 238000012549 training Methods 0.000 claims abstract description 24
- 125000004122 cyclic group Chemical group 0.000 claims description 25
- 102000000532 Methionine Sulfoxide Reductases Human genes 0.000 claims description 22
- 108010041559 Methionine Sulfoxide Reductases Proteins 0.000 claims description 22
- 238000001914 filtration Methods 0.000 claims description 16
- 239000012141 concentrate Substances 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 13
- 238000004422 calculation algorithm Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 7
- 238000011160 research Methods 0.000 description 6
- 238000009826 distribution Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 238000000546 chi-square test Methods 0.000 description 4
- 238000013480 data collection Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710796425.1A CN107766323B (zh) | 2017-09-06 | 2017-09-06 | 一种基于互信息和关联规则的文本特征提取方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710796425.1A CN107766323B (zh) | 2017-09-06 | 2017-09-06 | 一种基于互信息和关联规则的文本特征提取方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107766323A true CN107766323A (zh) | 2018-03-06 |
CN107766323B CN107766323B (zh) | 2021-08-31 |
Family
ID=61265086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710796425.1A Active CN107766323B (zh) | 2017-09-06 | 2017-09-06 | 一种基于互信息和关联规则的文本特征提取方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107766323B (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109240258A (zh) * | 2018-07-09 | 2019-01-18 | 上海万行信息科技有限公司 | 基于词向量的汽车故障智能辅助诊断方法和系统 |
CN109684462A (zh) * | 2018-12-30 | 2019-04-26 | 广西财经学院 | 基于权值比较和卡方分析的文本词间关联规则挖掘方法 |
CN109739953A (zh) * | 2018-12-30 | 2019-05-10 | 广西财经学院 | 基于卡方分析-置信度框架和后件扩展的文本检索方法 |
CN109857866A (zh) * | 2019-01-14 | 2019-06-07 | 中国科学院信息工程研究所 | 一种面向事件查询建议的关键词抽取方法和事件查询建议生成方法及检索系统 |
CN112818146A (zh) * | 2021-01-26 | 2021-05-18 | 山西三友和智慧信息技术股份有限公司 | 一种基于产品图像风格的推荐方法 |
CN113704447A (zh) * | 2021-03-03 | 2021-11-26 | 腾讯科技(深圳)有限公司 | 一种文本信息的识别方法以及相关装置 |
CN113807456A (zh) * | 2021-09-26 | 2021-12-17 | 大连交通大学 | 一种基于互信息的特征筛选和关联规则多标记分类算法 |
CN116644184A (zh) * | 2023-07-27 | 2023-08-25 | 浙江厚雪网络科技有限公司 | 基于数据聚类的人力资源信息管理系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279478A (zh) * | 2013-04-19 | 2013-09-04 | 国家电网公司 | 一种基于分布式互信息文档特征提取方法 |
CN103678274A (zh) * | 2013-04-15 | 2014-03-26 | 南京邮电大学 | 一种基于改进互信息和熵的文本分类特征提取方法 |
CN105335785A (zh) * | 2015-10-30 | 2016-02-17 | 西华大学 | 一种基于向量运算的关联规则挖掘方法 |
CN105631462A (zh) * | 2014-10-28 | 2016-06-01 | 北京交通大学 | 结合置信度和贡献度的基于时空上下文的行为识别方法 |
CN105701084A (zh) * | 2015-12-28 | 2016-06-22 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种基于互信息的文本分类的特征提取方法 |
-
2017
- 2017-09-06 CN CN201710796425.1A patent/CN107766323B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678274A (zh) * | 2013-04-15 | 2014-03-26 | 南京邮电大学 | 一种基于改进互信息和熵的文本分类特征提取方法 |
CN103279478A (zh) * | 2013-04-19 | 2013-09-04 | 国家电网公司 | 一种基于分布式互信息文档特征提取方法 |
CN105631462A (zh) * | 2014-10-28 | 2016-06-01 | 北京交通大学 | 结合置信度和贡献度的基于时空上下文的行为识别方法 |
CN105335785A (zh) * | 2015-10-30 | 2016-02-17 | 西华大学 | 一种基于向量运算的关联规则挖掘方法 |
CN105701084A (zh) * | 2015-12-28 | 2016-06-22 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种基于互信息的文本分类的特征提取方法 |
Non-Patent Citations (6)
Title |
---|
MARINONI A 等: "《Unsupervised Data Driven Feature Extraction by Means of Mutual Information Maximization》", 《IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING》 * |
任建华 等: "《基于词条之间关联关系的文档聚类》", 《计算机工程与应用》 * |
熊赟 等: "《大数据挖掘》", 30 April 2016 * |
胡可云 等: "《数据挖掘理论与应用》", 30 April 2008 * |
陈敏: "《认知计算导论》", 30 June 2017 * |
高定国: "《藏文信息处理的原理与应用》", 30 December 2014 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109240258A (zh) * | 2018-07-09 | 2019-01-18 | 上海万行信息科技有限公司 | 基于词向量的汽车故障智能辅助诊断方法和系统 |
CN109739953B (zh) * | 2018-12-30 | 2021-07-20 | 广西财经学院 | 基于卡方分析-置信度框架和后件扩展的文本检索方法 |
CN109684462A (zh) * | 2018-12-30 | 2019-04-26 | 广西财经学院 | 基于权值比较和卡方分析的文本词间关联规则挖掘方法 |
CN109739953A (zh) * | 2018-12-30 | 2019-05-10 | 广西财经学院 | 基于卡方分析-置信度框架和后件扩展的文本检索方法 |
CN109684462B (zh) * | 2018-12-30 | 2022-12-06 | 广西财经学院 | 基于权值比较和卡方分析的文本词间关联规则挖掘方法 |
CN109857866A (zh) * | 2019-01-14 | 2019-06-07 | 中国科学院信息工程研究所 | 一种面向事件查询建议的关键词抽取方法和事件查询建议生成方法及检索系统 |
CN109857866B (zh) * | 2019-01-14 | 2021-05-25 | 中国科学院信息工程研究所 | 一种面向事件查询建议的关键词抽取方法和事件查询建议生成方法及检索系统 |
CN112818146B (zh) * | 2021-01-26 | 2022-12-02 | 山西三友和智慧信息技术股份有限公司 | 一种基于产品图像风格的推荐方法 |
CN112818146A (zh) * | 2021-01-26 | 2021-05-18 | 山西三友和智慧信息技术股份有限公司 | 一种基于产品图像风格的推荐方法 |
CN113704447A (zh) * | 2021-03-03 | 2021-11-26 | 腾讯科技(深圳)有限公司 | 一种文本信息的识别方法以及相关装置 |
CN113704447B (zh) * | 2021-03-03 | 2024-05-03 | 腾讯科技(深圳)有限公司 | 一种文本信息的识别方法以及相关装置 |
CN113807456A (zh) * | 2021-09-26 | 2021-12-17 | 大连交通大学 | 一种基于互信息的特征筛选和关联规则多标记分类算法 |
CN113807456B (zh) * | 2021-09-26 | 2024-04-09 | 大连交通大学 | 一种基于互信息的特征筛选和关联规则多标记分类方法 |
CN116644184A (zh) * | 2023-07-27 | 2023-08-25 | 浙江厚雪网络科技有限公司 | 基于数据聚类的人力资源信息管理系统 |
CN116644184B (zh) * | 2023-07-27 | 2023-10-20 | 浙江厚雪网络科技有限公司 | 基于数据聚类的人力资源信息管理系统 |
Also Published As
Publication number | Publication date |
---|---|
CN107766323B (zh) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766323A (zh) | 一种基于互信息和关联规则的文本特征提取方法 | |
CN107220295B (zh) | 一种人民矛盾调解案例搜索和调解策略推荐方法 | |
CN102929873B (zh) | 一种基于情境搜索提取搜索价值词的方法及装置 | |
CN110442760A (zh) | 一种问答检索系统的同义词挖掘方法及装置 | |
CN103309862B (zh) | 一种网页类型识别方法和系统 | |
CN108280114A (zh) | 一种基于深度学习的用户文献阅读兴趣分析方法 | |
CN101609450A (zh) | 基于训练集的网页分类方法 | |
CN101763431A (zh) | 基于海量网络舆情信息的pl聚类处理方法 | |
CN104199822A (zh) | 一种识别搜索对应的需求分类的方法和系统 | |
CN110532480B (zh) | 一种用于人读威胁情报推荐的知识图谱构建方法及威胁情报推荐方法 | |
CN114880486A (zh) | 基于nlp和知识图谱的产业链识别方法及系统 | |
Mashuri | Sentiment analysis in twitter using lexicon based and polarity multiplication | |
CN105205163B (zh) | 一种科技新闻的增量学习多层次二分类方法 | |
CN107506472A (zh) | 一种学生浏览网页分类方法 | |
CN104778157A (zh) | 一种多文档摘要句的生成方法 | |
KR20140049680A (ko) | 규칙기반 다중 에이전트를 이용한 감성 분류 시스템 및 그 방법 | |
CN114997288A (zh) | 一种设计资源关联方法 | |
Zhou et al. | Attention calibration for transformer-based sequential recommendation | |
CN112492606A (zh) | 垃圾短信的分类识别方法、装置、计算机设备及存储介质 | |
WO2021060967A1 (en) | A system and method for predictive analytics of articles | |
CN112464668A (zh) | 一种提取智能家居行业动态信息的方法和系统 | |
CN108932247A (zh) | 一种优化文本搜索的方法及装置 | |
Majdabadi et al. | Twitter trend extraction: a graph-based approach for tweet and hashtag ranking, utilizing no-hashtag tweets | |
CN109871429A (zh) | 融合Wikipedia分类及显式语义特征的短文本检索方法 | |
Saranya et al. | Word Cloud Generation on Clothing Reviews using Topic Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20180306 Assignee: Fanyun software (Nanjing) Co.,Ltd. Assignor: HUAIYIN INSTITUTE OF TECHNOLOGY Contract record no.: X2021980010526 Denomination of invention: A text feature extraction method based on mutual information and association rules Granted publication date: 20210831 License type: Common License Record date: 20211011 |
|
TR01 | Transfer of patent right |
Effective date of registration: 20240506 Address after: 230000 b-1018, Woye Garden commercial office building, 81 Ganquan Road, Shushan District, Hefei City, Anhui Province Patentee after: HEFEI WISDOM DRAGON MACHINERY DESIGN Co.,Ltd. Country or region after: China Address before: 223005 Jiangsu Huaian economic and Technological Development Zone, 1 East Road. Patentee before: HUAIYIN INSTITUTE OF TECHNOLOGY Country or region before: China |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240510 Address after: Room 212, Building 3, No. 2959 Gudai Road, Minhang District, Shanghai, 201199 Patentee after: Shanghai Zhutong Information Technology Co.,Ltd. Country or region after: China Address before: 230000 b-1018, Woye Garden commercial office building, 81 Ganquan Road, Shushan District, Hefei City, Anhui Province Patentee before: HEFEI WISDOM DRAGON MACHINERY DESIGN Co.,Ltd. Country or region before: China |
|
TR01 | Transfer of patent right | ||
EC01 | Cancellation of recordation of patent licensing contract |
Assignee: Fanyun software (Nanjing) Co.,Ltd. Assignor: HUAIYIN INSTITUTE OF TECHNOLOGY Contract record no.: X2021980010526 Date of cancellation: 20240516 |
|
EC01 | Cancellation of recordation of patent licensing contract |