CN108804595B - Short text representation method based on word2vec - Google Patents
Short text representation method based on word2vec Download PDFInfo
- Publication number
- CN108804595B CN108804595B CN201810525103.8A CN201810525103A CN108804595B CN 108804595 B CN108804595 B CN 108804595B CN 201810525103 A CN201810525103 A CN 201810525103A CN 108804595 B CN108804595 B CN 108804595B
- Authority
- CN
- China
- Prior art keywords
- document
- words
- word
- similar words
- cosine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 239000013598 vector Substances 0.000 claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 25
- 230000006872 improvement Effects 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000005259 measurement Methods 0.000 claims abstract description 3
- 230000008569 process Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000035606 childbirth Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000035558 fertility Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a word2 vec-based short text representation method, which comprises the following steps of: s1: inputting a training text set subjected to text preprocessing, setting word2vec method parameters, and training to obtain a word vector set corresponding to the training text set; s2: calculating the cosine distance between word vectors for each word in each document to obtain a series of similar words of the word in the whole training text set; s3: calculating the cosine distance between the similar words in each document and the document; s4: sorting according to the cosine distances from big to small, and finally selecting the first n similar words and the corresponding cosine distances to form n similar words and cosine measurement of the document; s5: and calculating the weights of the words in the documents and the selected n similar words in the documents to form new text representation, and outputting the vector space representation of each document after improvement based on word2 vec.
Description
Technical Field
The invention relates to the field of computer science and technology, in particular to a word2 vec-based short text representation method.
Background
In the text mining process, the machine for reading the sample information needs to firstly go through a text representation link and convert the sample into a numerical value. With the continuous expansion of natural language processing range and the development of computer technology, how to use numerical values to better represent semantic information represented by texts is always one of the crucial research points in the field of text processing, because the semantic information directly influences text mining effect. For the problem of short text mining, an effective text feature representation method is a difficult point of research, and particularly, short texts generated by a social platform have the traditional problems of feature sparseness, incomplete semantics, one-word-multiple meaning, multiple-word-one meaning and the like, and also have the characteristics of random expression, misuse of new words, large quantity and the like.
Commonly used text representation models are boolean models, probabilistic models, and Vector Space models, with the most commonly used text representation Model being the Vector Space Model (Vector Space Model) proposed by Gerard slide et al in 1958. The basic idea of the vector space model is to represent a text by using vectors, i.e. selecting partial feature words from a training set, and then using each feature word as one dimension of a vector space coordinate system, so that the text is formed into a vector in a multi-dimensional vector space, wherein each text is a point in an n-dimensional space, and similarity between the texts can be measured by an included angle between vectors or a distance between vectors (Tai De Yi, Wang. text classification feature weight improvement algorithm [ J ] computer engineering, 2010,36(9): 197-. However, the vector space model has the defect that the data space expression is sparse and the semantic information between words is ignored, which results in a slightly weaker representation capability of the vector space model on short texts. Some researchers have attempted to correct these defects, such as Wang B K, etc. by proposing a string feature the result OF latent dirichlet allocation and information gain (SFT), which combines LDA and IG to increase the weight OF vocabulary, thereby selecting a feature word with stronger semantic information (Wang B K, Huang Y F, Yang W X, et al. short text classification based on string feature the result OF the feature [ J ]. joural OF zhejiangg property-SCIENCE C-components & electrorics, 2012,13(9): 649-. Yang Lili et al propose a Semantic extension method by combining words and Semantic Features of Short Text, which utilizes Wikipedia as a background knowledge base to obtain Semantic Features of words, and recalculates feature word weights based on combinations of words and semantics (Yang L, Li C, Ding Q, et al. combining lessical and Semantic Features for Short Text Classification [ J ]. Procedia Computer Science,2013,22(0): 78-86.).
In 2013, Google's Tomas Mikolov team issued an open source word vector generation tool based on deep learning — word2vec (Mikolov T, Le Q V, Sutskeeper I. expanding criteria animals sources for machine translation [ J ]. arXiv prediction arXiv:1309.4168,2013.Mikolov T, Chen K, Corrado G, et al. efficiency estimation of word reproducibility in vector space [ J ]. arXiv prediction arXiv:1301.3781,2013.). The algorithm can learn high-quality word vectors from a large-scale real document corpus in a short time and is used for conveniently calculating semantic similarity between words. word2vec can not only find semantic information among words, but also provide a new solution for the problem that a vector space model is sparse in short text expression.
Disclosure of Invention
The invention aims to provide a short text representation method based on word2vec aiming at the problems of data space sparseness and semantic missing of a Vector Space Model (VSM), and a knowledge theme can be better extracted by using a clustering result of short texts represented by the short text representation method based on word2 vec.
In order to realize the purpose, the technical scheme is as follows:
a short text representation method based on word2vec comprises the following steps:
s1: inputting a training text set subjected to text preprocessing, setting word2vec method parameters, and training to obtain a word vector set corresponding to the training text set;
s2: calculating the cosine distance between word vectors for each word in each document to obtain a series of similar words of the word in the whole training text set;
s3: calculating the cosine distance between the similar words in each document and the document;
s4: sorting according to the cosine distances from big to small, and finally selecting the first n similar words and the corresponding cosine distances to form n similar words and cosine measurement of the document;
s5: and calculating the weights of the words in the documents and the selected n similar words in the documents to form new text representation, and outputting the vector space representation of each document after improvement based on word2 vec.
Preferably, the preprocessing procedure of the training text set in step S1 includes:
s1.1: constructing a user dictionary to perform word segmentation processing and part-of-speech tagging on the training text;
s1.2: removing stop words according to the existing stop word list, and removing pronouns, prepositions and orientation words according to the part of speech;
s1.3: and the feature selection is carried out by adopting methods such as TF, IDF or TF-IDF and the like, so that the feature dimension is reduced.
Preferably, the specific calculation process of step S3 is as follows:
if some words in the document have consistent similar words, the cosine distances of the consistent similar words are added to form the cosine distance between the similar words and the document, otherwise, the original similar words and the cosine distances between the original similar words and the words in the document are kept:
s(t,d)=s(t,t1)+s(t,t2)+s(t,t3)+…+s(t,tn) (1)
wherein, t1,t2,t3,…,tnIs the vocabulary in the document d, s (t, t)n) Representing words t and words t in documents dnS (t, d) represents the cosine measure of the word t and the document d.
Preferably, the specific process of calculating the weight of the word in the document and the selected n similar words in the document in the step S5 is as follows:
w (t, nd) is the weight of the word t in the document nd added with n adjacent words, and is obtained by a characteristic weight calculation method TF-IDF; s (t, d) represents the cosine measure of the word t and the document d.
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention provides a short text representation method based on word2vec, wherein the word2vec is used for finding out similar words of each word in a text, and then the similar words of the text are obtained through calculation and are used as the expansion of the characteristics of the text in a vector space model.
(2) The experimental result shows that the short text representation method based on word2vec has the performance which is obviously superior to that of the traditional vector space model in the text clustering link and the text classification link of the experiment, the average value of the DB _ index of the clustering link is reduced by 0.704, the average value of the classification accuracy of the classification link is improved by 4.614%, and the short text representation method based on word2vec improves the clustering effect in two aspects of technology and application and can better extract the knowledge subject in the corpus.
Drawings
FIG. 1 Process for representing short text by word2 vec-based vector space model improvement method
FIG. 2 is a DB _ index line graph of text represented by a traditional vector space model method and changing with feature dimensions under different clustering numbers
FIG. 3 is a DB _ index line graph showing text variation with feature dimensions under different cluster numbers based on the method of the present invention
FIG. 4 is a histogram of clustered DB _ index values of a text represented by a conventional vector space model method and the method of the present invention
FIG. 5 is a histogram of classification accuracy of text as a function of feature dimensions based on a conventional vector space model method and methods described herein
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the invention is further illustrated below with reference to the figures and examples.
Example 1
The present invention and other features and advantages thereof will be described in more detail below with reference to the accompanying drawings, in which a comprehensive two-child policy short text corpus is taken as an example.
The acquisition and preprocessing process of the training text set is as follows:
a comprehensive two-child policy short text corpus used for experiments is obtained by crawling a Xinlang microblog, and 102300 pieces of available data are obtained as an experiment corpus after necessary cleaning and filtering are carried out on the captured data. After Chinese segmentation and part-of-speech tagging of a text are completed by using a word segmentation system Java edition of NLPIR2016, a work-in-the-great stop word list is imported to remove stop words, and meanwhile, words without practical meanings such as pronouns, prepositions, azimuth words and the like are removed according to the part-of-speech. In the feature selection process, the unsupervised feature selection methods mainly included at present are TF, IDF and TF-IDF, and the TF-IDF method is selected for feature selection in the embodiment, so that the feature dimension is reduced.
(1) Short text presentation process
As shown in fig. 1, a process of representing a short text by using a word2 vec-based short text representation method includes the following specific steps:
s1: before a word2 vec-based short text representation method is adopted, a Google word2vec open source tool is used for generating a word vector document for input data in a Linux environment. The text data after the text preprocessing of the training set is used as a data set for word2vec word vector generation, parameters are set to be-cbow 0-size 200-window 5-negative 0-hs 1-sample 1e-3-threads 12-binary 1, a Skip-Gram model is used, the size of a training window is 5, and a 200-dimensional word vector is generated.
S2: a series of similar words of the words in the text in the whole training text set are obtained by utilizing a word2vec method according to word vector calculation, and a table 1 shows the words in one text, the corresponding similar words and cosine distance values of the words in the one text.
Table 1 approximate words and cosine distances of partial words obtained by word2vec method
S3: and (3) calculating the cosine distance between the similar words of each document and the document according to the formula (1).
S4: the selection of the number n of the similar words of the document is considered, if the value of n is too small, the number of the similar words considered to participate in calculation of each document after the feature selection is too small; if the value of n is too large, the calculation amount and the running time of a text representation link are greatly increased, the invention sets n to be 50, namely, the first 50 similar words and the corresponding cosine distance of the document are selected as the expansion characteristics of the document.
S5: and (3) calculating the weights of the words in the documents and the selected n similar words in the documents according to the formula (2), forming a new text representation, and outputting a vector space representation of each document after improvement based on word2 vec.
(2) Evaluation method
And respectively calculating DB _ index of different feature dimensions selected by the documents represented by the conventional method and the word2 vec-based short text representation method and determining the cluster number by seeking the minimum DB _ index by using a K-means clustering method.
A DB _ index value-taking line graph that varies with feature dimensions based on different clustering numbers in the conventional vector space model method is shown in fig. 2. A graph showing the change of the DB _ index value along with the feature dimension under different clustering numbers based on the word2vec short text representation method is shown in FIG. 3.
As can be seen from fig. 2 and 3, no matter the conventional vector space model method or the word2 vec-based short text representation method is used, when the clustering number is 13, the intra-class dispersion and the inter-class separation of the feature dimensions remain relatively stable, the classification is relatively stable, and the DB _ index has the minimum value, so that the clustering number 13 is selected as the optimal clustering number.
(1)DB_index
Where k is the number of clusters, dijIs the distance between two class centers, SiIs the average distance of the samples in class i to the center of the class.
Because the optimal clustering effect is obtained when the clustering number is 13, the clustering effects of the two text representation methods are compared when the clustering number is 13. FIG. 4 is a DB-index value histogram of two text representation methods under different feature dimensions when the number of clusters is 13. As can be seen from FIG. 4, when the number of clusters is 13 and the feature dimension is between 200 and 2000, the word2 vec-based short text representation method can obtain a lower DB-index value than the conventional vector space model method. The text representation is performed by using a word2 vec-based short text representation method, so that the text can be better represented, and the text can obtain smaller intra-cluster aggregation degree and larger inter-cluster separation degree in the clustering process.
(2) Interpretation of clustering results
Considering that DB _ index has a minimum value of 1.168 when the feature dimension of the word2 vec-based short text representation method is 200, the clustering result at this time is explained as shown in table 2.
As can be seen from table 2, after the overall policy for the second child is opened, there are direct civil problems to be paid attention to and solved, including category 1 educational medicine, category 4 late marriage and late childbirth, and category 11 female employment, which are related problems directly fed back by the public at the first time after the overall policy for the second child is opened, and these problems are negative effects brought by the overall policy for the second child, and should cause related units to pay attention and take corresponding measures. Meanwhile, the categories 2, 6 and 9 represent the economic and life pressure burden brought by the comprehensive policy of the second-class children, people need to improve the personal income level and the life quality more to consider the opportunity of the second-class children, so that the government is forced to implement more welfare guarantee systems, provide more comprehensive employment and other means for promoting the income level of the people, and otherwise, the people are forced to have real life pressure despite opening the comprehensive second-class children, the general fertility will of the people is not high, and the aging problem of the population cannot be relieved. The categories 3, 8 and 13 mainly relate to the problems in the family which may be brought about by the general girl, and although the contents have no direct relation with the contents such as policy enforcement, the contents are worried about by each people when considering whether to respond to the general girl policy, and the people are believed to have proper practice and judgment. The categories 5, 10 and 12 mainly relate to the opinion and feelings of the public on the comprehensive policy of the second child, and most of the three categories represent the support and expectation of the public on the comprehensive policy of the second child, so that the comprehensive policy can be concluded or the requirements of the public can be met.
TABLE 2 different clustering corresponds to characteristic words and examples of text within classes
The explanation of each category in the clustering result shows that the category formed by clustering the short texts represented by the method has good explanatory property, and the knowledge subject in the cluster is easier to extract.
(3) Text classification accuracy rate using clustering result as training corpus
And manually classifying the test set documents according to the feature words and the category explanations of the categories to realize category labeling of the test set. And taking the manually marked documents as a test set, and taking the documents of which the classification results are obtained by text clustering as a training set, so as to check the accuracy of the training corpus automatically constructed by clustering. The traditional vector space model method and the word2 vec-based short text representation method are respectively adopted for text representation, a feature selection method TF-IDF consistent with a clustering link is adopted, and classification results based on different classifiers under different feature dimensions are obtained and are shown in Table 3.
TABLE 3 accuracy of different classifiers with respect to feature dimension under different text representation methods
In order to more intuitively compare the effect difference of two different text representation methods during text classification, a value histogram of the classification accuracy of the text along with the feature dimension in the different text representation methods can be drawn based on table 3, as shown in fig. 5.
As can be seen from fig. 5, under the word2 vec-based short text representation method, except that the feature dimension is 100 (at this time, there may be too few feature dimensions, and there are not enough feature words available for distinguishing different categories), the training corpus automatically constructed by clustering can obtain a classification accuracy higher than 80%. Meanwhile, the observation shows that under the conditions of different feature dimensions and different classifiers, the classification accuracy of the word2 vec-based short text representation method is higher than that of the traditional vector space model method, the improvement range is only 2.38% except that the feature dimension is 500, and the improvement range is 3.16% to 6.87% under the other conditions, wherein the classifier method is an SVM. The method shows that the corpus constructed by clustering by using the text representation method can better distinguish knowledge subjects in the corpus and obtain better effect in application.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (3)
1. A short text representation method based on word2vec is characterized in that: the method comprises the following steps:
s1: inputting a training text set subjected to text preprocessing, setting word2vec method parameters, and training to obtain a word vector set corresponding to the training text set;
s2: calculating the cosine distance between word vectors for each word in each document to obtain a series of similar words of the word in the whole training text set;
s3: calculating the cosine distance between the similar words in each document and the document;
s4: sorting according to the cosine distances from big to small, and finally selecting the first n similar words and the corresponding cosine distances to form n similar words and cosine measurement of the document;
s5: calculating weights of the words in the documents and the selected n similar words in the documents to form new text representation, and outputting vector space representation of each document after improvement based on word2 vec;
the specific calculation process of step S3 is as follows:
if some words in the document have consistent similar words, the cosine distances of the consistent similar words are added to form the cosine distance between the similar words and the document, otherwise, the original similar words and the cosine distances between the original similar words and the words in the document are kept:
s(t,d)=s(t,t1)+s(t,t2)+s(t,t3)+…+s(t,tn) (1)
wherein, t1,t2,t3,…,tnIs the vocabulary in the document d, s (t, t)n) Representing words in the word t and document dSink tnS (t, d) represents the cosine measure of the word t and the document d.
2. The word2 vec-based short text representation method according to claim 1, wherein: the preprocessing process of the training text set in step S1 includes:
s1.1: constructing a user dictionary to perform word segmentation processing and part-of-speech tagging on the training text;
s1.2: removing stop words according to the existing stop word list, and removing pronouns, prepositions and orientation words according to the part of speech;
s1.3: and the feature selection is carried out by adopting methods such as TF, IDF or TF-IDF and the like, so that the feature dimension is reduced.
3. The word2 vec-based short text representation method according to claim 2, characterized in that: the specific process of calculating the weights of the words in the document and the selected n similar words in the document in the step S5 is as follows:
w (t, nd) is the weight of the word t in the document nd added with n adjacent words, and is obtained by a characteristic weight calculation method TF-IDF; s (t, d) represents the cosine measure of the word t and the document d.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810525103.8A CN108804595B (en) | 2018-05-28 | 2018-05-28 | Short text representation method based on word2vec |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810525103.8A CN108804595B (en) | 2018-05-28 | 2018-05-28 | Short text representation method based on word2vec |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108804595A CN108804595A (en) | 2018-11-13 |
CN108804595B true CN108804595B (en) | 2021-07-27 |
Family
ID=64090655
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810525103.8A Expired - Fee Related CN108804595B (en) | 2018-05-28 | 2018-05-28 | Short text representation method based on word2vec |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804595B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162620B (en) * | 2019-01-10 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Method and device for detecting black advertisements, server and storage medium |
CN110232128A (en) * | 2019-06-21 | 2019-09-13 | 华中师范大学 | Topic file classification method and device |
CN110442873A (en) * | 2019-08-07 | 2019-11-12 | 云南电网有限责任公司信息中心 | A kind of hot spot work order acquisition methods and device based on CBOW model |
CN110705304B (en) * | 2019-08-09 | 2020-11-06 | 华南师范大学 | Attribute word extraction method |
CN111177401A (en) * | 2019-12-12 | 2020-05-19 | 西安交通大学 | Power grid free text knowledge extraction method |
CN112257431A (en) * | 2020-10-30 | 2021-01-22 | 中电万维信息技术有限责任公司 | NLP-based short text data processing method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279288A (en) * | 2015-12-04 | 2016-01-27 | 深圳大学 | Online content recommending method based on deep neural network |
CN107102989A (en) * | 2017-05-24 | 2017-08-29 | 南京大学 | A kind of entity disambiguation method based on term vector, convolutional neural networks |
CN107590218A (en) * | 2017-09-01 | 2018-01-16 | 南京理工大学 | The efficient clustering method of multiple features combination Chinese text based on Spark |
-
2018
- 2018-05-28 CN CN201810525103.8A patent/CN108804595B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279288A (en) * | 2015-12-04 | 2016-01-27 | 深圳大学 | Online content recommending method based on deep neural network |
CN107102989A (en) * | 2017-05-24 | 2017-08-29 | 南京大学 | A kind of entity disambiguation method based on term vector, convolutional neural networks |
CN107590218A (en) * | 2017-09-01 | 2018-01-16 | 南京理工大学 | The efficient clustering method of multiple features combination Chinese text based on Spark |
Non-Patent Citations (1)
Title |
---|
基于Word2vec 的文档分类方法;陈杰等;《计算机系统应用》;20171115;第159-164页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108804595A (en) | 2018-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804595B (en) | Short text representation method based on word2vec | |
CN108052593B (en) | Topic keyword extraction method based on topic word vector and network structure | |
CN107609121B (en) | News text classification method based on LDA and word2vec algorithm | |
Devika et al. | Sentiment analysis: a comparative study on different approaches | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN108763213A (en) | Theme feature text key word extracting method | |
CN109376352B (en) | Patent text modeling method based on word2vec and semantic similarity | |
CN109670014B (en) | Paper author name disambiguation method based on rule matching and machine learning | |
CN109508379A (en) | A kind of short text clustering method indicating and combine similarity based on weighted words vector | |
CN112989802B (en) | Bullet screen keyword extraction method, bullet screen keyword extraction device, bullet screen keyword extraction equipment and bullet screen keyword extraction medium | |
CN108038099B (en) | Low-frequency keyword identification method based on word clustering | |
CN110705247B (en) | Based on x2-C text similarity calculation method | |
CN111581364B (en) | Chinese intelligent question-answer short text similarity calculation method oriented to medical field | |
CN108763348A (en) | A kind of classification improved method of extension short text word feature vector | |
CN110046264A (en) | A kind of automatic classification method towards mobile phone document | |
CN112434164A (en) | Network public opinion analysis method and system considering topic discovery and emotion analysis | |
CN109101490A (en) | The fact that one kind is based on the fusion feature expression implicit emotion identification method of type and system | |
CN114491062B (en) | Short text classification method integrating knowledge graph and topic model | |
CN113934835A (en) | Retrieval type reply dialogue method and system combining keywords and semantic understanding representation | |
Yan et al. | Sentiment Analysis of Short Texts Based on Parallel DenseNet. | |
Villegas et al. | Vector-based word representations for sentiment analysis: a comparative study | |
Liu et al. | LIRIS-Imagine at ImageCLEF 2011 Photo Annotation Task. | |
Háva et al. | Supervised two-step feature extraction for structured representation of text data | |
CN110674293B (en) | Text classification method based on semantic migration | |
CN109871429B (en) | Short text retrieval method integrating Wikipedia classification and explicit semantic features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210727 |