CN106446147A - Emotion analysis method based on structuring features - Google Patents
Emotion analysis method based on structuring features Download PDFInfo
- Publication number
- CN106446147A CN106446147A CN201610839375.6A CN201610839375A CN106446147A CN 106446147 A CN106446147 A CN 106446147A CN 201610839375 A CN201610839375 A CN 201610839375A CN 106446147 A CN106446147 A CN 106446147A
- Authority
- CN
- China
- Prior art keywords
- text
- dictionary
- influence
- score value
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an emotion analysis method based on structuring features. The emotion analysis method includes the steps of collecting Twitter text data; building a Twitter text database; collecting existing emotion polarity value dictionaries; manually establishing related auxiliary dictionaries; preprocessing the Twitter text database; defining an emotion score influence factor, extracting language features of information, and updating the value of the emotion score influence factor every time one language feature is extracted; calculating the emotion polarity values of the Twitter text data through the emotion polarity value dictionaries and the emotion score influence factor. According to the emotion analysis method based on the structuring features, it is avoided that in supervision methods, a large amount of marked data is required to train a classifier, and analysis and generalization are difficult; the CPU processing requirement, the internal storage requirement and the overhead for calculating training time are reduced.
Description
Technical field
The present invention relates to a kind of sentiment analysis method.More particularly to a kind of unsupervised emotion based on structured features
Analysis method.
Background technology
Appearance with social media and popular, increasing user tends to divide by different social media platforms
Enjoy their particular views or simple their emotion of expression and mood.In these social platform, Twitter becomes and flows most
One of website of row, shows according to statistics in 2016, it has had more than 645,000,000 register user at present, averagely
The daily tweet quantity sent out is more than 190,000,000.By the API of Twitter, we can obtain the number enriching in a large number
According to enabling us to sufficiently these data be detected and excavate, be the good opportunity of sentiment analysis.Thus helping us
Infer the popular viewpoint for all kinds of things, we can make wiser prediction and selection using these conclusions, is based on
The sentiment analysis of Twitter text data, become study hotspot instantly naturally.
Sentiment analysis for Twitter text data relate generally to natural language processing, opining mining and emotional semantic classification
Etc. technology.The method realizing sentiment analysis at present mainly has two kinds:A kind of is unsupervised approaches based on dictionary, this method master
Depend on and contain the sentiment dictionary carrying feeling polarities information in a large number, such as LIWC[1]、ANEW[2]、AFINN[3]、VADER[4]、
SentiWordNet[5]Deng;Second method is measure of supervision, and this method passes through machine learning algorithm from a large number with mark
Extracting data feature training grader, such as SVM (Support Vector Machine),Bayes、Decision
Tree etc..Most-often used feature is depositing of n-grams (continuous 1 in text, 2,3 or multiple text-independent unit)
Whether or usage frequency.But this method needs, in the training stage, the data that a large amount of bands mark, therefore with regard to CPU process,
Memory requirements and for the training time computing cost larger.Additionally, for the data of a large portion, Supervised classification device institute
The decision-making score value of prediction is in close proximity to decision boundary, and which imply which kind of grader belongs on earth for text is very not
Determine, therefore, or distribute to the label of this kind of data if it were not for full of prunes right be also cas fortuit[6].Therefore exist
Here sentiment analysis are realized it is intended that selecting based on the unsupervised approaches of dictionary.
Twitter text is mainly based on the significant challenge that the sentiment analysis field of Twitter text data faces at present
The feature of itself is brought:Within the length of a such as tweet is limited in 140 words, so for the information of our offers
With regard to relatively limited;Except its irregular language construction and grammatical representation mode, in a tweet, may also contain many
Initialism, symbol expression, topic label, slang, chained address etc., this makes emotion extract and opining mining becomes difficult.
Existing conventional traditional natural language processing techniques (Natural Language Preprocessing, NLP) such as participle, standard
Change, part-of-speech tagging etc. can be effectively applied on the specification text normally write, and no longer suitable for Twitter data
With.
Content of the invention
The technical problem to be solved is to provide a kind of avoiding to be needed to be marked in a large number in supervision class method
Data is training the sentiment analysis method based on structured features of grader.
The technical solution adopted in the present invention is:A kind of sentiment analysis method based on structured features, walks including following
Suddenly:
1) gather Twitter text data, set up Twitter text database;
2) collect existing feeling polarities value dictionary, preferentially choose by the sentiment dictionary manually generating;
3) set up related auxiliary dictionary manually, including:Standard word dictionary, negative word dictionary, strengthen qualifier dictionary, subtract
Weak qualifier dictionary and network slang dictionary;
4) described Twitter text database is pre-processed, including:
(1) first participle is carried out to the data in Twitter text database;
(2) it is standardized;
(3) text is carried out with part-of-speech tagging (Part-of-Speech Tagging, POS Tagging);
5) defining emotion score value factor of influence, to step 4) information that obtains of pretreatment carries out language feature extraction, described
Language feature include the language feature of the other language feature of word-level, the language feature of phrase rank and sentence level, often carry
A language feature is taken just to update the numerical value of an emotion score value factor of influence;
6) utilize step 2) the feeling polarities value dictionary that obtains and step 5) the emotion score value factor of influence that obtains is every
Twitter text data calculates feeling polarities value.
Step 2) described in feeling polarities value dictionary include:Sentiment dictionary AFINN that 3 manually manually generate,
SentiStrength and VADER, and a sentiment dictionary Opinion Observer automatically generating.
Step 4) participle described in (1st) step is that Twitter text data is divided into minimum significant text-independent
Unit, marks the type of each text-independent unit simultaneously respectively.
Step 4) standardization described in (2nd) step is by the text-independent using repetitive letter using standard English dictionary
Unit is changed to canonical form, symbol expression in identification Twitter text data, picture expression and letter expressing, and judge and
Mark corresponding feeling polarities.
Step 4) part-of-speech tagging is carried out to text described in (3rd) step, it is the part of speech class marking each text-independent unit
Not.
Step 5) described in definition emotion score value factor of influence, be for each text-independent unit t introduce an emotion divide
Value factor of influence IFt, wherein IFt>=0, initial value is 1, in order to react the described language feature emotion to text-independent unit
Intensity enhancing or the degree weakening, emotion score value factor of influence formula is as follows:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update
The emotion score value factor of influence of this unit t, p refers to a certain feature, and P refers to all feature sets that can affect emotion score value factor of influence
Close.
Step 5) in word-level, other language feature extracts and includes:
If the alphabetical all Caps in a text-independent unit, all Caps mark sAllCaps=1, otherwise
sAllCaps=0, and update emotion score value factor of influence formula IFt:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update
The emotion score value factor of influence of this unit t;
If a text-independent unit uses repetitive letter, distribute an elongation factor for each text-independent unitIntorigRepresent original text-independent unit, tnormRepresent the text-independent after standardization
Unit, and update emotion score value factor of influence formula IFt:
Step 5) in the language feature of phrase rank extracted include:
Using step 3) the manual negative word dictionary set up, determine the beginning containing negative content phrase, by fullstop, ask
Number, exclamation mark and off-gauge text-independent unit be defined as the end mark containing negative content phrase, and update emotion and divide
Value factor of influence formula IFt:
Wherein t is the text-independent unit within the scope of negatives.
Using step 3) the enhancing qualifier dictionary set up manually and weaken qualifier dictionary, find out Twitter textual data
According to all of qualifierCalculate the stretching, extension factor of qualifier according to the following formula:
Wherein m represents certain qualifier, MDMRepresent and modify set of words, if the alphabetical all Caps of certain qualifier m, repair
Excuse all Caps markOtherwiseIf certain qualifier m employs repetitive letter, using repetition
Letter designationsOtherwise
And update emotion score value factor of influence formula IFt:
Step 5) in the language feature of sentence level extracted include:
By the sentence structure of X but Y, determine the Twitter text data using adversative conjunction (as but, yet etc.),
Part X before mark conjunction and part Y after conjunction, and update emotion score value factor of influence formula IFt:
Wherein, if text-independent unit t is in X,If in Y,
By following three kinds of sentence structures:If X, Y, If X () then Y and Y, if X., determine use condition sentence
Twitter text data, in sentence structure, X is conditional clause and Y is result sentence, and updates emotion score value factor of influence formula IFt:
Wherein, text-independent unit t is in X.
Step 6) described in calculating feeling polarities value, including:
(1) calculate the basic emotion polarity number of each text-independent unit t, if L is feeling polarities value dictionary collection used,
Lt={ l ∈ Lt| t ∈ l } represent comprise text-independent unit t sentiment dictionary subset, each text-independent list is obtained by following formula
The basic emotion polarity number s of first tt:
Wherein score (l, t) is the basic emotion polarity number of each text-independent unit t being given in dictionary l, | Lt| table
Show the number of the feeling polarities value dictionary comprising text-independent unit t;
(2) to each text-independent unit t, using step 5) the emotion score value factor of influence IF that obtainstUpdate each only
The basic emotion polarity number s of vertical text unit tt:
(3) it is that every Twitter text data T calculates overall emotion score value ST:
A kind of sentiment analysis method based on structured features of the present invention, it is to avoid need quilt in a large number in supervision class method
The data of mark, to train grader it is difficult to analyzing and carrying out vague generalization, reduces CPU process, memory requirements and training time
Computing cost.Beneficial effects of the present invention are specifically:
1st, avoid and supervise class method using based on having of machine learning, need not rely upon the data being marked in a large number to instruct
Practice grader thus realizing sentiment analysis;
2nd, employ the preprocessor of fine emotion perception such that it is able to effectively process informal social media literary composition
This information, improves efficiency and the classification accuracy of subsequent treatment;
3rd, propose a kind of structurized feature extraction mode, divide such that it is able to easily update emotion defined in us
Value factor of influence, and then improve the calculating process of emotion score value.
Brief description
Fig. 1 is the flow chart based on the sentiment analysis method of structured features for the present invention.
Specific embodiment
With reference to embodiment and accompanying drawing, a kind of of the present invention is made based on the sentiment analysis method of structured features in detail
Describe in detail bright.
As shown in figure 1, a kind of sentiment analysis method based on structured features of the present invention, comprise the steps:
1) gather Twitter text data, set up Twitter text database;
2) collect existing feeling polarities value dictionary, preferentially choose by the sentiment dictionary manually generating;Described feelings
Sense polarity number dictionary includes:Sentiment dictionary AFINN, SentiStrength and VADER that 3 manually manually generate, and one
The sentiment dictionary Opinion Observer automatically generating.Table 1 gives the general introduction of feeling polarities value dictionary and its feature.
Table 1 sentiment dictionary is summarized
3) set up related auxiliary dictionary manually, including:Standard word dictionary, negative word dictionary, strengthen qualifier dictionary, subtract
Weak qualifier dictionary and network slang dictionary;Table 2 gives the summary of dictionary used by us.
Table 2 auxiliary dictionary general introduction
4) described Twitter text database is pre-processed, including:
(1) first participle is carried out to the data in Twitter text database.Described participle, is by Twitter text
Data is divided into minimum significant text-independent unit, marks the type of each text-independent unit respectively simultaneously, such as word,
Topic label, symbol expression, chained address etc..Mated by regular expression different types of text-independent unit and be its
Mark respective labels.
(2) it is standardized.Described standardization, is by the text-independent using repetitive letter using standard English dictionary
Unit is changed to canonical form, symbol expression in identification Twitter text data, picture expression and letter expressing, and judge and
Mark corresponding feeling polarities.Specific as follows:
A. letter elongation. letter elongation refers to increase the expression dynamics of word using the letter repeating, and is primarily based on voice
Coding is from DenAnd DslangIn look for the index of word, if our standardization device runs into one is not present in this two words
Word in allusion quotation, then confirmed the option mating, then calculate between the option that input is mated with each by voice coding
Levenshtein distance weighs its similitude, returns best match.
B. symbol expression. the figure of the facial expression that symbol expression is made up of punctuation mark or letter represents
As:-),:),:O) etc., we by positive and passive symbol table mutual affection be not standardized as [EMOTICON+] and
[EMOTICON-].
C. picture expression (emoji).Since two thousand and ten, increasing picture expression is added into standard unicode
(UNICODE)-Unicode 8.0 in, such asExpress one's feelings similar with symbol, all of picture is expressed one's feelings all standards by we
Change and correspond to predefined text-independent unit s, such as [EMOJI+], [EMOJI0], [EMOJI-].
D. letter expressing (emotext). last, we standardize letter expressing such as haha, hehe, xixi, and we pass through
Coupling comprises at least k repetitive letter (setting k=2 at present) and finds out these letter expressings with matching regular expressions, then will
Each letter expressing is standardized as its core form, such as hhahahah is changed into haha.
(3) text is carried out with part-of-speech tagging (Part-of-Speech Tagging, POS Tagging).Described to literary composition
Originally carry out part-of-speech tagging, be the part of speech classification marking each text-independent unit, such as noun, adjective, verb etc..
5) defining emotion score value factor of influence, to step 4) information that obtains of pretreatment carries out language feature extraction, described
Language feature include the language feature of the other language feature of word-level, the language feature of phrase rank and sentence level, often carry
A language feature is taken just to update the numerical value of an emotion score value factor of influence;Described definition emotion score value factor of influence, be
Introduce an emotion score value factor of influence IF for each text-independent unit tt, wherein IFt>=0, initial value is 1, in order to react
, to the emotion intensity enhancing of text-independent unit or the degree weakening, emotion score value factor of influence formula is such as the language feature stated
Under:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update
The emotion score value factor of influence of this unit t, p refers to a certain feature, and P refers to all feature sets that can affect emotion score value factor of influence
Close.
To word-level, other language feature extracts and includes:
If the alphabetical all Caps in a text-independent unit, all Caps mark sAllCaps=1, otherwise
sAllCaps=0, and update emotion score value factor of influence formula IFt:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update
The emotion score value factor of influence of this unit t.
If a text-independent unit uses repetitive letter, distribute an elongation factor for each text-independent unitIntorigRepresent original text-independent unit, tnormRepresent the text-independent after standardization
Unit, and update emotion score value factor of influence formula IFt:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update
The emotion score value factor of influence of this unit t.
Inclusion is extracted to the language feature of phrase rank:
Using step 3) the manual negative word dictionary set up, determine the beginning containing negative content phrase, by fullstop, ask
Number, exclamation mark and off-gauge text-independent unit be defined as the end mark containing negative content phrase, and update emotion and divide
Value factor of influence formula IFt:
Wherein t is the text-independent unit within the scope of negatives,After referring to update, the emotion of text-independent unit t is divided
Value factor of influence,The emotion score value factor of influence of the text-independent unit t before referring to update.
Using step 3) the enhancing qualifier dictionary set up manually and weaken qualifier dictionary, find out Twitter textual data
According to all of qualifierCalculate the stretching, extension factor of qualifier according to the following formula:
Wherein m represents certain qualifier, MDMRepresent and modify set of words, if the alphabetical all Caps of certain qualifier m, repair
Excuse all Caps markOtherwiseIf certain qualifier m employs repetitive letter, using repetition
Letter designationsOtherwise
And update emotion score value factor of influence formula IFt:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Independent literary composition before referring to update
The emotion score value factor of influence of this unit t.
In the language feature of sentence level extracted include:
By the sentence structure of X but Y, determine the Twitter text data using adversative conjunction (as but, yet etc.),
Part X before mark conjunction and part Y after conjunction, and update emotion score value factor of influence formula IFt:
Wherein, if text-independent unit t is in X,If in Y,
The emotion score value factor of influence of text-independent unit t after referring to update,The emotion score value of the text-independent unit t before referring to update
Factor of influence.
By following three kinds of sentence structures:If X, Y, If X () then Y and Y, if X., determine use condition sentence
Twitter text data, in sentence structure, X is conditional clause and Y is result sentence, and updates emotion score value factor of influence formula IFt:
Wherein, text-independent unit t is in X.
6) utilize step 2) the feeling polarities value dictionary that obtains and step 5) the emotion score value factor of influence that obtains is every
Twitter text data calculates feeling polarities value.Described calculating feeling polarities value, including:
(1) calculate the basic emotion polarity number of each text-independent unit t, if L is feeling polarities value dictionary collection used,
Lt={ l ∈ Lt| t ∈ l } represent comprise text-independent unit t sentiment dictionary subset, each text-independent list is obtained by following formula
The basic emotion polarity number s of first tt:
Wherein score (l, t) is the basic emotion polarity number of each text-independent unit t being given in dictionary l, | Lt| table
Show the number of the feeling polarities value dictionary comprising text-independent unit t.
(2) to each text-independent unit t, using step 5) the emotion score value factor of influence IF that obtainstUpdate each only
The basic emotion polarity number s of vertical text unit tt:
(3) it is that every Twitter text data T calculates overall emotion score value ST:
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention
Sequence number is for illustration only, does not represent the quality of embodiment.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and
Within principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.
Bibliography in background technology is as follows:
[1]Pennebaker J W,Francis M E,Booth R J.Linguistic inquiry and word
count:LIWC 2001[J].Mahway:Lawrence Erlbaum Associates,2001,71:2001..
[2]Bradley M M,Lang P J.Affective norms for English words(ANEW):
Instruction manual and affective ratings[R].Technical report C-1,the center
for research in psychophysiology,University of Florida,1999.
[3]Nielsen FA new ANEW:Evaluation of a word list for sentiment
analysis in microblogs[J].arXiv preprint arXiv:1103.2903,2011.
[4]Hutto C J,Gilbert E.Vader:A parsimonious rule-based model for
sentiment analysis of social media text[C]//Eighth International AAAI
Conference on Weblogs and Social Media.2014.
[5]Baccianella S,Esuli A,Sebastiani F.SentiWordNet 3.0:An Enhanced
Lexical Resource for Sentiment Analysis and Opinion Mining[C]//LREC.2010,10:
2200-2204.
[6]Chikersal P,Poria S,Cambria E.SeNTU:sentiment analysis of tweets
by combining a rule-based classifier with supervised learning[J].SemEval-
2015,2015:647.
Claims (10)
1. a kind of sentiment analysis method based on structured features is it is characterised in that comprise the steps:
1) gather Twitter text data, set up Twitter text database;
2) collect existing feeling polarities value dictionary, preferentially choose by the sentiment dictionary manually generating;
3) set up related auxiliary dictionary manually, including:Standard word dictionary, negative word dictionary, strengthen qualifier dictionary, weaken and repair
Excuse dictionary and network slang dictionary;
4) described Twitter text database is pre-processed, including:
(1) first participle is carried out to the data in Twitter text database;
(2) it is standardized;
(3) text is carried out with part-of-speech tagging (Part-of-Speech Tagging, POS Tagging);
5) defining emotion score value factor of influence, to step 4) information that obtains of pretreatment carries out language feature extraction, described language
Speech feature includes the language feature of the other language feature of word-level, the language feature of phrase rank and sentence level, often extracts one
Individual language feature just updates the numerical value of an emotion score value factor of influence;
6) utilize step 2) the feeling polarities value dictionary that obtains and step 5) the emotion score value factor of influence that obtains is every
Twitter text data calculates feeling polarities value.
2. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 2) institute
The feeling polarities value dictionary stated includes:Sentiment dictionary AFINN, SentiStrength and VADER that 3 manually manually generate, with
And a sentiment dictionary Opinion Observer automatically generating.
3. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 4)
(1) participle described in step, is that Twitter text data is divided into minimum significant text-independent unit, marks respectively simultaneously
Note the type of each text-independent unit.
4. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 4)
(2) standardization described in step, is, using standard English dictionary, the text-independent unit using repetitive letter is changed to canonical form,
Symbol expression in identification Twitter text data, picture expression and letter expressing, and judge and mark corresponding feeling polarities.
5. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 4)
(3) part of speech classification part-of-speech tagging being carried out to text, being each text-independent unit of mark described in step.
6. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) institute
The definition emotion score value factor of influence stated, is to introduce an emotion score value factor of influence IF for each text-independent unit tt, its
Middle IFt>=0, initial value is 1, or weakens to the emotion intensity enhancing of text-independent unit in order to react described language feature
Degree, emotion score value factor of influence formula is as follows:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Refer to the text-independent list before updating
The emotion score value factor of influence of first t, p refers to a certain feature, and P refers to all characteristic sets that can affect emotion score value factor of influence.
7. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) in
To word-level, other language feature extracts and includes:
If the alphabetical all Caps in a text-independent unit, all Caps mark sAllCaps=1, otherwise sAllCaps=
0, and update emotion score value factor of influence formula IFt:
WhereinThe emotion score value factor of influence of text-independent unit t after referring to update,Refer to the text-independent list before updating
The emotion score value factor of influence of first t;
If a text-independent unit uses repetitive letter, distribute an elongation factor for each text-independent unitIntorigRepresent original text-independent unit, tnormRepresent the text-independent after standardization
Unit, and update emotion score value factor of influence formula IFt:
8. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) in
Inclusion is extracted to the language feature of phrase rank:
Using step 3) the manual negative word dictionary set up, determine the beginning containing negative content phrase, by fullstop, question mark, sense
Exclamation and off-gauge text-independent unit are defined as the end mark containing negative content phrase, and update the impact of emotion score value
Factor formula IFt:
Wherein t is the text-independent unit within the scope of negatives;
Using step 3) the enhancing qualifier dictionary set up manually and weaken qualifier dictionary, find out Twitter text data institute
Some qualifiersCalculate the stretching, extension factor of qualifier according to the following formula:
Wherein m represents certain qualifier, MDMRepresent and modify set of words, if the alphabetical all Caps of certain qualifier m, qualifier
All Caps markOtherwiseIf certain qualifier m employs repetitive letter, using repetitive letter
MarkOtherwise
And update emotion score value factor of influence formula IFt:
9. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 5) in
Inclusion is extracted to the language feature of sentence level:
By the sentence structure of X but Y, determine the Twitter text data using adversative conjunction (as but, yet etc.), mark
Part X before conjunction and part Y after conjunction, and update emotion score value factor of influence formula IFt:
Wherein, if text-independent unit t is in X,If in Y,
By following three kinds of sentence structures:If X, Y, If X () then Y and Y, if X., determine use condition sentence
Twitter text data, in sentence structure, X is conditional clause and Y is result sentence, and updates emotion score value factor of influence formula IFt:
Wherein, text-independent unit t is in X.
10. a kind of sentiment analysis method based on structured features according to claim 1 is it is characterised in that step 6)
Described calculating feeling polarities value, including:
(1) calculate the basic emotion polarity number of each text-independent unit t, if L is feeling polarities value dictionary collection used, Lt=
{l∈Lt| t ∈ l } represent comprise text-independent unit t sentiment dictionary subset, each text-independent unit t is obtained by following formula
Basic emotion polarity number st:
Wherein score (l, t) is the basic emotion polarity number of each text-independent unit t being given in dictionary l, | Lt| represent bag
The number of the feeling polarities value dictionary of the t of unit containing text-independent;
(2) to each text-independent unit t, using step 5) the emotion score value factor of influence IF that obtainstUpdate each text-independent
The basic emotion polarity number s of unit tt:
(3) it is that every Twitter text data T calculates overall emotion score value ST:
ST=∑t∈Tst(11).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610839375.6A CN106446147A (en) | 2016-09-20 | 2016-09-20 | Emotion analysis method based on structuring features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610839375.6A CN106446147A (en) | 2016-09-20 | 2016-09-20 | Emotion analysis method based on structuring features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106446147A true CN106446147A (en) | 2017-02-22 |
Family
ID=58166213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610839375.6A Pending CN106446147A (en) | 2016-09-20 | 2016-09-20 | Emotion analysis method based on structuring features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446147A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980650A (en) * | 2017-03-01 | 2017-07-25 | 平顶山学院 | A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications |
CN108681532A (en) * | 2018-04-08 | 2018-10-19 | 天津大学 | A kind of sentiment analysis method towards Chinese microblogging |
CN109697657A (en) * | 2018-12-27 | 2019-04-30 | 厦门快商通信息技术有限公司 | A kind of dining recommending method, server and storage medium |
CN111046136A (en) * | 2019-11-13 | 2020-04-21 | 天津大学 | Method for calculating multi-dimensional emotion intensity value by fusing emoticons and short text |
CN111046137A (en) * | 2019-11-13 | 2020-04-21 | 天津大学 | Multidimensional emotion tendency analysis method |
CN111143564A (en) * | 2019-12-27 | 2020-05-12 | 北京百度网讯科技有限公司 | Unsupervised multi-target chapter-level emotion classification model training method and unsupervised multi-target chapter-level emotion classification model training device |
CN111312394A (en) * | 2020-01-15 | 2020-06-19 | 东北电力大学 | Psychological health condition evaluation system based on combined emotion and processing method thereof |
CN117521813A (en) * | 2023-11-20 | 2024-02-06 | 中诚华隆计算机技术有限公司 | Scenario generation method, device, equipment and chip based on knowledge graph |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663046A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院自动化研究所 | Sentiment analysis method oriented to micro-blog short text |
JP2012226747A (en) * | 2011-04-21 | 2012-11-15 | Palo Alto Research Center Inc | Incorporation of glossary knowledge in svm learning for improvement in feeling classification |
CN104008091A (en) * | 2014-05-26 | 2014-08-27 | 上海大学 | Sentiment value based web text sentiment analysis method |
-
2016
- 2016-09-20 CN CN201610839375.6A patent/CN106446147A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012226747A (en) * | 2011-04-21 | 2012-11-15 | Palo Alto Research Center Inc | Incorporation of glossary knowledge in svm learning for improvement in feeling classification |
CN102663046A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院自动化研究所 | Sentiment analysis method oriented to micro-blog short text |
CN104008091A (en) * | 2014-05-26 | 2014-08-27 | 上海大学 | Sentiment value based web text sentiment analysis method |
Non-Patent Citations (1)
Title |
---|
王志涛 等: "基于词典和规则集的中文微博情感分析", 《计算机工程与应用》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980650A (en) * | 2017-03-01 | 2017-07-25 | 平顶山学院 | A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications |
CN108681532A (en) * | 2018-04-08 | 2018-10-19 | 天津大学 | A kind of sentiment analysis method towards Chinese microblogging |
CN108681532B (en) * | 2018-04-08 | 2022-03-25 | 天津大学 | Sentiment analysis method for Chinese microblog |
CN109697657A (en) * | 2018-12-27 | 2019-04-30 | 厦门快商通信息技术有限公司 | A kind of dining recommending method, server and storage medium |
CN111046136A (en) * | 2019-11-13 | 2020-04-21 | 天津大学 | Method for calculating multi-dimensional emotion intensity value by fusing emoticons and short text |
CN111046137A (en) * | 2019-11-13 | 2020-04-21 | 天津大学 | Multidimensional emotion tendency analysis method |
CN111143564A (en) * | 2019-12-27 | 2020-05-12 | 北京百度网讯科技有限公司 | Unsupervised multi-target chapter-level emotion classification model training method and unsupervised multi-target chapter-level emotion classification model training device |
CN111312394A (en) * | 2020-01-15 | 2020-06-19 | 东北电力大学 | Psychological health condition evaluation system based on combined emotion and processing method thereof |
CN111312394B (en) * | 2020-01-15 | 2023-09-29 | 东北电力大学 | Psychological health assessment system based on combined emotion and processing method thereof |
CN117521813A (en) * | 2023-11-20 | 2024-02-06 | 中诚华隆计算机技术有限公司 | Scenario generation method, device, equipment and chip based on knowledge graph |
CN117521813B (en) * | 2023-11-20 | 2024-05-28 | 中诚华隆计算机技术有限公司 | Scenario generation method, device, equipment and chip based on knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN108763326B (en) | Emotion analysis model construction method of convolutional neural network based on feature diversification | |
CN106446147A (en) | Emotion analysis method based on structuring features | |
Saha et al. | Proposed approach for sarcasm detection in twitter | |
CN109710770A (en) | A kind of file classification method and device based on transfer learning | |
CN107305539A (en) | A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries | |
CN110287319B (en) | Student evaluation text analysis method based on emotion analysis technology | |
CN112001187A (en) | Emotion classification system based on Chinese syntax and graph convolution neural network | |
CN106484664A (en) | Similarity calculating method between a kind of short text | |
CN107590134A (en) | Text sentiment classification method, storage medium and computer | |
CN108388554B (en) | Text emotion recognition system based on collaborative filtering attention mechanism | |
CN101520802A (en) | Question-answer pair quality evaluation method and system | |
CN103646088A (en) | Product comment fine-grained emotional element extraction method based on CRFs and SVM | |
CN105975454A (en) | Chinese word segmentation method and device of webpage text | |
CN106598940A (en) | Text similarity solution algorithm based on global optimization of keyword quality | |
CN107122349A (en) | A kind of feature word of text extracting method based on word2vec LDA models | |
CN110502626A (en) | A kind of aspect grade sentiment analysis method based on convolutional neural networks | |
CN106227768B (en) | A kind of short text opining mining method based on complementary corpus | |
CN105740382A (en) | Aspect classification method for short comment texts | |
CN111339772B (en) | Russian text emotion analysis method, electronic device and storage medium | |
Bansal et al. | Code-switching patterns can be an effective route to improve performance of downstream NLP applications: A case study of humour, sarcasm and hate speech detection | |
CN110134934A (en) | Text emotion analysis method and device | |
CN107357785A (en) | Theme feature word abstracting method and system, feeling polarities determination methods and system | |
Zhang et al. | Exploring deep recurrent convolution neural networks for subjectivity classification | |
CN107818173B (en) | Vector space model-based Chinese false comment filtering method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |
|
WD01 | Invention patent application deemed withdrawn after publication |