CN106649519A - Method of digging and assessing product features - Google Patents
Method of digging and assessing product features Download PDFInfo
- Publication number
- CN106649519A CN106649519A CN201610903523.6A CN201610903523A CN106649519A CN 106649519 A CN106649519 A CN 106649519A CN 201610903523 A CN201610903523 A CN 201610903523A CN 106649519 A CN106649519 A CN 106649519A
- Authority
- CN
- China
- Prior art keywords
- word
- emotion
- product feature
- product
- dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method of digging and assessing product features. The method comprises the steps that multiple items of text assessment information on emotional training lexicon and product feature lexicon are randomly captured for consumers; a target product is determined, multiple items of text assessment information on the target product for different consumers are captured from an e-commerce platform; based on the emotional training lexicon and the product feature lexicon, product feature-emotional phrase pairs are extracted in sequence from each item of text assessment information, and are used to iteratively update the emotional training lexicon and the product feature lexicon until all the multiple items of text assessment information are processed; the extracted product feature-emotional phrase pairs are statistically recorded, and the emotional training lexicon and the product feature lexicon are obtained. The method can dig and assess product features and carry out statistical analyses in the field of Chinese characters, and provides data support for the overall assessment of Chinese character e-commerce business.
Description
Technical field
The present invention relates to e-commerce field, especially, is related to excavation and the evaluation method of a kind of product feature.
Background technology
The appearance applied with Web2.0 and fast development, ecommerce is developing into the business model of a prosperity,
So it is more prone to get the feedback to commodity, client can leave their comment after consumption, and client afterwards is just
Can be decided whether to buy the commodity according to comment.For consumer, if they can be to the front of commodity and negative
The percentage of comment has one clearly to recognize, and can preferably make a choice for whether commodity are bought.People are evaluated most
The feature of those commodity to be supplied to consumer be also helpful.Meanwhile, these results can also report to manufacturer and
Ecommerce retailer, can be used to help the commodity for improving them and service.Then information on integration networkses is simultaneously united
Meter analysis, then providing an overall merit to consumer just becomes extremely important.
In terms of market information is obtained, prior art, as the corpus for excavating market information, includes master using Twitter
Topic detection and emotional semantic classification.However, in Chinese, containing the microblogging of the content information of a large number of users generation has too many falseness to comment
By and advertisement, it is difficult to create suitable data set.Meanwhile, sentiment analysis are usually taken as text classification problem, with news article
Different with the formal form text of scientific paper form, UGC sometimes can't be in strict accordance with grammer, but it is attached also to have some
Plus information.Used in Twitter as ":-) " as the feature that detects as polarity of emoticon;Tongue etc. also may be used
Using the instruction as front and negative emotions.In addition, how to extract them from the comment of commodity is characterized in that prior art
Subject matter.The authors that English product feature is excavated carry out feature extraction using major terms and noun phrase, and
Apriori algorithm generates frequent feature set and prunes preliminary set, but this means is not suitable for Chinese.
The problem that the method excavated for the Chinese-English text comment of prior art and evaluated is not applied in Chinese field, mesh
It is front not yet to have effective solution.
The content of the invention
In view of this, it is an object of the invention to propose a kind of excavation and the evaluation method of product feature, can be in Chinese
Excavate with the feature for evaluating product on field and carry out statistical analysis, the overall merit for Chinese ecommerce provides data
Hold.
Based on above-mentioned purpose, the technical scheme that the present invention is provided is as follows:
The invention provides a kind of excavation of product feature and evaluation method, including:
Text comments information training emotion dictionary and product feature dictionary of a plurality of consumer for product is captured at random;
Determine target product, and a plurality of different consumers are captured from e-commerce platform for the text of target product is commented
By information;
According to emotion dictionary and product feature dictionary, product feature-sense is extracted from every text review information successively
Feelings word pair, and emotion dictionary and product feature dictionary are updated to iteration using product feature-emotion word, until a plurality of text
Review information is processed;
To all product feature-emotion words being extracted to counting, product feature and the sense of the product is obtained
Feelings are evaluated.
Wherein, a plurality of consumer is captured at random for the text comments information training emotion dictionary of product includes:
It is determined that the text comments information captured at random per bar is front evaluating or unfavorable ratings;
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out
Part-of-speech tagging;
Each adjective obtained word segmentation processing using Nae Bayesianmethod is according to the adjectival occurrence number meter
Calculate the adjectival prior probability in front evaluation with unfavorable ratings;
The adjective is classified as by front word with the prior probability in unfavorable ratings according to adjectival the evaluation in front
Or negative word, and add in emotion dictionary.
Also, when the text comments information to capturing at random per bar carries out word segmentation processing, also emoticon and punctuate are accorded with
Number being considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by emoticon and mark
Point symbol is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Meanwhile, when the text comments information to capturing at random per bar carries out word segmentation processing, also by idiom and sentence pattern mould
Plate is considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by idiom and sentence
Pattern plate is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Wherein, a plurality of consumer is captured at random for the text comments information training product feature dictionary of product includes:
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out
Part-of-speech tagging;
Extract all independent noun that word segmentation processing obtains and add in product feature dictionary;
Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and compound word is integrally made
Add in product feature dictionary for single noun;
Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by phrase entirety
Add in product feature dictionary as single noun.
Wherein, according to emotion dictionary and product feature dictionary, product is extracted from every text review information successively special
Levy-emotion word is to including:
Every text review information is specified successively, and is pre-processed to being designated text comments information;
Extract from pretreated designated text comments information and the word for matching is recorded with emotion dictionary, as
The emotion vocabulary of designated text comments information;
Extract from pretreated designated text comments information and the word for matching recorded with product feature dictionary,
As the product feature vocabulary of designated text comments information;
According to emotion vocabulary and product feature vocabulary, by " product feature-emotion " model being designated after the pre-treatment
Multiple product features-emotion word pair is extracted in text comments information.
Also, carrying out pretreatment to designated text comments information includes:
Designated text comments information is divided into the multiple words for connecting in certain sequence;
Part-of-speech tagging is carried out to each word.
Also, according to the emotion vocabulary for extracting and product feature vocabulary, located in advance by " product feature-emotion " model
Multiple product features-emotion word is extracted in designated text comments information after reason to including:
Each emotion vocabulary is specified successively, according to designated emotion vocabulary designated text comments information after the pre-treatment
In word position, extract and preassign before the position all parts of speech in length and be noted as the word of noun, and will be referred to
Emotion vocabulary and each part of speech are determined and are noted as the word of noun to set up product feature-emotion word pair one by one, until each sense
Feelings vocabulary was all designated;
Each product feature vocabulary is specified successively, according to designated product feature vocabulary designated text after the pre-treatment
Word position in review information, extract after the position preassign length in all parts of speech be noted as adjectival list
Word, and designated product feature vocabulary and each part of speech are noted as into adjectival word set up product feature-emotion one by one
Word pair, until each product feature vocabulary was designated.
In addition, update emotion dictionary to iteration using product feature-emotion word including with product feature dictionary:
The emotion part of words of product feature-emotion word centering is incorporated in emotion dictionary;
The product feature part of words of product feature-emotion word centering is incorporated in product feature dictionary.
From the above it can be seen that the technical scheme that the present invention is provided is by using training emotion dictionary and product feature
Dictionary, captures a plurality of different consumers for the text comments information extraction of target product goes out product feature-emotion word to repeatedly
For renewal emotion dictionary and product feature dictionary, and the technological means for obtaining the product feature with emotion evaluation of the product is counted,
Can excavate on Chinese field with the feature for evaluating product and carry out statistical analysis, be that the overall merit of Chinese ecommerce is carried
Support for data.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only some enforcements of the present invention
Example, for those of ordinary skill in the art, on the premise of not paying creative work, can be being obtained according to these accompanying drawings
Obtain other accompanying drawings.
Fig. 1 is a kind of flow chart of the excavation with evaluation method of the product feature according to the embodiment of the present invention;
A kind of excavation of product feature of the embodiment of the present invention and the system architecture of evaluation method are applied according to Fig. 2
Figure.
Specific embodiment
To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with the embodiment of the present invention
Accompanying drawing, the technical scheme in the embodiment of the present invention is further carried out it is clear, complete, describe in detail, it is clear that it is described
Embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.Based on the embodiment in the present invention, this area
The every other embodiment that those of ordinary skill is obtained, belongs to the scope of protection of the invention.
Embodiments in accordance with the present invention, there is provided a kind of excavation of product feature and evaluation method.
As shown in figure 1, a kind of excavation of product feature of offer according to embodiments of the present invention includes with evaluation method:
Step S101, captures at random a plurality of consumer special with product for the text comments information training emotion dictionary of product
Levy dictionary;
Step S103, determines target product, and a plurality of different consumers are captured from e-commerce platform for target is produced
The text comments information of product;
Step S105, according to emotion dictionary and product feature dictionary, extracts from every text review information produce successively
Product feature-emotion word pair, and emotion dictionary and product feature dictionary are updated to iteration using product feature-emotion word, directly
It is processed to a plurality of text comments information;
Step S107, to all product feature-emotion words being extracted to counting, obtains the product of the product
Feature and emotion evaluation.
Wherein, a plurality of consumer is captured at random for the text comments information training emotion dictionary of product includes:
It is determined that the text comments information captured at random per bar is front evaluating or unfavorable ratings;
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out
Part-of-speech tagging;
Each adjective obtained word segmentation processing using Nae Bayesianmethod is according to the adjectival occurrence number meter
Calculate the adjectival prior probability in front evaluation with unfavorable ratings;
The adjective is classified as by front word with the prior probability in unfavorable ratings according to adjectival the evaluation in front
Or negative word, and add in emotion dictionary.
Also, when the text comments information to capturing at random per bar carries out word segmentation processing, also emoticon and punctuate are accorded with
Number being considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by emoticon and mark
Point symbol is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Meanwhile, when the text comments information to capturing at random per bar carries out word segmentation processing, also by idiom and sentence pattern mould
Plate is considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by idiom and sentence
Pattern plate is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Wherein, a plurality of consumer is captured at random for the text comments information training product feature dictionary of product includes:
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out
Part-of-speech tagging;
Extract all independent noun that word segmentation processing obtains and add in product feature dictionary;
Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and compound word is integrally made
Add in product feature dictionary for single noun;
Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by phrase entirety
Add in product feature dictionary as single noun.
Wherein, according to emotion dictionary and product feature dictionary, product is extracted from every text review information successively special
Levy-emotion word is to including:
Every text review information is specified successively, and is pre-processed to being designated text comments information;
Extract from pretreated designated text comments information and the word for matching is recorded with emotion dictionary, as
The emotion vocabulary of designated text comments information;
Extract from pretreated designated text comments information and the word for matching recorded with product feature dictionary,
As the product feature vocabulary of designated text comments information;
According to emotion vocabulary and product feature vocabulary, by " product feature-emotion " model being designated after the pre-treatment
Multiple product features-emotion word pair is extracted in text comments information.
Also, carrying out pretreatment to designated text comments information includes:
Designated text comments information is divided into the multiple words for connecting in certain sequence;
Part-of-speech tagging is carried out to each word.
Also, according to the emotion vocabulary for extracting and product feature vocabulary, located in advance by " product feature-emotion " model
Multiple product features-emotion word is extracted in designated text comments information after reason to including:
Each emotion vocabulary is specified successively, according to designated emotion vocabulary designated text comments information after the pre-treatment
In word position, extract and preassign before the position all parts of speech in length and be noted as the word of noun, and will be referred to
Emotion vocabulary and each part of speech are determined and are noted as the word of noun to set up product feature-emotion word pair one by one, until each sense
Feelings vocabulary was all designated;
Each product feature vocabulary is specified successively, according to designated product feature vocabulary designated text after the pre-treatment
Word position in review information, extract after the position preassign length in all parts of speech be noted as adjectival list
Word, and designated product feature vocabulary and each part of speech are noted as into adjectival word set up product feature-emotion one by one
Word pair, until each product feature vocabulary was designated.
In addition, update emotion dictionary to iteration using product feature-emotion word including with product feature dictionary:
The emotion part of words of product feature-emotion word centering is incorporated in emotion dictionary;
The product feature part of words of product feature-emotion word centering is incorporated in product feature dictionary.
The technical characteristic of the present invention is expanded on further below according to specific embodiment.
Fig. 2 is illustrated that the system for applying the excavation Yu evaluation method of product feature according to embodiments of the present invention.Such as Fig. 2
Shown, in addition to crawling comment using reptile from e-commerce platform, also including dictionary training, (i.e. aforesaid training is produced
Product feature lexicon), classifier training (i.e. aforesaid training sentiment dictionary) and comment process (i.e. aforesaid product feature-emotion
Word is to processing).False comment and advertisement in ecommerce comment is less, is a suitable data set.
In terms of classifier training, the embodiment of the present invention using Nae Bayesianmethod (Naive Bayesian, NB) with
The Sentiment orientation for judging efficiently and easily a comment is positive or negative.NB methods are famous to stablize, and permit
Perhaps self-defining feature and prior probability are introduced.
Nae Bayesianmethod is a naive model that can well carry out text classification.In this statistical models
In,
The value of class C* is that c for causing P (c | d) probable value maximum, and formula is
According to Bayes' theorem, and P (c | d) can so calculate:
Wherein P (c) is the probability of a classification, and front and negative two class are divided into here.P (d) represents what comment occurred
Probability.In fact, the molecular moiety of this fraction need to be only paid close attention to, because denominator is not dependent on the constant of c.If used
The corpus of one balance, that is, select to be used for the front trained as the size of negative training set, then P (c) can also
Have ignored.
If d={ f1, f2 ..., fn }, f1, f2 ..., fn represents each feature in comment, in naive Bayesian,
It is all conditional sampling that they is assumed to be, and then we just can obtain:
Using each word in comment as characteristic, two following methods can make up some short commentary opinions and lack enough
The shortcoming of many tell-tale Feature Words.
Some emoticons and punctuation mark in comment can also be used as the deictic words in classifying.As an example
Son, " ^_^ " expresses positive emotion, and " QAQ " then expresses negative emotion.As for punctuation mark, it has been found that one
A little punctuation mark combinations can express the emotion beyond text itself.So these symbols can be taken as the Feature Words classified,
Such as punctuation mark "~" (being repeated one or more times) is a positive emotion deictic words, "" (the query mark in Chinese, weight
It is multiple twice or repeatedly) be a negative emotion deictic words.The sample list of emoticon is given in table 1, mark is given in table 2
The sample list of point symbol.
The emoticon used in the Chinese comment of table 1.
Table 2. indicates the punctuation mark of emotion
Some idioms such as " what is do not had says " or " not the talking about " having in Chinese comment implys that the one kind to this product just
The emotion in face.We have collected some and are used in the idiom in comment for classifying.Below table 3 has listed these idioms.
Idiom in the Chinese comment of table 3.
Meanwhile, often there is the comment that some are short, there is no complicated expression can be as template come for classifying.
Give an example, consumer can say " not feeling quite pleased " to express the expected meaning that this commodity is not reaching to them.The present invention
Embodiment gives some templates based on regular expression, and positive word represents some words as " satisfaction ",
Negative word represents some words as " poor ".
Negatively:
Not (YES) * (too | very) * (making us) * (<Positive evaluates word>)
(really it is | simply) * (too) * (<Unfavorable ratings word>) () *
Front:
(being really) * (too) * (<Positive evaluates word>) () *
Not (.) * so (<Unfavorable ratings word>)
Wherein, * represents the direct connection relational between word, and () represents optional word, | represent multiple words and select one. generation
Any vocabulary of table.Each matching (above-mentioned emoticon and punctuation mark, idiom and template) in comment is calculated in study
All can be a feature in method, even if therefore from a short commentary opinion, we can also generate enough for NB Algorithm
Many features.
Feature extraction model is used for excavating product feature from user comment, and the main method of this model is grammer point
Analysis.Because emotion word and feature word would generally be occurred in pairs in comment, we use a window traversal training
The each comment concentrated goes to find out emotion-feature word pair.Simultaneously during traversal training set, can tie up in this model
One sentiment dictionary of shield and a feature lexicon.The new Feature Words and emotion word found in each comment can update word
With the excavation commented on later in allusion quotation.
" dictionary-window " basic model (DWM) is right based on " feature-emotion " in comment." feature-emotion " model meets
The comment custom of user.
Example:Mobile phone quality is pretty good.
In this comment, Feature Words " mobile phone quality " and emotion word " good " constitute " feature-emotion " word
Right, Feature Words are before emotion word.
Collect data and manpower comparing is relatively reached a conclusion, the comment comprising " feature-emotion " model accounts for the 84% of data altogether.
The identification higher in order to obtain emotion word, needs first to carry out some pretreatment works.After comment is divided,
Merge some nouns and will provide for more accurate Feature Words.
Example:Screen resolution is very high.
In this comment, participle instrument can be noted as noun word " screen " and word " resolution ratio ".If I
Be only using the two words as Feature Words, it is evident that do so is not so accurate.We may find that, " shield
Curtain resolution ratio " is accurate Feature Words in this comment.That is, during feature extraction, the two words are merged
Can be more suitable as a Feature Words into noun phrase.
The embodiment of the present invention defines three kinds of comment forms, it is possible to create can be used as the noun phrase of product feature word.
1. single word
Example:It is extremely cheap.
In this comment, price is the latter single word in participle.We use " price " as a product
Characteristic.
2. two adjacent words
Example:Screen resolution is very high.
We are merged into " screen " and " resolution ratio " after " screen resolution " as a product feature word.
3. noun+" "+noun
Example:The color of shell is plain.
In this comment, we use " color of shell " as product feature, rather than single word, than
Such as:" shell " or " color ".
Model can find " emotion-feature " word pair from comment, and can iteration more neologisms in the training process
Allusion quotation.Our word is called a unit, if there is n unit between emotion word and product feature word, then Wo Menye
It is called the window that size is n.We are a document comprising Feature Words or emotion word dictionary definition.
First, emotion word dictionary and initial characteristicses word dictionary are initialized.
Secondly, for training set in the c that comments on per bar be handled as follows:
C is pre-processed, including participle and part-of-speech tagging etc.;
The all emotion words that will be contained in emotion word dictionary and c are included into set S;
The all Feature Words that will be contained in Feature Words dictionary and c are included into set F;
Initialize and reset emotion word naturalization set and Feature Words naturalization set;
For each emotion word in S, if counted forward from s, there is noun f within m unit, then it is assumed that f is feature
Word, and add in Feature Words naturalization set;
For each Feature Words in F, if counted up from f, there is adjective s within m unit, then it is assumed that s is feelings
Sense word, and add in emotion word naturalization set;
During emotion word naturalization set and Feature Words naturalization set are respectively incorporated into into emotion word dictionary and initial characteristicses word dictionary.
Experiment demonstrates effectiveness of the invention.This experiment uses 300 front comments and 300 negative reviews as language
Material storehouse training grader, and carried out after participle and part-of-speech tagging using ICTCLAS, according to the number of times that each word occurs, system
Each word has been counted in front and the negative prior probability concentrated.This experiment simultaneously uses 50000 comments to use as input
DWM models are processed, and have obtained 3732 noun phrases, comprising some nonsensical words.Remove and wherein occur frequency
Word of the rate less than 300, eliminates some words for occurring still representing a product feature many times, such as " family
People ", " friend " etc., are left 283 Feature Words altogether in feature lexicon.For each comment, this experiment is carried out based on dictionary
Feature extraction, then carries out participle and part-of-speech tagging for classification.Substantial amounts of comment can mention more than one product feature and
Contain the specific emotion of some corresponding with feature, therefore this experiment is based on punctuation mark and blank sentence is carried out point
Cut, and using these clauses as input.
For the availability of proof system, experiment is carried out on some real data sets.This experiment is to the two
Model is separately tested and shows in the table result, including sentiment analysis experiment extracts experiment with product feature.
Sentiment analysis | Classify in front | Negative classification |
Mark front | 1443 | 357 |
Mark is negative | 257 | 1543 |
The sentiment analysis experimental result of table 4
The 1800 positive comments and 1800 negative reviews extracted from Jingdone district, assessment result such as upper table are labelled with by hand
It is shown.Positive recall rate and accuracy 80% and 84%, and negative recall rate and accuracy 85% and 81%.
Product is extracted | Feature | Non- feature |
Extraction feature | 884 | 165 |
Non- extraction feature | 91 | — |
The product feature of table 5 extracts experimental result
Feature Words are selected by hand in 400 comments from Jingdone district, and assessment result is as shown above.Recall rate is 90%,
Accuracy 84%.
In sum, by means of the above-mentioned technical proposal of the present invention, by training emotion dictionary and product feature dictionary, grab
Take a plurality of different consumers iteration is updated for the text comments information extraction of target product goes out product feature-emotion word and feel
Feelings dictionary and product feature dictionary, and the technological means for obtaining the product feature with emotion evaluation of the product is counted, can be in
Excavate with the feature for evaluating product on literary field and carry out statistical analysis, the overall merit for Chinese ecommerce provides data
Hold.
Those of ordinary skill in the art should be understood:The specific embodiment of the present invention is the foregoing is only, and
The restriction present invention, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. are not used in,
Should be included within protection scope of the present invention.
Claims (9)
1. a kind of excavation of product feature and evaluation method, it is characterised in that include:
Text comments information training emotion dictionary and product feature dictionary of a plurality of consumer for product is captured at random;
Determine target product, and a plurality of different consumers are captured from e-commerce platform for the text comments of target product are believed
Breath;
According to the emotion dictionary and product feature dictionary, product is extracted from per text comments information described in bar successively special
- emotion word pair is levied, and the emotion dictionary and product feature word are updated to iteration using the product feature-emotion word
Allusion quotation, until a plurality of text comments information is processed;
To all product feature-emotion words being extracted to counting, product feature and the sense of the product is obtained
Feelings are evaluated.
2. method according to claim 1, it is characterised in that capture text comments of a plurality of consumer for product at random
Information training emotion dictionary includes:
It is determined that being that front is evaluated or unfavorable ratings per the text comments information of random crawl described in bar;
To carrying out word segmentation processing per the text comments information of random crawl described in bar, and each word obtained to word segmentation processing is carried out
Part-of-speech tagging;
Each adjective for being obtained word segmentation processing using Nae Bayesianmethod is calculated according to the adjectival occurrence number should
It is adjectival to evaluate and the prior probability in unfavorable ratings in front;
The adjective is classified as by front word or negative with the prior probability in unfavorable ratings according to adjectival the evaluation in front
Face word, and add in the emotion dictionary.
3. method according to claim 2, it is characterised in that to carrying out per the text comments information of random crawl described in bar
During word segmentation processing, also emoticon and punctuation mark are considered as into word carries out word segmentation processing;Each word that word segmentation processing is obtained
When carrying out part-of-speech tagging, also emoticon and punctuation mark are considered as into word carries out part-of-speech tagging, and is labeled as adjective.
4. method according to claim 2, it is characterised in that to carrying out per the text comments information of random crawl described in bar
During word segmentation processing, also idiom and Sentence Template are considered as into word carries out word segmentation processing;Each word that word segmentation processing is obtained
When carrying out part-of-speech tagging, also idiom and Sentence Template are considered as into word carries out part-of-speech tagging, and is labeled as adjective.
5. method according to claim 1, it is characterised in that capture text comments of a plurality of consumer for product at random
Information training product feature dictionary includes:
To carrying out word segmentation processing per the text comments information of random crawl described in bar, and each word obtained to word segmentation processing is carried out
Part-of-speech tagging;
Extract all independent noun that word segmentation processing obtains and add in the product feature dictionary;
Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and the compound word is integrally made
Add in the product feature dictionary for single noun;
Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by the phrase entirety
Add in the product feature dictionary as single noun.
6. method according to claim 1, it is characterised in that according to the emotion dictionary and product feature dictionary, successively
Product feature-emotion word is extracted from per text comments information described in bar to including:
Specify per text comments information described in bar successively, and the designated text comments information is pre-processed;
Extract from the pretreated designated text comments information and the word for matching recorded with the emotion dictionary,
As the emotion vocabulary of the designated text comments information;
Extract from the pretreated designated text comments information and record what is matched with the product feature dictionary
Word, as the product feature vocabulary of the designated text comments information;
According to the emotion vocabulary and product feature vocabulary, by " product feature-emotion " model quilt after the pre-treatment
Multiple product features-emotion word pair is extracted in specified text comments information.
7. method according to claim 6, it is characterised in that pretreatment bag is carried out to the designated text comments information
Include:
The designated text comments information is divided into the multiple words for connecting in certain sequence;
Part-of-speech tagging is carried out to described each word.
8. method according to claim 7, it is characterised in that according to the emotion vocabulary for extracting and product feature word
Converge, by extracting multiple products in " product feature-emotion " model described designated text comments information after the pre-treatment
Feature-emotion word is to including:
Each described emotion vocabulary is specified successively, according to the designated emotion vocabulary described designated text after the pre-treatment
Word position in review information, extracts and the word that all parts of speech in length are noted as noun is preassigned before the position,
And the word that the designated emotion vocabulary and each part of speech are noted as noun is set up into one by one product feature-emotion word
Language pair, until each described emotion vocabulary was designated;
Each described product feature vocabulary is specified successively, according to the designated product feature vocabulary quilt after the pre-treatment
Word position in specified text comments information, extract after the position preassign length in all parts of speech be noted as describing
The word of word, and the designated product feature vocabulary and each part of speech are noted as into the adjectival word set up one by one
Product feature-emotion word pair, until each described product feature vocabulary was designated.
9. method according to claim 1, it is characterised in that iteration is updated using the product feature-emotion word
The emotion dictionary includes with product feature dictionary:
The emotion part of words of the product feature-emotion word centering is incorporated in the emotion dictionary;
The product feature part of words of the product feature-emotion word centering is incorporated in the product feature dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610903523.6A CN106649519B (en) | 2016-10-17 | 2016-10-17 | Product characteristic mining and evaluating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610903523.6A CN106649519B (en) | 2016-10-17 | 2016-10-17 | Product characteristic mining and evaluating method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649519A true CN106649519A (en) | 2017-05-10 |
CN106649519B CN106649519B (en) | 2020-11-27 |
Family
ID=58856101
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610903523.6A Expired - Fee Related CN106649519B (en) | 2016-10-17 | 2016-10-17 | Product characteristic mining and evaluating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649519B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107861946A (en) * | 2017-11-03 | 2018-03-30 | 北京奇艺世纪科技有限公司 | A kind of fine-grained evaluation information method for digging and system |
CN108959247A (en) * | 2018-06-19 | 2018-12-07 | 深圳市元征科技股份有限公司 | A kind of data processing method, server and computer-readable medium |
CN109299460A (en) * | 2018-09-18 | 2019-02-01 | 北京三快在线科技有限公司 | Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop |
CN109598528A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | Advertisement information processing method and device |
CN109684641A (en) * | 2018-12-26 | 2019-04-26 | 广东工业大学 | A kind of data extraction device, method, electronic equipment and storage medium |
CN109902229A (en) * | 2019-02-01 | 2019-06-18 | 中森云链(成都)科技有限责任公司 | A kind of interpretable recommended method based on comment |
CN110825876A (en) * | 2019-11-07 | 2020-02-21 | 上海德拓信息技术股份有限公司 | Movie comment viewpoint emotion tendency analysis method |
CN111027328A (en) * | 2019-11-08 | 2020-04-17 | 广州坚和网络科技有限公司 | Method for judging emotion positive and negative and emotional color of comments through corpus training |
CN111324745A (en) * | 2020-02-18 | 2020-06-23 | 深圳市一面网络技术有限公司 | Word stock generation method and device |
CN112364170A (en) * | 2021-01-13 | 2021-02-12 | 北京智慧星光信息技术有限公司 | Data emotion analysis method and device, electronic equipment and medium |
US20210200949A1 (en) * | 2019-12-30 | 2021-07-01 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Pre-training method for sentiment analysis model, and electronic device |
CN113158669A (en) * | 2021-04-28 | 2021-07-23 | 河北冀联人力资源服务集团有限公司 | Method and system for identifying positive and negative comments of employment platform |
KR102609681B1 (en) * | 2023-01-09 | 2023-12-05 | 트리톤 주식회사 | Method for determining product planning reflecting user feedback and Apparatus thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103399916A (en) * | 2013-07-31 | 2013-11-20 | 清华大学 | Internet comment and opinion mining method and system on basis of product features |
CN104731770A (en) * | 2015-03-23 | 2015-06-24 | 中国科学技术大学苏州研究院 | Chinese microblog emotion analysis method based on rules and statistical model |
US20160171386A1 (en) * | 2014-12-15 | 2016-06-16 | Xerox Corporation | Category and term polarity mutual annotation for aspect-based sentiment analysis |
CN105868185A (en) * | 2016-05-16 | 2016-08-17 | 南京邮电大学 | Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis |
-
2016
- 2016-10-17 CN CN201610903523.6A patent/CN106649519B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103399916A (en) * | 2013-07-31 | 2013-11-20 | 清华大学 | Internet comment and opinion mining method and system on basis of product features |
US20160171386A1 (en) * | 2014-12-15 | 2016-06-16 | Xerox Corporation | Category and term polarity mutual annotation for aspect-based sentiment analysis |
CN104731770A (en) * | 2015-03-23 | 2015-06-24 | 中国科学技术大学苏州研究院 | Chinese microblog emotion analysis method based on rules and statistical model |
CN105868185A (en) * | 2016-05-16 | 2016-08-17 | 南京邮电大学 | Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis |
Non-Patent Citations (2)
Title |
---|
林钦和等: "《基于情感计算的商品评论分析系统》", 《计算机应用与软件》 * |
罗帆: "《基于意见挖掘的产品评论系统研究与实现》", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109598528B (en) * | 2017-09-30 | 2023-05-23 | 北京国双科技有限公司 | Advertisement information processing method and device |
CN109598528A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | Advertisement information processing method and device |
CN107861946A (en) * | 2017-11-03 | 2018-03-30 | 北京奇艺世纪科技有限公司 | A kind of fine-grained evaluation information method for digging and system |
CN108959247A (en) * | 2018-06-19 | 2018-12-07 | 深圳市元征科技股份有限公司 | A kind of data processing method, server and computer-readable medium |
CN108959247B (en) * | 2018-06-19 | 2022-09-09 | 深圳市元征科技股份有限公司 | Data processing method, server and computer readable medium |
CN109299460A (en) * | 2018-09-18 | 2019-02-01 | 北京三快在线科技有限公司 | Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop |
CN109299460B (en) * | 2018-09-18 | 2022-07-12 | 北京三快在线科技有限公司 | Method and device for analyzing evaluation data of shop, electronic device and storage medium |
CN109684641A (en) * | 2018-12-26 | 2019-04-26 | 广东工业大学 | A kind of data extraction device, method, electronic equipment and storage medium |
CN109684641B (en) * | 2018-12-26 | 2023-04-07 | 广东工业大学 | Data extraction device and method, electronic equipment and storage medium |
CN109902229A (en) * | 2019-02-01 | 2019-06-18 | 中森云链(成都)科技有限责任公司 | A kind of interpretable recommended method based on comment |
CN109902229B (en) * | 2019-02-01 | 2019-12-24 | 中森云链(成都)科技有限责任公司 | Comment-based interpretable recommendation method |
CN110825876A (en) * | 2019-11-07 | 2020-02-21 | 上海德拓信息技术股份有限公司 | Movie comment viewpoint emotion tendency analysis method |
CN111027328A (en) * | 2019-11-08 | 2020-04-17 | 广州坚和网络科技有限公司 | Method for judging emotion positive and negative and emotional color of comments through corpus training |
US11537792B2 (en) * | 2019-12-30 | 2022-12-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Pre-training method for sentiment analysis model, and electronic device |
US20210200949A1 (en) * | 2019-12-30 | 2021-07-01 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Pre-training method for sentiment analysis model, and electronic device |
CN111324745A (en) * | 2020-02-18 | 2020-06-23 | 深圳市一面网络技术有限公司 | Word stock generation method and device |
CN112364170A (en) * | 2021-01-13 | 2021-02-12 | 北京智慧星光信息技术有限公司 | Data emotion analysis method and device, electronic equipment and medium |
CN113158669A (en) * | 2021-04-28 | 2021-07-23 | 河北冀联人力资源服务集团有限公司 | Method and system for identifying positive and negative comments of employment platform |
CN113158669B (en) * | 2021-04-28 | 2023-03-28 | 河北冀联人力资源服务集团有限公司 | Method and system for identifying positive and negative comments of employment platform |
KR102609681B1 (en) * | 2023-01-09 | 2023-12-05 | 트리톤 주식회사 | Method for determining product planning reflecting user feedback and Apparatus thereof |
Also Published As
Publication number | Publication date |
---|---|
CN106649519B (en) | 2020-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649519A (en) | Method of digging and assessing product features | |
Kumar et al. | Sentiment analysis of multimodal twitter data | |
Gu et al. | " what parts of your apps are loved by users?"(T) | |
Saha et al. | Proposed approach for sarcasm detection in twitter | |
Shilpa et al. | Sentiment analysis using deep learning | |
Ghosh et al. | Sentiment identification in code-mixed social media text | |
CN103853824A (en) | In-text advertisement releasing method and system based on deep semantic mining | |
CN105183717B (en) | A kind of OSN user feeling analysis methods based on random forest and customer relationship | |
CN102789449B (en) | The method and apparatus that comment text is evaluated | |
Zhao et al. | Sentiment analysis on the online reviews based on hidden Markov model | |
CN106446147A (en) | Emotion analysis method based on structuring features | |
CN106547875A (en) | A kind of online incident detection method of the microblogging based on sentiment analysis and label | |
Modi et al. | Sentiment analysis of Twitter feeds using flask environment: A superior application of data analysis | |
Nowson et al. | XRCE personal language analytics engine for multilingual author profiling | |
Anupama et al. | Real time Twitter sentiment analysis using natural language processing | |
Ilavarasan | A Survey on Sarcasm detection and challenges | |
Joo et al. | Author profiling on social media: An ensemble learning model using various features | |
Nabende et al. | Misinformation detection in Luganda-English code-mixed social media text | |
Kasmuri et al. | Building a Malay-English code-switching subjectivity corpus for sentiment analysis | |
Rastogi et al. | Sentiment analysis methods and applications–a review | |
Kumar et al. | Multimodal sentiment prediction based on the integration of text and emojis | |
Li et al. | Twitter sentiment analysis of the 2016 US Presidential Election using an emoji training heuristic | |
Rifa'i et al. | Sentiment Analysis Using Text Mining Techniques On Social Media Using the Support Vector Machine Method Case Study Seagames 2023 Football Final | |
Ullah et al. | Sentiment Analysis using Ensemble Technique on Textual and Emoticon Data | |
Musso et al. | Opinion mining of online product reviews using a lexicon-based algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20201127 Termination date: 20211017 |
|
CF01 | Termination of patent right due to non-payment of annual fee |