CN106649519A

CN106649519A - Method of digging and assessing product features

Info

Publication number: CN106649519A
Application number: CN201610903523.6A
Authority: CN
Inventors: 孙鹏飞; 吴国仕; 许可
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2016-10-17
Filing date: 2016-10-17
Publication date: 2017-05-10
Anticipated expiration: 2036-10-17
Also published as: CN106649519B

Abstract

The invention discloses a method of digging and assessing product features. The method comprises the steps that multiple items of text assessment information on emotional training lexicon and product feature lexicon are randomly captured for consumers; a target product is determined, multiple items of text assessment information on the target product for different consumers are captured from an e-commerce platform; based on the emotional training lexicon and the product feature lexicon, product feature-emotional phrase pairs are extracted in sequence from each item of text assessment information, and are used to iteratively update the emotional training lexicon and the product feature lexicon until all the multiple items of text assessment information are processed; the extracted product feature-emotional phrase pairs are statistically recorded, and the emotional training lexicon and the product feature lexicon are obtained. The method can dig and assess product features and carry out statistical analyses in the field of Chinese characters, and provides data support for the overall assessment of Chinese character e-commerce business.

Description

A kind of excavation of product feature and evaluation method

Technical field

The present invention relates to e-commerce field, especially, is related to excavation and the evaluation method of a kind of product feature.

Background technology

The appearance applied with Web2.0 and fast development, ecommerce is developing into the business model of a prosperity, So it is more prone to get the feedback to commodity, client can leave their comment after consumption, and client afterwards is just Can be decided whether to buy the commodity according to comment.For consumer, if they can be to the front of commodity and negative The percentage of comment has one clearly to recognize, and can preferably make a choice for whether commodity are bought.People are evaluated most The feature of those commodity to be supplied to consumer be also helpful.Meanwhile, these results can also report to manufacturer and Ecommerce retailer, can be used to help the commodity for improving them and service.Then information on integration networkses is simultaneously united Meter analysis, then providing an overall merit to consumer just becomes extremely important.

In terms of market information is obtained, prior art, as the corpus for excavating market information, includes master using Twitter Topic detection and emotional semantic classification.However, in Chinese, containing the microblogging of the content information of a large number of users generation has too many falseness to comment By and advertisement, it is difficult to create suitable data set.Meanwhile, sentiment analysis are usually taken as text classification problem, with news article Different with the formal form text of scientific paper form, UGC sometimes can't be in strict accordance with grammer, but it is attached also to have some Plus information.Used in Twitter as ":-) " as the feature that detects as polarity of emoticon；Tongue etc. also may be used Using the instruction as front and negative emotions.In addition, how to extract them from the comment of commodity is characterized in that prior art Subject matter.The authors that English product feature is excavated carry out feature extraction using major terms and noun phrase, and Apriori algorithm generates frequent feature set and prunes preliminary set, but this means is not suitable for Chinese.

The problem that the method excavated for the Chinese-English text comment of prior art and evaluated is not applied in Chinese field, mesh It is front not yet to have effective solution.

The content of the invention

In view of this, it is an object of the invention to propose a kind of excavation and the evaluation method of product feature, can be in Chinese Excavate with the feature for evaluating product on field and carry out statistical analysis, the overall merit for Chinese ecommerce provides data Hold.

Based on above-mentioned purpose, the technical scheme that the present invention is provided is as follows：

The invention provides a kind of excavation of product feature and evaluation method, including：

Text comments information training emotion dictionary and product feature dictionary of a plurality of consumer for product is captured at random；

Determine target product, and a plurality of different consumers are captured from e-commerce platform for the text of target product is commented By information；

According to emotion dictionary and product feature dictionary, product feature-sense is extracted from every text review information successively Feelings word pair, and emotion dictionary and product feature dictionary are updated to iteration using product feature-emotion word, until a plurality of text Review information is processed；

To all product feature-emotion words being extracted to counting, product feature and the sense of the product is obtained Feelings are evaluated.

Wherein, a plurality of consumer is captured at random for the text comments information training emotion dictionary of product includes：

It is determined that the text comments information captured at random per bar is front evaluating or unfavorable ratings；

Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out Part-of-speech tagging；

Each adjective obtained word segmentation processing using Nae Bayesianmethod is according to the adjectival occurrence number meter Calculate the adjectival prior probability in front evaluation with unfavorable ratings；

The adjective is classified as by front word with the prior probability in unfavorable ratings according to adjectival the evaluation in front Or negative word, and add in emotion dictionary.

Also, when the text comments information to capturing at random per bar carries out word segmentation processing, also emoticon and punctuate are accorded with Number being considered as word carries out word segmentation processing；When each word obtained to word segmentation processing carries out part-of-speech tagging, also by emoticon and mark Point symbol is considered as word carries out part-of-speech tagging, and is labeled as adjective.

Meanwhile, when the text comments information to capturing at random per bar carries out word segmentation processing, also by idiom and sentence pattern mould Plate is considered as word carries out word segmentation processing；When each word obtained to word segmentation processing carries out part-of-speech tagging, also by idiom and sentence Pattern plate is considered as word carries out part-of-speech tagging, and is labeled as adjective.

Wherein, a plurality of consumer is captured at random for the text comments information training product feature dictionary of product includes：

Extract all independent noun that word segmentation processing obtains and add in product feature dictionary；

Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and compound word is integrally made Add in product feature dictionary for single noun；

Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by phrase entirety Add in product feature dictionary as single noun.

Wherein, according to emotion dictionary and product feature dictionary, product is extracted from every text review information successively special Levy-emotion word is to including：

Every text review information is specified successively, and is pre-processed to being designated text comments information；

Extract from pretreated designated text comments information and the word for matching is recorded with emotion dictionary, as The emotion vocabulary of designated text comments information；

Extract from pretreated designated text comments information and the word for matching recorded with product feature dictionary, As the product feature vocabulary of designated text comments information；

According to emotion vocabulary and product feature vocabulary, by " product feature-emotion " model being designated after the pre-treatment Multiple product features-emotion word pair is extracted in text comments information.

Also, carrying out pretreatment to designated text comments information includes：

Designated text comments information is divided into the multiple words for connecting in certain sequence；

Part-of-speech tagging is carried out to each word.

Also, according to the emotion vocabulary for extracting and product feature vocabulary, located in advance by " product feature-emotion " model Multiple product features-emotion word is extracted in designated text comments information after reason to including：

Each emotion vocabulary is specified successively, according to designated emotion vocabulary designated text comments information after the pre-treatment In word position, extract and preassign before the position all parts of speech in length and be noted as the word of noun, and will be referred to Emotion vocabulary and each part of speech are determined and are noted as the word of noun to set up product feature-emotion word pair one by one, until each sense Feelings vocabulary was all designated；

Each product feature vocabulary is specified successively, according to designated product feature vocabulary designated text after the pre-treatment Word position in review information, extract after the position preassign length in all parts of speech be noted as adjectival list Word, and designated product feature vocabulary and each part of speech are noted as into adjectival word set up product feature-emotion one by one Word pair, until each product feature vocabulary was designated.

In addition, update emotion dictionary to iteration using product feature-emotion word including with product feature dictionary：

The emotion part of words of product feature-emotion word centering is incorporated in emotion dictionary；

The product feature part of words of product feature-emotion word centering is incorporated in product feature dictionary.

From the above it can be seen that the technical scheme that the present invention is provided is by using training emotion dictionary and product feature Dictionary, captures a plurality of different consumers for the text comments information extraction of target product goes out product feature-emotion word to repeatedly For renewal emotion dictionary and product feature dictionary, and the technological means for obtaining the product feature with emotion evaluation of the product is counted, Can excavate on Chinese field with the feature for evaluating product and carry out statistical analysis, be that the overall merit of Chinese ecommerce is carried Support for data.

Description of the drawings

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only some enforcements of the present invention Example, for those of ordinary skill in the art, on the premise of not paying creative work, can be being obtained according to these accompanying drawings Obtain other accompanying drawings.

Fig. 1 is a kind of flow chart of the excavation with evaluation method of the product feature according to the embodiment of the present invention；

A kind of excavation of product feature of the embodiment of the present invention and the system architecture of evaluation method are applied according to Fig. 2 Figure.

Specific embodiment

To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with the embodiment of the present invention Accompanying drawing, the technical scheme in the embodiment of the present invention is further carried out it is clear, complete, describe in detail, it is clear that it is described Embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.Based on the embodiment in the present invention, this area The every other embodiment that those of ordinary skill is obtained, belongs to the scope of protection of the invention.

Embodiments in accordance with the present invention, there is provided a kind of excavation of product feature and evaluation method.

As shown in figure 1, a kind of excavation of product feature of offer according to embodiments of the present invention includes with evaluation method：

Step S101, captures at random a plurality of consumer special with product for the text comments information training emotion dictionary of product Levy dictionary；

Step S103, determines target product, and a plurality of different consumers are captured from e-commerce platform for target is produced The text comments information of product；

Step S105, according to emotion dictionary and product feature dictionary, extracts from every text review information produce successively Product feature-emotion word pair, and emotion dictionary and product feature dictionary are updated to iteration using product feature-emotion word, directly It is processed to a plurality of text comments information；

Step S107, to all product feature-emotion words being extracted to counting, obtains the product of the product Feature and emotion evaluation.

Part-of-speech tagging is carried out to each word.

The technical characteristic of the present invention is expanded on further below according to specific embodiment.

Fig. 2 is illustrated that the system for applying the excavation Yu evaluation method of product feature according to embodiments of the present invention.Such as Fig. 2 Shown, in addition to crawling comment using reptile from e-commerce platform, also including dictionary training, (i.e. aforesaid training is produced Product feature lexicon), classifier training (i.e. aforesaid training sentiment dictionary) and comment process (i.e. aforesaid product feature-emotion Word is to processing).False comment and advertisement in ecommerce comment is less, is a suitable data set.

In terms of classifier training, the embodiment of the present invention using Nae Bayesianmethod (Naive Bayesian, NB) with The Sentiment orientation for judging efficiently and easily a comment is positive or negative.NB methods are famous to stablize, and permit Perhaps self-defining feature and prior probability are introduced.

Nae Bayesianmethod is a naive model that can well carry out text classification.In this statistical models In,

The value of class C* is that c for causing P (c | d) probable value maximum, and formula is

According to Bayes' theorem, and P (c | d) can so calculate：

Wherein P (c) is the probability of a classification, and front and negative two class are divided into here.P (d) represents what comment occurred Probability.In fact, the molecular moiety of this fraction need to be only paid close attention to, because denominator is not dependent on the constant of c.If used The corpus of one balance, that is, select to be used for the front trained as the size of negative training set, then P (c) can also Have ignored.

If d={ f1, f2 ..., fn }, f1, f2 ..., fn represents each feature in comment, in naive Bayesian, It is all conditional sampling that they is assumed to be, and then we just can obtain：

Using each word in comment as characteristic, two following methods can make up some short commentary opinions and lack enough The shortcoming of many tell-tale Feature Words.

Some emoticons and punctuation mark in comment can also be used as the deictic words in classifying.As an example Son, " ^_^ " expresses positive emotion, and " QAQ " then expresses negative emotion.As for punctuation mark, it has been found that one A little punctuation mark combinations can express the emotion beyond text itself.So these symbols can be taken as the Feature Words classified, Such as punctuation mark "～" (being repeated one or more times) is a positive emotion deictic words, "" (the query mark in Chinese, weight It is multiple twice or repeatedly) be a negative emotion deictic words.The sample list of emoticon is given in table 1, mark is given in table 2 The sample list of point symbol.

The emoticon used in the Chinese comment of table 1.

Table 2. indicates the punctuation mark of emotion

Some idioms such as " what is do not had says " or " not the talking about " having in Chinese comment implys that the one kind to this product just The emotion in face.We have collected some and are used in the idiom in comment for classifying.Below table 3 has listed these idioms.

Idiom in the Chinese comment of table 3.

Meanwhile, often there is the comment that some are short, there is no complicated expression can be as template come for classifying. Give an example, consumer can say " not feeling quite pleased " to express the expected meaning that this commodity is not reaching to them.The present invention Embodiment gives some templates based on regular expression, and positive word represents some words as " satisfaction ", Negative word represents some words as " poor ".

Negatively：

Not (YES) * (too | very) * (making us) * (<Positive evaluates word>)

(really it is | simply) * (too) * (<Unfavorable ratings word>) () *

Front:

(being really) * (too) * (<Positive evaluates word>) () *

Not (.) * so (<Unfavorable ratings word>)

Wherein, * represents the direct connection relational between word, and () represents optional word, | represent multiple words and select one. generation Any vocabulary of table.Each matching (above-mentioned emoticon and punctuation mark, idiom and template) in comment is calculated in study All can be a feature in method, even if therefore from a short commentary opinion, we can also generate enough for NB Algorithm Many features.

Feature extraction model is used for excavating product feature from user comment, and the main method of this model is grammer point Analysis.Because emotion word and feature word would generally be occurred in pairs in comment, we use a window traversal training The each comment concentrated goes to find out emotion-feature word pair.Simultaneously during traversal training set, can tie up in this model One sentiment dictionary of shield and a feature lexicon.The new Feature Words and emotion word found in each comment can update word With the excavation commented on later in allusion quotation.

" dictionary-window " basic model (DWM) is right based on " feature-emotion " in comment." feature-emotion " model meets The comment custom of user.

Example：Mobile phone quality is pretty good.

In this comment, Feature Words " mobile phone quality " and emotion word " good " constitute " feature-emotion " word Right, Feature Words are before emotion word.

Collect data and manpower comparing is relatively reached a conclusion, the comment comprising " feature-emotion " model accounts for the 84% of data altogether.

The identification higher in order to obtain emotion word, needs first to carry out some pretreatment works.After comment is divided, Merge some nouns and will provide for more accurate Feature Words.

Example：Screen resolution is very high.

In this comment, participle instrument can be noted as noun word " screen " and word " resolution ratio ".If I Be only using the two words as Feature Words, it is evident that do so is not so accurate.We may find that, " shield Curtain resolution ratio " is accurate Feature Words in this comment.That is, during feature extraction, the two words are merged Can be more suitable as a Feature Words into noun phrase.

The embodiment of the present invention defines three kinds of comment forms, it is possible to create can be used as the noun phrase of product feature word.

1. single word

Example：It is extremely cheap.

In this comment, price is the latter single word in participle.We use " price " as a product Characteristic.

2. two adjacent words

Example：Screen resolution is very high.

We are merged into " screen " and " resolution ratio " after " screen resolution " as a product feature word.

3. noun+" "+noun

Example：The color of shell is plain.

In this comment, we use " color of shell " as product feature, rather than single word, than Such as：" shell " or " color ".

Model can find " emotion-feature " word pair from comment, and can iteration more neologisms in the training process Allusion quotation.Our word is called a unit, if there is n unit between emotion word and product feature word, then Wo Menye It is called the window that size is n.We are a document comprising Feature Words or emotion word dictionary definition.

First, emotion word dictionary and initial characteristicses word dictionary are initialized.

Secondly, for training set in the c that comments on per bar be handled as follows：

C is pre-processed, including participle and part-of-speech tagging etc.；

The all emotion words that will be contained in emotion word dictionary and c are included into set S；

The all Feature Words that will be contained in Feature Words dictionary and c are included into set F；

Initialize and reset emotion word naturalization set and Feature Words naturalization set；

For each emotion word in S, if counted forward from s, there is noun f within m unit, then it is assumed that f is feature Word, and add in Feature Words naturalization set；

For each Feature Words in F, if counted up from f, there is adjective s within m unit, then it is assumed that s is feelings Sense word, and add in emotion word naturalization set；

During emotion word naturalization set and Feature Words naturalization set are respectively incorporated into into emotion word dictionary and initial characteristicses word dictionary.

Experiment demonstrates effectiveness of the invention.This experiment uses 300 front comments and 300 negative reviews as language Material storehouse training grader, and carried out after participle and part-of-speech tagging using ICTCLAS, according to the number of times that each word occurs, system Each word has been counted in front and the negative prior probability concentrated.This experiment simultaneously uses 50000 comments to use as input DWM models are processed, and have obtained 3732 noun phrases, comprising some nonsensical words.Remove and wherein occur frequency Word of the rate less than 300, eliminates some words for occurring still representing a product feature many times, such as " family People ", " friend " etc., are left 283 Feature Words altogether in feature lexicon.For each comment, this experiment is carried out based on dictionary Feature extraction, then carries out participle and part-of-speech tagging for classification.Substantial amounts of comment can mention more than one product feature and Contain the specific emotion of some corresponding with feature, therefore this experiment is based on punctuation mark and blank sentence is carried out point Cut, and using these clauses as input.

For the availability of proof system, experiment is carried out on some real data sets.This experiment is to the two Model is separately tested and shows in the table result, including sentiment analysis experiment extracts experiment with product feature.

Sentiment analysis	Classify in front	Negative classification
			Mark front	1443	357
Mark is negative	257	1543

The sentiment analysis experimental result of table 4

The 1800 positive comments and 1800 negative reviews extracted from Jingdone district, assessment result such as upper table are labelled with by hand It is shown.Positive recall rate and accuracy 80% and 84%, and negative recall rate and accuracy 85% and 81%.

Product is extracted	Feature	Non- feature
			Extraction feature	884	165
Non- extraction feature	91	—

The product feature of table 5 extracts experimental result

Feature Words are selected by hand in 400 comments from Jingdone district, and assessment result is as shown above.Recall rate is 90%, Accuracy 84%.

In sum, by means of the above-mentioned technical proposal of the present invention, by training emotion dictionary and product feature dictionary, grab Take a plurality of different consumers iteration is updated for the text comments information extraction of target product goes out product feature-emotion word and feel Feelings dictionary and product feature dictionary, and the technological means for obtaining the product feature with emotion evaluation of the product is counted, can be in Excavate with the feature for evaluating product on literary field and carry out statistical analysis, the overall merit for Chinese ecommerce provides data Hold.

Those of ordinary skill in the art should be understood：The specific embodiment of the present invention is the foregoing is only, and The restriction present invention, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. are not used in, Should be included within protection scope of the present invention.

Claims

1. a kind of excavation of product feature and evaluation method, it is characterised in that include：

Determine target product, and a plurality of different consumers are captured from e-commerce platform for the text comments of target product are believed Breath；

According to the emotion dictionary and product feature dictionary, product is extracted from per text comments information described in bar successively special - emotion word pair is levied, and the emotion dictionary and product feature word are updated to iteration using the product feature-emotion word Allusion quotation, until a plurality of text comments information is processed；

2. method according to claim 1, it is characterised in that capture text comments of a plurality of consumer for product at random Information training emotion dictionary includes：

It is determined that being that front is evaluated or unfavorable ratings per the text comments information of random crawl described in bar；

To carrying out word segmentation processing per the text comments information of random crawl described in bar, and each word obtained to word segmentation processing is carried out Part-of-speech tagging；

Each adjective for being obtained word segmentation processing using Nae Bayesianmethod is calculated according to the adjectival occurrence number should It is adjectival to evaluate and the prior probability in unfavorable ratings in front；

The adjective is classified as by front word or negative with the prior probability in unfavorable ratings according to adjectival the evaluation in front Face word, and add in the emotion dictionary.

3. method according to claim 2, it is characterised in that to carrying out per the text comments information of random crawl described in bar During word segmentation processing, also emoticon and punctuation mark are considered as into word carries out word segmentation processing；Each word that word segmentation processing is obtained When carrying out part-of-speech tagging, also emoticon and punctuation mark are considered as into word carries out part-of-speech tagging, and is labeled as adjective.

4. method according to claim 2, it is characterised in that to carrying out per the text comments information of random crawl described in bar During word segmentation processing, also idiom and Sentence Template are considered as into word carries out word segmentation processing；Each word that word segmentation processing is obtained When carrying out part-of-speech tagging, also idiom and Sentence Template are considered as into word carries out part-of-speech tagging, and is labeled as adjective.

5. method according to claim 1, it is characterised in that capture text comments of a plurality of consumer for product at random Information training product feature dictionary includes：

Extract all independent noun that word segmentation processing obtains and add in the product feature dictionary；

Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and the compound word is integrally made Add in the product feature dictionary for single noun；

Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by the phrase entirety Add in the product feature dictionary as single noun.

6. method according to claim 1, it is characterised in that according to the emotion dictionary and product feature dictionary, successively Product feature-emotion word is extracted from per text comments information described in bar to including：

Specify per text comments information described in bar successively, and the designated text comments information is pre-processed；

Extract from the pretreated designated text comments information and the word for matching recorded with the emotion dictionary, As the emotion vocabulary of the designated text comments information；

Extract from the pretreated designated text comments information and record what is matched with the product feature dictionary Word, as the product feature vocabulary of the designated text comments information；

According to the emotion vocabulary and product feature vocabulary, by " product feature-emotion " model quilt after the pre-treatment Multiple product features-emotion word pair is extracted in specified text comments information.

7. method according to claim 6, it is characterised in that pretreatment bag is carried out to the designated text comments information Include：

The designated text comments information is divided into the multiple words for connecting in certain sequence；

Part-of-speech tagging is carried out to described each word.

8. method according to claim 7, it is characterised in that according to the emotion vocabulary for extracting and product feature word Converge, by extracting multiple products in " product feature-emotion " model described designated text comments information after the pre-treatment Feature-emotion word is to including：

Each described emotion vocabulary is specified successively, according to the designated emotion vocabulary described designated text after the pre-treatment Word position in review information, extracts and the word that all parts of speech in length are noted as noun is preassigned before the position, And the word that the designated emotion vocabulary and each part of speech are noted as noun is set up into one by one product feature-emotion word Language pair, until each described emotion vocabulary was designated；

Each described product feature vocabulary is specified successively, according to the designated product feature vocabulary quilt after the pre-treatment Word position in specified text comments information, extract after the position preassign length in all parts of speech be noted as describing The word of word, and the designated product feature vocabulary and each part of speech are noted as into the adjectival word set up one by one Product feature-emotion word pair, until each described product feature vocabulary was designated.

9. method according to claim 1, it is characterised in that iteration is updated using the product feature-emotion word The emotion dictionary includes with product feature dictionary：

The emotion part of words of the product feature-emotion word centering is incorporated in the emotion dictionary；

The product feature part of words of the product feature-emotion word centering is incorporated in the product feature dictionary.