[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106649519A - Method of digging and assessing product features - Google Patents

Method of digging and assessing product features Download PDF

Info

Publication number
CN106649519A
CN106649519A CN201610903523.6A CN201610903523A CN106649519A CN 106649519 A CN106649519 A CN 106649519A CN 201610903523 A CN201610903523 A CN 201610903523A CN 106649519 A CN106649519 A CN 106649519A
Authority
CN
China
Prior art keywords
word
emotion
product feature
product
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610903523.6A
Other languages
Chinese (zh)
Other versions
CN106649519B (en
Inventor
孙鹏飞
吴国仕
许可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201610903523.6A priority Critical patent/CN106649519B/en
Publication of CN106649519A publication Critical patent/CN106649519A/en
Application granted granted Critical
Publication of CN106649519B publication Critical patent/CN106649519B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method of digging and assessing product features. The method comprises the steps that multiple items of text assessment information on emotional training lexicon and product feature lexicon are randomly captured for consumers; a target product is determined, multiple items of text assessment information on the target product for different consumers are captured from an e-commerce platform; based on the emotional training lexicon and the product feature lexicon, product feature-emotional phrase pairs are extracted in sequence from each item of text assessment information, and are used to iteratively update the emotional training lexicon and the product feature lexicon until all the multiple items of text assessment information are processed; the extracted product feature-emotional phrase pairs are statistically recorded, and the emotional training lexicon and the product feature lexicon are obtained. The method can dig and assess product features and carry out statistical analyses in the field of Chinese characters, and provides data support for the overall assessment of Chinese character e-commerce business.

Description

A kind of excavation of product feature and evaluation method
Technical field
The present invention relates to e-commerce field, especially, is related to excavation and the evaluation method of a kind of product feature.
Background technology
The appearance applied with Web2.0 and fast development, ecommerce is developing into the business model of a prosperity, So it is more prone to get the feedback to commodity, client can leave their comment after consumption, and client afterwards is just Can be decided whether to buy the commodity according to comment.For consumer, if they can be to the front of commodity and negative The percentage of comment has one clearly to recognize, and can preferably make a choice for whether commodity are bought.People are evaluated most The feature of those commodity to be supplied to consumer be also helpful.Meanwhile, these results can also report to manufacturer and Ecommerce retailer, can be used to help the commodity for improving them and service.Then information on integration networkses is simultaneously united Meter analysis, then providing an overall merit to consumer just becomes extremely important.
In terms of market information is obtained, prior art, as the corpus for excavating market information, includes master using Twitter Topic detection and emotional semantic classification.However, in Chinese, containing the microblogging of the content information of a large number of users generation has too many falseness to comment By and advertisement, it is difficult to create suitable data set.Meanwhile, sentiment analysis are usually taken as text classification problem, with news article Different with the formal form text of scientific paper form, UGC sometimes can't be in strict accordance with grammer, but it is attached also to have some Plus information.Used in Twitter as ":-) " as the feature that detects as polarity of emoticon;Tongue etc. also may be used Using the instruction as front and negative emotions.In addition, how to extract them from the comment of commodity is characterized in that prior art Subject matter.The authors that English product feature is excavated carry out feature extraction using major terms and noun phrase, and Apriori algorithm generates frequent feature set and prunes preliminary set, but this means is not suitable for Chinese.
The problem that the method excavated for the Chinese-English text comment of prior art and evaluated is not applied in Chinese field, mesh It is front not yet to have effective solution.
The content of the invention
In view of this, it is an object of the invention to propose a kind of excavation and the evaluation method of product feature, can be in Chinese Excavate with the feature for evaluating product on field and carry out statistical analysis, the overall merit for Chinese ecommerce provides data Hold.
Based on above-mentioned purpose, the technical scheme that the present invention is provided is as follows:
The invention provides a kind of excavation of product feature and evaluation method, including:
Text comments information training emotion dictionary and product feature dictionary of a plurality of consumer for product is captured at random;
Determine target product, and a plurality of different consumers are captured from e-commerce platform for the text of target product is commented By information;
According to emotion dictionary and product feature dictionary, product feature-sense is extracted from every text review information successively Feelings word pair, and emotion dictionary and product feature dictionary are updated to iteration using product feature-emotion word, until a plurality of text Review information is processed;
To all product feature-emotion words being extracted to counting, product feature and the sense of the product is obtained Feelings are evaluated.
Wherein, a plurality of consumer is captured at random for the text comments information training emotion dictionary of product includes:
It is determined that the text comments information captured at random per bar is front evaluating or unfavorable ratings;
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out Part-of-speech tagging;
Each adjective obtained word segmentation processing using Nae Bayesianmethod is according to the adjectival occurrence number meter Calculate the adjectival prior probability in front evaluation with unfavorable ratings;
The adjective is classified as by front word with the prior probability in unfavorable ratings according to adjectival the evaluation in front Or negative word, and add in emotion dictionary.
Also, when the text comments information to capturing at random per bar carries out word segmentation processing, also emoticon and punctuate are accorded with Number being considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by emoticon and mark Point symbol is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Meanwhile, when the text comments information to capturing at random per bar carries out word segmentation processing, also by idiom and sentence pattern mould Plate is considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by idiom and sentence Pattern plate is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Wherein, a plurality of consumer is captured at random for the text comments information training product feature dictionary of product includes:
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out Part-of-speech tagging;
Extract all independent noun that word segmentation processing obtains and add in product feature dictionary;
Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and compound word is integrally made Add in product feature dictionary for single noun;
Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by phrase entirety Add in product feature dictionary as single noun.
Wherein, according to emotion dictionary and product feature dictionary, product is extracted from every text review information successively special Levy-emotion word is to including:
Every text review information is specified successively, and is pre-processed to being designated text comments information;
Extract from pretreated designated text comments information and the word for matching is recorded with emotion dictionary, as The emotion vocabulary of designated text comments information;
Extract from pretreated designated text comments information and the word for matching recorded with product feature dictionary, As the product feature vocabulary of designated text comments information;
According to emotion vocabulary and product feature vocabulary, by " product feature-emotion " model being designated after the pre-treatment Multiple product features-emotion word pair is extracted in text comments information.
Also, carrying out pretreatment to designated text comments information includes:
Designated text comments information is divided into the multiple words for connecting in certain sequence;
Part-of-speech tagging is carried out to each word.
Also, according to the emotion vocabulary for extracting and product feature vocabulary, located in advance by " product feature-emotion " model Multiple product features-emotion word is extracted in designated text comments information after reason to including:
Each emotion vocabulary is specified successively, according to designated emotion vocabulary designated text comments information after the pre-treatment In word position, extract and preassign before the position all parts of speech in length and be noted as the word of noun, and will be referred to Emotion vocabulary and each part of speech are determined and are noted as the word of noun to set up product feature-emotion word pair one by one, until each sense Feelings vocabulary was all designated;
Each product feature vocabulary is specified successively, according to designated product feature vocabulary designated text after the pre-treatment Word position in review information, extract after the position preassign length in all parts of speech be noted as adjectival list Word, and designated product feature vocabulary and each part of speech are noted as into adjectival word set up product feature-emotion one by one Word pair, until each product feature vocabulary was designated.
In addition, update emotion dictionary to iteration using product feature-emotion word including with product feature dictionary:
The emotion part of words of product feature-emotion word centering is incorporated in emotion dictionary;
The product feature part of words of product feature-emotion word centering is incorporated in product feature dictionary.
From the above it can be seen that the technical scheme that the present invention is provided is by using training emotion dictionary and product feature Dictionary, captures a plurality of different consumers for the text comments information extraction of target product goes out product feature-emotion word to repeatedly For renewal emotion dictionary and product feature dictionary, and the technological means for obtaining the product feature with emotion evaluation of the product is counted, Can excavate on Chinese field with the feature for evaluating product and carry out statistical analysis, be that the overall merit of Chinese ecommerce is carried Support for data.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only some enforcements of the present invention Example, for those of ordinary skill in the art, on the premise of not paying creative work, can be being obtained according to these accompanying drawings Obtain other accompanying drawings.
Fig. 1 is a kind of flow chart of the excavation with evaluation method of the product feature according to the embodiment of the present invention;
A kind of excavation of product feature of the embodiment of the present invention and the system architecture of evaluation method are applied according to Fig. 2 Figure.
Specific embodiment
To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with the embodiment of the present invention Accompanying drawing, the technical scheme in the embodiment of the present invention is further carried out it is clear, complete, describe in detail, it is clear that it is described Embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.Based on the embodiment in the present invention, this area The every other embodiment that those of ordinary skill is obtained, belongs to the scope of protection of the invention.
Embodiments in accordance with the present invention, there is provided a kind of excavation of product feature and evaluation method.
As shown in figure 1, a kind of excavation of product feature of offer according to embodiments of the present invention includes with evaluation method:
Step S101, captures at random a plurality of consumer special with product for the text comments information training emotion dictionary of product Levy dictionary;
Step S103, determines target product, and a plurality of different consumers are captured from e-commerce platform for target is produced The text comments information of product;
Step S105, according to emotion dictionary and product feature dictionary, extracts from every text review information produce successively Product feature-emotion word pair, and emotion dictionary and product feature dictionary are updated to iteration using product feature-emotion word, directly It is processed to a plurality of text comments information;
Step S107, to all product feature-emotion words being extracted to counting, obtains the product of the product Feature and emotion evaluation.
Wherein, a plurality of consumer is captured at random for the text comments information training emotion dictionary of product includes:
It is determined that the text comments information captured at random per bar is front evaluating or unfavorable ratings;
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out Part-of-speech tagging;
Each adjective obtained word segmentation processing using Nae Bayesianmethod is according to the adjectival occurrence number meter Calculate the adjectival prior probability in front evaluation with unfavorable ratings;
The adjective is classified as by front word with the prior probability in unfavorable ratings according to adjectival the evaluation in front Or negative word, and add in emotion dictionary.
Also, when the text comments information to capturing at random per bar carries out word segmentation processing, also emoticon and punctuate are accorded with Number being considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by emoticon and mark Point symbol is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Meanwhile, when the text comments information to capturing at random per bar carries out word segmentation processing, also by idiom and sentence pattern mould Plate is considered as word carries out word segmentation processing;When each word obtained to word segmentation processing carries out part-of-speech tagging, also by idiom and sentence Pattern plate is considered as word carries out part-of-speech tagging, and is labeled as adjective.
Wherein, a plurality of consumer is captured at random for the text comments information training product feature dictionary of product includes:
Text comments information to capturing at random per bar carries out word segmentation processing, and each word obtained to word segmentation processing is carried out Part-of-speech tagging;
Extract all independent noun that word segmentation processing obtains and add in product feature dictionary;
Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and compound word is integrally made Add in product feature dictionary for single noun;
Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by phrase entirety Add in product feature dictionary as single noun.
Wherein, according to emotion dictionary and product feature dictionary, product is extracted from every text review information successively special Levy-emotion word is to including:
Every text review information is specified successively, and is pre-processed to being designated text comments information;
Extract from pretreated designated text comments information and the word for matching is recorded with emotion dictionary, as The emotion vocabulary of designated text comments information;
Extract from pretreated designated text comments information and the word for matching recorded with product feature dictionary, As the product feature vocabulary of designated text comments information;
According to emotion vocabulary and product feature vocabulary, by " product feature-emotion " model being designated after the pre-treatment Multiple product features-emotion word pair is extracted in text comments information.
Also, carrying out pretreatment to designated text comments information includes:
Designated text comments information is divided into the multiple words for connecting in certain sequence;
Part-of-speech tagging is carried out to each word.
Also, according to the emotion vocabulary for extracting and product feature vocabulary, located in advance by " product feature-emotion " model Multiple product features-emotion word is extracted in designated text comments information after reason to including:
Each emotion vocabulary is specified successively, according to designated emotion vocabulary designated text comments information after the pre-treatment In word position, extract and preassign before the position all parts of speech in length and be noted as the word of noun, and will be referred to Emotion vocabulary and each part of speech are determined and are noted as the word of noun to set up product feature-emotion word pair one by one, until each sense Feelings vocabulary was all designated;
Each product feature vocabulary is specified successively, according to designated product feature vocabulary designated text after the pre-treatment Word position in review information, extract after the position preassign length in all parts of speech be noted as adjectival list Word, and designated product feature vocabulary and each part of speech are noted as into adjectival word set up product feature-emotion one by one Word pair, until each product feature vocabulary was designated.
In addition, update emotion dictionary to iteration using product feature-emotion word including with product feature dictionary:
The emotion part of words of product feature-emotion word centering is incorporated in emotion dictionary;
The product feature part of words of product feature-emotion word centering is incorporated in product feature dictionary.
The technical characteristic of the present invention is expanded on further below according to specific embodiment.
Fig. 2 is illustrated that the system for applying the excavation Yu evaluation method of product feature according to embodiments of the present invention.Such as Fig. 2 Shown, in addition to crawling comment using reptile from e-commerce platform, also including dictionary training, (i.e. aforesaid training is produced Product feature lexicon), classifier training (i.e. aforesaid training sentiment dictionary) and comment process (i.e. aforesaid product feature-emotion Word is to processing).False comment and advertisement in ecommerce comment is less, is a suitable data set.
In terms of classifier training, the embodiment of the present invention using Nae Bayesianmethod (Naive Bayesian, NB) with The Sentiment orientation for judging efficiently and easily a comment is positive or negative.NB methods are famous to stablize, and permit Perhaps self-defining feature and prior probability are introduced.
Nae Bayesianmethod is a naive model that can well carry out text classification.In this statistical models In,
The value of class C* is that c for causing P (c | d) probable value maximum, and formula is
According to Bayes' theorem, and P (c | d) can so calculate:
Wherein P (c) is the probability of a classification, and front and negative two class are divided into here.P (d) represents what comment occurred Probability.In fact, the molecular moiety of this fraction need to be only paid close attention to, because denominator is not dependent on the constant of c.If used The corpus of one balance, that is, select to be used for the front trained as the size of negative training set, then P (c) can also Have ignored.
If d={ f1, f2 ..., fn }, f1, f2 ..., fn represents each feature in comment, in naive Bayesian, It is all conditional sampling that they is assumed to be, and then we just can obtain:
Using each word in comment as characteristic, two following methods can make up some short commentary opinions and lack enough The shortcoming of many tell-tale Feature Words.
Some emoticons and punctuation mark in comment can also be used as the deictic words in classifying.As an example Son, " ^_^ " expresses positive emotion, and " QAQ " then expresses negative emotion.As for punctuation mark, it has been found that one A little punctuation mark combinations can express the emotion beyond text itself.So these symbols can be taken as the Feature Words classified, Such as punctuation mark "~" (being repeated one or more times) is a positive emotion deictic words, "" (the query mark in Chinese, weight It is multiple twice or repeatedly) be a negative emotion deictic words.The sample list of emoticon is given in table 1, mark is given in table 2 The sample list of point symbol.
The emoticon used in the Chinese comment of table 1.
Table 2. indicates the punctuation mark of emotion
Some idioms such as " what is do not had says " or " not the talking about " having in Chinese comment implys that the one kind to this product just The emotion in face.We have collected some and are used in the idiom in comment for classifying.Below table 3 has listed these idioms.
Idiom in the Chinese comment of table 3.
Meanwhile, often there is the comment that some are short, there is no complicated expression can be as template come for classifying. Give an example, consumer can say " not feeling quite pleased " to express the expected meaning that this commodity is not reaching to them.The present invention Embodiment gives some templates based on regular expression, and positive word represents some words as " satisfaction ", Negative word represents some words as " poor ".
Negatively:
Not (YES) * (too | very) * (making us) * (<Positive evaluates word>)
(really it is | simply) * (too) * (<Unfavorable ratings word>) () *
Front:
(being really) * (too) * (<Positive evaluates word>) () *
Not (.) * so (<Unfavorable ratings word>)
Wherein, * represents the direct connection relational between word, and () represents optional word, | represent multiple words and select one. generation Any vocabulary of table.Each matching (above-mentioned emoticon and punctuation mark, idiom and template) in comment is calculated in study All can be a feature in method, even if therefore from a short commentary opinion, we can also generate enough for NB Algorithm Many features.
Feature extraction model is used for excavating product feature from user comment, and the main method of this model is grammer point Analysis.Because emotion word and feature word would generally be occurred in pairs in comment, we use a window traversal training The each comment concentrated goes to find out emotion-feature word pair.Simultaneously during traversal training set, can tie up in this model One sentiment dictionary of shield and a feature lexicon.The new Feature Words and emotion word found in each comment can update word With the excavation commented on later in allusion quotation.
" dictionary-window " basic model (DWM) is right based on " feature-emotion " in comment." feature-emotion " model meets The comment custom of user.
Example:Mobile phone quality is pretty good.
In this comment, Feature Words " mobile phone quality " and emotion word " good " constitute " feature-emotion " word Right, Feature Words are before emotion word.
Collect data and manpower comparing is relatively reached a conclusion, the comment comprising " feature-emotion " model accounts for the 84% of data altogether.
The identification higher in order to obtain emotion word, needs first to carry out some pretreatment works.After comment is divided, Merge some nouns and will provide for more accurate Feature Words.
Example:Screen resolution is very high.
In this comment, participle instrument can be noted as noun word " screen " and word " resolution ratio ".If I Be only using the two words as Feature Words, it is evident that do so is not so accurate.We may find that, " shield Curtain resolution ratio " is accurate Feature Words in this comment.That is, during feature extraction, the two words are merged Can be more suitable as a Feature Words into noun phrase.
The embodiment of the present invention defines three kinds of comment forms, it is possible to create can be used as the noun phrase of product feature word.
1. single word
Example:It is extremely cheap.
In this comment, price is the latter single word in participle.We use " price " as a product Characteristic.
2. two adjacent words
Example:Screen resolution is very high.
We are merged into " screen " and " resolution ratio " after " screen resolution " as a product feature word.
3. noun+" "+noun
Example:The color of shell is plain.
In this comment, we use " color of shell " as product feature, rather than single word, than Such as:" shell " or " color ".
Model can find " emotion-feature " word pair from comment, and can iteration more neologisms in the training process Allusion quotation.Our word is called a unit, if there is n unit between emotion word and product feature word, then Wo Menye It is called the window that size is n.We are a document comprising Feature Words or emotion word dictionary definition.
First, emotion word dictionary and initial characteristicses word dictionary are initialized.
Secondly, for training set in the c that comments on per bar be handled as follows:
C is pre-processed, including participle and part-of-speech tagging etc.;
The all emotion words that will be contained in emotion word dictionary and c are included into set S;
The all Feature Words that will be contained in Feature Words dictionary and c are included into set F;
Initialize and reset emotion word naturalization set and Feature Words naturalization set;
For each emotion word in S, if counted forward from s, there is noun f within m unit, then it is assumed that f is feature Word, and add in Feature Words naturalization set;
For each Feature Words in F, if counted up from f, there is adjective s within m unit, then it is assumed that s is feelings Sense word, and add in emotion word naturalization set;
During emotion word naturalization set and Feature Words naturalization set are respectively incorporated into into emotion word dictionary and initial characteristicses word dictionary.
Experiment demonstrates effectiveness of the invention.This experiment uses 300 front comments and 300 negative reviews as language Material storehouse training grader, and carried out after participle and part-of-speech tagging using ICTCLAS, according to the number of times that each word occurs, system Each word has been counted in front and the negative prior probability concentrated.This experiment simultaneously uses 50000 comments to use as input DWM models are processed, and have obtained 3732 noun phrases, comprising some nonsensical words.Remove and wherein occur frequency Word of the rate less than 300, eliminates some words for occurring still representing a product feature many times, such as " family People ", " friend " etc., are left 283 Feature Words altogether in feature lexicon.For each comment, this experiment is carried out based on dictionary Feature extraction, then carries out participle and part-of-speech tagging for classification.Substantial amounts of comment can mention more than one product feature and Contain the specific emotion of some corresponding with feature, therefore this experiment is based on punctuation mark and blank sentence is carried out point Cut, and using these clauses as input.
For the availability of proof system, experiment is carried out on some real data sets.This experiment is to the two Model is separately tested and shows in the table result, including sentiment analysis experiment extracts experiment with product feature.
Sentiment analysis Classify in front Negative classification
Mark front 1443 357
Mark is negative 257 1543
The sentiment analysis experimental result of table 4
The 1800 positive comments and 1800 negative reviews extracted from Jingdone district, assessment result such as upper table are labelled with by hand It is shown.Positive recall rate and accuracy 80% and 84%, and negative recall rate and accuracy 85% and 81%.
Product is extracted Feature Non- feature
Extraction feature 884 165
Non- extraction feature 91
The product feature of table 5 extracts experimental result
Feature Words are selected by hand in 400 comments from Jingdone district, and assessment result is as shown above.Recall rate is 90%, Accuracy 84%.
In sum, by means of the above-mentioned technical proposal of the present invention, by training emotion dictionary and product feature dictionary, grab Take a plurality of different consumers iteration is updated for the text comments information extraction of target product goes out product feature-emotion word and feel Feelings dictionary and product feature dictionary, and the technological means for obtaining the product feature with emotion evaluation of the product is counted, can be in Excavate with the feature for evaluating product on literary field and carry out statistical analysis, the overall merit for Chinese ecommerce provides data Hold.
Those of ordinary skill in the art should be understood:The specific embodiment of the present invention is the foregoing is only, and The restriction present invention, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. are not used in, Should be included within protection scope of the present invention.

Claims (9)

1. a kind of excavation of product feature and evaluation method, it is characterised in that include:
Text comments information training emotion dictionary and product feature dictionary of a plurality of consumer for product is captured at random;
Determine target product, and a plurality of different consumers are captured from e-commerce platform for the text comments of target product are believed Breath;
According to the emotion dictionary and product feature dictionary, product is extracted from per text comments information described in bar successively special - emotion word pair is levied, and the emotion dictionary and product feature word are updated to iteration using the product feature-emotion word Allusion quotation, until a plurality of text comments information is processed;
To all product feature-emotion words being extracted to counting, product feature and the sense of the product is obtained Feelings are evaluated.
2. method according to claim 1, it is characterised in that capture text comments of a plurality of consumer for product at random Information training emotion dictionary includes:
It is determined that being that front is evaluated or unfavorable ratings per the text comments information of random crawl described in bar;
To carrying out word segmentation processing per the text comments information of random crawl described in bar, and each word obtained to word segmentation processing is carried out Part-of-speech tagging;
Each adjective for being obtained word segmentation processing using Nae Bayesianmethod is calculated according to the adjectival occurrence number should It is adjectival to evaluate and the prior probability in unfavorable ratings in front;
The adjective is classified as by front word or negative with the prior probability in unfavorable ratings according to adjectival the evaluation in front Face word, and add in the emotion dictionary.
3. method according to claim 2, it is characterised in that to carrying out per the text comments information of random crawl described in bar During word segmentation processing, also emoticon and punctuation mark are considered as into word carries out word segmentation processing;Each word that word segmentation processing is obtained When carrying out part-of-speech tagging, also emoticon and punctuation mark are considered as into word carries out part-of-speech tagging, and is labeled as adjective.
4. method according to claim 2, it is characterised in that to carrying out per the text comments information of random crawl described in bar During word segmentation processing, also idiom and Sentence Template are considered as into word carries out word segmentation processing;Each word that word segmentation processing is obtained When carrying out part-of-speech tagging, also idiom and Sentence Template are considered as into word carries out part-of-speech tagging, and is labeled as adjective.
5. method according to claim 1, it is characterised in that capture text comments of a plurality of consumer for product at random Information training product feature dictionary includes:
To carrying out word segmentation processing per the text comments information of random crawl described in bar, and each word obtained to word segmentation processing is carried out Part-of-speech tagging;
Extract all independent noun that word segmentation processing obtains and add in the product feature dictionary;
Extract all multiple nouns that word segmentation processing obtains and be joined directly together the compound word to be formed, and the compound word is integrally made Add in the product feature dictionary for single noun;
Extract between all multiple nouns that word segmentation processing is obtained with " " phrase that is connected to form, and by the phrase entirety Add in the product feature dictionary as single noun.
6. method according to claim 1, it is characterised in that according to the emotion dictionary and product feature dictionary, successively Product feature-emotion word is extracted from per text comments information described in bar to including:
Specify per text comments information described in bar successively, and the designated text comments information is pre-processed;
Extract from the pretreated designated text comments information and the word for matching recorded with the emotion dictionary, As the emotion vocabulary of the designated text comments information;
Extract from the pretreated designated text comments information and record what is matched with the product feature dictionary Word, as the product feature vocabulary of the designated text comments information;
According to the emotion vocabulary and product feature vocabulary, by " product feature-emotion " model quilt after the pre-treatment Multiple product features-emotion word pair is extracted in specified text comments information.
7. method according to claim 6, it is characterised in that pretreatment bag is carried out to the designated text comments information Include:
The designated text comments information is divided into the multiple words for connecting in certain sequence;
Part-of-speech tagging is carried out to described each word.
8. method according to claim 7, it is characterised in that according to the emotion vocabulary for extracting and product feature word Converge, by extracting multiple products in " product feature-emotion " model described designated text comments information after the pre-treatment Feature-emotion word is to including:
Each described emotion vocabulary is specified successively, according to the designated emotion vocabulary described designated text after the pre-treatment Word position in review information, extracts and the word that all parts of speech in length are noted as noun is preassigned before the position, And the word that the designated emotion vocabulary and each part of speech are noted as noun is set up into one by one product feature-emotion word Language pair, until each described emotion vocabulary was designated;
Each described product feature vocabulary is specified successively, according to the designated product feature vocabulary quilt after the pre-treatment Word position in specified text comments information, extract after the position preassign length in all parts of speech be noted as describing The word of word, and the designated product feature vocabulary and each part of speech are noted as into the adjectival word set up one by one Product feature-emotion word pair, until each described product feature vocabulary was designated.
9. method according to claim 1, it is characterised in that iteration is updated using the product feature-emotion word The emotion dictionary includes with product feature dictionary:
The emotion part of words of the product feature-emotion word centering is incorporated in the emotion dictionary;
The product feature part of words of the product feature-emotion word centering is incorporated in the product feature dictionary.
CN201610903523.6A 2016-10-17 2016-10-17 Product characteristic mining and evaluating method Expired - Fee Related CN106649519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610903523.6A CN106649519B (en) 2016-10-17 2016-10-17 Product characteristic mining and evaluating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610903523.6A CN106649519B (en) 2016-10-17 2016-10-17 Product characteristic mining and evaluating method

Publications (2)

Publication Number Publication Date
CN106649519A true CN106649519A (en) 2017-05-10
CN106649519B CN106649519B (en) 2020-11-27

Family

ID=58856101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610903523.6A Expired - Fee Related CN106649519B (en) 2016-10-17 2016-10-17 Product characteristic mining and evaluating method

Country Status (1)

Country Link
CN (1) CN106649519B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861946A (en) * 2017-11-03 2018-03-30 北京奇艺世纪科技有限公司 A kind of fine-grained evaluation information method for digging and system
CN108959247A (en) * 2018-06-19 2018-12-07 深圳市元征科技股份有限公司 A kind of data processing method, server and computer-readable medium
CN109299460A (en) * 2018-09-18 2019-02-01 北京三快在线科技有限公司 Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop
CN109598528A (en) * 2017-09-30 2019-04-09 北京国双科技有限公司 Advertisement information processing method and device
CN109684641A (en) * 2018-12-26 2019-04-26 广东工业大学 A kind of data extraction device, method, electronic equipment and storage medium
CN109902229A (en) * 2019-02-01 2019-06-18 中森云链(成都)科技有限责任公司 A kind of interpretable recommended method based on comment
CN110825876A (en) * 2019-11-07 2020-02-21 上海德拓信息技术股份有限公司 Movie comment viewpoint emotion tendency analysis method
CN111027328A (en) * 2019-11-08 2020-04-17 广州坚和网络科技有限公司 Method for judging emotion positive and negative and emotional color of comments through corpus training
CN111324745A (en) * 2020-02-18 2020-06-23 深圳市一面网络技术有限公司 Word stock generation method and device
CN112364170A (en) * 2021-01-13 2021-02-12 北京智慧星光信息技术有限公司 Data emotion analysis method and device, electronic equipment and medium
US20210200949A1 (en) * 2019-12-30 2021-07-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Pre-training method for sentiment analysis model, and electronic device
CN113158669A (en) * 2021-04-28 2021-07-23 河北冀联人力资源服务集团有限公司 Method and system for identifying positive and negative comments of employment platform
KR102609681B1 (en) * 2023-01-09 2023-12-05 트리톤 주식회사 Method for determining product planning reflecting user feedback and Apparatus thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
US20160171386A1 (en) * 2014-12-15 2016-06-16 Xerox Corporation Category and term polarity mutual annotation for aspect-based sentiment analysis
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
US20160171386A1 (en) * 2014-12-15 2016-06-16 Xerox Corporation Category and term polarity mutual annotation for aspect-based sentiment analysis
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林钦和等: "《基于情感计算的商品评论分析系统》", 《计算机应用与软件》 *
罗帆: "《基于意见挖掘的产品评论系统研究与实现》", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598528B (en) * 2017-09-30 2023-05-23 北京国双科技有限公司 Advertisement information processing method and device
CN109598528A (en) * 2017-09-30 2019-04-09 北京国双科技有限公司 Advertisement information processing method and device
CN107861946A (en) * 2017-11-03 2018-03-30 北京奇艺世纪科技有限公司 A kind of fine-grained evaluation information method for digging and system
CN108959247A (en) * 2018-06-19 2018-12-07 深圳市元征科技股份有限公司 A kind of data processing method, server and computer-readable medium
CN108959247B (en) * 2018-06-19 2022-09-09 深圳市元征科技股份有限公司 Data processing method, server and computer readable medium
CN109299460A (en) * 2018-09-18 2019-02-01 北京三快在线科技有限公司 Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop
CN109299460B (en) * 2018-09-18 2022-07-12 北京三快在线科技有限公司 Method and device for analyzing evaluation data of shop, electronic device and storage medium
CN109684641A (en) * 2018-12-26 2019-04-26 广东工业大学 A kind of data extraction device, method, electronic equipment and storage medium
CN109684641B (en) * 2018-12-26 2023-04-07 广东工业大学 Data extraction device and method, electronic equipment and storage medium
CN109902229A (en) * 2019-02-01 2019-06-18 中森云链(成都)科技有限责任公司 A kind of interpretable recommended method based on comment
CN109902229B (en) * 2019-02-01 2019-12-24 中森云链(成都)科技有限责任公司 Comment-based interpretable recommendation method
CN110825876A (en) * 2019-11-07 2020-02-21 上海德拓信息技术股份有限公司 Movie comment viewpoint emotion tendency analysis method
CN111027328A (en) * 2019-11-08 2020-04-17 广州坚和网络科技有限公司 Method for judging emotion positive and negative and emotional color of comments through corpus training
US11537792B2 (en) * 2019-12-30 2022-12-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Pre-training method for sentiment analysis model, and electronic device
US20210200949A1 (en) * 2019-12-30 2021-07-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Pre-training method for sentiment analysis model, and electronic device
CN111324745A (en) * 2020-02-18 2020-06-23 深圳市一面网络技术有限公司 Word stock generation method and device
CN112364170A (en) * 2021-01-13 2021-02-12 北京智慧星光信息技术有限公司 Data emotion analysis method and device, electronic equipment and medium
CN113158669A (en) * 2021-04-28 2021-07-23 河北冀联人力资源服务集团有限公司 Method and system for identifying positive and negative comments of employment platform
CN113158669B (en) * 2021-04-28 2023-03-28 河北冀联人力资源服务集团有限公司 Method and system for identifying positive and negative comments of employment platform
KR102609681B1 (en) * 2023-01-09 2023-12-05 트리톤 주식회사 Method for determining product planning reflecting user feedback and Apparatus thereof

Also Published As

Publication number Publication date
CN106649519B (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN106649519A (en) Method of digging and assessing product features
Kumar et al. Sentiment analysis of multimodal twitter data
Gu et al. " what parts of your apps are loved by users?"(T)
Saha et al. Proposed approach for sarcasm detection in twitter
Shilpa et al. Sentiment analysis using deep learning
Ghosh et al. Sentiment identification in code-mixed social media text
CN103853824A (en) In-text advertisement releasing method and system based on deep semantic mining
CN105183717B (en) A kind of OSN user feeling analysis methods based on random forest and customer relationship
CN102789449B (en) The method and apparatus that comment text is evaluated
Zhao et al. Sentiment analysis on the online reviews based on hidden Markov model
CN106446147A (en) Emotion analysis method based on structuring features
CN106547875A (en) A kind of online incident detection method of the microblogging based on sentiment analysis and label
Modi et al. Sentiment analysis of Twitter feeds using flask environment: A superior application of data analysis
Nowson et al. XRCE personal language analytics engine for multilingual author profiling
Anupama et al. Real time Twitter sentiment analysis using natural language processing
Ilavarasan A Survey on Sarcasm detection and challenges
Joo et al. Author profiling on social media: An ensemble learning model using various features
Nabende et al. Misinformation detection in Luganda-English code-mixed social media text
Kasmuri et al. Building a Malay-English code-switching subjectivity corpus for sentiment analysis
Rastogi et al. Sentiment analysis methods and applications–a review
Kumar et al. Multimodal sentiment prediction based on the integration of text and emojis
Li et al. Twitter sentiment analysis of the 2016 US Presidential Election using an emoji training heuristic
Rifa'i et al. Sentiment Analysis Using Text Mining Techniques On Social Media Using the Support Vector Machine Method Case Study Seagames 2023 Football Final
Ullah et al. Sentiment Analysis using Ensemble Technique on Textual and Emoticon Data
Musso et al. Opinion mining of online product reviews using a lexicon-based algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201127

Termination date: 20211017

CF01 Termination of patent right due to non-payment of annual fee