CN103970801B - Microblogging advertisement blog article recognition methods and device - Google Patents
Microblogging advertisement blog article recognition methods and device Download PDFInfo
- Publication number
- CN103970801B CN103970801B CN201310046176.6A CN201310046176A CN103970801B CN 103970801 B CN103970801 B CN 103970801B CN 201310046176 A CN201310046176 A CN 201310046176A CN 103970801 B CN103970801 B CN 103970801B
- Authority
- CN
- China
- Prior art keywords
- blog article
- advertisement
- microblogging
- word
- advertisement blog
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention discloses a kind of microblogging advertisement blog article recognition methods and device, and method includes: to create microblogging filter using known advertisement blog article and non-advertisement blog article as sample;Advertisement identification is carried out to current microblogging blog article based on microblogging filter and bayesian algorithm.The present invention is based on bayesian algorithms, using known advertisement blog article and non-advertisement blog article as sample, obtain advertisement or non-advertisement microblogging filter, and judge that current microblogging blog article is the probability of advertisement blog article using the microblogging filter, thus the advertisement blog article in microblogging is effectively identified, and improves the valid data recall rate of search engine;Further, it is also possible to which continuous update is trained to update microblogging filter, more efficient to the identification of the advertisement blog article of the stronger microblog media text of real-time by learning new advertisement blog article and normal blog article sample (non-advertisement blog article).
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of microblogging advertisement blog article recognition methods and devices.
Background technique
In internet, the identification to advertisement blog article in microblogging community is counteradvertising, the anti-important content practised fraud.Currently,
Identification method to advertisement microblogging is mainly: collecting advertisement blog article by manual type, generates the keyword for advertisement identification
Then table judges current blog article using the keyword in antistop list.As shown in Figure 1, the microblogging in Fig. 1 is advertisement
Microblogging.
In identification, it is assumed that comprising " number, member, presell " these keywords in obtained antistop list, and
Recognition rule is set are as follows: if containing these keywords in a microblogging, being considered as the microblogging is advertisement, then in Fig. 1
Microblogging can then be identified as advertisement microblogging.
But existing advertisement microblogging recognition methods has the disadvantage that
1, advertisement keyword vocabulary is difficult in maintenance, and manual identified is needed to collect, and efficiency is lower, and is manually difficult to collect complete
The advertisement blog article in face, cannot generate comprehensive antistop list, can only passively accumulate, and recall not so as to cause the identification of advertisement blog article
It is enough;In addition, therefrom select keyword also relatively difficult the advertisement blog article of artificial discovery, it is rich with two in Fig. 2 and Fig. 3
For text, wherein blog article corresponding to Fig. 2 is advertisement, single in terms of this blog article, wherein advertisement keyword can be added in " Taobao " word
Vocabulary;In terms of Fig. 3, advertisement keyword vocabulary should not be then added in " Taobao " word.
It 2, only according to antistop list whether is that advertisement judges to blog article, accuracy is difficult to control, because word
Occur and context have much relations (unless take long illustration and text juxtaposed setting this as vocabulary, otherwise determine blog article for advertisement by vocabulary
Accuracy remain to be discussed), different terms have very big difference in the function and significance of different context.Such as " cheap " once exists
It is common word in advertisement blog article, but may also appear in normal blog article, two blog articles of comparison diagram 4 and Fig. 5, wherein Fig. 4 is normal
Blog article, Fig. 5 are advertisement blog articles.
3, the corresponding very strong community Media of this timeliness of microblogging, the mode renewal speed for collecting vocabulary is slow, and renewal amount is small,
Therefore cheating blog article cannot be found in time.
Summary of the invention
The main purpose of the present invention is to provide a kind of microblogging advertisement blog article recognition methods and devices, it is intended to in microblogging
Advertisement blog article is effectively identified, the valid data recall rate of search engine is improved.
In order to achieve the above object, the present invention proposes a kind of microblogging advertisement blog article recognition methods, comprising:
Using known advertisement blog article and non-advertisement blog article as sample, microblogging filter is created;
Advertisement identification is carried out to current microblogging blog article based on the microblogging filter and bayesian algorithm.
The present invention also proposes a kind of microblogging advertisement blog article identification device, comprising:
Creation module, for creating microblogging filter using known advertisement blog article and non-advertisement blog article as sample;
Identification module, for carrying out advertisement knowledge to current microblogging blog article based on the microblogging filter and bayesian algorithm
Not.
A kind of microblogging advertisement blog article recognition methods proposed by the present invention and device are based on bayesian algorithm, with known advertisement
Blog article and non-advertisement blog article are sample, obtain advertisement or non-advertisement microblogging filter, and judge to work as using the microblogging filter
Preceding microblogging blog article is the probability of advertisement blog article, is thus effectively identified to the advertisement blog article in microblogging, and improve search and draw
The valid data recall rate held up;Further, it is also possible to by learning new advertisement blog article and normal blog article sample (non-advertisement blog article),
It is continuous to update training to update microblogging filter, more to the identification of the advertisement blog article of the stronger microblog media text of real-time
Effectively.
Detailed description of the invention
Fig. 1 is the first existing microblogging example schematic;
Fig. 2 is existing second of microblogging example schematic;
Fig. 3 is the third existing microblogging example schematic;
Fig. 4 is existing 4th kind of microblogging example schematic;
Fig. 5 is existing 5th kind of microblogging example schematic;
Fig. 6 is the flow diagram of microblogging advertisement blog article recognition methods first embodiment of the present invention;
Fig. 7 is in microblogging advertisement blog article recognition methods first embodiment of the present invention with known advertisement blog article and non-advertisement blog article
For sample, a kind of flow diagram of microblogging filter is created;
Fig. 8 is in microblogging advertisement blog article recognition methods first embodiment of the present invention with known advertisement blog article and non-advertisement blog article
For sample, another flow diagram of microblogging filter is created;
Fig. 9 is the flow diagram of microblogging advertisement blog article recognition methods second embodiment of the present invention;
Figure 10 is the structural schematic diagram of microblogging advertisement blog article identification device first embodiment of the present invention;
Figure 11 is the structural schematic diagram of creation module in microblogging advertisement blog article identification device first embodiment of the present invention;
Figure 12 is the structural schematic diagram of identification module in microblogging advertisement blog article identification device first embodiment of the present invention;
Figure 13 is the structural schematic diagram of microblogging advertisement blog article identification device second embodiment of the present invention.
In order to keep technical solution of the present invention clearer, clear, it is described in further detail below in conjunction with attached drawing.
Specific embodiment
The solution of the embodiment of the present invention is mainly: it is based on bayesian algorithm, it is rich with known advertisement blog article and non-advertisement
Text is sample, obtains advertisement or non-advertisement microblogging filter, and judge that current microblogging blog article is wide using the microblogging filter
The probability of blog article is accused, to realize effective identification to the advertisement blog article in microblogging.
As shown in fig. 6, first embodiment of the invention proposes a kind of microblogging advertisement blog article recognition methods, comprising:
Step S101 creates microblogging filter using known advertisement blog article and non-advertisement blog article as sample;
The identification of microblogging advertisement blog article is realized the present invention is based on bayesian theory.
In order to identify that microblogging is advertisement microblogging or normal microblogging (the present embodiment refers to non-advertisement microblogging), the present embodiment is first
Microblogging filter is created by known advertisement blog article and non-advertisement blog article, then by the microblogging filter of creation to current micro-
It is rich to carry out advertisement identification.
Wherein, microblogging filter can be divided into advertisement microblogging filter and non-advertisement microblogging filter, the filtering of advertisement microblogging
Device output result is the probability that current microblogging is advertisement microblogging, and non-advertisement microblogging filter output result is that current microblogging is non-wide
Accuse the probability of microblogging.
It is rich that above-mentioned microblogging filter creation process and the identification process of advertisement blog article respectively correspond the present embodiment microblogging advertisement
The offline segmentation scheme and online segmentation scheme of text identification total system.
In offline segmentation scheme, creation advertisement microblogging filter can choose, also can choose the non-advertisement microblogging of creation
Filter, the two select one, or comprehensive two kinds of selections to implement.
Step S102 carries out advertisement identification to current microblogging blog article based on the microblogging filter and bayesian algorithm.
After the completion of microblogging filter creation, into the online segmentation scheme of the present embodiment, current microblogging blog article is known
Not, judge that current microblogging blog article be advertisement blog article is also non-advertisement blog article.
Specifically, participle is carried out to current microblogging blog article first and vector is converted, then input the vector being converted to
In the microblogging filter created into step S101, and it is rich to combine bayesian algorithm and total probability formula to calculate current microblogging
Text is the probability of advertisement blog article.
For advertisement microblogging filter, then directly exporting result is the probability that current microblogging is advertisement microblogging, for non-wide
Microblogging filter is accused, output result is the probability that current microblogging is non-advertisement microblogging, then converts current microblogging for this result
For the probability of advertisement microblogging.
Later, the probability that the current microblogging blog article of acquisition is advertisement blog article is compared with preset threshold value, if
More than predetermined threshold, then determine the microblogging blog article for advertisement blog article.
Wherein, to the setting of predetermined threshold, it can be based on known advertisement blog article collection and non-advertisement blog article collection counting statistics,
And obtain the predetermined threshold.
More specifically, as shown in fig. 7, for creating advertisement blog article filter, above-mentioned steps S101, with known advertisement
The step of blog article and non-advertisement blog article are sample, create microblogging filter may include:
Step S1010, collects several known advertisement blog articles and non-advertisement blog article separately constitutes advertisement blog article collection and non-advertisement
Blog article collection, as sample;
Step S1011, each blog article concentrated to the advertisement blog article collection and non-advertisement blog article segment, and obtain every
The word sequence of one blog article;
Step S1012 is calculated and is obtained the probability that the advertisement blog article concentrates each word to concentrate appearance in the advertisement blog article;
It calculates and obtains the probability that the non-advertisement blog article concentrates each word to concentrate appearance in the non-advertisement blog article;
Step S1013, according to the probability obtained is calculated, advertisement blog article collection described in correspondence establishment and non-advertisement blog article are concentrated every
One word and the word concentrate the corresponding relationship Hash table of the probability occurred in the advertisement blog article collection or non-advertisement blog article;
Step S1014 is based on the corresponding relationship Hash table, establishes advertisement blog article according to bayesian algorithm and concentrates, is based on
There is the probability of advertisement blog article and the mapping relations Hash table of the word in corresponding word, obtains advertisement blog article filter.
As shown in figure 8, for creating non-advertisement blog article filter, above-mentioned steps S101, with known advertisement blog article and non-
The step of advertisement blog article is sample, creates microblogging filter is similar to above-mentioned step shown in Fig. 7, the difference is that, this
Above-mentioned step S1014 shown in fig. 7 is substituted with step S1015 in example, in which:
Step S1015 is based on the corresponding relationship Hash table, establishes non-advertisement blog article according to bayesian algorithm and concentrates, base
There is the probability of non-advertisement blog article and the mapping relations Hash table of the word in corresponding word, obtains non-advertisement blog article filter.
The specific implementation process of the present embodiment is elaborated with example below:
Segmentation scheme (for creating advertisement blog article filter) offline for the present embodiment:
1, the normal blog article collection (non-advertisement blog article collection) and advertisement blog article collection for accumulating magnanimity, are divided into SET_GOOD, SET_
BAD。
2, corresponding normal blog article collection and advertisement blog article collection, the word sequence that will be obtained after any one blog article D participle can be with
It is indicated with vector, i.e. D=(W1,W2,...Wn), n is the number after blog article participle, therefore can be by SET_GOOD and SET_BAD
Regard a series of single contaminations as.Any word W in SET_GOODiIt is expressed as Wi∈ SET_GOOD, from SET_GOOD optionally
One word and word WiProbability (i.e. WiIn the probability that SET_GOOD occurs) it is expressed as Pi_good, thenOrWherein, TF (Wi) it is word WiCorresponding word frequency, N are the number of not repetitor in SET_GOOD;
P can be calculated with same methodi_bad(i.e. WiThe probability occurred in SET_BAD).
3, according to the calculated result of above-mentioned steps 2, following corresponding relationship is generated respectively for SET_GOOD and SET_BAD
Hash table:
Wherein, GoodHashtable indicates any word W in SET_GOODiOccur in SET_GOOD with the word
Probability Pi_goodCorresponding relationship, BadHashtable indicate SET_BAD in any word WiGo out in SET_BAD with the word
Existing Probability pi_badCorresponding relationship.
4, for any one blog article, the vector after participle is expressed as D=(W1,W2,...Wn), n is blog article participle
Number afterwards, if P (i_bad | Wi) it is word W occur in the blog articleiWhen, blog article is the probability of advertisement, is breathed out using above-mentioned corresponding relationship
Uncommon table Goodhashtable and Badhashtable, according to bayesian algorithm can calculate P (i_bad | Wi) value, then
For each of SET_BAD word, i.e. Wi∈ SET_BAD establishes the Hash table of following mapping relations and storage:
Wherein, Bad Pr obabilityH ashtable is advertisement blog article filter alleged by the present embodiment, is indicated
Any word W in SET_BADi, work as WiWhen appearing in any one blog article D, blog article D is the probability of advertisement.
Segmentation scheme online for the present embodiment:
For any one blog article, the vector after participle is expressed as D=(W1,W2,...Wn), n is after the blog article segments
Number, if the blog article be advertisement probability be expressed as P (bad | W1,W2...Wn), the Bad obtained using above-mentioned offline segmentation scheme
ProbabilityHashtable(advertisement blog article filter), it is rich that this can be calculated according to bayesian algorithm and total probability formula
Text be advertisement blog article probability P (bad | W1,W2...Wn), when P (bad | W1,W2...Wn) be more than some threshold θ when, i.e., it is believed that
The blog article is advertisement blog article.
Wherein, for the setting of threshold θ, to the advertisement blog article collection SET_BAD and normal blog article collection SET_ accumulated
GOOD, the mode that online processing scheme can be taken similar calculate the probability that each blog article is advertisement blog article, and observation statistics can
To obtain threshold θ.Theoretically, when a blog article be advertisement blog article probability P (bad | W1,W2...Wn) be greater than 0.5 when, illustrate this
Blog article tendency is advertisement blog article.
Wherein, 0≤θ≤1, θ are arranged bigger, then the accuracy rate judged is higher, and advertisement blog article recall rate is lower;On the contrary, θ
What is be arranged is smaller, then the accuracy rate judged is lower, and advertisement blog article recall rate is higher, and therefore, it is accurate to take into account according to the actual situation
θ is arranged in rate and recall rate.
In addition, in offline segmentation scheme, if creating non-advertisement blog article filter, the specific implementation process is as follows:
For any one blog article, the vector after participle is expressed as D=(W1,W2,...Wn), n is after the blog article segments
Number, if P (i_good | Wi) it is word W occur in the blog articleiWhen, blog article is the probability of normal blog article, using above-mentioned offline portion
Goodhashtable obtained in offshoot program step 3 and Badhashtable can calculate P (i_ according to bayesian algorithm
good|Wi) value, then for each of SET_GOOD word, i.e. Wi∈ SET_GOOD establishes the Kazakhstan of following mapping relations
Uncommon table simultaneously stores:
Wherein, Good ProbabilityHashtable is non-advertisement blog article filter alleged by the present embodiment, is indicated
Any word W in SET_GOODi, work as WiWhen appearing in any one blog article D, blog article D is the probability of normal blog article.
In addition, the present embodiment can also use following optimisation strategy:
In above-mentioned offline statistics and on-line prediction, when segmenting to corresponding blog article, removal does not have representativeness
Word (such as stop words);Alternatively, choosing representative word (such as noun, verb);Or combine above two situation.
In addition, following consideration can be increased in online segmentation scheme, for any one blog article, after participle
Vector is expressed as D=(W1,W2,...Wn), n is number after blog article participle, when the probability P that one blog article of calculating is advertisement blog article
(bad|W1,W2...Wn) when, if some word WiBoth it had not appeared in Bad ProbabilityHashtable, had not had yet
Good ProbabilityHashtable is appeared in, illustrates that current filter does not have recognition capability to the word, therefore can
To ignore effect of this word to result, to reduce erroneous judgement.
Further, when being segmented to blog article, segmentation sequence N-gram (N member) can also be changed.Some in microblogging
Word all often occurs in advertisement blog article and normal blog article, can be divided into single word, such as " robbing ", " only " word in participle,
Individually these words do not have identification (being advertisement or normal blog article) ability well, but when these words and its context
Good recognition capability, such as " crazy to rob ", " only selling " word will be had after word combination, the blog article containing these words is the probability of advertisement
It is very big.Therefore in offline segmentation scheme and online segmentation scheme, to the obtained single word of participle carry out 2 yuan or it is polynary up and down
Text combination, then carries out subsequent calculation processing, can be reduced the scale of advertisement blog article filter in this way, improve the accurate of differentiation
Property.
In addition, can be combined with certain rule when identifying using microblogging filter to current microblogging blog article
Differentiate.
Specifically, although the present embodiment above scheme identifies accuracy rate (90%+) with higher to advertisement blog article and recalls
Rate (90%+), but in order to reduce erroneous judgement bring injury, some abundant in content advertisements can be let off, to a certain extent to subtract
Normal text is gently judged to the possible injury of advertisement, can such as think that the microblogging with video can be in conjunction with certain rule
Think it is non-advertisement blog article, and those be identified as with the blog article of advertisement by advertisement blog article filter, if it does not contain it is any
Apparent advertising words can normally be recalled this blog article with advertisement property it may be considered that the blog article is weak advertisement.
Whether the present embodiment can be that advertisement blog article is effectively identified to microblogging blog article, in microblogging through the above scheme
When full-text search, advertisement sticker is recalled according to certain strategy (do not recall or selectivity is recalled), search engine can be improved
Valid data recall rate, and promote user's search experience.
Compared with prior art, the present embodiment has the advantage that
1, the identification of microblogging advertisement blog article is carried out based on bayesian theory, using known advertisement blog article and non-advertisement blog article as sample
This, obtains microblogging filter, and uses it to the probability that judgement is newly advertisement blog article into blog article.The program in the prior art, base
Different in the identification technology of cheating vocabulary, the present invention is counted based on a large amount of data, to given data set
It practises, obtains differentiating the difference between advertisement blog article and non-advertisement blog article, which is to be indicated with probability, and can apply automatically
Into later detection, maintenance automation improves a lot to recall rate.
2, all the elements of blog article are analyzed, some keyword not only therein, such as: comprising " cheap ",
The blog article of " selling " printed words is not necessarily advertisement blog article, if using keyword filtration technology in the prior art, it is clear that be difficult to reach
To ideal effect.And the method for the present invention had both considered the probability that these words occur in advertisement blog article, it is contemplated that it is just
Probability in Chang Bowen is judged by comprehensively considering these factors, can hold the balance between " good " and " bad ", quasi-
True rate is substantially better than non-1 i.e. 0 static filtering technology.
3, the microblogging filter of the present embodiment creation is difficult to be spoofed, although advertisement blog article sends master-hand and can pass through reduction
Advertisement vocabulary (such as " cheap ", " price ") is added to bypass one into some good vocabulary (such as news, hot word) in blog article
As blog article Content inspection, but since advertisement blog article filter has personalized color, to successfully bypass its inspection
It looks into, the preference for wouling have to write each bloger microblogging is studied, and this hardly has feasibility.In addition, by " special
Very " the study of blog article training set, available " special " filter of microblogging advertisement blog article recognition methods based on Bayes,
Therefore it is directed to a certain series advertisements blog article, can be efficiently identified.
As shown in figure 9, second embodiment of the invention proposes a kind of microblogging advertisement blog article recognition methods, implement above-mentioned first
On the basis of example, after above-mentioned steps S102, further includes:
Step S103 re-starts study, updates the microblogging mistake according to the advertisement blog article and non-advertisement blog article identified
Filter.
The difference of the present embodiment and above-described embodiment is that the present embodiment can also be according to according to the advertisement blog article identified
With non-advertisement blog article, study is re-started, periodically updates microblogging filter.
Specifically, the advertisement blog article and normal blog article identified according to online segmentation scheme is repeated every certain period
Offline segmentation scheme trains new Bad ProbabilityHashtable(advertisement blog article filter) and Good
The non-advertisement blog article filter of ProbabilityHashtable(), it then updates and arrives online part.
The present embodiment has adaptation function based on the microblogging advertisement blog article recognition methods of Bayes, by learning newly wide
Blog article and normal blog article sample are accused, continuous to update training, microblogging filter also constantly obtains self refresh.When new blog article reaches
When, newest advertisement blog article can be fought using the advertisement blog article filter or non-advertisement blog article filter that newly obtain, it is right
It is more efficient in the advertisement identification of the stronger media text of this real-time of microblogging.
As shown in Figure 10, first embodiment of the invention proposes a kind of microblogging advertisement blog article identification device, comprising: creation module
201 and identification module 202, in which:
Creation module 201, for creating microblogging filter using known advertisement blog article and non-advertisement blog article as sample;
Identification module 202, for carrying out advertisement to current microblogging blog article based on the microblogging filter and bayesian algorithm
Identification.
The identification of microblogging advertisement blog article is realized the present invention is based on bayesian theory.
In order to identify that microblogging is advertisement microblogging or normal microblogging (the present embodiment refers to non-advertisement microblogging), the present embodiment is first
Microblogging filter is created by known advertisement blog article and non-advertisement blog article by creation module 201, is then led to by identification module 202
The microblogging filter for crossing creation carries out advertisement identification to current microblogging.
Wherein, microblogging filter can be divided into advertisement microblogging filter and non-advertisement microblogging filter, the filtering of advertisement microblogging
Device output result is the probability that current microblogging is advertisement microblogging, and non-advertisement microblogging filter output result is that current microblogging is non-wide
Accuse the probability of microblogging.
It is rich that above-mentioned microblogging filter creation process and the identification process of advertisement blog article respectively correspond the present embodiment microblogging advertisement
The offline segmentation scheme and online segmentation scheme of text identification total system.
In offline segmentation scheme, creation advertisement microblogging filter can choose, also can choose the non-advertisement microblogging of creation
Filter, the two select one, or comprehensive two kinds of selections to implement.
After the completion of microblogging filter creation, into the online segmentation scheme of the present embodiment, current microblogging blog article is known
Not, judge that current microblogging blog article be advertisement blog article is also non-advertisement blog article.
Specifically, participle is carried out to current microblogging blog article first and vector is converted, then input the vector being converted to
Into the microblogging filter created, and it is rich for advertisement to combine bayesian algorithm and total probability formula to calculate current microblogging blog article
The probability of text.
For advertisement microblogging filter, then directly exporting result is the probability that current microblogging is advertisement microblogging, for non-wide
Microblogging filter is accused, output result is the probability that current microblogging is non-advertisement microblogging, then converts current microblogging for this result
For the probability of advertisement microblogging.
Later, the probability that the current microblogging blog article of acquisition is advertisement blog article is compared with preset threshold value, if
More than predetermined threshold, then determine the microblogging blog article for advertisement blog article.
Wherein, to the setting of predetermined threshold, it can be based on known advertisement blog article collection and non-advertisement blog article collection counting statistics,
And obtain the predetermined threshold.
More specifically, as shown in figure 11, for creating advertisement blog article filter, the creation module 201 be can wrap
Include: collector unit 2011, participle unit 2012, the first computing unit 2013, first establishing unit 2014 and second are established single
Member 2015, in which:
Collector unit 2011, for collect several known advertisement blog articles and non-advertisement blog article separately constitute advertisement blog article collection and
Non- advertisement blog article collection, as sample;
Participle unit 2012 is divided for each blog article to the advertisement blog article collection and non-advertisement blog article concentration
Word obtains the word sequence of each blog article;
First computing unit 2013 obtains each word of the advertisement blog article concentration in advertisement blog article concentration for calculating
The probability of appearance;It calculates and obtains the probability that the non-advertisement blog article concentrates each word to concentrate appearance in the non-advertisement blog article;
First establishing unit 2014, for according to calculating the probability obtained, advertisement blog article collection described in correspondence establishment and non-wide
Accusing blog article concentrates each word and the word to concentrate the corresponding of the probability occurred to close in the advertisement blog article collection or non-advertisement blog article
It is Hash table;
Second establishes unit 2015, and for being based on the corresponding relationship Hash table, it is rich to establish advertisement according to bayesian algorithm
In collected works, the probability of advertisement blog article and the mapping relations Hash table of the word are occurred based on corresponding word, obtain advertisement blog article mistake
Filter.
When creating non-advertisement blog article filter, described second, which establishes unit 2015, is also used to:
Based on the corresponding relationship Hash table, non-advertisement blog article is established according to bayesian algorithm and is concentrated, based on corresponding word
There is the probability of non-advertisement blog article and the mapping relations Hash table of the word, obtains non-advertisement blog article filter.
As shown in figure 12, the identification module 202 may include: participle converting unit 2021, the second computing unit 2022
And judging unit 2023, in which:
Converting unit 2021 is segmented, for carrying out participle and vector conversion to current microblogging blog article;
Second computing unit 2022 for inputting the vector being converted in the microblogging filter, and combines pattra leaves
This algorithm and total probability formula calculate the probability that current microblogging blog article is advertisement blog article;
Judging unit 2023, if the probability for current microblogging blog article to be advertisement blog article is more than predetermined threshold, determining should
Microblogging blog article is advertisement blog article.
The specific implementation process of the present embodiment is elaborated with example below:
Segmentation scheme (for creating advertisement blog article filter) offline for the present embodiment:
1, the normal blog article collection (non-advertisement blog article collection) and advertisement blog article collection for accumulating magnanimity, are divided into SET_GOOD, SET_
BAD。
2, corresponding normal blog article collection and advertisement blog article collection, the word sequence that will be obtained after any one blog article D participle can be with
It is indicated with vector, i.e. D=(W1,W2... Wn), n is the number after blog article participle, therefore can be by SET_GOOD and SET_BAD
Regard a series of single contaminations as.Any word W in SET_GOODiIt is expressed as Wi∈ SET_GOOD, from SET_GOOD optionally
One word and word WiProbability (i.e. WiIn the probability that SET_GOOD occurs) it is expressed as Pi_good, thenOrWherein, TF (Wi) it is word WiCorresponding word frequency, N are the number of not repetitor in SET_GOOD;
P can be calculated with same methodi_bad(i.e. WiThe probability occurred in SET_BAD).
3, according to the calculated result of above-mentioned steps 2, following corresponding relationship is generated respectively for SET_GOOD and SET_BAD
Hash table:
Wherein, GoodHashtable indicates any word W in SET_GOODiOccur in SET_GOOD with the word
Probability Pi_goodCorresponding relationship, BadHashtable indicate SET_BAD in any word WiGo out in SET_BAD with the word
Existing Probability pi_badCorresponding relationship.
4, for any one blog article, the vector after participle is expressed as D=(W1,W2,...Wn), n is blog article participle
Number afterwards, if P (i_bad | Wi) it is word W occur in the blog articleiWhen, blog article is the probability of advertisement, is breathed out using above-mentioned corresponding relationship
Uncommon table Goodhashtable and Badhashtable, according to bayesian algorithm can calculate P (i_bad | Wi) value, then
For each of SET_BAD word, i.e. Wi∈ SET_BAD establishes the Hash table of following mapping relations and storage:
Wherein, Bad ProbabilityHashtable is advertisement blog article filter alleged by the present embodiment, indicates SET_
Any word W in BADi, work as WiWhen appearing in any one blog article D, blog article D is the probability of advertisement.
Segmentation scheme online for the present embodiment:
For any one blog article, the vector after participle is expressed as D=(W1,W2,...Wn), n is after the blog article segments
Number, if the blog article be advertisement probability be expressed as P (bad | W1,W2...Wn), it is obtained using above-mentioned offline segmentation scheme
BadProbabilityHashtable(advertisement blog article filter), it can be calculated according to bayesian algorithm and total probability formula
The blog article be advertisement blog article probability P (bad | W1,W2...Wn), when P (bad | W1,W1...Wn) be more than some threshold θ when
Think that the blog article is advertisement blog article.
Wherein, for the setting of threshold θ, to the advertisement blog article collection SET_BAD and normal blog article collection SET_ accumulated
GOOD, the mode that online processing scheme can be taken similar calculate the probability that each blog article is advertisement blog article, and observation statistics can
To obtain threshold θ.Theoretically, when a blog article be advertisement blog article probability P (bad | W1,W2...Wn) be greater than 0.5 when, illustrate this
Blog article tendency is advertisement blog article.
Wherein, 0≤θ≤1, θ are arranged bigger, then the accuracy rate judged is higher, and advertisement blog article recall rate is lower;On the contrary, θ
What is be arranged is smaller, then the accuracy rate judged is lower, and advertisement blog article recall rate is higher, and therefore, it is accurate to take into account according to the actual situation
θ is arranged in rate and recall rate.
In addition, in offline segmentation scheme, if creating non-advertisement blog article filter, the specific implementation process is as follows:
For any one blog article, the vector after participle is expressed as D=(W1,W2,...Wn), n is after the blog article segments
Number, if P (i_good | Wi) it is word W occur in the blog articleiWhen, blog article is the probability of normal blog article, using above-mentioned offline portion
Goodhashtable obtained in offshoot program step 3 and Badhashtable can calculate P (i_ according to bayesian algorithm
good|Wi) value, then for each of SET_GOOD word, i.e. Wi∈ SET_GOOD establishes the Kazakhstan of following mapping relations
Uncommon table simultaneously stores:
Wherein, Good ProbabilityHashtable is non-advertisement blog article filter alleged by the present embodiment, is indicated
Any word W in SET_GOODi, work as WiWhen appearing in any one blog article D, blog article D is the probability of normal blog article.
In addition, the present embodiment can also use following optimisation strategy:
In above-mentioned offline statistics and on-line prediction, when segmenting to corresponding blog article, removal does not have representativeness
Word (such as stop words);Alternatively, choosing representative word (such as noun, verb);Or combine above two situation.
In addition, following consideration can be increased in online segmentation scheme, for any one blog article, after participle
Vector is expressed as D=(W1,W2,...Wn), n is number after blog article participle, when the probability P that one blog article of calculating is advertisement blog article
(bad|W1,W2...Wn) when, if some word WiBoth it had not appeared in Bad ProbabilityHashtable, had not had yet
Good ProbabilityH ashtable is appeared in, illustrates that current filter does not have recognition capability to the word, therefore
Effect of this word to result can be ignored, to reduce erroneous judgement.
Further, when being segmented to blog article, segmentation sequence N-gram (N member) can also be changed.Some in microblogging
Word all often occurs in advertisement blog article and normal blog article, can be divided into single word, such as " robbing ", " only " word in participle,
Individually these words do not have identification (being advertisement or normal blog article) ability well, but when these words and its context
Good recognition capability, such as " crazy to rob ", " only selling " word will be had after word combination, the blog article containing these words is the probability of advertisement
It is very big.Therefore in offline segmentation scheme and online segmentation scheme, to the obtained single word of participle carry out 2 yuan or it is polynary up and down
Text combination, then carries out subsequent calculation processing, can be reduced the scale of advertisement blog article filter in this way, improve the accurate of differentiation
Property.
In addition, can be combined with certain rule when identifying using microblogging filter to current microblogging blog article
Differentiate.
Specifically, although the present embodiment above scheme identifies accuracy rate (90%+) with higher to advertisement blog article and recalls
Rate (90%+), but in order to reduce erroneous judgement bring injury, some abundant in content advertisements can be let off, to a certain extent to subtract
Normal text is gently judged to the possible injury of advertisement, can such as think that the microblogging with video can be in conjunction with certain rule
Think it is non-advertisement blog article, and those be identified as with the blog article of advertisement by advertisement blog article filter, if it does not contain it is any
Apparent advertising words can normally be recalled this blog article with advertisement property it may be considered that the blog article is weak advertisement.
Whether the present embodiment can be that advertisement blog article is effectively identified to microblogging blog article, in microblogging through the above scheme
When full-text search, advertisement sticker is recalled according to certain strategy (do not recall or selectivity is recalled), search engine can be improved
Valid data recall rate, and promote user's search experience.
Compared with prior art, the present embodiment has the advantage that
1, the identification of microblogging advertisement blog article is carried out based on bayesian theory, using known advertisement blog article and non-advertisement blog article as sample
This, obtains microblogging filter, and uses it to the probability that judgement is newly advertisement blog article into blog article.The program in the prior art, base
Different in the identification technology of cheating vocabulary, the present invention is counted based on a large amount of data, to given data set
It practises, obtains differentiating the difference between advertisement blog article and non-advertisement blog article, which is to be indicated with probability, and can apply automatically
Into later detection, maintenance automation improves a lot to recall rate.
2, all the elements of blog article are analyzed, some keyword not only therein, such as: comprising " cheap ",
The blog article of " selling " printed words is not necessarily advertisement blog article, if using keyword filtration technology in the prior art, it is clear that be difficult to reach
To ideal effect.And the method for the present invention had both considered the probability that these words occur in advertisement blog article, it is contemplated that it is just
Probability in Chang Bowen is judged by comprehensively considering these factors, can hold the balance between " good " and " bad ", quasi-
True rate is substantially better than non-1 i.e. 0 static filtering technology.
3, the microblogging filter of the present embodiment creation is difficult to be spoofed, although advertisement blog article sends master-hand and can pass through reduction
Advertisement vocabulary (such as " cheap ", " price ") is added to bypass one into some good vocabulary (such as news, hot word) in blog article
As blog article Content inspection, but since advertisement blog article filter has personalized color, to successfully bypass its inspection
It looks into, the preference for wouling have to write each bloger microblogging is studied, and this hardly has feasibility.In addition, by " special
Very " the study of blog article training set, available " special " filter of microblogging advertisement blog article recognition methods based on Bayes,
Therefore it is directed to a certain series advertisements blog article, can be efficiently identified.
As shown in figure 13, second embodiment of the invention proposes a kind of microblogging advertisement blog article identification device, further includes:
Update module 203 updates institute for re-starting study according to the advertisement blog article and non-advertisement blog article identified
State microblogging filter.
The difference of the present embodiment and above-described embodiment is that the present embodiment can also be according to according to the advertisement blog article identified
With non-advertisement blog article, study is re-started, periodically updates microblogging filter.
Specifically, the advertisement blog article and normal blog article identified according to online segmentation scheme is repeated every certain period
Offline segmentation scheme trains new Bad ProbabilityH ashtable(advertisement blog article filter) and Good
The non-advertisement blog article filter of ProbabilityH ashtable(), it then updates and arrives online part.
The present embodiment has adaptation function based on the microblogging advertisement blog article recognition methods of Bayes, by learning newly wide
Blog article and normal blog article sample are accused, continuous to update training, microblogging filter also constantly obtains self refresh.When new blog article reaches
When, newest advertisement blog article can be fought using the advertisement blog article filter or non-advertisement blog article filter that newly obtain, it is right
It is more efficient in the advertisement identification of the stronger media text of this real-time of microblogging.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all utilizations
Equivalent structure made by description of the invention and accompanying drawing content or process transformation, are applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (14)
1. a kind of microblogging advertisement blog article recognition methods characterized by comprising
Using known advertisement blog article and non-advertisement blog article as sample, microblogging filter is created;
Advertisement identification is carried out to current microblogging blog article based on the microblogging filter and bayesian algorithm;
Wherein, described after carrying out advertisement identification to current microblogging blog article based on the microblogging filter and bayesian algorithm
Method further include: according to the advertisement blog article and non-advertisement blog article identified, re-start study, update the microblogging filter;
Wherein, described using known advertisement blog article and non-advertisement blog article as sample, if the step of creation microblogging filter includes: to collect
Dry known advertisement blog article and non-advertisement blog article separately constitute advertisement blog article collection and non-advertisement blog article collection, as sample;To described wide
It accuses each blog article that blog article collection and non-advertisement blog article are concentrated to be segmented, obtains the word sequence of each blog article;Calculating obtains
Take the advertisement blog article that each word is concentrated to concentrate the probability occurred in the advertisement blog article;It calculates and obtains the non-advertisement blog article collection
In each word the probability occurred is concentrated in the non-advertisement blog article;According to the probability obtained is calculated, advertisement described in correspondence establishment is rich
Collected works and non-advertisement blog article concentrate each word and the word to concentrate the corresponding relationship of the probability occurred to breathe out in the advertisement blog article
Uncommon table or the word concentrate the corresponding relationship Hash table of the probability occurred in non-advertisement blog article;Based on the word in advertisement blog article
The corresponding relationship Hash table of the probability occurred is concentrated to concentrate the corresponding relationship of the probability occurred in non-advertisement blog article with the word
Hash table is established advertisement blog article according to bayesian algorithm and is concentrated, and the probability of advertisement blog article and the word occurs based on corresponding word
Mapping relations Hash table, obtain advertisement blog article filter;Or
It is described using known advertisement blog article and non-advertisement blog article as sample, create microblogging filter the step of include: collect it is several
Know that advertisement blog article and non-advertisement blog article separately constitute advertisement blog article collection and non-advertisement blog article collection, as sample;It is rich to the advertisement
The each blog article that collected works and non-advertisement blog article are concentrated is segmented, and the word sequence of each blog article is obtained;It calculates and obtains institute
Stating advertisement blog article concentrates each word to concentrate the probability occurred in the advertisement blog article;It calculates and obtains the non-advertisement blog article concentration often
One word concentrates the probability occurred in the non-advertisement blog article;According to calculating the probability obtained, advertisement blog article collection described in correspondence establishment
The corresponding relationship Hash table for the probability for concentrating each word and the word to occur in advertisement blog article concentration with non-advertisement blog article
Or the word concentrates the corresponding relationship Hash table of the probability occurred in non-advertisement blog article;It is concentrated based on the word in advertisement blog article
The corresponding relationship Hash table and the word of the probability of appearance concentrate the corresponding relationship Hash of the probability occurred in non-advertisement blog article
Table is established non-advertisement blog article according to bayesian algorithm and is concentrated, and the probability of non-advertisement blog article and the word occurs based on corresponding word
Mapping relations Hash table, obtain non-advertisement blog article filter.
2. the method according to claim 1, wherein the microblogging filter and bayesian algorithm of being based on is to current
Microblogging blog article carry out advertisement identification the step of include:
Participle and vector conversion are carried out to current microblogging blog article;
The vector being converted to is inputted in the microblogging filter, and bayesian algorithm and total probability formula is combined to calculate and work as
Preceding microblogging blog article is the probability of advertisement blog article;
If the probability that current microblogging blog article is advertisement blog article is more than predetermined threshold, determine the microblogging blog article for advertisement blog article.
3. according to the method described in claim 2, it is characterized in that, the step of setting the predetermined threshold includes:
The predetermined threshold is obtained based on known advertisement blog article collection and non-advertisement blog article collection counting statistics.
4. the method according to claim 1, which is characterized in that further include:
When segmenting to corresponding blog article, removal does not meet the word of predetermined condition and/or chooses specific word.
5. according to the method described in claim 4, it is characterized by further comprising:
After being segmented to corresponding blog article, polynary context is carried out to the word that participle obtains and is combined.
6. according to the method described in claim 2, it is characterized in that, the microblogging filter and bayesian algorithm of being based on is to current
The step of microblogging blog article progress advertisement identification, further comprises:
When calculating the probability that current microblogging blog article is advertisement blog article, if described in a word does not appear in current microblogging blog article
In microblogging filter, then ignores and calculate the word.
7. according to the method described in claim 2, it is characterized in that, the microblogging filter and bayesian algorithm of being based on is to current
The step of microblogging blog article progress advertisement identification, further comprises:
Identify whether current microblogging blog article is advertisement blog article in conjunction with pre-defined rule.
8. a kind of microblogging advertisement blog article identification device characterized by comprising
Creation module, for creating microblogging filter using known advertisement blog article and non-advertisement blog article as sample;
Identification module, for carrying out advertisement identification to current microblogging blog article based on the microblogging filter and bayesian algorithm;
Wherein, described device further include: update module, for according to the advertisement blog article and non-advertisement blog article that identify, again into
Row study, updates the microblogging filter;
Wherein, the creation module includes: collector unit, for collecting several known advertisement blog articles and non-advertisement blog article group respectively
At advertisement blog article collection and non-advertisement blog article collection, as sample;Participle unit, for the advertisement blog article collection and non-advertisement blog article
The each blog article concentrated is segmented, and the word sequence of each blog article is obtained;First computing unit obtains institute for calculating
Stating advertisement blog article concentrates each word to concentrate the probability occurred in the advertisement blog article;It calculates and obtains the non-advertisement blog article concentration often
One word concentrates the probability occurred in the non-advertisement blog article;First establishing unit, for according to the probability obtained is calculated, correspondence to be built
It founds the advertisement blog article collection and non-advertisement blog article concentrates each word and the word to concentrate the probability occurred in the advertisement blog article
Corresponding relationship Hash table or the word non-advertisement blog article concentrate occur probability corresponding relationship Hash table;Second establishes list
Member, for concentrating the corresponding relationship Hash table of the probability occurred with the word in non-advertisement in advertisement blog article based on the word
Blog article concentrates the corresponding relationship Hash table of the probability occurred, establishes advertisement blog article according to bayesian algorithm and concentrates, based on corresponding single
There is the probability of advertisement blog article and the mapping relations Hash table of the word in word, obtains advertisement blog article filter.
9. device according to claim 8, which is characterized in that when creating non-advertisement blog article filter, described second is built
Vertical unit is also used to:
Concentrate the corresponding relationship Hash table of the probability occurred and the word rich in non-advertisement in advertisement blog article based on the word
The corresponding relationship Hash table of the probability occurred in collected works is established non-advertisement blog article according to bayesian algorithm and is concentrated, based on corresponding single
There is the probability of non-advertisement blog article and the mapping relations Hash table of the word in word, obtains non-advertisement blog article filter.
10. device according to claim 8, which is characterized in that the identification module includes:
Converting unit is segmented, for carrying out participle and vector conversion to current microblogging blog article;
Second computing unit, for the vector being converted to be inputted in the microblogging filter, and combine bayesian algorithm and
Total probability formula calculates the probability that current microblogging blog article is advertisement blog article;
Judging unit determines the microblogging blog article the probability for if current microblogging blog article to be advertisement blog article is more than predetermined threshold
For advertisement blog article.
11. device according to claim 8 or claim 9, which is characterized in that the participle unit is also used to corresponding blog article
When being segmented, removal does not meet the word of predetermined condition and/or chooses specific word;And/or divide to corresponding blog article
After word, polynary context is carried out to the word that participle obtains and is combined.
12. device according to claim 10, which is characterized in that the participle converting unit is also used to rich in current microblogging
When text is segmented, removal does not meet the word of predetermined condition and/or chooses specific word;And/or to current microblogging blog article into
After row participle, polynary context is carried out to the word that participle obtains and is combined.
13. device according to claim 10, which is characterized in that second computing unit is also used to current micro- in calculating
When rich blog article is the probability of advertisement blog article, if a word does not appear in the microblogging filter in current microblogging blog article,
Ignore and calculates the word.
14. device according to claim 10, which is characterized in that the judging unit is also used to that pre-defined rule is combined to identify
Whether current microblogging blog article is advertisement blog article.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310046176.6A CN103970801B (en) | 2013-02-05 | 2013-02-05 | Microblogging advertisement blog article recognition methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310046176.6A CN103970801B (en) | 2013-02-05 | 2013-02-05 | Microblogging advertisement blog article recognition methods and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103970801A CN103970801A (en) | 2014-08-06 |
CN103970801B true CN103970801B (en) | 2019-03-26 |
Family
ID=51240313
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310046176.6A Active CN103970801B (en) | 2013-02-05 | 2013-02-05 | Microblogging advertisement blog article recognition methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103970801B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105989136A (en) * | 2015-02-27 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Web page information recognition method and device |
CN106294346A (en) * | 2015-05-13 | 2017-01-04 | 厦门美柚信息科技有限公司 | A kind of forum postings recognition methods and device |
CN105068986B (en) * | 2015-07-15 | 2018-03-16 | 浙江理工大学 | The comment spam filter method of corpus is updated based on bidirectional iteration and automatic structure |
CN106909669B (en) * | 2017-02-28 | 2020-02-11 | 北京时间股份有限公司 | Method and device for detecting promotion information |
CN108632639B (en) * | 2017-03-23 | 2020-09-25 | 北京小唱科技有限公司 | Video type judgment method and server |
CN107688564A (en) * | 2017-08-31 | 2018-02-13 | 平安科技(深圳)有限公司 | Subject of news Corporate Identity method, electronic equipment and computer-readable recording medium |
CN107729401A (en) * | 2017-09-21 | 2018-02-23 | 北京百度网讯科技有限公司 | High quality articles method for digging, device and storage medium based on artificial intelligence |
CN110362680B (en) * | 2019-06-14 | 2021-07-13 | 西安交通大学 | A method for soft advertising detection and advertisement extraction based on graph network structure analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101159704A (en) * | 2007-10-23 | 2008-04-09 | 浙江大学 | Anti-spam method based on micro-content similarity |
CN102208992A (en) * | 2010-06-13 | 2011-10-05 | 天津海量信息技术有限公司 | Internet-facing filtration system of unhealthy information and method thereof |
CN102591983A (en) * | 2012-01-10 | 2012-07-18 | 凤凰在线(北京)信息技术有限公司 | Advertisement filter system and advertisement filter method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7249162B2 (en) * | 2003-02-25 | 2007-07-24 | Microsoft Corporation | Adaptive junk message filtering system |
-
2013
- 2013-02-05 CN CN201310046176.6A patent/CN103970801B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101159704A (en) * | 2007-10-23 | 2008-04-09 | 浙江大学 | Anti-spam method based on micro-content similarity |
CN102208992A (en) * | 2010-06-13 | 2011-10-05 | 天津海量信息技术有限公司 | Internet-facing filtration system of unhealthy information and method thereof |
CN102591983A (en) * | 2012-01-10 | 2012-07-18 | 凤凰在线(北京)信息技术有限公司 | Advertisement filter system and advertisement filter method |
Also Published As
Publication number | Publication date |
---|---|
CN103970801A (en) | 2014-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103970801B (en) | Microblogging advertisement blog article recognition methods and device | |
CN109189901B (en) | Method for automatically discovering new classification and corresponding corpus in intelligent customer service system | |
CN103823844B (en) | Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service | |
CN102945290B (en) | Hot microblog topic excavating gear and method | |
CN103389979B (en) | Recommend system, the device and method of classified lexicon in input method | |
CN108305180B (en) | Friend recommendation method and device | |
CN102982157A (en) | Device and method used for mining microblog hot topics | |
CN102682120B (en) | Method and device for acquiring essential article commented on network | |
CN103336766A (en) | Short text garbage identification and modeling method and device | |
CN104866554B (en) | A kind of individuation search method and system based on socialization mark | |
CN109005382A (en) | A kind of video acquisition management method and server | |
CN105528432B (en) | A method and device for generating a digital resource hotspot | |
CN107247751B (en) | LDA topic model-based content recommendation method | |
CN107194617B (en) | A soft skills classification system and method for software engineers | |
CN102279889A (en) | Question pushing method and system based on geographic information | |
CN104021483A (en) | Recommendation method for passenger demands | |
CN102194015A (en) | Retrieval information heat statistical method | |
CN103218368B (en) | A kind of method and apparatus excavating hot word | |
CN104731874A (en) | Evaluation information generation method and device | |
CN105138572B (en) | Method and device for acquiring relevance weight of user tag | |
CN110930184A (en) | Potential customer mining and customer type selection method based on mixed recommendation algorithm | |
JP2014532220A (en) | Net comment collection method and system | |
CN109213852B (en) | Tourist destination picture recommendation method | |
CN109255019A (en) | A kind of online exam pool and its application method based on artificial intelligence | |
KR20200052786A (en) | Method for determining user's opinion in social network service and system thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210927 Address after: 518000 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 Floors Patentee after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd. Patentee after: TENCENT CLOUD COMPUTING (BEIJING) Co.,Ltd. Address before: 2, 518044, East 403 room, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong, Futian District Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd. |