CN114091586A - Account identification model determining method, device, equipment and medium - Google Patents
Account identification model determining method, device, equipment and medium Download PDFInfo
- Publication number
- CN114091586A CN114091586A CN202111326003.0A CN202111326003A CN114091586A CN 114091586 A CN114091586 A CN 114091586A CN 202111326003 A CN202111326003 A CN 202111326003A CN 114091586 A CN114091586 A CN 114091586A
- Authority
- CN
- China
- Prior art keywords
- account
- classifier
- determining
- base
- calculation formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000004364 calculation method Methods 0.000 claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 31
- 230000008859 change Effects 0.000 claims abstract description 18
- 230000004083 survival effect Effects 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 6
- 238000001514 detection method Methods 0.000 abstract description 4
- 230000000694 effects Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000009193 crawling Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a method, a device, equipment and a medium for determining an account identification model. The method comprises the following steps: acquiring account data, and extracting the account data characteristics to obtain account characteristics; training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers; and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula. By operating the technical scheme provided by the embodiment of the invention, the problem that the zombie account number is identified based on the machine learning model can be solved, however, the zombie account number has strong timeliness, and the machine learning model is usually fixed, so that the detection accuracy of the account number is easy to gradually decrease along with the time, and the effect of improving the accuracy of account number identification is realized.
Description
Technical Field
The embodiment of the invention relates to computer technology, in particular to a method, a device, equipment and a medium for determining an account identification model.
Background
As society develops, social networks have changed people's lives, however, a concomitant flooding of zombie accounts. Zombie account numbers in the field of social networks generally refer to those personal account numbers with low liveness or false account numbers operated by robots in batches, and therefore filter the low-value account numbers.
In the prior art, zombie account numbers are usually identified based on a machine learning model, however, the zombie account numbers have strong timeliness, and the machine learning model is usually fixed, so that the detection accuracy of the account numbers is easy to gradually decrease along with the time.
Disclosure of Invention
The embodiment of the invention provides an account identification model determining method, device, equipment and medium, and aims to improve accuracy of account identification.
In a first aspect, an embodiment of the present invention provides an account identification model determining method, where the method includes:
acquiring account data, and extracting the account data characteristics to obtain account characteristics;
training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
In a second aspect, an embodiment of the present invention further provides an account identification model determining apparatus, where the account identification model determining apparatus includes:
the account characteristic extraction module is used for acquiring account data and extracting the account data characteristics to obtain account characteristics;
the base classifier acquisition module is used for training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and the account recognition model determining module is used for determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account recognition model according to each classifier calculation formula.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the account identification model determination method as described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the account identification model determining method described above.
The account data are obtained, and the account data characteristics are extracted to obtain the account characteristics; training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers; and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula. The problem of the zombie account number of machine learning model based on discernment, however because the zombie account number has very strong timeliness, and the machine learning model is comparatively fixed usually, consequently easily leads to the detection accuracy of account number to descend gradually along with time lapse is solved, the effect of the accuracy of account number discernment is realized improving.
Drawings
Fig. 1 is a flowchart of an account identification model determining method according to an embodiment of the present invention;
fig. 2 is a flowchart of an account identification model determining method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an account identification model determining apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of an account identification model determining method according to an embodiment of the present invention, where the embodiment is applicable to a case of determining a zombie account identification model, and the method may be executed by an account identification model determining apparatus provided in an embodiment of the present invention, and the apparatus may be implemented in a software and/or hardware manner. Referring to fig. 1, the account identification model determining method provided in this embodiment includes:
and 110, acquiring account data, and extracting the account data characteristics to obtain account characteristics.
The account data may be obtained by crawling by a web crawler or by calling an application program interface corresponding to a website, which is not limited in this embodiment.
Illustratively, account data are collected by adopting a distributed web crawler based on a Scapy framework, and as the Scapy has a highly customized and asynchronous multithreading crawler function, a crawler program developed based on the Scapy is deployed on a plurality of hosts, and the crawler program is cooperatively crawled, namely, the hosts do not maintain a crawling queue, but share the crawling queue with all the hosts, and simultaneously execute the same crawler task to perform distributed crawler, so that the performance and efficiency of data acquisition are greatly improved.
And extracting the account data characteristics to obtain account characteristics, wherein the account characteristics can be characteristics which are greatly related to the characteristics of the zombie account, and exemplarily can be the similarity degree of the text contents in the account.
And 120, training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers.
The different acquisition times of the account data may be different preset interval times, for example, the account data is acquired in units of days, and the account data α obtained at different acquisition times is used1,α2,α3,...,αtRespectively training to obtain corresponding base classifiers D (alpha)1),D(α2),...,D(αt)。
The base classifier can be a naive Bayes classifier, and because the naive Bayes classifier has better anti-noise capability, the separation relation between the normal account and the zombie account is a random variable by using the naive Bayes classifier, and the relation can be analyzed by adopting conditional probability.
And determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, wherein the forgetting factor based on time change is a factor of which the value is decreased with the increase of time. The forgetting factor is substituted into the classifier calculation formula originally corresponding to the base classifier, so that the base classifier is related to time.
Optionally, the sample of the base classifier is set as α according to the time of acquisition1,α2,α3,...,αtThe base classifier is set to D (α)t) The following conditions are satisfied:
(1) for T < i ≦ T, αt0; where T is the current time, i.e. the samples not obtained will not affect the classifier.
(2) When etatIf the value of (d) is less than the predetermined value, the classifier is discarded.
When the forgetting factor is smaller than the specific value, the influence of the classifier is discarded, and the effect of sliding the window in time is achieved, so that the influence of excessive classifiers on the algorithm efficiency is balanced.
In this embodiment, optionally, determining the classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change includes:
determining a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
And acquiring a base classifier calculation formula at the time t, wherein the base classifier calculation formula at the time t is determined by a forgetting factor at the time t, and the value of the forgetting factor is decreased with the increase of t, so that the proportion of a result corresponding to the base classifier in a subsequent account identification model is reduced, and the account identification model dynamically changes according to time to improve the accuracy of current account identification.
In this embodiment, optionally, determining an account identification model according to each classifier calculation formula includes:
determining an account recognition model according to the following formula:
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
The results obtained by the base classifiers are collected to obtain an account identification model, so that the accuracy of determining the account identification model is improved, and the accuracy of identifying the current zombie account is improved as the account identification model changes continuously along with the lapse of time.
According to the technical scheme provided by the embodiment, the classifier calculation formulas corresponding to the base classifiers determined by the forgetting factor are substituted into the formula determined by the account identification model, that is, the results of the base classifiers are summarized to obtain the total account identification model. Because the forgetting factor has an exponential decay characteristic, the influence of a classifier obtained by training according to account data with longer acquisition time on a final result is reduced, so that the account recognition model dynamically changes according to time, and the accuracy of current account recognition is improved.
Example two
Fig. 2 is a flowchart of an account identification model determining method according to a second embodiment of the present invention, and this technical solution is supplementary described with respect to a process after at least two base classifiers are obtained. Compared with the scheme, the scheme is specifically optimized in that after at least two base classifiers are obtained, the method further comprises the following steps:
obtaining the difference degree between the two base classifiers;
judging whether the difference degree is larger than the average difference degree value or not; if so, reserving a base classifier group corresponding to the difference degree larger than the average value of the difference degrees, and determining a base classifier corresponding to the base classifier group as a target base classifier;
correspondingly, a classifier calculation formula corresponding to each base classifier is determined according to a preset forgetting factor based on time change, and the method comprises the following steps:
and determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change. Specifically, a flowchart of the account identification model determining method is shown in fig. 2:
In this embodiment, optionally, the account characteristics include: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
The zombie account has a single behavior mode, less information, lower attention and fan number, less message number and shorter survival time because the zombie account is easy to be automatically cleared by a platform where the account is located in the past. Later zombie account numbers gradually develop, personal profiles of head portraits and settings are often generated to escape a detection system of a platform, meaningless tweets are regularly sent, some zombie accounts even have higher attention and fan numbers, and most fan accounts are also zombie account numbers. Some zombie account numbers can send advertisements or malicious links, and some zombie account numbers can hijack trend topics to play a role in controlling the robot by public opinion, so that the current zombie account numbers are relatively complex in characteristics.
The account characteristics selected in this embodiment include: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
The user attention number is the number of accounts concerned by one account, and compared with a normal user who only can pay attention to a limited account which can be processed, zombie accounts pay attention to a large number of other accounts in order to meet economic benefits, and even approach to the attention upper limit allowed by a platform.
The number of fans concerned by a user refers to the number of fans of an account, from the perspective of influence of one account, the number of fans is larger, the influence of one account is correspondingly higher, and more resources are needed for operating a zombie account with high vermicelli quantity, so that the number of fans of the zombie account is usually not too high, and most fans of the zombie account are also zombie accounts.
The number of approved users refers to the number of approved accounts.
The user number of messages refers to the number of contents transmitted by one account. The number of user messages is an important indicator of whether a user is active or not. A normal user often sends a large amount of contents from registration to the current, a carefully maintained zombie account also has a large amount of tweets, the number of tweets of the zombie account which is stopped to be maintained is often small, and an advertisement type zombie account also has the possibility of sending tweets with high frequency for advertisement promotion.
The survival time of an account is the time the account is present, and the survival time of a normal account is often longer than that of a zombie account.
The content repetition number refers to the number of similar contents issued in the account, and if one account issues a large amount of tweets of repeated contents, the account is likely to be regarded as a zombie account.
The account number associated address refers to location information filled in by a user, and a typical zombie account number cannot fill in too much account number content. The local energy reflects whether the account is a zombie account or not to a certain extent.
The account profile refers to the personal profile of the account, and official and long-term account numbers generally have personal profiles, while zombie account numbers are likely to have no profiles in comparison. Or profile information has no meaning.
By extracting account features with high correlation with zombie accounts, the accuracy of an account identification model obtained through subsequent training is improved, and therefore the accuracy of subsequent zombie account identification is improved.
And step 230, acquiring the difference degree between the two base classifiers.
The difference degree is the difference degree of results obtained by predicting the same prediction data by different base classifiers.
Different predictions can be made for the same test set by two base classifiers to obtain the degree of difference, and for test set Q, the difference metric d is determined according to the following formulamn:
Wherein Q is11Indicating that both base classifiers predicted success; q00Indicating that both base classifiers failed prediction; q01The prediction of the base classifier 1 is failed, and the prediction of the base classifier 2 is successful; q10Indicating that base classifier 1 predicted successfully and base classifier 2 failed.
dmn∈[0,1]The larger the value, the existence of the base classifier is indicatedThe larger the difference, the more representative. The difference between the other base classifiers and the base classifier can be obtained by using the single base classifier as a reference, which is not limited in this embodiment.
And after averaging all the obtained difference degrees, determining a base classifier group with the difference degree larger than the average value of the difference degrees, determining two base classifiers corresponding to the base classifier group as target base classifiers, and deleting repeated target base classifiers.
And 250, determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
And determining a classifier calculation formula corresponding to each target-based classifier according to a preset forgetting factor based on time change, and summarizing results of each target classifier to obtain a total account identification model.
According to the embodiment of the invention, the representative base classifier with stronger classification capability is obtained by screening the base classifier, the redundant base classifier is removed, and the account identification model is determined according to the screened base classifier, so that the account identification model continuously selects a part of time data with better classification effect, and the accuracy of determining the account identification model is improved.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an account identification model determining apparatus according to a third embodiment of the present invention. The device can be realized in a hardware and/or software mode, can execute the account identification model determination method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. As shown in fig. 3, the apparatus includes:
the account feature extraction module 310 is configured to obtain account data and extract features of the account data to obtain account features;
a base classifier obtaining module 320, configured to train, through a preset training model, the account data according to the account data obtaining time and the account features to obtain at least two base classifiers;
the account identification model determining module 330 is configured to determine a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determine an account identification model according to each classifier calculation formula.
According to the technical scheme provided by the embodiment, the classifier calculation formulas corresponding to the base classifiers determined by the forgetting factor are substituted into the formula determined by the account identification model, that is, the results of the base classifiers are summarized to obtain the total account identification model. Because the forgetting factor has an exponential decay characteristic, the influence of a classifier obtained by training according to account data with longer acquisition time on a final result is reduced, so that the account recognition model dynamically changes according to time, and the accuracy of current account recognition is improved.
On the basis of the above technical solutions, optionally, the account identification model determining module includes:
a first classifier calculation formula determining unit, configured to determine a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
On the basis of the above technical solutions, optionally, the account identification model determining module includes:
the account recognition model determining unit is used for determining an account recognition model according to the following formula:
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
On the basis of the above technical solutions, optionally, the apparatus further includes:
the difference degree obtaining module is used for obtaining the difference degree between the two base classifiers before the account number identification model determining module;
the target base classifier determining module is used for judging whether the difference degree is greater than the difference degree average value or not; if so, reserving a base classifier group corresponding to the difference degree larger than the average value of the difference degrees, and determining a base classifier corresponding to the base classifier group as a target base classifier;
correspondingly, the account identification model determining module includes:
and the second classifier calculation formula determining unit is used for determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change.
On the basis of the above technical solutions, optionally, the account characteristics include: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
Example four
Fig. 4 is a schematic structural diagram of an electronic apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the electronic apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of the processors 40 in the electronic device may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the memory 41, the input device 42 and the output device 43 in the electronic apparatus may be connected by a bus or other means, and the bus connection is exemplified in fig. 4.
The memory 41 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the account identification model determination method in the embodiment of the present invention. The processor 40 executes various functional applications and data processing of the electronic device by executing software programs, instructions and modules stored in the memory 41, that is, implements the account identification model determination method described above.
The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a method for determining an account identification model, and the method includes:
acquiring account data, and extracting the account data characteristics to obtain account characteristics;
training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the above-described method operations, and may also perform related operations in the account identification model determination method provided in any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the account identification model determining apparatus, each included unit and module are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. An account identification model determination method is characterized by comprising the following steps:
acquiring account data, and extracting the account data characteristics to obtain account characteristics;
training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
2. The method according to claim 1, wherein determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time variation comprises:
determining a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
3. The method of claim 2, wherein determining an account identification model from each of the classifier computational formulas comprises:
determining an account recognition model according to the following formula:
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
4. The method of claim 1, after obtaining at least two base classifiers, further comprising:
obtaining the difference degree between the two base classifiers;
judging whether the difference degree is larger than the average difference degree value or not; if so, reserving a base classifier group corresponding to the difference degree larger than the average value of the difference degrees, and determining a base classifier corresponding to the base classifier group as a target base classifier;
correspondingly, a classifier calculation formula corresponding to each base classifier is determined according to a preset forgetting factor based on time change, and the method comprises the following steps:
and determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change.
5. The method of claim 1, wherein the account features comprise: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
6. An account recognition model determination apparatus, comprising:
the account characteristic extraction module is used for acquiring account data and extracting the account data characteristics to obtain account characteristics;
the base classifier acquisition module is used for training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and the account recognition model determining module is used for determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account recognition model according to each classifier calculation formula.
7. The apparatus of claim 6, wherein the account identification model determining module comprises:
a first classifier calculation formula determining unit, configured to determine a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
8. The apparatus of claim 6, wherein the account identification model determining module comprises:
the account recognition model determining unit is used for determining an account recognition model according to the following formula:
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the account identification model determination method of any of claims 1-5.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out an account identification model determination method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111326003.0A CN114091586A (en) | 2021-11-10 | 2021-11-10 | Account identification model determining method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111326003.0A CN114091586A (en) | 2021-11-10 | 2021-11-10 | Account identification model determining method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114091586A true CN114091586A (en) | 2022-02-25 |
Family
ID=80299585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111326003.0A Pending CN114091586A (en) | 2021-11-10 | 2021-11-10 | Account identification model determining method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114091586A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117879863A (en) * | 2023-11-30 | 2024-04-12 | 电子科技大学长三角研究院(湖州) | Multi-layer coping system for botnet based on elastic complex social network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106886518A (en) * | 2015-12-15 | 2017-06-23 | 国家计算机网络与信息安全管理中心 | A kind of method of microblog account classification |
CN110070060A (en) * | 2019-04-26 | 2019-07-30 | 天津开发区精诺瀚海数据科技有限公司 | A kind of method for diagnosing faults of bearing apparatus |
US20200366698A1 (en) * | 2019-05-13 | 2020-11-19 | Feedzai-Consultadoria e Inovação Tecnologica, S.A. | Automatic model monitoring for data streams |
-
2021
- 2021-11-10 CN CN202111326003.0A patent/CN114091586A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106886518A (en) * | 2015-12-15 | 2017-06-23 | 国家计算机网络与信息安全管理中心 | A kind of method of microblog account classification |
CN110070060A (en) * | 2019-04-26 | 2019-07-30 | 天津开发区精诺瀚海数据科技有限公司 | A kind of method for diagnosing faults of bearing apparatus |
US20200366698A1 (en) * | 2019-05-13 | 2020-11-19 | Feedzai-Consultadoria e Inovação Tecnologica, S.A. | Automatic model monitoring for data streams |
Non-Patent Citations (3)
Title |
---|
KRAWCZYK, B等: "Weighted Naïve Bayes Classifier with Forgetting for Drifting Data Streams", 2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, 9 October 2015 (2015-10-09) * |
吕林涛;姬娜;张九龙;: "基于RBF神经网络的可疑交易监测模型", 计算机工程与应用, no. 03, 21 January 2010 (2010-01-21) * |
詹天晟;陈德华;乐嘉锦;王梅;: "基于海量搜索历史数据的用户兴趣模型", 计算机应用, no. 2, 15 December 2014 (2014-12-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117879863A (en) * | 2023-11-30 | 2024-04-12 | 电子科技大学长三角研究院(湖州) | Multi-layer coping system for botnet based on elastic complex social network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103336766B (en) | Short text garbage identification and modeling method and device | |
Adewole et al. | SMSAD: a framework for spam message and spam account detection | |
US8874663B2 (en) | Comparing similarity between documents for filtering unwanted documents | |
US7882192B2 (en) | Detecting spam email using multiple spam classifiers | |
US10178115B2 (en) | Systems and methods for categorizing network traffic content | |
JP5990284B2 (en) | Spam detection system and method using character histogram | |
US9876742B2 (en) | Techniques to select and prioritize application of junk email filtering rules | |
US10637826B1 (en) | Policy compliance verification using semantic distance and nearest neighbor search of labeled content | |
CN110149266B (en) | Junk mail identification method and device | |
US10789537B2 (en) | Machine learning and validation of account names, addresses, and/or identifiers | |
CN106874314B (en) | Information recommendation method and device | |
CN112070120A (en) | Threat information processing method, device, electronic device and storage medium | |
CN107729520B (en) | File classification method and device, computer equipment and computer readable medium | |
JP4742619B2 (en) | Information processing system, program, and information processing method | |
CN112039874B (en) | Malicious mail identification method and device | |
Liubchenko et al. | Research Application of the Spam Filtering and Spammer Detection Algorithms on Social Media. | |
Hosseinpour et al. | An ensemble learning approach for sms spam detection | |
US9332031B1 (en) | Categorizing accounts based on associated images | |
CN114091586A (en) | Account identification model determining method, device, equipment and medium | |
CN108804501B (en) | Method and device for detecting effective information | |
CN110972086A (en) | Short message processing method and device, electronic equipment and computer readable storage medium | |
CN109145115B (en) | Product public opinion discovery method, device, computer equipment and storage medium | |
CN109587248B (en) | User identification method, device, server and storage medium | |
CN115391674B (en) | Method, device, equipment and storage medium for efficiently suppressing false information of network community | |
US10671654B2 (en) | Estimating probability of spreading information by users on micro-weblogs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |