[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114091586A - Account identification model determining method, device, equipment and medium - Google Patents

Account identification model determining method, device, equipment and medium Download PDF

Info

Publication number
CN114091586A
CN114091586A CN202111326003.0A CN202111326003A CN114091586A CN 114091586 A CN114091586 A CN 114091586A CN 202111326003 A CN202111326003 A CN 202111326003A CN 114091586 A CN114091586 A CN 114091586A
Authority
CN
China
Prior art keywords
account
classifier
determining
base
calculation formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111326003.0A
Other languages
Chinese (zh)
Inventor
刘纯彰
古毅伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202111326003.0A priority Critical patent/CN114091586A/en
Publication of CN114091586A publication Critical patent/CN114091586A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a medium for determining an account identification model. The method comprises the following steps: acquiring account data, and extracting the account data characteristics to obtain account characteristics; training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers; and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula. By operating the technical scheme provided by the embodiment of the invention, the problem that the zombie account number is identified based on the machine learning model can be solved, however, the zombie account number has strong timeliness, and the machine learning model is usually fixed, so that the detection accuracy of the account number is easy to gradually decrease along with the time, and the effect of improving the accuracy of account number identification is realized.

Description

Account identification model determining method, device, equipment and medium
Technical Field
The embodiment of the invention relates to computer technology, in particular to a method, a device, equipment and a medium for determining an account identification model.
Background
As society develops, social networks have changed people's lives, however, a concomitant flooding of zombie accounts. Zombie account numbers in the field of social networks generally refer to those personal account numbers with low liveness or false account numbers operated by robots in batches, and therefore filter the low-value account numbers.
In the prior art, zombie account numbers are usually identified based on a machine learning model, however, the zombie account numbers have strong timeliness, and the machine learning model is usually fixed, so that the detection accuracy of the account numbers is easy to gradually decrease along with the time.
Disclosure of Invention
The embodiment of the invention provides an account identification model determining method, device, equipment and medium, and aims to improve accuracy of account identification.
In a first aspect, an embodiment of the present invention provides an account identification model determining method, where the method includes:
acquiring account data, and extracting the account data characteristics to obtain account characteristics;
training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
In a second aspect, an embodiment of the present invention further provides an account identification model determining apparatus, where the account identification model determining apparatus includes:
the account characteristic extraction module is used for acquiring account data and extracting the account data characteristics to obtain account characteristics;
the base classifier acquisition module is used for training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and the account recognition model determining module is used for determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account recognition model according to each classifier calculation formula.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the account identification model determination method as described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the account identification model determining method described above.
The account data are obtained, and the account data characteristics are extracted to obtain the account characteristics; training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers; and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula. The problem of the zombie account number of machine learning model based on discernment, however because the zombie account number has very strong timeliness, and the machine learning model is comparatively fixed usually, consequently easily leads to the detection accuracy of account number to descend gradually along with time lapse is solved, the effect of the accuracy of account number discernment is realized improving.
Drawings
Fig. 1 is a flowchart of an account identification model determining method according to an embodiment of the present invention;
fig. 2 is a flowchart of an account identification model determining method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an account identification model determining apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of an account identification model determining method according to an embodiment of the present invention, where the embodiment is applicable to a case of determining a zombie account identification model, and the method may be executed by an account identification model determining apparatus provided in an embodiment of the present invention, and the apparatus may be implemented in a software and/or hardware manner. Referring to fig. 1, the account identification model determining method provided in this embodiment includes:
and 110, acquiring account data, and extracting the account data characteristics to obtain account characteristics.
The account data may be obtained by crawling by a web crawler or by calling an application program interface corresponding to a website, which is not limited in this embodiment.
Illustratively, account data are collected by adopting a distributed web crawler based on a Scapy framework, and as the Scapy has a highly customized and asynchronous multithreading crawler function, a crawler program developed based on the Scapy is deployed on a plurality of hosts, and the crawler program is cooperatively crawled, namely, the hosts do not maintain a crawling queue, but share the crawling queue with all the hosts, and simultaneously execute the same crawler task to perform distributed crawler, so that the performance and efficiency of data acquisition are greatly improved.
And extracting the account data characteristics to obtain account characteristics, wherein the account characteristics can be characteristics which are greatly related to the characteristics of the zombie account, and exemplarily can be the similarity degree of the text contents in the account.
And 120, training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers.
The different acquisition times of the account data may be different preset interval times, for example, the account data is acquired in units of days, and the account data α obtained at different acquisition times is used1,α2,α3,...,αtRespectively training to obtain corresponding base classifiers D (alpha)1),D(α2),...,D(αt)。
The base classifier can be a naive Bayes classifier, and because the naive Bayes classifier has better anti-noise capability, the separation relation between the normal account and the zombie account is a random variable by using the naive Bayes classifier, and the relation can be analyzed by adopting conditional probability.
Step 130, determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time variation, and determining an account identification model according to each classifier calculation formula.
And determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, wherein the forgetting factor based on time change is a factor of which the value is decreased with the increase of time. The forgetting factor is substituted into the classifier calculation formula originally corresponding to the base classifier, so that the base classifier is related to time.
Optionally, the sample of the base classifier is set as α according to the time of acquisition1,α2,α3,...,αtThe base classifier is set to D (α)t) The following conditions are satisfied:
(1) for T < i ≦ T, αt0; where T is the current time, i.e. the samples not obtained will not affect the classifier.
(2) When etatIf the value of (d) is less than the predetermined value, the classifier is discarded.
When the forgetting factor is smaller than the specific value, the influence of the classifier is discarded, and the effect of sliding the window in time is achieved, so that the influence of excessive classifiers on the algorithm efficiency is balanced.
In this embodiment, optionally, determining the classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change includes:
determining a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
And acquiring a base classifier calculation formula at the time t, wherein the base classifier calculation formula at the time t is determined by a forgetting factor at the time t, and the value of the forgetting factor is decreased with the increase of t, so that the proportion of a result corresponding to the base classifier in a subsequent account identification model is reduced, and the account identification model dynamically changes according to time to improve the accuracy of current account identification.
In this embodiment, optionally, determining an account identification model according to each classifier calculation formula includes:
determining an account recognition model according to the following formula:
Figure BDA0003347143900000061
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
The results obtained by the base classifiers are collected to obtain an account identification model, so that the accuracy of determining the account identification model is improved, and the accuracy of identifying the current zombie account is improved as the account identification model changes continuously along with the lapse of time.
According to the technical scheme provided by the embodiment, the classifier calculation formulas corresponding to the base classifiers determined by the forgetting factor are substituted into the formula determined by the account identification model, that is, the results of the base classifiers are summarized to obtain the total account identification model. Because the forgetting factor has an exponential decay characteristic, the influence of a classifier obtained by training according to account data with longer acquisition time on a final result is reduced, so that the account recognition model dynamically changes according to time, and the accuracy of current account recognition is improved.
Example two
Fig. 2 is a flowchart of an account identification model determining method according to a second embodiment of the present invention, and this technical solution is supplementary described with respect to a process after at least two base classifiers are obtained. Compared with the scheme, the scheme is specifically optimized in that after at least two base classifiers are obtained, the method further comprises the following steps:
obtaining the difference degree between the two base classifiers;
judging whether the difference degree is larger than the average difference degree value or not; if so, reserving a base classifier group corresponding to the difference degree larger than the average value of the difference degrees, and determining a base classifier corresponding to the base classifier group as a target base classifier;
correspondingly, a classifier calculation formula corresponding to each base classifier is determined according to a preset forgetting factor based on time change, and the method comprises the following steps:
and determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change. Specifically, a flowchart of the account identification model determining method is shown in fig. 2:
step 210, account data is obtained, and account data features are extracted to obtain account features.
In this embodiment, optionally, the account characteristics include: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
The zombie account has a single behavior mode, less information, lower attention and fan number, less message number and shorter survival time because the zombie account is easy to be automatically cleared by a platform where the account is located in the past. Later zombie account numbers gradually develop, personal profiles of head portraits and settings are often generated to escape a detection system of a platform, meaningless tweets are regularly sent, some zombie accounts even have higher attention and fan numbers, and most fan accounts are also zombie account numbers. Some zombie account numbers can send advertisements or malicious links, and some zombie account numbers can hijack trend topics to play a role in controlling the robot by public opinion, so that the current zombie account numbers are relatively complex in characteristics.
The account characteristics selected in this embodiment include: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
The user attention number is the number of accounts concerned by one account, and compared with a normal user who only can pay attention to a limited account which can be processed, zombie accounts pay attention to a large number of other accounts in order to meet economic benefits, and even approach to the attention upper limit allowed by a platform.
The number of fans concerned by a user refers to the number of fans of an account, from the perspective of influence of one account, the number of fans is larger, the influence of one account is correspondingly higher, and more resources are needed for operating a zombie account with high vermicelli quantity, so that the number of fans of the zombie account is usually not too high, and most fans of the zombie account are also zombie accounts.
The number of approved users refers to the number of approved accounts.
The user number of messages refers to the number of contents transmitted by one account. The number of user messages is an important indicator of whether a user is active or not. A normal user often sends a large amount of contents from registration to the current, a carefully maintained zombie account also has a large amount of tweets, the number of tweets of the zombie account which is stopped to be maintained is often small, and an advertisement type zombie account also has the possibility of sending tweets with high frequency for advertisement promotion.
The survival time of an account is the time the account is present, and the survival time of a normal account is often longer than that of a zombie account.
The content repetition number refers to the number of similar contents issued in the account, and if one account issues a large amount of tweets of repeated contents, the account is likely to be regarded as a zombie account.
The account number associated address refers to location information filled in by a user, and a typical zombie account number cannot fill in too much account number content. The local energy reflects whether the account is a zombie account or not to a certain extent.
The account profile refers to the personal profile of the account, and official and long-term account numbers generally have personal profiles, while zombie account numbers are likely to have no profiles in comparison. Or profile information has no meaning.
By extracting account features with high correlation with zombie accounts, the accuracy of an account identification model obtained through subsequent training is improved, and therefore the accuracy of subsequent zombie account identification is improved.
Step 220, training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers.
And step 230, acquiring the difference degree between the two base classifiers.
The difference degree is the difference degree of results obtained by predicting the same prediction data by different base classifiers.
Different predictions can be made for the same test set by two base classifiers to obtain the degree of difference, and for test set Q, the difference metric d is determined according to the following formulamn
Figure BDA0003347143900000091
Wherein Q is11Indicating that both base classifiers predicted success; q00Indicating that both base classifiers failed prediction; q01The prediction of the base classifier 1 is failed, and the prediction of the base classifier 2 is successful; q10Indicating that base classifier 1 predicted successfully and base classifier 2 failed.
dmn∈[0,1]The larger the value, the existence of the base classifier is indicatedThe larger the difference, the more representative. The difference between the other base classifiers and the base classifier can be obtained by using the single base classifier as a reference, which is not limited in this embodiment.
Step 240, judging whether the difference degree is larger than the difference degree average value; if so, reserving a base classifier group corresponding to the difference degree larger than the average value of the difference degrees, and determining a base classifier corresponding to the base classifier group as a target base classifier.
And after averaging all the obtained difference degrees, determining a base classifier group with the difference degree larger than the average value of the difference degrees, determining two base classifiers corresponding to the base classifier group as target base classifiers, and deleting repeated target base classifiers.
And 250, determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
And determining a classifier calculation formula corresponding to each target-based classifier according to a preset forgetting factor based on time change, and summarizing results of each target classifier to obtain a total account identification model.
According to the embodiment of the invention, the representative base classifier with stronger classification capability is obtained by screening the base classifier, the redundant base classifier is removed, and the account identification model is determined according to the screened base classifier, so that the account identification model continuously selects a part of time data with better classification effect, and the accuracy of determining the account identification model is improved.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an account identification model determining apparatus according to a third embodiment of the present invention. The device can be realized in a hardware and/or software mode, can execute the account identification model determination method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. As shown in fig. 3, the apparatus includes:
the account feature extraction module 310 is configured to obtain account data and extract features of the account data to obtain account features;
a base classifier obtaining module 320, configured to train, through a preset training model, the account data according to the account data obtaining time and the account features to obtain at least two base classifiers;
the account identification model determining module 330 is configured to determine a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determine an account identification model according to each classifier calculation formula.
According to the technical scheme provided by the embodiment, the classifier calculation formulas corresponding to the base classifiers determined by the forgetting factor are substituted into the formula determined by the account identification model, that is, the results of the base classifiers are summarized to obtain the total account identification model. Because the forgetting factor has an exponential decay characteristic, the influence of a classifier obtained by training according to account data with longer acquisition time on a final result is reduced, so that the account recognition model dynamically changes according to time, and the accuracy of current account recognition is improved.
On the basis of the above technical solutions, optionally, the account identification model determining module includes:
a first classifier calculation formula determining unit, configured to determine a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
On the basis of the above technical solutions, optionally, the account identification model determining module includes:
the account recognition model determining unit is used for determining an account recognition model according to the following formula:
Figure BDA0003347143900000111
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
On the basis of the above technical solutions, optionally, the apparatus further includes:
the difference degree obtaining module is used for obtaining the difference degree between the two base classifiers before the account number identification model determining module;
the target base classifier determining module is used for judging whether the difference degree is greater than the difference degree average value or not; if so, reserving a base classifier group corresponding to the difference degree larger than the average value of the difference degrees, and determining a base classifier corresponding to the base classifier group as a target base classifier;
correspondingly, the account identification model determining module includes:
and the second classifier calculation formula determining unit is used for determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change.
On the basis of the above technical solutions, optionally, the account characteristics include: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
Example four
Fig. 4 is a schematic structural diagram of an electronic apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the electronic apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of the processors 40 in the electronic device may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the memory 41, the input device 42 and the output device 43 in the electronic apparatus may be connected by a bus or other means, and the bus connection is exemplified in fig. 4.
The memory 41 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the account identification model determination method in the embodiment of the present invention. The processor 40 executes various functional applications and data processing of the electronic device by executing software programs, instructions and modules stored in the memory 41, that is, implements the account identification model determination method described above.
The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a method for determining an account identification model, and the method includes:
acquiring account data, and extracting the account data characteristics to obtain account characteristics;
training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the above-described method operations, and may also perform related operations in the account identification model determination method provided in any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the account identification model determining apparatus, each included unit and module are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. An account identification model determination method is characterized by comprising the following steps:
acquiring account data, and extracting the account data characteristics to obtain account characteristics;
training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account identification model according to each classifier calculation formula.
2. The method according to claim 1, wherein determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time variation comprises:
determining a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
3. The method of claim 2, wherein determining an account identification model from each of the classifier computational formulas comprises:
determining an account recognition model according to the following formula:
Figure FDA0003347143890000011
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
4. The method of claim 1, after obtaining at least two base classifiers, further comprising:
obtaining the difference degree between the two base classifiers;
judging whether the difference degree is larger than the average difference degree value or not; if so, reserving a base classifier group corresponding to the difference degree larger than the average value of the difference degrees, and determining a base classifier corresponding to the base classifier group as a target base classifier;
correspondingly, a classifier calculation formula corresponding to each base classifier is determined according to a preset forgetting factor based on time change, and the method comprises the following steps:
and determining a classifier calculation formula corresponding to each target base classifier according to a preset forgetting factor based on time change.
5. The method of claim 1, wherein the account features comprise: at least one of a user interest count, a user endorsed count, a user text transmission count, an account survival time, a content duplication count, an account associated address, and an account profile.
6. An account recognition model determination apparatus, comprising:
the account characteristic extraction module is used for acquiring account data and extracting the account data characteristics to obtain account characteristics;
the base classifier acquisition module is used for training the account data according to the acquisition time of the account data and the account characteristics through a preset training model to obtain at least two base classifiers;
and the account recognition model determining module is used for determining a classifier calculation formula corresponding to each base classifier according to a preset forgetting factor based on time change, and determining an account recognition model according to each classifier calculation formula.
7. The apparatus of claim 6, wherein the account identification model determining module comprises:
a first classifier calculation formula determining unit, configured to determine a classifier calculation formula corresponding to each base classifier according to the following formula:
D(αt)=(1-ηt)×D(αt-1)
wherein D (alpha)t) According to account data alpha with the acquisition time being t momenttTraining the obtained base classifier; etatA forgetting factor based on time variation; d (alpha)t-1) The method comprises the steps of training to obtain a base classifier according to account data with the acquisition time being t-1;
the forgetting factor based on the time variation is determined according to the following formula:
ηt=C0×e-ρt
wherein, C0And ρ is a preset constant.
8. The apparatus of claim 6, wherein the account identification model determining module comprises:
the account recognition model determining unit is used for determining an account recognition model according to the following formula:
Figure FDA0003347143890000031
wherein, WtIdentifying a model for the account; c is a preset constant; and T is the current time.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the account identification model determination method of any of claims 1-5.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out an account identification model determination method according to any one of claims 1 to 5.
CN202111326003.0A 2021-11-10 2021-11-10 Account identification model determining method, device, equipment and medium Pending CN114091586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111326003.0A CN114091586A (en) 2021-11-10 2021-11-10 Account identification model determining method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111326003.0A CN114091586A (en) 2021-11-10 2021-11-10 Account identification model determining method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114091586A true CN114091586A (en) 2022-02-25

Family

ID=80299585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111326003.0A Pending CN114091586A (en) 2021-11-10 2021-11-10 Account identification model determining method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114091586A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117879863A (en) * 2023-11-30 2024-04-12 电子科技大学长三角研究院(湖州) Multi-layer coping system for botnet based on elastic complex social network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886518A (en) * 2015-12-15 2017-06-23 国家计算机网络与信息安全管理中心 A kind of method of microblog account classification
CN110070060A (en) * 2019-04-26 2019-07-30 天津开发区精诺瀚海数据科技有限公司 A kind of method for diagnosing faults of bearing apparatus
US20200366698A1 (en) * 2019-05-13 2020-11-19 Feedzai-Consultadoria e Inovação Tecnologica, S.A. Automatic model monitoring for data streams

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886518A (en) * 2015-12-15 2017-06-23 国家计算机网络与信息安全管理中心 A kind of method of microblog account classification
CN110070060A (en) * 2019-04-26 2019-07-30 天津开发区精诺瀚海数据科技有限公司 A kind of method for diagnosing faults of bearing apparatus
US20200366698A1 (en) * 2019-05-13 2020-11-19 Feedzai-Consultadoria e Inovação Tecnologica, S.A. Automatic model monitoring for data streams

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KRAWCZYK, B等: "Weighted Naïve Bayes Classifier with Forgetting for Drifting Data Streams", 2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, 9 October 2015 (2015-10-09) *
吕林涛;姬娜;张九龙;: "基于RBF神经网络的可疑交易监测模型", 计算机工程与应用, no. 03, 21 January 2010 (2010-01-21) *
詹天晟;陈德华;乐嘉锦;王梅;: "基于海量搜索历史数据的用户兴趣模型", 计算机应用, no. 2, 15 December 2014 (2014-12-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117879863A (en) * 2023-11-30 2024-04-12 电子科技大学长三角研究院(湖州) Multi-layer coping system for botnet based on elastic complex social network

Similar Documents

Publication Publication Date Title
CN103336766B (en) Short text garbage identification and modeling method and device
Adewole et al. SMSAD: a framework for spam message and spam account detection
US8874663B2 (en) Comparing similarity between documents for filtering unwanted documents
US7882192B2 (en) Detecting spam email using multiple spam classifiers
US10178115B2 (en) Systems and methods for categorizing network traffic content
JP5990284B2 (en) Spam detection system and method using character histogram
US9876742B2 (en) Techniques to select and prioritize application of junk email filtering rules
US10637826B1 (en) Policy compliance verification using semantic distance and nearest neighbor search of labeled content
CN110149266B (en) Junk mail identification method and device
US10789537B2 (en) Machine learning and validation of account names, addresses, and/or identifiers
CN106874314B (en) Information recommendation method and device
CN112070120A (en) Threat information processing method, device, electronic device and storage medium
CN107729520B (en) File classification method and device, computer equipment and computer readable medium
JP4742619B2 (en) Information processing system, program, and information processing method
CN112039874B (en) Malicious mail identification method and device
Liubchenko et al. Research Application of the Spam Filtering and Spammer Detection Algorithms on Social Media.
Hosseinpour et al. An ensemble learning approach for sms spam detection
US9332031B1 (en) Categorizing accounts based on associated images
CN114091586A (en) Account identification model determining method, device, equipment and medium
CN108804501B (en) Method and device for detecting effective information
CN110972086A (en) Short message processing method and device, electronic equipment and computer readable storage medium
CN109145115B (en) Product public opinion discovery method, device, computer equipment and storage medium
CN109587248B (en) User identification method, device, server and storage medium
CN115391674B (en) Method, device, equipment and storage medium for efficiently suppressing false information of network community
US10671654B2 (en) Estimating probability of spreading information by users on micro-weblogs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination