WO2020257991A1 - User identification method and related product - Google Patents
User identification method and related product Download PDFInfo
- Publication number
- WO2020257991A1 WO2020257991A1 PCT/CN2019/092592 CN2019092592W WO2020257991A1 WO 2020257991 A1 WO2020257991 A1 WO 2020257991A1 CN 2019092592 W CN2019092592 W CN 2019092592W WO 2020257991 A1 WO2020257991 A1 WO 2020257991A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- target user
- identified
- input
- groups
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
Definitions
- This application relates to the field of communication technology, and specifically to a user identification method and related products.
- the more useful resources are pushed to users in important positions, the greater the user's perception of value and the better the effect of resources.
- the indicator of the resource display in a better position is getting better and better as its user clicks or user behavior.
- some content producers will use the method of brushing to obtain them.
- they can obtain better location resources for themselves, and on the other hand, they can get more exposure from real users. But from the perspective of the resource display platform, it will cause the platform to be unfair to resources, and users will also have mistrust of the platform. Therefore, how to identify the users who brushed the amount has become an urgent problem to be solved.
- the current user identification of swiping users is mainly to identify users one by one. For example, one account is logged in on multiple mobile phones, and there are multiple accounts on one mobile phone for registration and login, and one mobile phone has continuous and uninterrupted access to the same URL or the number of visits exceeds normal. Identification of users and other means. Currently, the accuracy of user identification for swiping is low.
- the embodiments of the present application provide a user identification method and related products, which can improve the identification accuracy of a user who has swiped.
- an embodiment of the present application provides a user identification method, including:
- N brush groups are classified according to group user rules, any one of the N brush groups
- the ID of the users who brush the amount included in the group is greater than the preset number threshold, and N is a positive integer;
- the input characteristics of the target user ID including user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics;
- the target user ID is a brushing user ID.
- an embodiment of the present application provides a user identification device, the user identification device including a first determining unit, an acquiring unit, an identifying unit, and a second determining unit, wherein:
- the first determining unit is configured to determine whether there are identified N brushing groups when the target user ID needs to be identified.
- the N brushing groups are classified according to group user rules.
- the user ID contained in any one of the two groups is greater than the preset number threshold, and N is a positive integer;
- the acquiring unit is configured to acquire the input characteristics of the target user ID when the first determining unit determines that there are N swiping groups that have been identified, and the input characteristics include user location characteristics and user APP usage Features, user equipment usage features and user click-through rate CTR features;
- the identification unit is configured to identify the similarity between the target user ID and each of the N brush amount groups based on the input characteristics of the target user ID;
- the second determining unit is configured to determine the brush group that has a similarity with the target user ID greater than a preset similarity threshold among the N brush groups
- the target user ID is the ID of the user who swipes.
- an embodiment of the present application provides a server, including a processor and a memory, the memory is used to store one or more programs, and the one or more programs are configured to be executed by the processor.
- the program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
- an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
- embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect.
- the computer program product may be a software installation package.
- the user identification method described in the embodiment of the present application specifically includes the following steps: when the target user ID needs to be identified, it is determined whether there are identified N brushing groups, and the N brushing groups are According to the group user rule classification, any one of the N brushing groups contains a brushing user ID greater than a preset number threshold, and N is a positive integer; if it exists, the input characteristics of the target user ID are obtained, and the input characteristics include users Location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics; based on the input characteristics of the target user ID to identify the similarity between the target user ID and each of the N brush groups; if N There is a scraping group whose similarity with the target user ID is greater than a preset similarity threshold in the scraping group, and the target user ID is determined as the scraping user ID.
- the target user when the target user ID is used for user identification, the target user can be recognized for similarity with the identified swiping group. If the similarity is greater than the preset similarity threshold, the target user ID can be directly identified In order to swipe the user ID, since the swiping users often have the characteristics of the swiping group, the similarity recognition with the swiping group can quickly and accurately determine whether the target user is the swiping user ID, thereby improving the recognition of the swiping user Accuracy.
- FIG. 1 is a schematic flowchart of a user identification method disclosed in an embodiment of the present application
- FIG. 2 is a schematic flowchart of another user identification method disclosed in an embodiment of the present application.
- FIG. 3 is a schematic flow chart of an algorithm for identifying users who swipe credits disclosed in an embodiment of the present application
- FIG. 4 is a schematic flowchart of another user identification method disclosed in an embodiment of the present application.
- Fig. 5 is a schematic structural diagram of a user identification device disclosed in an embodiment of the present application.
- Fig. 6 is a schematic structural diagram of a server disclosed in an embodiment of the present application.
- the mobile terminals involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc.
- UE User Equipment
- MS Mobile Station
- terminal device terminal device
- FIG. 1 is a schematic flowchart of a user identification method disclosed in an embodiment of the present application. As shown in FIG. 1, the user identification method includes the following steps.
- the server determines whether there are identified N scraping groups.
- the N scraping groups are classified according to the group user rules, and any one of the N scraping groups is obtained.
- the number of brush user IDs included in the group is greater than the preset number threshold, and N is a positive integer.
- the server serves the client, and the content of the service includes providing resources to the client and storing client data.
- the server is a targeted service program, and the device running the server can be called a server.
- the server can establish connections with multiple clients at the same time, and can provide services to multiple clients at the same time.
- the server can be used to identify the user ID of the swiping amount.
- the client, content provider, and server can form a content distribution system.
- the client is a content distribution client.
- the client can provide a display interface for displaying various content resources. Different content resources occupy different positions of the display interface.
- the content distribution system will count the clicks or clicks of each content resource of each client.
- the amount of downloads is determined to be displayed in different positions of the display interface of the client according to the amount of clicks or downloads of each content resource.
- the server displays the content of the content provider on the display interface of the client.
- the content resources can be application resources, audio and video resources, etc. The following is an example of APP resources.
- the content distribution system usually counts the number of clicks or downloads of various apps, and displays the apps in different locations of the content distribution platform (ie, the client) based on the statistical data, recommends apps with high downloads to users, and gives Dedicated resources form a list to operate.
- the APP producer ie, the content provider
- APP publishers may use a swiping application to swipe the clicks or downloads of the APP.
- the APP publisher sends a swipe task request through the swipe application, and the terminal installed with the swipe application obtains the swipe task request. Then, the terminal uses the installed swipe application to generate users who do not really exist, that is, the swipe
- the terminal uses the installed swipe application to generate users who do not really exist, that is, the swipe
- the apps recommended based on the unreal APP clicks or download traffic data may not be high-quality apps. Thereby affecting the user's trust in the content distribution platform.
- the content distribution platform needs to identify which of the users who click or watch a certain APP are users who are scam users.
- the server In order to identify whether the target user ID is the user ID of the scalping user, the server first determines whether there are N scalping groups that have been identified.
- the N scalping groups are classified according to the group user rules.
- Group user rules can be based on the location of the device corresponding to the user ID, the time series used by the application corresponding to the user ID, the cumulative use time of the application corresponding to the user ID, the frequency of use of the application corresponding to the user ID, and the application corresponding to the user ID. The usage time ratio of all applications with the client is determined.
- the device location can be the same, the time series are similar, the cumulative usage time is greater than a certain duration threshold (for example, 2 hours), the usage frequency is greater than a certain frequency threshold (for example, 100 times), the application corresponding to the user ID and all of the client
- a certain ratio threshold for example, 80%
- the brush amount user ID included in the brush amount group is greater than the preset number threshold, and the preset number threshold can be set in advance and stored in the memory (for example, non-volatile memory) of the server.
- the preset number threshold may be an integer greater than or equal to 2, for example, the preset number threshold may be set to 5.
- the group user rule is determined based on the location of the device corresponding to the user ID and the time sequence used by the application corresponding to the user ID.
- the server determines that the distance between the corresponding device positions in the plurality of user IDs that have been identified is less than the preset distance threshold, and the time sequence used by the application in the plurality of user IDs is in the first preset time period.
- the user ID of the amount of brushing within is classified into the first type of brushing group.
- the server before the server determines whether there are identified N swiping groups, the server can use group user rules to classify the multiple swiping user IDs that have been identified, and the multiple swiping groups that have been identified can be classified.
- the distance between the corresponding device positions in the user ID is less than the preset distance threshold, and the time series used by the application in the plurality of user IDs in the first preset time period are classified into the same user ID Class brush amount group.
- the time series used by the application is the data used by the APP that substitutes the time label, that is, a time series label is recorded for each operation of the APP, which is used to record the operation time of the APP. Because group users will concentrate on a certain period of time, the APP time training of users of the same group has a high degree of similarity.
- the embodiment of the present application can classify the user ID of the user according to the distance between the corresponding device position in the user ID of the user and the similarity of the time series used by the application in the user ID of the user, thereby improving the classification of the user ID of the user. Accuracy.
- the server obtains the input characteristics of the target user ID.
- the input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics.
- the server acquiring the input feature of the target user ID may specifically be: the server extracting the input feature of the target user ID from the historical behavior data of the target user ID.
- the historical behavior data of the target user ID may include the location information of the device logged in by the target user ID within a preset time period, the APP usage information of the target user ID, the usage information of the device logged in by the target user ID, and the target user ID CTR characteristics.
- the embodiment of the present application adds the location characteristics of the user as one of the considerations.
- the simulation will be similar to the terminal operation, but because of the swipe task, the open rate and use time of the swipe task content will be longer, but the use of other apps will be shorter.
- this embodiment of the application will examine the frequency and duration of use of commonly used APPs by the terminal and the time distribution of the entire terminal using APPs, so the user APP use characteristics are added as one of the considerations.
- the operation behavior of the terminal will also be different, such as whether there is a call history, whether a card is inserted, whether there is a terminal use behavior such as short message reception, so the user terminal use characteristics are added as One of the considerations. Since the indicator of the final success of the brush volume is the exposure click rate or download rate or the success rate of a certain behavior, the click volume of tasks related to CTR will be higher and more significant than other users, so the user CTR feature is also used as One of the considerations.
- the user location feature includes the location feature of the device where the target user ID is logged in (including the location of the device when the user ID is logged in, the range of change in the location of the device, etc.).
- the location feature of the device where the target user ID is logged in including the location of the device when the user ID is logged in, the range of change in the location of the device, etc.
- the user APP usage characteristics include the usage time of the target APP logged in by the user ID, the usage frequency of the target APP, and the usage time distribution of the target APP.
- the longer the use time of the target APP logged in by the user ID the higher the use frequency of the target APP, and the more concentrated the use time distribution of the target APP, the greater the probability that the user ID is a swiping user.
- the user equipment usage characteristics include the usage characteristics of the device logged in by the target user ID (for example, whether the device has a call record, whether a card is inserted, whether there is a short message reception, etc. during the login process of the target user ID). Generally speaking, if the device has no call history, no card inserted, and no short message reception during the login process of the target user ID, the greater the possibility that the user ID is a swiping user.
- CTR refers to searching after entering keywords in a search engine, and then sorting out relevant web pages in order according to factors such as bidding, and then users will choose the websites they are interested in and click into them; the total number of searches for a website is taken as the total The number of times, the ratio of the number of times a user clicks and enters the website to the total number of times is called click-through rate.
- click-through rate the ratio of the number of times a user clicks and enters the website to the total number of times.
- the server recognizes the similarity between the target user ID and each of the N brush groups based on the input characteristics of the target user ID.
- each of the N brushing groups will have group-common characteristics.
- the common characteristics of the group include similar group positions and similar time series used by group applications.
- the server can calculate the similarity between the user location feature of the target user ID and the location feature of each of the N brush groups, and calculate the time series used by the application of the target user ID and the N brush groups The time similarity of the time series used by the group application of each brush group in the group; according to the location feature similarity of the group location characteristics of each brush group in the N brush groups and each brush group in the N brush groups The time similarity of the time series used by the group application of the quantity group determines the similarity between the target user ID and each of the N quantity groups.
- the server determines the target user ID as the scraping user ID.
- the target user ID is classified into the target swiping group, and the target user ID is determined as the swiping user ID.
- the target user when the target user ID is identified, the target user can be identified with the identified swiping group. If the similarity is greater than the preset similarity threshold, the target user ID can be directly identified In order to swipe the user ID, since the swiping users often have the characteristics of the swiping group, the similarity recognition with the swiping group can quickly and accurately determine whether the target user is the swiping user ID, thereby improving the recognition of the swiping user Accuracy.
- FIG. 2 is a schematic flowchart of another user identification method disclosed in an embodiment of the present application.
- Fig. 2 is obtained by further optimization on the basis of Fig. 1.
- the user identification method includes the following steps.
- the server determines whether there are identified N scraping groups.
- the N scraping groups are classified according to group user rules, and any one of the N scraping groups is scraped.
- the number of brush user IDs included in the group is greater than the preset number threshold, and N is a positive integer.
- the server obtains the input characteristics of the target user ID.
- the input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics.
- the server identifies the similarity between the target user ID and each of the N brush groups based on the input characteristics of the target user ID.
- the server determines that the target user ID is the scraping user ID.
- step 201 to step 204 in the embodiment of the present application, reference may be made to the description of step 101 to step 104 shown in FIG. 1, which will not be repeated here.
- the server If there is no brush group whose similarity with the target user ID is greater than the preset similarity threshold among the N brush groups, the server inputs the input characteristics of the target user ID into the trained binary classification model to obtain the target user ID Enter the preliminary classification result of the feature.
- the server inputs the preliminary classification results into the trained classifier for calculation to obtain intermediate calculation results, and inputs the intermediate calculation results into the trained neural network model for training, to obtain the identification result of the target user ID.
- the two-classification model can adopt a multi-algorithm fusion method.
- the two-classification model can specifically include k-Nearest Neighbor (KNN) classification algorithm, logistic regression (LR) algorithm, and support vector machine (Support Vector (Machine, SVM) algorithm of one or more combinations of two classification models.
- KNN k-Nearest Neighbor
- LR logistic regression
- SVM Support Vector machine
- the classifier may include an extreme gradient boosting (eXtreme Gradient Boosting, XGboost) classifier or a random forest classifier.
- extreme gradient boosting eXtreme Gradient Boosting, XGboost
- random forest classifier eXtreme Gradient Boosting, XGboost
- FIG. 3 is a schematic flow chart of an algorithm for recognizing a swiping user disclosed in an embodiment of the present application.
- the input features of the target user are first input into the two classifiers.
- the KNN classification algorithm, LR algorithm, and SVM algorithm in the two classifiers are single algorithms, which are used to classify the input features of the target user;
- the intermediate results of the classifier classification are input to the classifier.
- the XGboost and random forest in the classifier are fusion algorithms used for preliminary calculation of the intermediate results output by the two classifiers; then the intermediate results of the classifier classification are input to the neural network model After training, the recognition result of the target user is finally obtained.
- There are only two types of recognition results for target users that is, whether it is a scalping user or not a scalping user.
- the identification process of the target user ID in the embodiments of this application successively adopts a single algorithm, a fusion algorithm, and a neural network.
- a single algorithm can preliminarily classify input features and reduce the computational complexity of subsequent fusion algorithms.
- the fusion algorithm takes into account the number of users who brush This possibility can ensure the accuracy of the calculation results of the fusion algorithm.
- the neural network model is used for training to reduce the possibility of misjudgment, thereby improving the accuracy of the recognition result of the target user ID.
- step 205 the following steps may be performed:
- the server extracts the input feature of the first user ID, the first user ID is any one of M user IDs to be identified, and M is a positive integer;
- the server uses the single-user rule to identify the ID of the user who is credited and the ID of the user that is not of the M user IDs to be identified, and P is a positive integer less than or equal to M;
- the server inputs the input features of the M to-be-identified user IDs into the initial binary classification model for training, and obtains M training results;
- the server determines that the initial two-classification model after training is a trained two-classification model.
- the M user IDs to be identified can be identified by the single user rule.
- the M user IDs to be identified can all be used to identify whether they are credit users through a single user rule.
- Single user rules can include the following rules: (1) The same user ID can log in on multiple terminals (for example, mobile phones) in a short time; (2) There are multiple user IDs on one terminal for registration and login at the same time; (3) One terminal Continuous access to the same URL or the number of visits far exceeds that of ordinary users.
- Each of the M user IDs to be identified either satisfies the above three single user rules at the same time, or does not satisfy the above single user rules.
- the user ID that meets the above three single-user rules at the same time is the swiping user ID
- the user ID in the M to-be-identified user IDs that does not meet any of the three single-user rules is the non-swiping user ID . That is, the user IDs among the M user IDs to be identified can all be identified by the single user rule as to whether they are credit users.
- the user IDs of the users who want to be identified as the black samples of the two-classification model training, and the non-user IDs of the M user IDs to be identified are the white samples for the training of the two-class model to ensure the initial training of the two-class model
- the accuracy of the data improves the training effect of the two-class model.
- the value of M can be as large as possible.
- This embodiment of the application provides a method for training a two-class model.
- a single-user rule is used to identify users who scribbled, and some more accurate users with scribbling are identified as black samples, and other normal users are used as white samples.
- the two-classification model make predictions and count the accuracy of the prediction results.
- the training results are wrong, the two-classification model will be adjusted accordingly so that the two-classification model will not have the same error next time.
- the accuracy of the two-classification model reaches the first preset accuracy threshold, the training is stopped, and the initial two-classification model after training is determined to be the trained two-classification model.
- step 206 the following steps may be performed:
- the server inputs the M training results into the initial classifier for calculation, and obtains M intermediate calculation results;
- the server determines that the trained initial classifier is a trained classifier.
- the embodiment of the present application provides a method for training a classifier. According to the previously identified users who are more accurate and use as black samples, and other normal users are trained as white samples, a classifier with higher accuracy can be obtained.
- step 206 the following steps may be performed:
- the server inputs the M intermediate calculation results into the initial neural network model for training, and obtains M recognition results;
- the server determines that the trained initial neural network model is a trained neural network model.
- the embodiment of the application provides a method for training a neural network model. According to the previously identified users with more accurate brushing as black samples and other normal users as white samples for training, a neural network with higher accuracy can be obtained. model.
- FIG. 4 is a schematic flowchart of another user identification method disclosed in an embodiment of the present application.
- Figure 4 is further optimized on the basis of Figure 2.
- the user identification method includes the following steps.
- the server determines whether there are identified N scraping groups.
- the N scraping groups are classified according to group user rules, and any one of the N scraping groups is scraped.
- the number of brush user IDs included in the group is greater than the preset number threshold, and N is a positive integer.
- the server obtains the input characteristics of the target user ID.
- the input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics.
- the server recognizes the similarity between the target user ID and each of the N brush groups based on the input characteristics of the target user ID.
- the server determines that the target user ID is the scraping user ID.
- the server inputs the input characteristics of the target user ID into the trained binary classification model to obtain the target user ID Enter the preliminary classification result of the feature.
- the server inputs the preliminary classification results into the trained classifier for calculation to obtain intermediate calculation results, and inputs the intermediate calculation results into the trained neural network model for training, to obtain the identification result of the target user ID.
- step 401 to step 406 can refer to step 201 to step 206 shown in FIG. 2, which will not be repeated here.
- the server determines whether there are multiple identified swipe user IDs.
- the server identifies the similarity between the target user ID and the identified multiple credit user IDs.
- step 409 If there is a swipe user ID whose similarity to the target user ID is greater than the preset similarity threshold among the multiple swipe user IDs, the server adds the swipe user-related feature to the input features of the target user ID; and step 405 is executed
- the middle server inputs the input features of the target user ID into the trained two-classification model to obtain the preliminary classification results of the input features of the target user ID.
- the target user ID and a single identified swiping user ID can be calculated for similarity.
- the similarity analysis algorithm can be used to judge the user, and the similarity between the target user ID and the credit user ID can be calculated to increase the input characteristics of the target user ID, thereby improving The accuracy of the identification of the target user ID further determines whether the target user is a real credit user.
- the embodiment of the present application may also use an unsupervised algorithm to complete group swipe identification, and use a clustering algorithm or a lonely forest algorithm to identify abnormal users in the group.
- the server includes hardware structures and/or software modules corresponding to each function.
- the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
- the embodiment of the present application may divide the server side into functional units according to the foregoing method examples.
- each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
- FIG. 5 is a schematic structural diagram of a user identification device disclosed in an embodiment of the present application.
- the user identification device 500 includes a first determination unit 501, an acquisition unit 502, an identification unit 503, and a second determination unit 504, wherein:
- the first determining unit 501 is configured to determine whether there are identified N swipe groups when the target user ID needs to be identified.
- the N swipe groups are classified according to group user rules. Any one of the N brush groups contains brush user IDs greater than the preset number threshold, and N is a positive integer;
- the acquiring unit 502 is configured to acquire the input characteristics of the target user ID when the first determining unit 501 determines that there are N swipe groups that have been identified, and the input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics;
- the identification unit 503 is configured to identify the similarity between the target user ID and each of the N brush amount groups based on the input characteristics of the target user ID;
- the second determining unit 504 is configured to determine when the identifying unit 503 recognizes that there is a brushing group whose similarity with the target user ID is greater than a preset similarity threshold among the N brushing groups
- the target user ID is the ID of the user who swipes.
- the user identification device 500 may further include a processing unit 505505.
- the processing unit 505 is configured to: if the identification unit 503 recognizes that there is no brush group whose similarity with the target user ID is greater than a preset similarity threshold among the N brush groups, The input feature of the target user ID is input into a trained binary classification model to obtain a preliminary classification result of the input feature of the target user ID;
- the processing unit 505 is further configured to input the preliminary classification result into the trained classifier for calculation to obtain an intermediate calculation result, and input the intermediate calculation result into the trained neural network model for training to obtain the target user ID recognition result.
- the processing unit 505 is further configured to input the input feature of the target user ID into the trained binary classification model, and extract the first user ID before obtaining the preliminary classification result of the input feature of the target user ID
- the first user ID is any one of the M user IDs to be identified, and M is a positive integer
- the single-user rule is used to identify the swipe user ID and the non-swipe amount among the M user IDs to be identified User ID
- the initial two-classification model after training is a trained two-classification model.
- the processing unit 505 is further configured to input the preliminary classification result into the trained classifier for calculation, and before obtaining the intermediate calculation result, input the M training results into the initial classifier for calculation to obtain M Intermediate calculation results; when the accuracy of the M intermediate calculation results reaches a second preset accuracy threshold, it is determined that the initial classifier after training is a trained classifier.
- processing unit 505 is further configured to input the intermediate calculation results into the trained neural network model for training, and before the identification result of the target user ID is obtained, input the M intermediate calculation results into the initial The neural network model is trained to obtain M recognition results;
- the initial neural network model after training is determined to be the trained neural network model.
- the group user rule is determined based on the location of the device corresponding to the user ID and the time series used by the application corresponding to the user ID, and the processing unit 505 is further configured to determine whether there is an existing user ID in the first determining unit 501.
- the distance between the corresponding device positions in the multiple swiping user IDs that have been identified is smaller than the preset distance threshold, and the time series of application usage in the multiple swiping user IDs
- the user IDs of the users who swiped during the first preset time period are classified into the first type of swipe groups.
- the processing unit 505 is further configured to determine whether there are multiple identified user IDs when the first determining unit 501 determines that there are no identified N crediting groups; if There is the plurality of scoring user IDs that have been identified, and the similarity between the target user ID and the plurality of scoring user IDs that have been identified; The target user ID similarity is greater than the preset similarity threshold for the swipe user ID, add swipe user-related features in the input features of the target user ID; input the target user ID input features into the trained two categories The model obtains the preliminary classification result of the input feature of the target user ID.
- the first determining unit 501, acquiring unit 502, identifying unit 503, second determining unit 504, and processing unit 505 in FIG. 5 may be processors.
- the target user when the target user ID is user identification, the target user can be identified with the identified brush group for similarity. If the similarity is greater than the preset similarity threshold, it can be directly identified
- the target user ID is the user ID of the scouring user. Since the scouring user often has the characteristics of the scouring group, the identification of the similarity with the scouring group can quickly and accurately determine whether the target user is the scouring user ID, thereby improving the scoring Measure the user’s recognition accuracy.
- FIG. 6 is a schematic structural diagram of a server disclosed in an embodiment of the present application.
- the server 600 includes a processor 601 and a memory 602.
- the server 600 may also include a bus 603.
- the processor 601 and the memory 602 may be connected to each other through the bus 603.
- the bus 603 may be a peripheral component. Connect the standard (Peripheral Component Interconnect, referred to as PCI) bus or extended industry standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus, etc.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the bus 603 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG.
- the server 600 may further include an input communication interface 604, and the communication interface 604 may obtain data from an external device (for example, other servers or databases).
- the memory 602 is used to store one or more programs containing instructions; the processor 601 is used to call the instructions stored in the memory 602 to execute some or all of the method steps in FIGS. 1 to 4.
- the target user when the target user ID is identified, the target user can be identified with the identified brush group for similarity. If the similarity is greater than the preset similarity threshold, it can be directly identified
- the target user ID is the user ID of the scalping user. Since the scalping user often has the characteristics of the scalping group, the similarity recognition with the scalping group can quickly and accurately determine whether the target user is the scalping user ID, thereby increasing the scalping volume The accuracy of user recognition.
- An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes the computer to execute any part of the user identification method described in the above method embodiment Or all steps.
- the embodiments of the present application also provide a computer program product.
- the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
- the computer program is operable to cause a computer to execute any of the methods described in the foregoing method embodiments. Part or all of the steps of a user identification method.
- the disclosed device may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory.
- the technical solution of the present invention essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention.
- the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
- the program can be stored in a computer-readable memory, and the memory can include: flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disc, etc.
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Disclosed in embodiments of the present application are a user identification method and a related product. The method comprises: when user identification on a target user ID is required, determining whether there are N fake traffic creating populations that have been identified, the N fake traffic creating populations being classified according to population user rules, and fake traffic creating user IDs comprised in any one of the N fake traffic creating populations being greater than a preset number threshold; if yes, obtaining input features of the target user ID, the input features comprising a user location feature, a user APP usage feature, a user equipment usage feature, and a user click through rate (CTR) feature; identifying a similarity between the target user ID and each of the N fake traffic creating populations on the basis of the input features of the target user ID; and if there is a fake traffic creating population in the N fake traffic creating populations whose similarity to the target user ID is greater than a preset similarity threshold, determining that the target user ID is a fake traffic creating user ID. The embodiments of the present application can improve the identification accuracy of fake traffic creating users.
Description
本申请涉及通信技术领域,具体涉及一种用户识别方法及相关产品。This application relates to the field of communication technology, and specifically to a user identification method and related products.
在资源展示平台,在重要的位置推给用户有用的资源,才会让用户价值感知越大,资源的效果得到更好。当前,随着展示位资源越来越有限,较好位置的资源展示的指标是随着它的用户点击或者用户行为而越来越好的。为了得到更多的点击率,有些内容制作者就会利用刷量的方式去获取,一方面为自己获取更好的位置资源,一方面能够得到更多真实用户的曝光。但从资源展示平台角度来讲,会造成平台对资源的不公平,用户对平台也会产生不信任。因此,如何识别刷量用户成为亟待解决的问题。On the resource display platform, the more useful resources are pushed to users in important positions, the greater the user's perception of value and the better the effect of resources. Currently, as the display resources become more and more limited, the indicator of the resource display in a better position is getting better and better as its user clicks or user behavior. In order to get more click-through rates, some content producers will use the method of brushing to obtain them. On the one hand, they can obtain better location resources for themselves, and on the other hand, they can get more exposure from real users. But from the perspective of the resource display platform, it will cause the platform to be unfair to resources, and users will also have mistrust of the platform. Therefore, how to identify the users who brushed the amount has become an urgent problem to be solved.
当前的刷量用户识别主要是对用户逐个识别,比如一个账号在多个手机上登录,一个手机上有多个账号进行注册登陆,一个手机对同一个网址进行持续不间断访问或者访问次数超过普通用户等手段的识别。目前的刷量用户识别的准确度较低。The current user identification of swiping users is mainly to identify users one by one. For example, one account is logged in on multiple mobile phones, and there are multiple accounts on one mobile phone for registration and login, and one mobile phone has continuous and uninterrupted access to the same URL or the number of visits exceeds normal. Identification of users and other means. Currently, the accuracy of user identification for swiping is low.
发明内容Summary of the invention
本申请实施例提供了一种用户识别方法及相关产品,可以提高刷量用户的识别准确度。The embodiments of the present application provide a user identification method and related products, which can improve the identification accuracy of a user who has swiped.
第一方面,本申请实施例提供一种用户识别方法,包括:In the first aspect, an embodiment of the present application provides a user identification method, including:
当需要对目标用户ID进行用户识别时,确定是否存在已识别的N个刷量群体,所述N个刷量群体是按照群体用户规则分类得到,所述N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数;When user identification of the target user ID is required, determine whether there are identified N brush groups, the N brush groups are classified according to group user rules, any one of the N brush groups The ID of the users who brush the amount included in the group is greater than the preset number threshold, and N is a positive integer;
若存在,获取所述目标用户ID的输入特征,所述输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征;If it exists, obtain the input characteristics of the target user ID, the input characteristics including user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics;
基于所述目标用户ID的输入特征识别所述目标用户ID与所述N个刷量群体中每个刷量群体的相似度;Identifying the similarity between the target user ID and each of the N brushing groups based on the input characteristics of the target user ID;
若所述N个刷量群体中存在与所述目标用户ID相似度大于预设相似度阈值的刷量群体,确定所述目标用户ID为刷量用户ID。If there is a brushing group whose similarity with the target user ID is greater than a preset similarity threshold among the N brushing groups, it is determined that the target user ID is a brushing user ID.
第二方面,本申请实施例提供了一种用户识别装置,所述用户识别装置包括第一确定 单元、获取单元、识别单元和第二确定单元,其中:In a second aspect, an embodiment of the present application provides a user identification device, the user identification device including a first determining unit, an acquiring unit, an identifying unit, and a second determining unit, wherein:
所述第一确定单元,用于当需要对目标用户ID进行用户识别时,确定是否存在已识别的N个刷量群体,所述N个刷量群体是按照群体用户规则分类得到,所述N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数;The first determining unit is configured to determine whether there are identified N brushing groups when the target user ID needs to be identified. The N brushing groups are classified according to group user rules. The user ID contained in any one of the two groups is greater than the preset number threshold, and N is a positive integer;
所述获取单元,用于在所述第一确定单元确定存在已识别的N个刷量群体的情况下,获取所述目标用户ID的输入特征,所述输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征;The acquiring unit is configured to acquire the input characteristics of the target user ID when the first determining unit determines that there are N swiping groups that have been identified, and the input characteristics include user location characteristics and user APP usage Features, user equipment usage features and user click-through rate CTR features;
所述识别单元,用于基于所述目标用户ID的输入特征识别所述目标用户ID与所述N个刷量群体中每个刷量群体的相似度;The identification unit is configured to identify the similarity between the target user ID and each of the N brush amount groups based on the input characteristics of the target user ID;
所述第二确定单元,用于在所述识别单元识别到所述N个刷量群体中存在与所述目标用户ID相似度大于预设相似度阈值的刷量群体的情况下,确定所述目标用户ID为刷量用户ID。The second determining unit is configured to determine the brush group that has a similarity with the target user ID greater than a preset similarity threshold among the N brush groups The target user ID is the ID of the user who swipes.
第三方面,本申请实施例提供一种服务端,包括处理器、存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序被配置成由所述处理器执行,上述程序包括用于执行本申请实施例第一方面中的步骤的指令。In a third aspect, an embodiment of the present application provides a server, including a processor and a memory, the memory is used to store one or more programs, and the one or more programs are configured to be executed by the processor. The program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。In a fifth aspect, embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect. The computer program product may be a software installation package.
可以看出,本申请实施例中所描述的用户识别方法,具体包括如下步骤:当需要对目标用户ID进行用户识别时,确定是否存在已识别的N个刷量群体,N个刷量群体是按照群体用户规则分类得到,N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数;若存在,获取目标用户ID的输入特征,输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征;基于目标用户ID的输入特征识别目标用户ID与N个刷量群体中每个刷量群体的相似度;若N个刷量群体中存在与目标用户ID相似度大于预设相似度阈值的刷量群体,确定目标用户ID为刷量用户 ID。实施本申请实施例,对目标用户ID进行用户识别时,可以将该目标用户与已识别的刷量群体进行相似度识别,如果相似度大于预设相似度阈值,则可直接认定该目标用户ID为刷量用户ID,由于刷量用户往往具有群体刷量的特性,通过与刷量群体的相似度识别可以快速准确的确定该目标用户是否为刷量用户ID,从而提高了刷量用户的识别准确度。It can be seen that the user identification method described in the embodiment of the present application specifically includes the following steps: when the target user ID needs to be identified, it is determined whether there are identified N brushing groups, and the N brushing groups are According to the group user rule classification, any one of the N brushing groups contains a brushing user ID greater than a preset number threshold, and N is a positive integer; if it exists, the input characteristics of the target user ID are obtained, and the input characteristics include users Location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics; based on the input characteristics of the target user ID to identify the similarity between the target user ID and each of the N brush groups; if N There is a scraping group whose similarity with the target user ID is greater than a preset similarity threshold in the scraping group, and the target user ID is determined as the scraping user ID. In the implementation of the embodiments of this application, when the target user ID is used for user identification, the target user can be recognized for similarity with the identified swiping group. If the similarity is greater than the preset similarity threshold, the target user ID can be directly identified In order to swipe the user ID, since the swiping users often have the characteristics of the swiping group, the similarity recognition with the swiping group can quickly and accurately determine whether the target user is the swiping user ID, thereby improving the recognition of the swiping user Accuracy.
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1是本申请实施例公开的一种用户识别方法的流程示意图;FIG. 1 is a schematic flowchart of a user identification method disclosed in an embodiment of the present application;
图2是本申请实施例公开的另一种用户识别方法的流程示意图;2 is a schematic flowchart of another user identification method disclosed in an embodiment of the present application;
图3是本申请实施例公开的一种刷量用户识别的算法流程示意图;FIG. 3 is a schematic flow chart of an algorithm for identifying users who swipe credits disclosed in an embodiment of the present application;
图4是本申请实施例公开的另一种用户识别方法的流程示意图;4 is a schematic flowchart of another user identification method disclosed in an embodiment of the present application;
图5是本申请实施例公开的一种用户识别装置的结构示意图;Fig. 5 is a schematic structural diagram of a user identification device disclosed in an embodiment of the present application;
图6是本申请实施例公开的一种服务端的结构示意图。Fig. 6 is a schematic structural diagram of a server disclosed in an embodiment of the present application.
为了使本技术领域的人员更好地理解本发明方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实 施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a specific feature, structure or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present invention. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
本申请实施例所涉及到的移动终端可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。为方便描述,上面提到的设备统称为移动终端。The mobile terminals involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc. For ease of description, the devices mentioned above are collectively referred to as mobile terminals.
下面对本申请实施例进行详细介绍。The following describes the embodiments of the present application in detail.
请参阅图1,图1是本申请实施例公开的一种用户识别方法的流程示意图,如图1所示,该用户识别方法包括如下步骤。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a user identification method disclosed in an embodiment of the present application. As shown in FIG. 1, the user identification method includes the following steps.
101,当需要对目标用户ID进行用户识别时,服务端确定是否存在已识别的N个刷量群体,N个刷量群体是按照群体用户规则分类得到,N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数。101. When the target user ID needs to be identified, the server determines whether there are identified N scraping groups. The N scraping groups are classified according to the group user rules, and any one of the N scraping groups is obtained. The number of brush user IDs included in the group is greater than the preset number threshold, and N is a positive integer.
本申请实施例中,服务端是为客户端服务的,服务的内容诸如向客户端提供资源,保存客户端数据等。服务端是一种有针对性的服务程序,运行服务端的设备可以称为服务器。服务端可以同时与多个客户端建立连接,可以同时为多个客户端提供服务。本申请实施例中服务端可以用来识别刷量用户ID。In the embodiments of the present application, the server serves the client, and the content of the service includes providing resources to the client and storing client data. The server is a targeted service program, and the device running the server can be called a server. The server can establish connections with multiple clients at the same time, and can provide services to multiple clients at the same time. In the embodiment of the application, the server can be used to identify the user ID of the swiping amount.
其中,客户端、内容提供端、服务端可以组成内容分发系统。客户端为内容分发型客户端,客户端可以提供用于展示各种内容资源的展示界面,不同的内容资源占据展示界面的不同位置,内容分发系统会统计各个客户端的各个内容资源的点击量或下载量,根据各个内容资源的点击量或下载量确定在客户端的展示界面的不同位置进行展示。内容提供端可以提供的内容资源,服务端将内容提供端的内容展示在客户端的展示界面。内容提供端的数量可以有多个、客户端的数量可以有多个,服务端的数量也可以有多个。内容资源可以是应用程序APP资源、音视频资源等。下面以APP资源为例进行说明。Among them, the client, content provider, and server can form a content distribution system. The client is a content distribution client. The client can provide a display interface for displaying various content resources. Different content resources occupy different positions of the display interface. The content distribution system will count the clicks or clicks of each content resource of each client. The amount of downloads is determined to be displayed in different positions of the display interface of the client according to the amount of clicks or downloads of each content resource. For content resources that can be provided by the content provider, the server displays the content of the content provider on the display interface of the client. There can be multiple content providers, multiple clients, and multiple servers. The content resources can be application resources, audio and video resources, etc. The following is an example of APP resources.
内容分发系统通常会统计各类APP的点击量或者下载量,并根据统计到的数据在内容分发平台(即,客户端)的不同位置展示APP,向用户推荐下载量高的APP,并且会给专门的资源形成榜单去运营。基于此,APP制作者(即,内容提供端)希望APP能够获得较高的点击量或者下载量,从而使APP能够得到一个较好的展示位置或者能够被内容分发平台推荐给用户。为了能够获得较高的点击量或者下载量,APP发布者可能会利用刷量应用程序对APP的点击量或者下载量进行刷量。APP发布者通过刷量应用程序发送刷量任务请 求,安装有刷量应用程序的终端获取该刷量任务请求,进而,终端利用安装的刷量应用程序生成并不真实存在的用户,即刷量用户,对需要刷量的APP进行点击或者观看,从而增加APP的点击量或者下载量。采用这种不真实的APP点击量或者下载量进行某些决策时,往往会给内容分发平台带来很多不利影响,根据不真实的APP点击量或者下载量流量数据推荐的APP可能并非优质APP,从而影响用户对于内容分发平台的信任度。为了减少不真实的APP点击量或者下载量带来的负面影响,内容分发平台需要识别点击或者观看某一APP的用户中哪些用户是刷量用户。The content distribution system usually counts the number of clicks or downloads of various apps, and displays the apps in different locations of the content distribution platform (ie, the client) based on the statistical data, recommends apps with high downloads to users, and gives Dedicated resources form a list to operate. Based on this, the APP producer (ie, the content provider) hopes that the APP can obtain a higher amount of clicks or downloads, so that the APP can get a better display position or can be recommended to users by the content distribution platform. In order to obtain a higher number of clicks or downloads, APP publishers may use a swiping application to swipe the clicks or downloads of the APP. The APP publisher sends a swipe task request through the swipe application, and the terminal installed with the swipe application obtains the swipe task request. Then, the terminal uses the installed swipe application to generate users who do not really exist, that is, the swipe The user clicks or watches the APP that needs to be swiped, thereby increasing the number of clicks or downloads of the APP. When making certain decisions with such unreal APP clicks or downloads, it often brings a lot of adverse effects to the content distribution platform. The apps recommended based on the unreal APP clicks or download traffic data may not be high-quality apps. Thereby affecting the user's trust in the content distribution platform. In order to reduce the negative impact of unreal APP clicks or downloads, the content distribution platform needs to identify which of the users who click or watch a certain APP are users who are scam users.
为了识别目标用户ID是否为刷量用户ID,服务端首先确定是否存在已识别的N个刷量群体,N个刷量群体是按照群体用户规则分类得到。群体用户规则可以基于用户ID对应的设备位置、用户ID对应的应用程序使用的时间序列、用户ID对应的应用程序的累计使用时长、用户ID对应的应用程序的使用频次、用户ID对应的应用程序与客户端的所有应用程序的使用时长比例确定。比如,可以将设备位置相同、时间序列相近、累计使用时长均大于一定时长阈值(比如,2小时)、使用频率大于一定频次阈值(比如,100次)、用户ID对应的应用程序与客户端的所有应用程序的使用时长比例大于一定比例阈值(比如,80%)的用户ID归入同一个刷量群体。In order to identify whether the target user ID is the user ID of the scalping user, the server first determines whether there are N scalping groups that have been identified. The N scalping groups are classified according to the group user rules. Group user rules can be based on the location of the device corresponding to the user ID, the time series used by the application corresponding to the user ID, the cumulative use time of the application corresponding to the user ID, the frequency of use of the application corresponding to the user ID, and the application corresponding to the user ID. The usage time ratio of all applications with the client is determined. For example, the device location can be the same, the time series are similar, the cumulative usage time is greater than a certain duration threshold (for example, 2 hours), the usage frequency is greater than a certain frequency threshold (for example, 100 times), the application corresponding to the user ID and all of the client The user IDs whose usage time ratio of the application is greater than a certain ratio threshold (for example, 80%) are classified into the same swiping group.
其中,刷量群体包含的刷量用户ID大于预设数量阈值,预设数量阈值可以预先进行设定并存储在服务端的存储器(比如,非易失性存储器)中。预设数量阈值可以为大于或等于2的整数,比如,预设数量阈值可以设置为5。Wherein, the brush amount user ID included in the brush amount group is greater than the preset number threshold, and the preset number threshold can be set in advance and stored in the memory (for example, non-volatile memory) of the server. The preset number threshold may be an integer greater than or equal to 2, for example, the preset number threshold may be set to 5.
可选的,所述群体用户规则基于用户ID对应的设备位置以及用户ID对应的应用程序使用的时间序列确定。在执行步骤101之前,还可以执行如下步骤:Optionally, the group user rule is determined based on the location of the device corresponding to the user ID and the time sequence used by the application corresponding to the user ID. Before performing step 101, you can also perform the following steps:
服务端将已识别的多个刷量用户ID中对应的设备位置之间的距离小于预设距离阈值、并且所述多个刷量用户ID中应用程序使用的时间序列在第一预设时间段内的刷量用户ID归入第一类刷量群体。The server determines that the distance between the corresponding device positions in the plurality of user IDs that have been identified is less than the preset distance threshold, and the time sequence used by the application in the plurality of user IDs is in the first preset time period. The user ID of the amount of brushing within is classified into the first type of brushing group.
本申请实施例中,在服务端确定是否存在已识别的N个刷量群体之前,服务端可以采用群体用户规则对已识别的多个刷量用户ID进行分类,可以将已识别的多个刷量用户ID中对应的设备位置之间的距离小于预设距离阈值、且所述多个刷量用户ID中应用程序使用的时间序列在第一预设时间段内的刷量用户ID归入同一类刷量群体。In this embodiment of the application, before the server determines whether there are identified N swiping groups, the server can use group user rules to classify the multiple swiping user IDs that have been identified, and the multiple swiping groups that have been identified can be classified. The distance between the corresponding device positions in the user ID is less than the preset distance threshold, and the time series used by the application in the plurality of user IDs in the first preset time period are classified into the same user ID Class brush amount group.
应用程序使用的时间序列是代用时间标签的APP使用的数据,即每次APP的操作都会记录一个时间序列标签,用于记录APP的操作时间。由于群体刷量用户会集中在某一个时 间段进行刷量,使得同一类群体刷量用户的APP时间训练具有较高的相似度。本申请实施例可以根据刷量用户ID中对应的设备位置之间的距离和刷量用户ID中应用程序使用的时间序列的相似度对刷量用户ID进行分类,提高了刷量用户ID分类的准确度。The time series used by the application is the data used by the APP that substitutes the time label, that is, a time series label is recorded for each operation of the APP, which is used to record the operation time of the APP. Because group users will concentrate on a certain period of time, the APP time training of users of the same group has a high degree of similarity. The embodiment of the present application can classify the user ID of the user according to the distance between the corresponding device position in the user ID of the user and the similarity of the time series used by the application in the user ID of the user, thereby improving the classification of the user ID of the user. Accuracy.
102,若存在已识别的N个刷量群体,服务端获取目标用户ID的输入特征,输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征。102. If there are N swiping groups that have been identified, the server obtains the input characteristics of the target user ID. The input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics.
本申请实施例中,服务端获取目标用户ID的输入特征具体可以为:服务端从目标用户ID的历史行为数据中提取该目标用户ID的输入特征。In the embodiment of the present application, the server acquiring the input feature of the target user ID may specifically be: the server extracting the input feature of the target user ID from the historical behavior data of the target user ID.
该目标用户ID的历史行为数据可以包括预设时间段内该目标用户ID登录的设备的位置信息、该目标用户ID的APP使用信息、该目标用户ID登录的设备的使用信息、该目标用户ID的CTR特征。The historical behavior data of the target user ID may include the location information of the device logged in by the target user ID within a preset time period, the APP usage information of the target user ID, the usage information of the device logged in by the target user ID, and the target user ID CTR characteristics.
其中,考虑到刷量用户的位置聚集型以及一些位置的偏好性和较少移动性,本申请实施例加入用户的位置特征作为考虑之一。用户在使用设备进行刷量的时候会模拟跟终端操作类似,但是由于存在刷量任务,那么对刷量任务内容的打开率和使用时长会较长,但是对其他APP的使用就会较短,基于这种情况下,本申请实施例会考察终端对常用APP使用的使用频次和时长以及整个终端使用APP的时间分布进行考察,所以加入用户APP使用特征作为考虑之一。同时用户刷量由于带着特定的目的性,所以在终端的操作行为上也会有不同,比如是否有通话记录,是否插卡,是否有短信接收等终端使用行为,因此加入用户终端使用特征作为其中的考虑之一。由于刷量最后成功的指标是曝光点击率或者下载率或者是某个行为的成功率,那么跟CTR相关的任务的点击量上就会比其他用户更高更显著,所以把用户CTR特征也作为其中的考虑之一。Among them, taking into account the location aggregation type of the users who swipe and the preference and less mobility of some locations, the embodiment of the present application adds the location characteristics of the user as one of the considerations. When the user uses the device to swipe, the simulation will be similar to the terminal operation, but because of the swipe task, the open rate and use time of the swipe task content will be longer, but the use of other apps will be shorter. Based on this situation, this embodiment of the application will examine the frequency and duration of use of commonly used APPs by the terminal and the time distribution of the entire terminal using APPs, so the user APP use characteristics are added as one of the considerations. At the same time, because the user swipes with a specific purpose, the operation behavior of the terminal will also be different, such as whether there is a call history, whether a card is inserted, whether there is a terminal use behavior such as short message reception, so the user terminal use characteristics are added as One of the considerations. Since the indicator of the final success of the brush volume is the exposure click rate or download rate or the success rate of a certain behavior, the click volume of tasks related to CTR will be higher and more significant than other users, so the user CTR feature is also used as One of the considerations.
用户位置特征包括该目标用户ID登录的设备的位置特征(包括该用户ID登录时,设备的位置、设备的位置变化幅度等)。一般而言,该用户ID登录时,设备的位置变化幅度越小,该用户ID为刷量用户的可能性越大。The user location feature includes the location feature of the device where the target user ID is logged in (including the location of the device when the user ID is logged in, the range of change in the location of the device, etc.). Generally speaking, when the user ID is logged in, the smaller the change in the device's position, the greater the possibility that the user ID is a swiping user.
用户APP使用特征包括该用户ID登录的目标APP的使用时长、目标APP的使用频次、目标APP的使用时间分布等。一般而言,该用户ID登录的目标APP的使用时长越长、目标APP的使用频次越高,目标APP的使用时间分布越集中,该用户ID为刷量用户的可能性越大。The user APP usage characteristics include the usage time of the target APP logged in by the user ID, the usage frequency of the target APP, and the usage time distribution of the target APP. Generally speaking, the longer the use time of the target APP logged in by the user ID, the higher the use frequency of the target APP, and the more concentrated the use time distribution of the target APP, the greater the probability that the user ID is a swiping user.
用户设备使用特征包括该目标用户ID登录的设备的使用特征(比如,该目标用户ID登录的过程中,该设备是否有通话记录、是否插卡,是否有短信接收等)。一般而言,如果 目标用户ID登录的过程中,该设备没有通话记录、没有插卡、没有短信接收,该用户ID为刷量用户的可能性越大。The user equipment usage characteristics include the usage characteristics of the device logged in by the target user ID (for example, whether the device has a call record, whether a card is inserted, whether there is a short message reception, etc. during the login process of the target user ID). Generally speaking, if the device has no call history, no card inserted, and no short message reception during the login process of the target user ID, the greater the possibility that the user ID is a swiping user.
CTR指在搜索引擎中输入关键词后进行搜索,然后按竞价等因素把相关的网页按顺序进行排列出来,然后用户会选择自己感兴趣的网站点击进去;把一个网站所有搜索出来的次数作为总次数,把用户点击并进入网站的次数占总次数的比例叫点击率。一般而言,如果目标用户ID的CTR越高,则该用户ID为刷量用户的可能性越大。CTR refers to searching after entering keywords in a search engine, and then sorting out relevant web pages in order according to factors such as bidding, and then users will choose the websites they are interested in and click into them; the total number of searches for a website is taken as the total The number of times, the ratio of the number of times a user clicks and enters the website to the total number of times is called click-through rate. Generally speaking, if the CTR of the target user ID is higher, the possibility that the user ID is a swiping user is greater.
103,服务端基于目标用户ID的输入特征识别目标用户ID与N个刷量群体中每个刷量群体的相似度。103. The server recognizes the similarity between the target user ID and each of the N brush groups based on the input characteristics of the target user ID.
本申请实施例中,N个刷量群体中每个刷量群体都会有群体共有特征。群体共有特征包括群体位置相似、群体应用程序使用的时间序列相似。In the embodiments of the present application, each of the N brushing groups will have group-common characteristics. The common characteristics of the group include similar group positions and similar time series used by group applications.
服务端可以计算目标用户ID的用户位置特征与N个刷量群体中每个刷量群体的群体位置特征的位置特征相似度,计算目标用户ID的应用程序使用的时间序列与N个刷量群体中每个刷量群体的群体应用程序使用的时间序列的时间相似度;根据N个刷量群体中每个刷量群体的群体位置特征的位置特征相似度和N个刷量群体中每个刷量群体的群体应用程序使用的时间序列的时间相似度确定目标用户ID与N个刷量群体中每个刷量群体的相似度。The server can calculate the similarity between the user location feature of the target user ID and the location feature of each of the N brush groups, and calculate the time series used by the application of the target user ID and the N brush groups The time similarity of the time series used by the group application of each brush group in the group; according to the location feature similarity of the group location characteristics of each brush group in the N brush groups and each brush group in the N brush groups The time similarity of the time series used by the group application of the quantity group determines the similarity between the target user ID and each of the N quantity groups.
104,若N个刷量群体中存在与目标用户ID相似度大于预设相似度阈值的刷量群体,服务端确定目标用户ID为刷量用户ID。104. If there is a scraping group whose similarity with the target user ID is greater than a preset similarity threshold among the N scraping groups, the server determines the target user ID as the scraping user ID.
本申请实施例中,如果N个刷量群体中存在与目标用户ID相似度大于预设相似度阈值的刷量群体,表明目标用户ID属于N个刷量群体中与目标用户ID相似度最大的目标刷量群体,则将该目标用户ID归入该目标刷量群体,并确定目标用户ID为刷量用户ID。In the embodiment of this application, if there is a brush group whose similarity to the target user ID is greater than the preset similarity threshold in the N brush amount groups, it indicates that the target user ID belongs to the N brush amount groups with the greatest similarity to the target user ID. For the target swiping group, the target user ID is classified into the target swiping group, and the target user ID is determined as the swiping user ID.
本申请实施例中,对目标用户ID进行用户识别时,可以将该目标用户与已识别的刷量群体进行相似度识别,如果相似度大于预设相似度阈值,则可直接认定该目标用户ID为刷量用户ID,由于刷量用户往往具有群体刷量的特性,通过与刷量群体的相似度识别可以快速准确的确定该目标用户是否为刷量用户ID,从而提高了刷量用户的识别准确度。In the embodiment of the present application, when the target user ID is identified, the target user can be identified with the identified swiping group. If the similarity is greater than the preset similarity threshold, the target user ID can be directly identified In order to swipe the user ID, since the swiping users often have the characteristics of the swiping group, the similarity recognition with the swiping group can quickly and accurately determine whether the target user is the swiping user ID, thereby improving the recognition of the swiping user Accuracy.
请参阅图2,图2是本申请实施例公开的另一种用户识别方法的流程示意图。图2是在图1的基础上进一步优化得到的,如图2所示,该用户识别方法包括如下步骤。Please refer to FIG. 2, which is a schematic flowchart of another user identification method disclosed in an embodiment of the present application. Fig. 2 is obtained by further optimization on the basis of Fig. 1. As shown in Fig. 2, the user identification method includes the following steps.
201,当需要对目标用户ID进行用户识别时,服务端确定是否存在已识别的N个刷量群体,N个刷量群体是按照群体用户规则分类得到,N个刷量群体中任意一个刷量群体包 含的刷量用户ID大于预设数量阈值,N为正整数。201. When the target user ID needs to be identified, the server determines whether there are identified N scraping groups. The N scraping groups are classified according to group user rules, and any one of the N scraping groups is scraped. The number of brush user IDs included in the group is greater than the preset number threshold, and N is a positive integer.
202,若存在已识别的N个刷量群体,服务端获取目标用户ID的输入特征,输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征。202. If there are N swiping groups that have been identified, the server obtains the input characteristics of the target user ID. The input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics.
203,服务端基于目标用户ID的输入特征识别目标用户ID与N个刷量群体中每个刷量群体的相似度。203. The server identifies the similarity between the target user ID and each of the N brush groups based on the input characteristics of the target user ID.
204,若N个刷量群体中存在与目标用户ID相似度大于预设相似度阈值的刷量群体,服务端确定目标用户ID为刷量用户ID。204. If there is a scraping group whose similarity with the target user ID is greater than a preset similarity threshold among the N scraping groups, the server determines that the target user ID is the scraping user ID.
其中,本申请实施例中的步骤201至步骤204的具体实施可以参见图1所示的步骤101至步骤104的描述,此处不再赘述。For the specific implementation of step 201 to step 204 in the embodiment of the present application, reference may be made to the description of step 101 to step 104 shown in FIG. 1, which will not be repeated here.
205,若N个刷量群体中不存在与目标用户ID相似度大于预设相似度阈值的刷量群体,服务端将目标用户ID的输入特征输入训练好的二分类模型,得到目标用户ID的输入特征的初步分类结果。205. If there is no brush group whose similarity with the target user ID is greater than the preset similarity threshold among the N brush groups, the server inputs the input characteristics of the target user ID into the trained binary classification model to obtain the target user ID Enter the preliminary classification result of the feature.
206,服务端将初步分类结果输入训练好的分类器进行计算,得到中间计算结果,将中间计算结果输入训练好的神经网络模型进行训练,得到目标用户ID的识别结果。206. The server inputs the preliminary classification results into the trained classifier for calculation to obtain intermediate calculation results, and inputs the intermediate calculation results into the trained neural network model for training, to obtain the identification result of the target user ID.
本申请实施例中,如果N个刷量群体中不存在与目标用户ID相似度大于预设相似度阈值的刷量群体,表明该目标用户ID不属于N个刷量群体中的任意一个。则需要利用训练好的二分类模型、训练好的分类器和训练好的神经网络模型对该目标用户ID进行识别。In the embodiment of this application, if there is no brush group whose similarity to the target user ID is greater than the preset similarity threshold among the N brush groups, it indicates that the target user ID does not belong to any of the N brush groups. It is necessary to use the trained binary classification model, the trained classifier and the trained neural network model to identify the target user ID.
其中,二分类模型可以采用多算法融合的方式,比如,二分类模型具体可以包括k最邻近(k-Nearest Neighbor,KNN)分类算法、逻辑回归(Logistic Regression,LR)算法、支持向量机(Support Vector Machine,SVM)算法的一种或多种组合的二分类模型。Among them, the two-classification model can adopt a multi-algorithm fusion method. For example, the two-classification model can specifically include k-Nearest Neighbor (KNN) classification algorithm, logistic regression (LR) algorithm, and support vector machine (Support Vector (Machine, SVM) algorithm of one or more combinations of two classification models.
分类器可以包括极端梯度提升(eXtreme Gradient Boosting,XGboost)分类器或者随机森林分类器。The classifier may include an extreme gradient boosting (eXtreme Gradient Boosting, XGboost) classifier or a random forest classifier.
举例来说,请参阅图3,图3是本申请实施例公开的一种刷量用户识别的算法流程示意图。如图3所示,首先将目标用户的输入特征输入二分类器,二分类器中的KNN分类算法、LR算法、SVM算法为单算法,用于对目标用户的输入特征进行分类;然后将二分类器分类的中间结果输入至分类器,分类器中的XGboost、随机森林为融合算法,用于对二分类器输出的中间结果进行初步计算;然后将分类器分类的中间结果输入至神经网络模型进行训练,最后得到目标用户的识别结果。目标用户的识别结果只有两种,即:是刷量用户或不是刷量用户。For example, please refer to FIG. 3, which is a schematic flow chart of an algorithm for recognizing a swiping user disclosed in an embodiment of the present application. As shown in Figure 3, the input features of the target user are first input into the two classifiers. The KNN classification algorithm, LR algorithm, and SVM algorithm in the two classifiers are single algorithms, which are used to classify the input features of the target user; The intermediate results of the classifier classification are input to the classifier. The XGboost and random forest in the classifier are fusion algorithms used for preliminary calculation of the intermediate results output by the two classifiers; then the intermediate results of the classifier classification are input to the neural network model After training, the recognition result of the target user is finally obtained. There are only two types of recognition results for target users, that is, whether it is a scalping user or not a scalping user.
本申请实施例的目标用户ID的识别过程先后采用单算法、融合算法和神经网络,单算法可以对输入特征进行初步分类,降低后续融合算法的计算复杂度,融合算法考虑了刷量用户的多种可能,可以保证融合算法的计算结果的准确性,最后采用神经网络模型进行训练,降低误判的可能性,进而提高目标用户ID的刷量识别结果的准确性。The identification process of the target user ID in the embodiments of this application successively adopts a single algorithm, a fusion algorithm, and a neural network. A single algorithm can preliminarily classify input features and reduce the computational complexity of subsequent fusion algorithms. The fusion algorithm takes into account the number of users who brush This possibility can ensure the accuracy of the calculation results of the fusion algorithm. Finally, the neural network model is used for training to reduce the possibility of misjudgment, thereby improving the accuracy of the recognition result of the target user ID.
可选的,在执行步骤205之前,还可以执行如下步骤:Optionally, before step 205 is performed, the following steps may be performed:
(11)服务端提取第一用户ID的输入特征,所述第一用户ID为M个待识别用户ID中的任一个,M为正整数;(11) The server extracts the input feature of the first user ID, the first user ID is any one of M user IDs to be identified, and M is a positive integer;
(12)服务端采用单用户规则识别所述M个待识别用户ID中的刷量用户ID和非刷量用户ID,P为小于或等于M的正整数;(12) The server uses the single-user rule to identify the ID of the user who is credited and the ID of the user that is not of the M user IDs to be identified, and P is a positive integer less than or equal to M;
(13)服务端将所述M个待识别用户ID的输入特征输入初始二分类模型进行训练,得到M个训练结果;(13) The server inputs the input features of the M to-be-identified user IDs into the initial binary classification model for training, and obtains M training results;
(14)当所述M个训练结果的准确度达到第一预设准确度阈值时,服务端确定训练后的所述初始二分类模型为训练好的二分类模型。(14) When the accuracy of the M training results reaches a first preset accuracy threshold, the server determines that the initial two-classification model after training is a trained two-classification model.
其中,M个待识别用户ID可以通过单用户规则进行识别的用户ID。M个待识别用户ID都可以通过单用户规则识别其是否为刷量用户。单用户规则可以包括如下规则:(1)同一个用户ID短时间内在多个终端(比如,手机)上登录;(2)一个终端上同时有多个用户ID进行注册登录;(3)一个终端对同一个网址进行持续不间断访问或者访问次数远远超过普通用户。Among them, the M user IDs to be identified can be identified by the single user rule. The M user IDs to be identified can all be used to identify whether they are credit users through a single user rule. Single user rules can include the following rules: (1) The same user ID can log in on multiple terminals (for example, mobile phones) in a short time; (2) There are multiple user IDs on one terminal for registration and login at the same time; (3) One terminal Continuous access to the same URL or the number of visits far exceeds that of ordinary users.
M个待识别用户ID中的每个用户ID,要么同时满足上述三条单用户规则,要么都不满足上述单用户规则。M个待识别用户ID中同时满足上述三条单用户规则的用户ID为刷量用户ID,M个待识别用户ID中不满足上述三条单用户规则中的任意一条的用户ID为非刷量用户ID。也即,M个待识别用户ID中的用户ID都可以通过该单用户规则识别是否为刷量用户。M个待识别用户ID中的刷量用户ID作为二分类模型训练的黑样本,M个待识别用户ID中的非刷量用户ID作为二分类模型训练的白样本,保证二分类模型训练的初始数据的准确性,从而提高二分类模型的训练效果。为了提高二分类模型的训练效果,M的值可以取的尽可能的大。Each of the M user IDs to be identified either satisfies the above three single user rules at the same time, or does not satisfy the above single user rules. Among the M to-be-identified user IDs, the user ID that meets the above three single-user rules at the same time is the swiping user ID, and the user ID in the M to-be-identified user IDs that does not meet any of the three single-user rules is the non-swiping user ID . That is, the user IDs among the M user IDs to be identified can all be identified by the single user rule as to whether they are credit users. The user IDs of the users who want to be identified as the black samples of the two-classification model training, and the non-user IDs of the M user IDs to be identified are the white samples for the training of the two-class model to ensure the initial training of the two-class model The accuracy of the data improves the training effect of the two-class model. In order to improve the training effect of the two-class model, the value of M can be as large as possible.
本申请实施例提供了一种二分类模型的训练方法,首先采用单用户规则识别出刷量用户,根据之前识别出的出一些较准确的刷量用户作为黑样本,其他正常用户作为白样本,作为一个二分类问题进行预测,并统计预测结果的准确性,当训练的结果出现错误时,会 对该二分类模型进行相应的调整,以使该二分类模型下次不会出现相同的错误,直到该二分类模型的准确率达到第一预设准确度阈值时,停止进行训练,确定训练后的所述初始二分类模型为训练好的二分类模型。This embodiment of the application provides a method for training a two-class model. First, a single-user rule is used to identify users who scribbled, and some more accurate users with scribbling are identified as black samples, and other normal users are used as white samples. As a two-classification problem, make predictions and count the accuracy of the prediction results. When the training results are wrong, the two-classification model will be adjusted accordingly so that the two-classification model will not have the same error next time. Until the accuracy of the two-classification model reaches the first preset accuracy threshold, the training is stopped, and the initial two-classification model after training is determined to be the trained two-classification model.
可选的,在执行步骤206之前,还可以执行如下步骤:Optionally, before step 206 is performed, the following steps may be performed:
(21)服务端将所述M个训练结果输入初始分类器进行计算,得到M个中间计算结果;(21) The server inputs the M training results into the initial classifier for calculation, and obtains M intermediate calculation results;
(22)当所述M个中间计算结果的准确度达到第二预设准确度阈值时,服务端确定训练后的所述初始分类器为训练好的分类器。(22) When the accuracy of the M intermediate calculation results reaches a second preset accuracy threshold, the server determines that the trained initial classifier is a trained classifier.
本申请实施例提供了一种分类器的训练方法,根据之前识别出的出一些较准确的刷量用户作为黑样本,其他正常用户作为白样本进行训练,可以得到准确度较高的分类器。The embodiment of the present application provides a method for training a classifier. According to the previously identified users who are more accurate and use as black samples, and other normal users are trained as white samples, a classifier with higher accuracy can be obtained.
可选的,在执行步骤206之前,还可以执行如下步骤:Optionally, before step 206 is performed, the following steps may be performed:
(31)服务端将所述M个中间计算结果输入初始神经网络模型进行训练,得到M个识别结果;(31) The server inputs the M intermediate calculation results into the initial neural network model for training, and obtains M recognition results;
(32)当所述M个识别结果的准确度达到第三预设准确度阈值时,服务端确定训练后的所述初始神经网络模型为训练好的神经网络模型。(32) When the accuracy of the M recognition results reaches the third preset accuracy threshold, the server determines that the trained initial neural network model is a trained neural network model.
本申请实施例提供了一种神经网络模型的训练方法,根据之前识别出的出一些较准确的刷量用户作为黑样本,其他正常用户作为白样本进行训练,可以得到准确度较高的神经网络模型。The embodiment of the application provides a method for training a neural network model. According to the previously identified users with more accurate brushing as black samples and other normal users as white samples for training, a neural network with higher accuracy can be obtained. model.
请参阅图4,图4是本申请实施例公开的另一种用户识别方法的流程示意图。图4是在图2的基础上进一步优化得到的,如图4所示,该用户识别方法包括如下步骤。Please refer to FIG. 4, which is a schematic flowchart of another user identification method disclosed in an embodiment of the present application. Figure 4 is further optimized on the basis of Figure 2. As shown in Figure 4, the user identification method includes the following steps.
401,当需要对目标用户ID进行用户识别时,服务端确定是否存在已识别的N个刷量群体,N个刷量群体是按照群体用户规则分类得到,N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数。401. When the target user ID needs to be identified, the server determines whether there are identified N scraping groups. The N scraping groups are classified according to group user rules, and any one of the N scraping groups is scraped. The number of brush user IDs included in the group is greater than the preset number threshold, and N is a positive integer.
402,若存在已识别的N个刷量群体,服务端获取目标用户ID的输入特征,输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征。402. If there are N swiping groups that have been identified, the server obtains the input characteristics of the target user ID. The input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics.
403,服务端基于目标用户ID的输入特征识别目标用户ID与N个刷量群体中每个刷量群体的相似度。403. The server recognizes the similarity between the target user ID and each of the N brush groups based on the input characteristics of the target user ID.
404,若N个刷量群体中存在与目标用户ID相似度大于预设相似度阈值的刷量群体,服务端确定目标用户ID为刷量用户ID。404. If there is a scraping group whose similarity with the target user ID is greater than a preset similarity threshold among the N scraping groups, the server determines that the target user ID is the scraping user ID.
405,若N个刷量群体中不存在与目标用户ID相似度大于预设相似度阈值的刷量群体,服务端将目标用户ID的输入特征输入训练好的二分类模型,得到目标用户ID的输入特征的初步分类结果。405. If there is no brushing group whose similarity with the target user ID is greater than the preset similarity threshold among the N brushing groups, the server inputs the input characteristics of the target user ID into the trained binary classification model to obtain the target user ID Enter the preliminary classification result of the feature.
406,服务端将初步分类结果输入训练好的分类器进行计算,得到中间计算结果,将中间计算结果输入训练好的神经网络模型进行训练,得到目标用户ID的识别结果。406. The server inputs the preliminary classification results into the trained classifier for calculation to obtain intermediate calculation results, and inputs the intermediate calculation results into the trained neural network model for training, to obtain the identification result of the target user ID.
其中,步骤401至步骤406的具体实施可以参见图2所示的步骤201至步骤206,此处不再赘述。Among them, the specific implementation of step 401 to step 406 can refer to step 201 to step 206 shown in FIG. 2, which will not be repeated here.
407,若不存在已识别的N个刷量群体,服务端确定是否存在已识别的多个刷量用户ID。407. If there are no identified N swipe groups, the server determines whether there are multiple identified swipe user IDs.
408,若存在已识别的多个刷量用户ID,服务端识别目标用户ID与已识别的多个刷量用户ID之间的相似度。408. If there are multiple credit user IDs that have been identified, the server identifies the similarity between the target user ID and the identified multiple credit user IDs.
409,若多个刷量用户ID中存在与目标用户ID相似度大于预设相似度阈值的刷量用户ID,服务端在目标用户ID的输入特征中增加刷量用户关联特征;并执行步骤405中服务端将目标用户ID的输入特征输入训练好的二分类模型,得到目标用户ID的输入特征的初步分类结果的步骤。409. If there is a swipe user ID whose similarity to the target user ID is greater than the preset similarity threshold among the multiple swipe user IDs, the server adds the swipe user-related feature to the input features of the target user ID; and step 405 is executed The middle server inputs the input features of the target user ID into the trained two-classification model to obtain the preliminary classification results of the input features of the target user ID.
本申请实施例中,如果不存在已识别的刷量群体,则可以将目标用户ID与单个已识别的刷量用户ID进行相似度计算。当识别到用户和单个刷量用户存在关联,可以利用相似度分析的算法对用户进行判断,求出目标用户ID与刷量用户ID之间的相似度来增加目标用户ID的输入特征,从而提高该目标用户ID识别的准确度,进一步判断该目标用户是否是真正的刷量用户。In the embodiment of the present application, if there is no identified swiping group, the target user ID and a single identified swiping user ID can be calculated for similarity. When it is recognized that a user is associated with a single user who is credited, the similarity analysis algorithm can be used to judge the user, and the similarity between the target user ID and the credit user ID can be calculated to increase the input characteristics of the target user ID, thereby improving The accuracy of the identification of the target user ID further determines whether the target user is a real credit user.
可选的,本申请实施例还可以采用非监督算法去完成群体刷量识别,利用到聚类算法或者孤独森林的算法进行识别那些群体中异常的用户。Optionally, the embodiment of the present application may also use an unsupervised algorithm to complete group swipe identification, and use a clustering algorithm or a lonely forest algorithm to identify abnormal users in the group.
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,服务端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The foregoing mainly introduces the solution of the embodiment of the present application from the perspective of the execution process on the method side. It can be understood that, in order to realize the above-mentioned functions, the server includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
本申请实施例可以根据上述方法示例对服务端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present application may divide the server side into functional units according to the foregoing method examples. For example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
请参阅图5,图5是本申请实施例公开的一种用户识别装置的结构示意图。如图5所示,该用户识别装置500包括第一确定单元501、获取单元502、识别单元503和第二确定单元504,其中:Please refer to FIG. 5, which is a schematic structural diagram of a user identification device disclosed in an embodiment of the present application. As shown in FIG. 5, the user identification device 500 includes a first determination unit 501, an acquisition unit 502, an identification unit 503, and a second determination unit 504, wherein:
所述第一确定单元501,用于当需要对目标用户ID进行用户识别时,确定是否存在已识别的N个刷量群体,所述N个刷量群体是按照群体用户规则分类得到,所述N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数;The first determining unit 501 is configured to determine whether there are identified N swipe groups when the target user ID needs to be identified. The N swipe groups are classified according to group user rules. Any one of the N brush groups contains brush user IDs greater than the preset number threshold, and N is a positive integer;
所述获取单元502,用于在所述第一确定单元501确定存在已识别的N个刷量群体的情况下,获取所述目标用户ID的输入特征,所述输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征;The acquiring unit 502 is configured to acquire the input characteristics of the target user ID when the first determining unit 501 determines that there are N swipe groups that have been identified, and the input characteristics include user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics;
所述识别单元503,用于基于所述目标用户ID的输入特征识别所述目标用户ID与所述N个刷量群体中每个刷量群体的相似度;The identification unit 503 is configured to identify the similarity between the target user ID and each of the N brush amount groups based on the input characteristics of the target user ID;
所述第二确定单元504,用于在所述识别单元503识别到所述N个刷量群体中存在与所述目标用户ID相似度大于预设相似度阈值的刷量群体的情况下,确定所述目标用户ID为刷量用户ID。The second determining unit 504 is configured to determine when the identifying unit 503 recognizes that there is a brushing group whose similarity with the target user ID is greater than a preset similarity threshold among the N brushing groups The target user ID is the ID of the user who swipes.
可选的,该用户识别装置500还可以包括处理单元505505。Optionally, the user identification device 500 may further include a processing unit 505505.
所述处理单元505,用于在所述识别单元503识别到所述N个刷量群体中不存在与所述目标用户ID相似度大于预设相似度阈值的刷量群体的情况下,将所述目标用户ID的输入特征输入训练好的二分类模型,得到所述目标用户ID的输入特征的初步分类结果;The processing unit 505 is configured to: if the identification unit 503 recognizes that there is no brush group whose similarity with the target user ID is greater than a preset similarity threshold among the N brush groups, The input feature of the target user ID is input into a trained binary classification model to obtain a preliminary classification result of the input feature of the target user ID;
所述处理单元505,还用于将所述初步分类结果输入训练好的分类器进行计算,得到中间计算结果,将所述中间计算结果输入训练好的神经网络模型进行训练,得到所述目标用户ID的识别结果。The processing unit 505 is further configured to input the preliminary classification result into the trained classifier for calculation to obtain an intermediate calculation result, and input the intermediate calculation result into the trained neural network model for training to obtain the target user ID recognition result.
可选的,所述处理单元505,还用于将所述目标用户ID的输入特征输入训练好的二分类模型,得到所述目标用户ID的输入特征的初步分类结果之前,提取第一用户ID的输入特征,所述第一用户ID为M个待识别用户ID中的任一个,M为正整数;采用单用户规则 识别所述M个待识别用户ID中的刷量用户ID和非刷量用户ID;将所述M个待识别用户ID的输入特征输入初始二分类模型进行训练,得到M个训练结果;当所述M个训练结果的准确度达到第一预设准确度阈值时,确定训练后的所述初始二分类模型为训练好的二分类模型。Optionally, the processing unit 505 is further configured to input the input feature of the target user ID into the trained binary classification model, and extract the first user ID before obtaining the preliminary classification result of the input feature of the target user ID The first user ID is any one of the M user IDs to be identified, and M is a positive integer; the single-user rule is used to identify the swipe user ID and the non-swipe amount among the M user IDs to be identified User ID; input the input features of the M user IDs to be identified into the initial binary classification model for training, and obtain M training results; when the accuracy of the M training results reaches the first preset accuracy threshold, determine The initial two-classification model after training is a trained two-classification model.
可选的,所述处理单元505,还用于将所述初步分类结果输入训练好的分类器进行计算,得到中间计算结果之前,将所述M个训练结果输入初始分类器进行计算,得到M个中间计算结果;当所述M个中间计算结果的准确度达到第二预设准确度阈值时,确定训练后的所述初始分类器为训练好的分类器。Optionally, the processing unit 505 is further configured to input the preliminary classification result into the trained classifier for calculation, and before obtaining the intermediate calculation result, input the M training results into the initial classifier for calculation to obtain M Intermediate calculation results; when the accuracy of the M intermediate calculation results reaches a second preset accuracy threshold, it is determined that the initial classifier after training is a trained classifier.
可选的,所述处理单元505,还用于将所述中间计算结果输入训练好的神经网络模型进行训练,得到所述目标用户ID的识别结果之前,将所述M个中间计算结果输入初始神经网络模型进行训练,得到M个识别结果;Optionally, the processing unit 505 is further configured to input the intermediate calculation results into the trained neural network model for training, and before the identification result of the target user ID is obtained, input the M intermediate calculation results into the initial The neural network model is trained to obtain M recognition results;
当所述M个识别结果的准确度达到第三预设准确度阈值时,确定训练后的所述初始神经网络模型为训练好的神经网络模型。When the accuracy of the M recognition results reaches the third preset accuracy threshold, the initial neural network model after training is determined to be the trained neural network model.
可选的,所述群体用户规则基于用户ID对应的设备位置以及用户ID对应的应用程序使用的时间序列确定,所述处理单元505,还用于在所述第一确定单元501确定是否存在已识别的N个刷量群体之前,将已识别的多个刷量用户ID中对应的设备位置之间的距离小于预设距离阈值、并且所述多个刷量用户ID中应用程序使用的时间序列在第一预设时间段内的刷量用户ID归入第一类刷量群体。Optionally, the group user rule is determined based on the location of the device corresponding to the user ID and the time series used by the application corresponding to the user ID, and the processing unit 505 is further configured to determine whether there is an existing user ID in the first determining unit 501. Before the identified N swiping groups, the distance between the corresponding device positions in the multiple swiping user IDs that have been identified is smaller than the preset distance threshold, and the time series of application usage in the multiple swiping user IDs The user IDs of the users who swiped during the first preset time period are classified into the first type of swipe groups.
可选的,所述处理单元505,还用于在所述第一确定单元501确定不存在已识别的N个刷量群体的情况下,确定是否存在已识别的多个刷量用户ID;若存在所述已识别的多个刷量用户ID,识别所述目标用户ID与所述已识别的多个刷量用户ID之间的相似度;若所述多个刷量用户ID中存在与所述目标用户ID相似度大于预设相似度阈值的刷量用户ID,在所述目标用户ID的输入特征中增加刷量用户关联特征;将所述目标用户ID的输入特征输入训练好的二分类模型,得到所述目标用户ID的输入特征的初步分类结果。Optionally, the processing unit 505 is further configured to determine whether there are multiple identified user IDs when the first determining unit 501 determines that there are no identified N crediting groups; if There is the plurality of scoring user IDs that have been identified, and the similarity between the target user ID and the plurality of scoring user IDs that have been identified; The target user ID similarity is greater than the preset similarity threshold for the swipe user ID, add swipe user-related features in the input features of the target user ID; input the target user ID input features into the trained two categories The model obtains the preliminary classification result of the input feature of the target user ID.
其中,图5中的第一确定单元501、获取单元502、识别单元503、第二确定单元504和处理单元505可以是处理器。Wherein, the first determining unit 501, acquiring unit 502, identifying unit 503, second determining unit 504, and processing unit 505 in FIG. 5 may be processors.
实施图5所示的用户识别装置,对目标用户ID进行用户识别时,可以将该目标用户与已识别的刷量群体进行相似度识别,如果相似度大于预设相似度阈值,则可直接认定该目标用户ID为刷量用户ID,由于刷量用户往往具有群体刷量的特性,通过与刷量群体的相 似度识别可以快速准确的确定该目标用户是否为刷量用户ID,从而提高了刷量用户的识别准确度。Implementing the user identification device shown in Figure 5, when the target user ID is user identification, the target user can be identified with the identified brush group for similarity. If the similarity is greater than the preset similarity threshold, it can be directly identified The target user ID is the user ID of the scouring user. Since the scouring user often has the characteristics of the scouring group, the identification of the similarity with the scouring group can quickly and accurately determine whether the target user is the scouring user ID, thereby improving the scoring Measure the user’s recognition accuracy.
请参阅图6,图6是本申请实施例公开的一种服务端的结构示意图。如图6所示,该服务端600包括处理器601和存储器602,其中,服务端600还可以包括总线603,处理器601和存储器602可以通过总线603相互连接,总线603可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。总线603可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。其中,服务端600还可以包括输入通信接口604,该通信接口604可以从外部设备(比如,其他服务器或者数据库)从获取数据。存储器602用于存储包含指令的一个或多个程序;处理器601用于调用存储在存储器602中的指令执行上述图1至图4中的部分或全部方法步骤。Please refer to FIG. 6. FIG. 6 is a schematic structural diagram of a server disclosed in an embodiment of the present application. As shown in FIG. 6, the server 600 includes a processor 601 and a memory 602. The server 600 may also include a bus 603. The processor 601 and the memory 602 may be connected to each other through the bus 603. The bus 603 may be a peripheral component. Connect the standard (Peripheral Component Interconnect, referred to as PCI) bus or extended industry standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus, etc. The bus 603 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus. The server 600 may further include an input communication interface 604, and the communication interface 604 may obtain data from an external device (for example, other servers or databases). The memory 602 is used to store one or more programs containing instructions; the processor 601 is used to call the instructions stored in the memory 602 to execute some or all of the method steps in FIGS. 1 to 4.
实施图6所示的服务端,对目标用户ID进行用户识别时,可以将该目标用户与已识别的刷量群体进行相似度识别,如果相似度大于预设相似度阈值,则可直接认定该目标用户ID为刷量用户ID,由于刷量用户往往具有群体刷量的特性,通过与刷量群体的相似度识别可以快速准确的确定该目标用户是否为刷量用户ID,从而提高了刷量用户的识别准确度。Implementing the server shown in Figure 6, when the target user ID is identified, the target user can be identified with the identified brush group for similarity. If the similarity is greater than the preset similarity threshold, it can be directly identified The target user ID is the user ID of the scalping user. Since the scalping user often has the characteristics of the scalping group, the similarity recognition with the scalping group can quickly and accurately determine whether the target user is the scalping user ID, thereby increasing the scalping volume The accuracy of user recognition.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种用户识别方法的部分或全部步骤。An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes the computer to execute any part of the user identification method described in the above method embodiment Or all steps.
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种用户识别方法的部分或全部步骤。The embodiments of the present application also provide a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. The computer program is operable to cause a computer to execute any of the methods described in the foregoing method embodiments. Part or all of the steps of a user identification method.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described sequence of actions. Because according to the present invention, certain steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the involved actions and modules are not necessarily required by the present invention.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable memory, and the memory can include: flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disc, etc.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the present invention. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present invention; Those of ordinary skill in the art, based on the idea of the present invention, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as limiting the present invention.
Claims (10)
- 一种用户识别方法,其特征在于,包括:A user identification method, characterized in that it comprises:当需要对目标用户ID进行用户识别时,确定是否存在已识别的N个刷量群体,所述N个刷量群体是按照群体用户规则分类得到,所述N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数;When user identification of the target user ID is required, determine whether there are identified N brush groups, the N brush groups are classified according to group user rules, any one of the N brush groups The ID of the users who brush the amount included in the group is greater than the preset number threshold, and N is a positive integer;若存在,获取所述目标用户ID的输入特征,所述输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征;If it exists, obtain the input characteristics of the target user ID, the input characteristics including user location characteristics, user APP usage characteristics, user equipment usage characteristics, and user click-through rate CTR characteristics;基于所述目标用户ID的输入特征识别所述目标用户ID与所述N个刷量群体中每个刷量群体的相似度;Identifying the similarity between the target user ID and each of the N brushing groups based on the input characteristics of the target user ID;若所述N个刷量群体中存在与所述目标用户ID相似度大于预设相似度阈值的刷量群体,确定所述目标用户ID为刷量用户ID。If there is a brushing group whose similarity with the target user ID is greater than a preset similarity threshold among the N brushing groups, it is determined that the target user ID is a brushing user ID.
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:若所述N个刷量群体中不存在与所述目标用户ID相似度大于预设相似度阈值的刷量群体,将所述目标用户ID的输入特征输入训练好的二分类模型,得到所述目标用户ID的输入特征的初步分类结果;If there is no brush group whose similarity with the target user ID is greater than the preset similarity threshold in the N brush groups, input the input characteristics of the target user ID into the trained binary classification model to obtain the The preliminary classification result of the input characteristics of the target user ID;将所述初步分类结果输入训练好的分类器进行计算,得到中间计算结果,将所述中间计算结果输入训练好的神经网络模型进行训练,得到所述目标用户ID的识别结果。The preliminary classification result is input into a trained classifier for calculation to obtain an intermediate calculation result, and the intermediate calculation result is input into a trained neural network model for training to obtain the identification result of the target user ID.
- 根据权利要求2所述的方法,其特征在于,所述将所述目标用户ID的输入特征输入训练好的二分类模型,得到所述目标用户ID的输入特征的初步分类结果之前,所述方法还包括:The method according to claim 2, characterized in that, before the input feature of the target user ID is input into a trained binary classification model, and the preliminary classification result of the input feature of the target user ID is obtained, the method Also includes:提取第一用户ID的输入特征,所述第一用户ID为M个待识别用户ID中的任一个,M为正整数;Extract the input feature of the first user ID, where the first user ID is any one of M user IDs to be identified, and M is a positive integer;采用单用户规则识别所述M个待识别用户ID中的刷量用户ID和非刷量用户ID;Using a single-user rule to identify the user ID of the amount-swiping user and the ID of the non-swiping user among the M user IDs to be identified;将所述M个待识别用户ID的输入特征输入初始二分类模型进行训练,得到M个训练结果;Input the input features of the M to-be-identified user IDs into the initial binary classification model for training, and obtain M training results;当所述M个训练结果的准确度达到第一预设准确度阈值时,确定训练后的所述初始二 分类模型为训练好的二分类模型。When the accuracy of the M training results reaches a first preset accuracy threshold, it is determined that the initial binary classification model after training is a trained binary classification model.
- 根据权利要求3所述的方法,其特征在于,所述将所述初步分类结果输入训练好的分类器进行计算,得到中间计算结果之前,所述方法还包括:The method according to claim 3, characterized in that said inputting said preliminary classification result into a trained classifier for calculation and before obtaining intermediate calculation result, said method further comprises:将所述M个训练结果输入初始分类器进行计算,得到M个中间计算结果;Input the M training results into the initial classifier for calculation to obtain M intermediate calculation results;当所述M个中间计算结果的准确度达到第二预设准确度阈值时,确定训练后的所述初始分类器为训练好的分类器。When the accuracy of the M intermediate calculation results reaches a second preset accuracy threshold, it is determined that the initial classifier after training is a trained classifier.
- 根据权利要求4所述的方法,其特征在于,所述将所述中间计算结果输入训练好的神经网络模型进行训练,得到所述目标用户ID的识别结果之前,所述方法还包括:The method according to claim 4, characterized in that, before inputting the intermediate calculation result into a trained neural network model for training, and obtaining the identification result of the target user ID, the method further comprises:将所述M个中间计算结果输入初始神经网络模型进行训练,得到M个识别结果;Input the M intermediate calculation results into the initial neural network model for training, and obtain M recognition results;当所述M个识别结果的准确度达到第三预设准确度阈值时,确定训练后的所述初始神经网络模型为训练好的神经网络模型。When the accuracy of the M recognition results reaches the third preset accuracy threshold, the initial neural network model after training is determined to be the trained neural network model.
- 根据权利要求1~5任一项所述的方法,其特征在于,所述群体用户规则基于用户ID对应的设备位置以及用户ID对应的应用程序使用的时间序列确定,所述确定是否存在已识别的N个刷量群体之前,所述方法还包括:The method according to any one of claims 1 to 5, wherein the group user rule is determined based on the location of the device corresponding to the user ID and the time series used by the application corresponding to the user ID, and the determination of whether there is an identified Before the N brushing groups of, the method also includes:将已识别的多个刷量用户ID中对应的设备位置之间的距离小于预设距离阈值、并且所述多个刷量用户ID中应用程序使用的时间序列在第一预设时间段内的刷量用户ID归入第一类刷量群体。The distance between the corresponding device positions in the identified multiple swiping user IDs is less than the preset distance threshold, and the time series used by the application in the multiple swiping user IDs is within the first preset time period. The user ID of the amount of brushing is classified into the first type of brushing group.
- 根据权利要求2~6任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 2 to 6, wherein the method further comprises:若不存在已识别的N个刷量群体,确定是否存在已识别的多个刷量用户ID;If there are no identified N swipe groups, determine whether there are multiple identified swipe user IDs;若存在所述已识别的多个刷量用户ID,识别所述目标用户ID与所述已识别的多个刷量用户ID之间的相似度;If there are multiple identified user IDs for scoring, identifying the similarity between the target user ID and the identified multiple scoring user IDs;若所述多个刷量用户ID中存在与所述目标用户ID相似度大于预设相似度阈值的刷量用户ID,在所述目标用户ID的输入特征中增加刷量用户关联特征;If there is a swipe user ID whose similarity to the target user ID is greater than a preset similarity threshold among the plurality of swipe user IDs, add swipe user-related features in the input features of the target user ID;执行所述将所述目标用户ID的输入特征输入训练好的二分类模型,得到所述目标用户ID的输入特征的初步分类结果的步骤。The step of inputting the input feature of the target user ID into a trained two-classification model is performed to obtain a preliminary classification result of the input feature of the target user ID.
- 一种用户识别装置,其特征在于,所述用户识别装置包括第一确定单元、获取单元、识别单元和第二确定单元,其中:A user identification device, characterized in that the user identification device includes a first determination unit, an acquisition unit, an identification unit and a second determination unit, wherein:所述第一确定单元,用于当需要对目标用户ID进行用户识别时,确定是否存在已识别的N个刷量群体,所述N个刷量群体是按照群体用户规则分类得到,所述N个刷量群体中任意一个刷量群体包含的刷量用户ID大于预设数量阈值,N为正整数;The first determining unit is configured to determine whether there are identified N brushing groups when the target user ID needs to be identified. The N brushing groups are classified according to group user rules. The user ID contained in any one of the two groups is greater than the preset number threshold, and N is a positive integer;所述获取单元,用于在所述第一确定单元确定存在已识别的N个刷量群体的情况下,获取所述目标用户ID的输入特征,所述输入特征包括用户位置特征、用户APP使用特征、用户设备使用特征和用户点击通过率CTR特征;The acquiring unit is configured to acquire the input characteristics of the target user ID when the first determining unit determines that there are N swiping groups that have been identified, and the input characteristics include user location characteristics and user APP usage Features, user equipment usage features and user click-through rate CTR features;所述识别单元,用于基于所述目标用户ID的输入特征识别所述目标用户ID与所述N个刷量群体中每个刷量群体的相似度;The identification unit is configured to identify the similarity between the target user ID and each of the N brush amount groups based on the input characteristics of the target user ID;所述第二确定单元,用于在所述识别单元识别到所述N个刷量群体中存在与所述目标用户ID相似度大于预设相似度阈值的刷量群体的情况下,确定所述目标用户ID为刷量用户ID。The second determining unit is configured to determine the brush group that has a similarity with the target user ID greater than a preset similarity threshold among the N brush groups The target user ID is the ID of the user who swipes.
- 一种服务端,其特征在于,包括处理器以及存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序被配置成由所述处理器执行,所述程序包括用于执行如权利要求1~7任一项所述的方法。A server is characterized by comprising a processor and a memory, the memory is used to store one or more programs, the one or more programs are configured to be executed by the processor, and the programs include Perform the method according to any one of claims 1-7.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1~7任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for electronic data exchange, wherein the computer program causes a computer to execute any one of claims 1-7 method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201980091203.7A CN113383362B (en) | 2019-06-24 | 2019-06-24 | User identification method and related product |
PCT/CN2019/092592 WO2020257991A1 (en) | 2019-06-24 | 2019-06-24 | User identification method and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/092592 WO2020257991A1 (en) | 2019-06-24 | 2019-06-24 | User identification method and related product |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020257991A1 true WO2020257991A1 (en) | 2020-12-30 |
Family
ID=74061199
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/092592 WO2020257991A1 (en) | 2019-06-24 | 2019-06-24 | User identification method and related product |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113383362B (en) |
WO (1) | WO2020257991A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111930995A (en) * | 2020-08-18 | 2020-11-13 | 湖南快乐阳光互动娱乐传媒有限公司 | Data processing method and device |
CN112819527A (en) * | 2021-01-29 | 2021-05-18 | 百果园技术(新加坡)有限公司 | User grouping processing method and device |
CN113947139A (en) * | 2021-10-13 | 2022-01-18 | 咪咕视讯科技有限公司 | User identification method, device and equipment |
CN114466214A (en) * | 2022-02-09 | 2022-05-10 | 上海哔哩哔哩科技有限公司 | Method and device for counting people in live broadcast room |
CN114679600A (en) * | 2022-03-24 | 2022-06-28 | 上海哔哩哔哩科技有限公司 | Data processing method and device |
CN114926221A (en) * | 2022-05-31 | 2022-08-19 | 北京奇艺世纪科技有限公司 | Cheating user identification method and device, electronic equipment and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704566B (en) * | 2021-10-29 | 2022-01-18 | 贝壳技术有限公司 | Identification number body identification method, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100100486A1 (en) * | 2008-10-17 | 2010-04-22 | At&T Mobility Ii Llc | User terminal and wireless item-based credit card authorization servers, systems, methods and computer program products |
CN106651475A (en) * | 2017-02-22 | 2017-05-10 | 广州万唯邑众信息科技有限公司 | Method and system for identifying false traffic of mobile video advertisement |
CN107169769A (en) * | 2016-03-08 | 2017-09-15 | 广州市动景计算机科技有限公司 | The brush amount recognition methods of application program, device |
CN109241343A (en) * | 2018-07-27 | 2019-01-18 | 北京奇艺世纪科技有限公司 | A kind of brush amount user identifying system, method and device |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294508B (en) * | 2015-06-10 | 2020-02-11 | 深圳市腾讯计算机系统有限公司 | Brushing amount tool detection method and device |
CN104932966B (en) * | 2015-06-19 | 2017-09-15 | 广东欧珀移动通信有限公司 | Detect that application software downloads the method and device of brush amount |
CN106612202A (en) * | 2015-10-27 | 2017-05-03 | 网易(杭州)网络有限公司 | Method and system for pre-estimate and judgment of amount brushing of online game channel |
CN106022834B (en) * | 2016-05-24 | 2020-04-07 | 腾讯科技(深圳)有限公司 | Advertisement anti-cheating method and device |
CN107634952B (en) * | 2017-09-22 | 2020-12-08 | Oppo广东移动通信有限公司 | Method and device for determining brushing amount resource, service equipment, mobile terminal and storage medium |
CN108921581B (en) * | 2018-07-18 | 2021-07-02 | 北京三快在线科技有限公司 | Method and device for identifying bill-swiping operation and computer-readable storage medium |
CN109525595B (en) * | 2018-12-25 | 2021-04-16 | 广州方硅信息技术有限公司 | Black product account identification method and equipment based on time flow characteristics |
-
2019
- 2019-06-24 CN CN201980091203.7A patent/CN113383362B/en active Active
- 2019-06-24 WO PCT/CN2019/092592 patent/WO2020257991A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100100486A1 (en) * | 2008-10-17 | 2010-04-22 | At&T Mobility Ii Llc | User terminal and wireless item-based credit card authorization servers, systems, methods and computer program products |
CN107169769A (en) * | 2016-03-08 | 2017-09-15 | 广州市动景计算机科技有限公司 | The brush amount recognition methods of application program, device |
CN106651475A (en) * | 2017-02-22 | 2017-05-10 | 广州万唯邑众信息科技有限公司 | Method and system for identifying false traffic of mobile video advertisement |
CN109241343A (en) * | 2018-07-27 | 2019-01-18 | 北京奇艺世纪科技有限公司 | A kind of brush amount user identifying system, method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111930995A (en) * | 2020-08-18 | 2020-11-13 | 湖南快乐阳光互动娱乐传媒有限公司 | Data processing method and device |
CN112819527A (en) * | 2021-01-29 | 2021-05-18 | 百果园技术(新加坡)有限公司 | User grouping processing method and device |
CN112819527B (en) * | 2021-01-29 | 2024-05-24 | 百果园技术(新加坡)有限公司 | User grouping processing method and device |
CN113947139A (en) * | 2021-10-13 | 2022-01-18 | 咪咕视讯科技有限公司 | User identification method, device and equipment |
CN114466214A (en) * | 2022-02-09 | 2022-05-10 | 上海哔哩哔哩科技有限公司 | Method and device for counting people in live broadcast room |
CN114679600A (en) * | 2022-03-24 | 2022-06-28 | 上海哔哩哔哩科技有限公司 | Data processing method and device |
CN114926221A (en) * | 2022-05-31 | 2022-08-19 | 北京奇艺世纪科技有限公司 | Cheating user identification method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113383362B (en) | 2022-05-13 |
CN113383362A (en) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020257991A1 (en) | User identification method and related product | |
US11138381B2 (en) | Method, computer device and readable medium for user's intent mining | |
CN104091276B (en) | The method of on-line analysis clickstream data and relevant apparatus and system | |
CN106919661B (en) | Emotion type identification method and related device | |
CN110442712B (en) | Risk determination method, risk determination device, server and text examination system | |
CN106339507B (en) | Streaming Media information push method and device | |
WO2020093289A1 (en) | Resource recommendation method and apparatus, electronic device and storage medium | |
CN109509010B (en) | Multimedia information processing method, terminal and storage medium | |
CN104281622A (en) | Information recommending method and information recommending device in social media | |
WO2015120798A1 (en) | Method for processing network media information and related system | |
CN103336766A (en) | Short text garbage identification and modeling method and device | |
CN104750760B (en) | A kind of implementation method and device for recommending application software | |
CN108112038B (en) | Method and device for controlling access flow | |
CN113111264B (en) | Interface content display method and device, electronic equipment and storage medium | |
WO2023000491A1 (en) | Application recommendation method, apparatus and device, and computer-readable storage medium | |
CN111523035B (en) | Recommendation method, device, server and medium for APP browsing content | |
CN113127746A (en) | Information pushing method based on user chat content analysis and related equipment thereof | |
CN113505272B (en) | Control method and device based on behavior habit, electronic equipment and storage medium | |
CN112884529A (en) | Advertisement bidding method, device, equipment and medium | |
US20200394448A1 (en) | Methods for more effectively moderating one or more images and devices thereof | |
CN110460593B (en) | Network address identification method, device and medium for mobile traffic gateway | |
CN111507471B (en) | Model training method, device, equipment and storage medium | |
CN113010785A (en) | User recommendation method and device | |
CN110309406A (en) | Clicking rate predictor method, device, equipment and storage medium | |
CN110837739A (en) | Service processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19934765 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 15/02/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19934765 Country of ref document: EP Kind code of ref document: A1 |