[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117575595A - Payment risk identification method, device, computer equipment and storage medium - Google Patents

Payment risk identification method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117575595A
CN117575595A CN202310754485.2A CN202310754485A CN117575595A CN 117575595 A CN117575595 A CN 117575595A CN 202310754485 A CN202310754485 A CN 202310754485A CN 117575595 A CN117575595 A CN 117575595A
Authority
CN
China
Prior art keywords
payment
risk
risk identification
sample set
identification result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310754485.2A
Other languages
Chinese (zh)
Inventor
黄自豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310754485.2A priority Critical patent/CN117575595A/en
Publication of CN117575595A publication Critical patent/CN117575595A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a payment risk identification method, a payment risk identification device, computer equipment and a storage medium, and relates to the field of payment. Comprising the following steps: extracting payment characteristics corresponding to the payment behaviors; respectively inputting the payment characteristics into a plurality of base classifiers in a first risk recognition model to obtain a plurality of sub-risk recognition results, wherein the base classifiers comprise a class of support vector machines obtained through unsupervised learning; determining a first risk identification result of the payment behavior based on the plurality of sub-risk identification results, the first risk identification result being used to characterize whether the payment behavior is at risk for payment; and processing the payment behavior based on the target payment processing strategy corresponding to the first risk identification result. By adopting the method of the embodiment of the application, the labeling time and the gold Qian Chengben are reduced, and the problem of impure labeling data is avoided; the multiple base classifiers are integrated, so that the robustness, stability and generalization capability of the model can be improved, the overfitting is reduced, and the model identification is more accurate.

Description

Payment risk identification method, device, computer equipment and storage medium
Technical Field
The embodiment of the application relates to the field of payment, in particular to a payment risk identification method, a device, computer equipment and a storage medium.
Background
In the context of online payment, the payment program evaluates whether the transaction is at risk of fraud to reduce the occurrence of fraud events and reduce the likelihood of funds loss.
In the related art, risk recognition models are usually trained by a supervised machine learning method to predict the risk possibility of transactions.
However, in the field of fraud recognition for social payments, the training process of risk recognition models faces many problems. For example, tag data is scarce in social payment scenarios, and the process of collecting and labeling training samples is time consuming and expensive; at the same time, the tag data may have the problem of being impure, i.e., the tag data may be inaccurate or noisy, which will affect the performance and generalization ability of the model.
Disclosure of Invention
The embodiment of the application provides a payment risk identification method, a payment risk identification device, computer equipment and a storage medium. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a payment risk identification method, where the method includes:
extracting payment characteristics corresponding to the payment behaviors;
respectively inputting the payment characteristics into a plurality of base classifiers in a first risk recognition model to obtain sub-risk recognition results output by the plurality of base classifiers, wherein the base classifiers comprise a class of support vector machines obtained through unsupervised learning;
Determining a first risk recognition result of the payment behavior based on a plurality of the sub risk recognition results, wherein the first risk recognition result is used for representing whether the payment behavior has a payment risk or not;
and processing the payment behavior based on a target payment processing strategy corresponding to the first risk identification result.
In another aspect, an embodiment of the present application provides a payment risk identification device, including:
the extraction module is used for extracting payment characteristics corresponding to the payment behaviors;
the identification module is used for respectively inputting the payment characteristics into a plurality of base classifiers in a first risk identification model to obtain sub-risk identification results output by the plurality of base classifiers, wherein the base classifiers comprise a class of support vector machines obtained through unsupervised learning;
the identification module is further configured to determine a first risk identification result of the payment behavior based on a plurality of the sub-risk identification results, where the first risk identification result is used to characterize whether the payment behavior has a payment risk;
and the processing module is used for processing the payment behavior based on a target payment processing strategy corresponding to the first risk identification result.
In another aspect, embodiments of the present application provide a computer device comprising a processor and a memory; the memory stores at least one instruction for execution by the processor to implement the method of the above aspect.
In another aspect, embodiments of the present application provide a computer-readable storage medium having at least one instruction stored therein, the instructions being loaded and executed by a processor to implement a method as described in the above aspects.
In another aspect, embodiments of the present application provide a computer program product comprising computer instructions stored in a computer-readable storage medium. The processor of the terminal reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the terminal performs the methods provided in the various alternative implementations of the above aspect.
In the embodiment of the application, the payment features are respectively input into the plurality of base classifiers in the first risk recognition model to obtain a plurality of sub-risk recognition results, and the base classifiers comprise a class of support vector machines obtained through unsupervised learning, so that training data do not need to be marked, marking time and gold Qian Chengben are reduced, and the problem of uncleanness of marking data is avoided; on the other hand, the first risk recognition result of the payment behavior is determined based on the multiple sub-risk recognition results, and multiple base classifiers can be integrated, so that the robustness, stability and generalization capability of the model are improved, and overfitting is reduced, so that the normal payment behavior and the abnormal payment behavior can be distinguished more accurately.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation environment provided by an exemplary embodiment of the present application;
FIG. 2 is a flow chart of a payment risk identification method provided by an exemplary embodiment of the present application;
FIG. 3 is a flowchart of training a first risk identification model provided in one exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of training a class of support vector machines by means of unsupervised learning, according to an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram of voting results based on a plurality of sub-risk identification results and voting weights provided in an exemplary embodiment of the present application;
FIG. 6 is a schematic diagram of three stages of a payment risk identification method provided by an exemplary embodiment of the present application;
FIG. 7 is a schematic diagram of a payment risk identification method provided by an exemplary embodiment of the present application;
FIG. 8 is a block diagram of a payment risk identification device provided in an exemplary embodiment of the present application;
fig. 9 is a schematic structural diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
FIG. 1 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application.
As shown in fig. 1, the implementation environment includes a terminal 11 and a server 12.
The terminal 11 is an electronic device that can run an application program. The electronic device may be a mobile terminal such as a smart phone, a tablet pc, a laptop portable notebook computer, or a vehicle-mounted terminal, or may be a terminal such as a desktop computer or a projection computer, and the terminal 11 is described as a smart phone in this specification, but this configuration is not limited thereto.
In some embodiments, the terminal 11 may run a payment application (e.g., a social payment application) in which the user may conduct payment actions (e.g., friend transfers, sweep transfers, or redpackages, etc.) in various ways.
The server 12 may be a background server of the payment application. The server 12 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform.
In some embodiments, the terminal 11 sends the payment behavior in the payment application and its related data (e.g., payment type, payment amount, payment party account number, payment time, location of the payment party, etc.) to the server 12.
It should be noted that, in the process of collecting relevant data (such as where two payment parties are located) of a user, the present application may display a prompt interface, a popup window or output voice prompt information, where the prompt interface, the popup window or the voice prompt information is used to prompt the user to collect relevant data currently, so that the present application only starts to execute the relevant step of obtaining relevant data of the user after obtaining the confirmation operation of the user to the prompt interface or the popup window, otherwise (i.e. when the confirmation operation of the user to the prompt interface or the popup window is not obtained), ends the relevant step of obtaining relevant data of the user, i.e. does not obtain relevant data of the user. In other words, the information (including but not limited to user equipment information, user personal information, real-time location of the user), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to herein are all user-authorized or fully authorized by parties, and the collection, use, and processing of relevant data requires compliance with relevant laws and regulations and standards of the relevant country and region. For example, the locations of both parties of payment and the like referred to in this application are acquired with sufficient authorization.
In some embodiments, server 12 extracts payment features corresponding to the payment behavior based on the received payment behavior and its related data, and determines a first risk identification result of the payment behavior through a plurality of basis classifiers in first risk identification model 121. The first risk identification result is used for representing whether the payment behavior has a payment risk or not. In some embodiments, the base classifier includes a class of support vector machines derived by unsupervised learning.
In some embodiments, server 12 processes the payment behavior based on a target payment processing policy corresponding to the first risk identification result. For example, when the first risk identification result characterizes that the payment action has a payment risk, the server 12 transmits warning information or interception information to the terminal 11 to cause the terminal 11 to interrupt the payment action in the payment application.
In some embodiments, the server 12 may further determine, based on the payment characteristics, a first risk identification result and a second risk identification result through the first risk identification model 121 and the second risk identification model 122, respectively, and determine, based on the first risk identification result and the second risk identification result, a third risk identification result of the payment behavior, so as to finally determine whether the payment behavior has a risk.
In some embodiments, server 12 processes the payment behavior based on a target payment processing policy corresponding to the third risk identification result.
It should be noted that, the foregoing embodiments are merely illustrative of a general architecture of an implementation environment, and the system may further include more or fewer components, or some components may be combined, and the present embodiment is not limited to this configuration.
Referring to fig. 2, fig. 2 is a flowchart of a payment risk identification method provided in an exemplary embodiment of the present application. The method comprises the following steps:
in step 201, payment characteristics corresponding to the payment behavior are extracted.
Optionally, the payment is performed by a payment application running on the terminal, for example, by a friend transfer, a code scanning transfer, or a red packet sending function in the social payment application.
Optionally, the type of the payment feature may be obtained by performing feature engineering based on the payment behavior and related data received by the server.
In one possible implementation, information value (Information Value, IV) of the payment behavior related data may be calculated based on the first sample set, and a type of payment feature may be determined based on the information value. Wherein the first sample set is a historical sample containing annotation information. The information value is an index for evaluating the significance of a variable.
Optionally, which relevant data is used as the payment feature is determined based on the information value or the stability of the information value.
By way of example only, the data related to the payment activity may include various data such as payment type, payment amount, accounts of both parties, payment time, location of both parties, number of days of account registration, number of friends of the account, reputation value, and historical payment amount … …. And (3) determining the top 5 features (such as the account numbers of the two parties of payment, the places of the two parties of payment, the number of friends of the account numbers, the number of registered days of the account numbers and the credit value) which have the highest information value and meet the stability requirement as the types of the payment features by calculating the information value.
In some embodiments, after determining the type of the payment feature through the feature engineering, the server may extract the payment feature of the current payment behavior from the payment behavior and related data sent by the terminal.
For example only, after determining that the type of the payment feature is the account number of the two parties, the location of the two parties, the number of friends of the account, the number of registered days of the account, and the reputation value, the payment feature of the current payment activity is extracted from the payment activity and the related data thereof sent by the terminal, where the payment feature is respectively as follows: payment account number "xxxxx", collection account number "xxxxx", payer location "shenzhen", payee location "overseas", payee friends number "13", payee account number registration days "41 days", payee reputation value 69.
Step 202, respectively inputting the payment features into a plurality of base classifiers in the first risk recognition model to obtain sub-risk recognition results output by the plurality of base classifiers. The base classifier comprises a class of support vector machines obtained through unsupervised learning.
Alternatively, the base classifier is a machine learning model obtained by unsupervised learning. In some embodiments, the base classifier may be a type of support vector machine (One-Class Support Vector Machine, OCSVM). In some embodiments, the base classifier may be a Tree model, such as Decision Tree (DT), random Forest (RF), extreme gradient enhancement (Extreme Gradient Boosting, XGBoost), and the like.
In some embodiments, the plurality of base classifiers may be the same type or different types of machine learning models. The number of base classifiers is the superparameter, denoted N. Optionally, N is set to a positive integer between 10 and 30.
For example only, when n=20, the first risk identification model may be a model in which 20 types of support vector machines are integrated, or a model in which 15 types of support vector machines are integrated with 5 decision trees.
Alternatively, the plurality of base classifiers may be integrated in a variety of possible ways, such as by boosting, bagging or Stacking, among others.
In some embodiments, each base classifier outputs a corresponding sub-risk identification result.
Optionally, the sub-risk recognition result is a scoring value, representing a risk score of the payment behavior. Optionally, the sub-risk recognition result is a scoring grade, which characterizes a risk grade of the payment behavior.
Step 203, determining a first risk identification result of the payment behavior based on the plurality of sub-risk identification results, the first risk identification result being used to characterize whether the payment behavior is at risk for payment.
The first risk identification result is a prediction result output by the first risk identification model.
Optionally, the first risk identification result is a value of 1 or 0, which characterizes whether the payment behavior has a payment risk. Optionally, the first risk identification result is a scoring value, which characterizes a risk score of the payment behavior. Optionally, the first risk identification result is a rating level, which characterizes a risk level of the payment behavior.
In one possible implementation, the first risk identification result may be determined by voting on the results of the plurality of sub-risk identification results. In one possible implementation, the sub-risk recognition results corresponding to different base classifiers may have different voting weights.
And 204, processing payment behaviors based on the target payment processing strategy corresponding to the first risk identification result.
The different first risk identification results may correspond to different target payment processing policies.
For example only, the first risk identification result may include a plurality of risk levels. The lower risk level corresponds to a warning strategy, and a payment application program of the terminal displays warning information to remind a payer; the higher risk level corresponds to an interception policy, and a payment application program of the terminal intercepts payment behaviors of a payer; the extremely high risk level corresponds to an interception and audit policy, the payment application program of the terminal intercepts the payment behavior of the payer, simultaneously carries out manual audit on the accounts of the payer and the payee, and takes a plurality of forbidden measures, such as a sales user and the like, on the verified fraudulent account.
In summary, the payment features are respectively input into the plurality of base classifiers in the first risk recognition model to obtain a plurality of sub-risk recognition results, and the base classifiers comprise a class of support vector machines obtained through unsupervised learning, so that training data do not need to be marked, marking time and gold Qian Chengben are reduced, and the problem of uncleanness of marking data is avoided; on the other hand, the first risk recognition result of the payment behavior is determined based on the multiple sub-risk recognition results, and multiple base classifiers can be integrated, so that the robustness, stability and generalization capability of the model are improved, and overfitting is reduced, so that the normal payment behavior and the abnormal payment behavior can be distinguished more accurately.
Regarding the manner in which the first risk recognition result is determined based on the plurality of sub-risk recognition results, in one possible manner, the sub-risk recognition results are scoring values, a risk score characterizing the payment behavior, and a sum of the plurality of sub-risk recognition results may be taken as the first risk recognition result; in yet another possible manner, a result vote may be performed based on the multiple sub-risk identification results, resulting in a risk score corresponding to the payment activity.
In some embodiments, the sub-risk identification result is a scoring value, where a scoring value greater than or equal to zero characterizes the payment behavior as not having a payment risk, and where a scoring value less than zero characterizes the payment behavior as having a payment risk. Thus, a result vote may be made as to whether the multiple sub-risk identification results are indicative of a payment risk being present. For example only, the sub-risk recognition results corresponding to the 6 base classifiers are-45, -23, -50, 11, 3 and-90 respectively, and if 4 base classifiers represent that the payment behavior has a payment risk and 2 base classifiers represent that the payment behavior does not have a payment risk, the result voting is performed based on the multiple sub-risk recognition results, so that the risk score corresponding to the payment behavior is 4.
To determine the first risk identification result based on the risk score, in one possible implementation, at least one level of the score threshold may be divided, and a target risk level corresponding to the payment behavior may be determined as the first risk identification result. Wherein the at least one level of scoring threshold is used to divide the at least two levels of risk.
After determining the risk level, the server determines a target payment processing policy corresponding to the target risk level based on a correspondence between the risk level and the payment processing policy, and processes the payment behavior based on the target payment processing policy.
For example only, a first scoring threshold a, a second scoring threshold b, and a third scoring threshold c may be set for classifying the level 4 risk level. The number of basis classifiers in the first risk identification model is N.
Optionally, the first scoring threshold a=0.5.
Optionally, the first scoring threshold a, the second scoring threshold b and the third scoring threshold c are set based on a payment type corresponding to the payment behavior. For example, if the payment action is reddening, the first scoring threshold a may be set relatively high; if the payment action is a large transfer, the first scoring threshold a may be set relatively low.
And when the risk score n is in the range of aN < n < bN, determining that the target risk level corresponding to the payment behavior is a level 1 risk level, and determining the level 1 risk level as a first risk identification result. And under the condition that the first risk identification result is the level 1 risk level, the target payment processing strategy is an alarm strategy, and the payment application program of the terminal displays alarm information to remind a payer.
And when the risk score n is within the range that bN is less than or equal to n < cN, determining that the target risk level corresponding to the payment behavior is a level 2 risk level, and determining the level 2 risk level as a first risk identification result. And under the condition that the first risk identification result is the level 2 risk level, the target payment processing strategy is an interception strategy, and the payment application program of the terminal intercepts the payment behavior of the payer.
And when the risk score n is within the range that cN is less than or equal to n, determining that the target risk level corresponding to the payment behavior is a 3-level risk level, and determining the 3-level risk level as a first risk identification result. Under the condition that the first risk identification result is the 3-level risk level, the target payment processing strategy is an interception and auditing strategy, the payment application program of the terminal intercepts the payment behavior of the payer, simultaneously carries out manual auditing on the payer and the payee account, and takes a plurality of forbidden measures, such as a sales user and the like, on the verified fraud account.
In some embodiments, the first risk identification model includes at least two types of base classifiers. For example, when n=30, the first risk identification model includes 20 support vector machines and 10 tree models.
In some embodiments, the voting weights of the individual base classifiers may be determined, wherein different types of base classifiers correspond to different voting weights; and voting the results based on the multiple sub-risk identification results and the voting weights to obtain risk scores corresponding to the payment behaviors.
In one possible implementation, the voting weights may be set based on the test results of the different base classifiers during the test. For example, when the test result of the support vector machine in the test process is better than the test result of the tree model, the voting weight corresponding to the support vector machine is greater than the voting weight corresponding to the tree model.
In this embodiment, the first risk recognition result is determined through the multiple sub-risk recognition results, and the target risk level corresponding to the payment behavior is determined as the first risk recognition result based on different scoring thresholds, so that hierarchical policy regulation and control on risks can be realized, and different requirements under the payment risk recognition scene are satisfied.
In order to train the first risk recognition model, input features are determined through feature engineering based on small-part labeled data, and then a plurality of base classifiers in the first risk recognition model are trained in an unsupervised learning mode based on input features corresponding to a large number of unlabeled data.
Referring to fig. 3, fig. 3 is a flowchart of training a first risk identification model provided in an exemplary embodiment of the present application. The flowchart includes the following steps.
In step 301, raw data is acquired.
The raw data may include data corresponding to historical payment behavior. In some embodiments, the raw data may be obtained through a data log of the social payment application.
Step 302, cleaning the original data to obtain a first sample set and a second sample set; the first sample set contains labeling information, the second sample set does not contain labeling information, and the data volume of the second sample set is larger than the data volume of the first sample set.
The original data is cleaned, and a standard historical payment behavior and related data thereof can be obtained, for example, the related data of the historical payment behavior may include various data such as payment type, payment amount, account numbers of both parties, payment time, places of both parties, account registration days, number of friends of the account, credit value, historical payment amount … … and the like.
The first sample set is a small number of sample sets containing labeling information, e.g., the first sample set may include more normal sample sets and fewer abnormal sample sets.
And step 303, performing feature engineering on the first sample set to determine input features of the first risk identification model.
In one possible implementation, information value (Information Value, IV) of the payment behavior related data may be calculated based on the first sample set, and the input feature is determined based on the information value. The information value is an index for evaluating the significance of the variable.
Optionally, it is determined which relevant data is to be used as the input feature based on the level of the information value or the stability of the information value.
By way of example only, by calculating the information value, the first 5 features (e.g., the payer's account number, the payer's location, the number of friends in the account number, the number of days of account registration, the reputation value) with the highest information value and stability meeting the requirements are determined as input features.
In some embodiments, after determining the type of the payment feature through the feature engineering, the server may extract the input feature of the current payment behavior from the payment behavior and the related data thereof sent by the terminal.
Step 304, determining the hyper parameters of the first risk recognition model, and training a plurality of base classifiers of the first risk recognition model in an unsupervised learning manner based on the input features of the second sample set.
In some embodiments, the plurality of base classifiers may be the same type or different types of machine learning models. The number of base classifiers is the superparameter, denoted N. Optionally, N is set to a positive integer between 10 and 30.
In some embodiments, the plurality of base classifiers includes a class of support vector machines derived through unsupervised learning.
In some embodiments, multiple sets of training sample sets may be sampled from the second sample set, different training sample sets corresponding to different base classifiers, and the amount of data in the training sample set being less than the amount of data in the second sample set. Based on the input features of the multiple groups of training sample sets, the multiple base classifiers are respectively trained in an unsupervised learning mode.
Optionally, multiple sets of training sample sets are sampled from the second sample set by self-service sampling (Bootstrap Sampling), randomly with put-back. For example, the plurality of base classifiers are integrated by bagging.
Compared with the single base classifier, the method integrates a plurality of base classifiers in a bagging mode, and has the following beneficial effects:
(1) The robustness of the model can be improved. By combining the sub-risk recognition results of multiple base classifiers, the first risk recognition model can reduce performance degradation of a single base classifier model due to noise, outliers, or overfitting. Differences between the multiple base classifiers can enhance the adaptation of the overall model to complex patterns in the data.
(2) The generalization capability of the model can be improved, and a plurality of base classifiers can capture different features and modes in the data, so that the first risk identification model can make more accurate predictions when facing new samples.
(3) The stability of the model can be improved. A single base classifier may be affected by random fluctuations in the training data, while the first risk recognition model reduces reliance on a particular training sample by combining multiple base classifiers, thereby improving the stability of the model.
(4) Overfitting can be reduced. Multiple training sample sets are generated by self-sampling (Bootstrap Sampling) the training data, with each base classifier trained on a different training sample set, which helps reduce the model's overfitting to a particular training sample.
(5) A more accurate decision function can be determined. Integration of multiple basis classifiers helps to achieve a boundary division of a smoother and more complex decision function. The prediction results of the plurality of base classifiers can compensate each other, so that a more reasonable decision function is obtained between the normal sample and the abnormal sample.
The core idea behind training a class of support vector machines is to find a decision function that can contain most of the normal payment samples. The decision function needs to be as close as possible to the normal payment sample while being as far away as possible from the abnormal payment sample. During the training process, a class of support vector machines learn the distribution of normal payment samples and then identify potential abnormal payment samples from this distribution. The training method does not need to know the distribution of the abnormal payment samples in advance, so that the method is suitable for processing the problem of abnormal detection without marked data.
Regarding the principle of training a class of support vector machines by an unsupervised learning method based on the input features of the second sample set, referring to fig. 4, fig. 4 is a schematic diagram of training a class of support vector machines by an unsupervised learning method according to an exemplary embodiment of the present application.
As shown in fig. 4, the normal payment samples are shown with white circles, the abnormal payment samples are shown with black circles, and for each base classifier in the first risk identification model, a decision function 410 corresponding to each base classifier is determined based on the input features of the second sample set by an unsupervised learning manner.
The decision function 410 is used to distinguish normal payment samples from abnormal payment samples.
Optionally, the decision function is expressed by f (x), and for any payment sample, if the corresponding f (x) is greater than or equal to 0, the payment sample is represented as a normal payment sample; if the corresponding f (x) is less than 0, the payment sample is characterized as an abnormal payment sample.
In some embodiments, constraints may be constructed based on the input features and relaxation variables of the second sample set.
Assuming that the second sample set D contains n input features of unlabeled samples, d= { x 1 ,x z ,...,x n }。
Constructing constraint conditions:
T φ(x i ))>ρ-ζ i ,i=1,…,n,ζ i >0
wherein ω is the weight vector, ζ i Is a relaxation variable, ρ is an offset of the decision function, φ (x i ) Is to take sample x i A function mapped to a high-dimensional feature space.
The constraint condition is used for representing the input characteristics of the second sample set in a range corresponding to the decision function f (x) not less than 0. Wherein the relaxation variable ζ i For adjusting the input features in a range corresponding to f (x) < 0.
In some embodiments, the weight vector ω, regularization parameter v, and relaxation variable ζ may be based on i Constructing an objective function:
where v is a regularization parameter for controlling the sample proportion outside the normal payment samples.
The objective function is used to characterize the distance separation between the input features of the second sample set and the decision function f (x).
In some embodiments, the decision function f (x) may be obtained by solving a function that satisfies the constraint condition and minimizes the objective function.
That is, solving the objective function minimization problem with constraints:
s.t.(ω T φ(x i ))> ρi ,i=1,…,n,ζ i >0
to solve the objective function minimization problem, in some embodiments, the objective function minimization problem may be calculated by introducing a Lagrangian multiplicationAnd beta i Constructing a Lagrange function corresponding to the constraint condition and the objective function:
in some embodiments, to solve the decision function f (x), a kernel function may be introduced using kernel skills based on the lagrangian function to solve the dual problem:
solving a decision function:
wherein K (x i X) is a kernel function for implicitly mapping samples to a high-dimensional feature space, avoiding explicit computation of the representation of samples in the high-dimensional space to better learn the decision function f (x) for non-linear data.
Optionally, the kernel function is one of a linear kernel, a polynomial kernel, and a radial basis function kernel.
In the embodiment, the plurality of base classifiers of the first risk identification model are trained in an unsupervised learning mode based on the second sample set without marking, so that the marking time and the gold Qian Chengben are reduced, and the problem of impure marking data is avoided; meanwhile, through unsupervised learning, a decision function capable of distinguishing normal payment samples from abnormal payment samples is learned, a model training process is simplified, and the distribution of the abnormal payment samples does not need to be known in advance.
Step 305, cleaning the original data to obtain a third sample set; the time nodes corresponding to the samples in the third sample set are different from the time nodes corresponding to the samples in the second sample set, and the third sample set comprises a marked normal sample set and an marked abnormal sample set.
Illustratively, the time nodes for the samples in the second sample set are 3 months, 4 months, and 5 months, and the time nodes for the samples in the third sample set are 6 months. That is, the second sample set is a sample corresponding to a payment activity within 3 months, 4 months, and 5 months, and the third sample set is a sample corresponding to a payment activity within 6 months.
Step 306, testing the plurality of base classifiers on the third sample set.
Optionally, the base classifier is evaluated by one or more metrics on the third sample set.
In some embodiments, the plurality of base classifiers is evaluated by at least one indicator of case coverage, strategic cost performance, kolmogorov-Smirnov test (KS test), area Under working characteristic Curve (Area Under Curve (AUC), distribution of evaluation values.
Optionally, the case coverage is determined based on a ratio of the amount of fraudulent transaction orders covered by the model to the total amount of fraudulent transaction orders.
Optionally, the strategic cost performance is based on a determination of a ratio of the amount of fraudulent transaction orders interfered by the model to the amount of fraudulent transaction orders covered by the model.
In the embodiment, based on the third sample set, the cross-time verification method is adopted for testing, so that the stability of the model can be enhanced, the over-fitting phenomenon is avoided, and the model is ensured to have good generalization capability on unknown data.
And step 307, deploying a plurality of base classifiers as a first risk identification model under the condition that the test result meets the test requirement.
Step 308, online monitoring and reporting are performed on the first risk identification model.
In some embodiments, based on the results of online monitoring and reporting of the first risk identification model, the original data may be re-acquired, feature engineering performed, and unsupervised training performed to update the first risk identification model, further improving the accuracy of the model.
Because the first risk identification model may include at least two types of base classifiers, when result voting is performed based on a plurality of sub-risk identification results corresponding to the plurality of base classifiers, and a risk score corresponding to the payment behavior is obtained, different types of base classifiers may have different voting weights.
By way of example, two types of base classifiers may be included in the first risk identification model. For example, when n=30, the first risk identification model includes 20 support vector machines and 10 tree models.
In some embodiments, the voting weights of the individual base classifiers may be determined, wherein different types of base classifiers correspond to different voting weights; and voting the results based on the multiple sub-risk identification results and the voting weights to obtain risk scores corresponding to the payment behaviors.
Referring to fig. 5, fig. 5 is a schematic diagram of applying for result voting based on a plurality of sub-risk recognition results and voting weights according to an exemplary embodiment.
As shown in fig. 5, a plurality of base classifiers are included in the first risk identification model 510. The plurality of base classifiers comprise a class of support vector machines 511, a class of support vector machines 512 and a tree model 513 … …, wherein the class of support vector machines 511 process the payment feature 501 to obtain a sub-risk identification result 521; a class of support vector machines 512 processes the payment feature 501 to obtain a sub-risk identification result 522; the tree model 513 processes the payment feature 501 to obtain a sub-risk identification result 523 … …
Optionally, the sub-risk identification result is a scoring value of 0-100, and the larger the value is, the higher the possibility that the risk exists in the payment behavior is represented. By way of example only, the sub-risk recognition result 521 corresponding to the class support vector machine 511 is 78, the sub-risk recognition result 522 corresponding to the class support vector machine 512 is 82, and the sub-risk recognition result 523 corresponding to the tree model 513 is 44 … …
Wherein different types of base classifiers correspond to different voting weights, for example, one type of support vector machine 511 and one type of support vector machine 512 correspond to voting weight a, and tree model 513 corresponds to voting weight B.
In one possible implementation, the voting weights may be set based on the evaluation results of the different base classifiers during the test. For example, when the evaluation result of the support vector machine in the test process is better than the evaluation result of the tree model, the voting weight corresponding to the support vector machine is greater than the voting weight corresponding to the tree model.
For example only, the voting weight a and the voting weight B may be determined based on the evaluation results of all kinds of support vector machines and all tree models in the first risk identification model 510 on the third sample set. For example, the voting weight a and the voting weight B may be determined based on AUC values of all support vector machines of one type and all tree models in the first risk identification model 510 on the third sample set, where the voting weight a is greater than the voting weight B when the evaluation result of the support vector machines of one type in the test process is better than the evaluation result of the tree models.
In some embodiments, result voting is performed based on the multiple sub-risk identification results and the voting weights, resulting in a risk score 531 corresponding to the payment behavior, and the first risk identification result 540 is determined based on the risk score 531.
Optionally, the first risk identification result is 1 or 0, which characterizes whether the payment behavior has a payment risk.
Optionally, a risk score threshold (e.g., 30) may be set, and when the risk score 531 is greater than or equal to the risk score threshold, a first risk result is determined to be 1, which characterizes that the payment behavior has a payment risk; when the risk score 531 is less than the risk score threshold, the first risk result is determined to be 0, which characterizes that the payment behavior does not have a payment risk.
In this embodiment, by integrating different types of base classifiers in the first risk identification model, overfitting can be reduced; meanwhile, different voting weights are set for different types of base classifiers based on the evaluation result of the test, so that the recognition accuracy of the first risk recognition model can be further improved.
In order to make the predicted result more accurate, in the application process of the first risk recognition model, the abnormal payment behaviors recognized by the first risk recognition model can be continuously collected, and the second risk recognition model is trained in a supervised learning mode based on an abnormal sample set and a normal sample set; and jointly determining a third risk recognition result based on the first risk recognition model and the second risk recognition model, and processing payment behaviors based on a target payment processing strategy corresponding to the third risk recognition result.
In some embodiments, the payment risk identification method may be divided into three phases. The first stage comprises a training and application process of a first risk identification model, the second stage comprises a training process of a second risk identification model, and the third stage comprises a joint application process of the first risk identification model and the second risk identification model.
Referring to fig. 6, fig. 6 is a schematic diagram of three stages of a payment risk identification method according to an exemplary embodiment of the present application.
The first stage 610 of the payment risk identification method includes training and application of a first risk identification model.
In the first risk recognition model training process of the first stage 610, a second sample set containing no labeling information is obtained through the original data, and based on the second sample set, a plurality of base classifiers in the first risk recognition model are trained through an unsupervised learning mode.
In the first risk identification model application process in the first stage 610, the first risk identification model processes the payment feature, predicts to obtain a first risk identification result, and if the first risk identification result characterizes an abnormal payment behavior, the abnormal payment feature corresponding to the abnormal payment behavior may be added to the abnormal sample set. Wherein the abnormal payment behavior refers to a payment behavior in which the first risk identification result indicates that there is a payment risk.
The second stage 620 of the payment risk identification method includes a training process of the second risk identification model.
In the case where the data amount of the abnormal sample set is greater than the data amount threshold, the second risk recognition model may be trained by a supervised learning approach based on the abnormal sample set and the normal sample set. The normal sample set may be collected through the prediction result of the first risk identification model, or may be collected through a large number of easily acquired normal payment behaviors.
Since the number of fraudulent activities in the social payment scenario is very small relative to the number of normal payment activities, the training of the second risk identification model has the problem that the training data is seriously unbalanced. In some embodiments, the abnormal sample set and the normal sample set may be updated based on a resampling manner, for example, a part of normal samples may be deleted from a plurality of normal sample sets by an undersampling manner, or a new abnormal sample may be generated from the abnormal sample set by an oversampling manner, so as to reduce the problem of inaccurate second risk identification model caused by unbalance of training data.
With respect to the training process of the second risk identification model, in some embodiments, the abnormal sample set and the normal sample set may be input into the second risk identification model to obtain a sample risk identification result. And determining a first loss function based on the sample risk recognition result, the abnormal sample set and the risk recognition result true value corresponding to the normal sample set, and iteratively updating parameters of the second risk recognition model based on the first loss function to obtain a trained second risk recognition model. The output of the second risk identification model after training at this time may be 1 or 0, indicating whether the payment behavior is at risk for payment.
In other embodiments, the abnormal sample set and the normal sample set contain risk level tags corresponding to sample payment features. For example, risk level labels corresponding to sample payment features in a normal sample set are all level 0 risk level labels, and the payment behavior is characterized in that payment risks do not exist; the risk level labels corresponding to the sample payment features in the abnormal sample set can be a level 1 risk level label, a level 2 risk level label and a level 3 risk level label, and the low-level payment risk, the medium-level payment risk and the high-level payment risk of the payment behaviors are respectively represented.
In the training process of the second risk identification model, sample payment features of the abnormal sample set and the normal sample set can be input into the second risk identification model to obtain a predicted risk level; taking the risk level label corresponding to the sample payment feature as a supervision of predicting the risk level, determining a second loss function, and iteratively updating parameters of the second risk recognition model based on the second loss function to obtain a trained second risk recognition model. The output of the second risk identification model after training at this time may be a level 0 risk level, a level 1 risk level, a level 2 risk level, a level 3 risk level, and represent a risk level corresponding to the payment behavior.
In the field of payment fraud recognition, labeling data is scarce, and the number of samples of abnormal payment behavior is small. In order to perform supervised training on the second risk recognition model, in the embodiment, an abnormal sample set is constructed by using the abnormal payment behavior predicted by the first risk recognition model to train the second risk recognition model, so that the labeling time and money cost are reduced, and the training efficiency of the second risk recognition model is improved.
The trained second risk identification model may be applied in conjunction with the first risk identification model to identify payment risk for the payment activity.
A third stage 630 of the payment risk identification method comprises a joint application of the first risk identification model and the second risk identification model.
In the joint application process of the first risk identification model and the second risk identification model in the third stage 630, the payment feature may be input into the first risk identification model, so as to obtain a corresponding first risk identification result; simultaneously inputting the payment characteristics into a second risk identification model to obtain a corresponding second risk identification result, and determining a third risk identification result based on the first risk identification result and the second risk identification result; and processing the payment behavior based on the target payment processing strategy corresponding to the third risk identification result.
With respect to determining a third risk identification result based on the first risk identification result and the second risk identification result, it may include at least one of the following scenarios.
(1) In one possible scenario, the first risk identification result and the second risk identification result are both 1 or 0, which characterizes whether the payment behavior is at risk for payment.
And when at least one of the first risk identification result and the second risk identification result is 1, determining that the third risk identification result is 1, and representing that the payment behavior has payment risk.
And when the first risk identification result and the second risk identification result are both 0, determining that the third risk identification result is 0, and representing that the payment behavior does not have payment risk.
(2) In one possible scenario, the first risk identification result and the second risk identification result are scoring levels, characterizing risk levels of payment behavior.
In some embodiments, the first risk identification result includes a first risk level, and the second risk identification result includes a second risk level; the server may determine the maximum of the first risk level and the second risk level as a third risk level for the payment action.
For example, the first risk recognition result is a level 1 risk level (first risk level), the second risk recognition result is a level 3 risk level (second risk level), and the server takes the maximum value of the first risk recognition result and the second risk recognition result, namely the level 3 risk level, as the third risk level to determine that the third risk recognition result is the level 3 risk level.
For another example, when the first risk identification result and the second risk identification result are both level 2 risk levels, since the first risk level and the second risk level are the same, the maximum value of the first risk level and the second risk level is also level 2 risk level, and the level 2 risk level is used as the third risk level to determine that the third risk identification result is level 2 risk level.
And after determining a third risk identification result of the payment behavior based on the first risk identification result and the second risk identification result, the server processes the payment behavior based on a target payment processing strategy corresponding to the third risk identification result.
When the third risk identification result is a level 1 risk level, the corresponding target payment processing strategy is an alarm strategy, the server sends alarm information to the terminal, and the payment application program of the terminal displays the alarm information to remind the payer; when the third risk identification result is a level 2 risk level, the corresponding target payment processing strategy is an interception strategy, the server sends interception information to the terminal, and the payment application program of the terminal intercepts the payment behavior of the payer; when the third risk identification result is the 3-level risk level, the corresponding target payment processing strategy is an interception and audit strategy, the server sends interception information and audit information to the terminal, the payment application program of the terminal intercepts the payment behavior of the payer, meanwhile, the payer and the payee account are manually audited, and a plurality of forbidden measures such as a sales user and the like are adopted for the verified fraudulent account.
In the embodiment, an abnormal sample set is constructed through the abnormal payment behavior predicted by the first risk identification model to train the second risk identification model, so that the labeling time and money cost are reduced, and the training efficiency of the second risk identification model is improved; meanwhile, the third risk recognition result is determined based on the first risk recognition model and the second risk recognition model together, and the payment behavior is processed based on the target payment processing strategy corresponding to the third risk recognition result, so that a stricter risk recognition effect can be achieved, the payment safety is further improved, and the property loss of a user is reduced.
Referring to fig. 7, fig. 7 is a schematic diagram of a payment risk identification method according to an exemplary embodiment of the present application.
As shown in fig. 7, the process of the payment risk identification method in this embodiment is divided into three parts of offline deployment, risk control, and identification for payment behavior of a payment application.
In some embodiments, the payment risk identification method is applied to a social payment scene and is used for monitoring the social payment fraud risk, identifying fraud, and reducing the cheating risk and loss of a platform and a user.
In the step of off-line deployment, comprising:
And 701, mining and screening out high-significance features based on a large amount of data and case analysis, performing feature engineering, and determining input features.
Step 702, taking the input features as input data of a plurality of base classifiers, and training the plurality of base classifiers by adopting an unsupervised learning mode to obtain an integrated first risk identification model.
At step 703, the effect of the first risk identification model is evaluated across the set of time samples.
And step 704, in the case that the evaluation result meets the requirement, deploying a plurality of base classifiers as a first risk identification model.
In some embodiments, multiple base classifiers may be deployed into the storage service 711 as the first risk identification model. Illustratively, multiple base classifiers may be deployed based on Cloud Key Value store (CKV) and seekers and other structural drawing tools software.
In the step of risk control, a wind control strategy 712 may be set.
The pneumatic control strategy 712 includes preset risk control rules for paying various types of each type. For example, the wind control strategy 712 may include: when the first risk identification result indicates that payment risks exist and the payment behavior belongs to cross-border payment or off-site payment, the payment application program of the terminal intercepts the payment behavior of the payer, simultaneously carries out manual audit on the payer and the payee account, and takes a plurality of forbidden measures, such as a sales user and the like, on the verified fraud account; when the first risk identification result indicates that the payment risk exists and the payment behavior belongs to non-remote payment, the payment application program of the terminal displays warning information to remind a payer.
When the user 713 performs a payment action through the payment application, the first risk identification model deployed in the storage service 111 performs real-time risk identification on the payment action in combination with the wind control policy 712, and performs real-time processing through a corresponding target payment processing policy. When there is a risk of fraud, it may be appreciated by the user 713 that when a transaction is at risk of fraud, thereby preventing the transaction from occurring, some reduction in user fraud and funds loss.
Referring to fig. 8, fig. 8 is a block diagram illustrating a payment risk identification device according to an exemplary embodiment of the present application. The device comprises:
an extracting module 801, configured to extract payment features corresponding to payment behaviors;
the identification module 802 is configured to input the payment features into a plurality of base classifiers in a first risk identification model, to obtain sub-risk identification results output by the plurality of base classifiers, where the base classifiers include a class of support vector machines obtained through unsupervised learning;
the identifying module 802 is further configured to determine a first risk identification result of the payment behavior based on a plurality of the sub-risk identification results, where the first risk identification result is used to characterize whether the payment behavior has a payment risk;
And a processing module 803, configured to process the payment behavior based on a target payment processing policy corresponding to the first risk identification result.
Optionally, the identifying module 802 is further configured to:
performing result voting based on a plurality of sub-risk recognition results to obtain risk scores corresponding to the payment behaviors;
and determining the first risk identification result based on the risk score.
Optionally, the identifying module 802 is further configured to:
determining a target risk level corresponding to the payment behavior as the first risk identification result based on the risk score and at least one primary score threshold, wherein the at least one primary score threshold is used for dividing at least two levels of risk levels;
the processing the payment behavior based on the target payment processing policy corresponding to the first risk identification result includes:
determining a target payment processing strategy corresponding to the target risk level based on a corresponding relation between the risk level and the payment processing strategy;
and processing the payment behavior based on the target payment processing policy.
Optionally, the identifying module 802 is further configured to:
adding an abnormal payment feature corresponding to the abnormal payment behavior identified by the first risk identification model to an abnormal sample set, wherein the abnormal payment behavior refers to a payment behavior of which the first risk identification result indicates that a payment risk exists;
Training a second risk identification model by a supervised learning mode based on the abnormal sample set and the normal sample set under the condition that the data volume of the abnormal sample set is larger than a data volume threshold;
under the condition that the second risk identification model is trained, inputting the payment characteristics into the second risk identification model to obtain a second risk identification result of the payment behavior;
determining a third risk identification result of the payment action based on the first risk identification result and the second risk identification result;
and processing the payment behavior based on a target payment processing strategy corresponding to the third risk identification result.
Optionally, the abnormal sample set and the normal sample set include risk level labels corresponding to sample payment features, and the identifying module 802 is further configured to:
training a second risk identification model based on the abnormal sample set and the normal sample set in a supervised learning mode, wherein the training comprises the following steps:
inputting sample payment features in the abnormal sample set and the normal sample set into the second risk identification model to obtain a predicted risk level;
and taking the risk level label corresponding to the sample payment feature as the supervision of the predicted risk level, and training the second risk identification model.
Optionally, the first risk identification result includes a first risk level, and the second risk identification result includes a second risk level; the identification module 802 is further configured to:
the determining a third risk identification result of the payment behavior based on the first risk identification result and the second risk identification result includes:
and determining the maximum value of the first risk level and the second risk level as a third risk level of the payment action.
Optionally, the first risk identification model includes at least two types of base classifiers; the step of voting results based on the multiple sub-risk recognition results to obtain risk scores corresponding to the payment behaviors comprises the following steps:
determining the voting weight of each base classifier, wherein different types of base classifiers correspond to different voting weights;
and carrying out result voting based on the multiple sub-risk identification results and the voting weights to obtain risk scores corresponding to the payment behaviors.
Optionally, the device further includes a training module, where the training module is configured to:
performing feature engineering on a first sample set, and determining input features of the first risk identification model, wherein the first sample set contains labeling information;
Based on the input features of a second sample set, training a plurality of base classifiers of the first risk identification model in an unsupervised learning mode, wherein the second sample set does not contain labeling information, and the data volume of the second sample set is larger than that of the first sample set.
Optionally, the training module is further configured to:
sampling from the second sample set to obtain a plurality of groups of training sample sets, wherein different training sample sets correspond to different base classifiers, and the data volume of the training sample sets is smaller than that of the second sample set;
based on the input features of the training sample sets, respectively training a plurality of base classifiers in an unsupervised learning mode.
Optionally, the training module is further configured to:
testing a plurality of base classifiers on a third sample set, wherein time nodes corresponding to samples in the third sample set are different from time nodes corresponding to samples in the second sample set, and the third sample set comprises a marked normal sample set and an marked abnormal sample set;
and under the condition that the test result meets the test requirement, deploying a plurality of base classifiers as the first risk identification model.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an exemplary embodiment of the present application. The computer device 900 may be implemented as a terminal or a server in the above-described embodiments.
Optionally, the terminal is a smart phone, a tablet computer, a notebook computer or a desktop computer. Terminals may also be referred to by other names as user equipment, portable terminals, laptop terminals, desktop terminals, etc.
Alternatively, the server may be a background server of the payment application. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.
In some embodiments, the terminal may run a payment application (e.g., a social payment application) upon which the user may conduct payment actions.
The computer device 900 includes a central processing unit (Central Processing Unit, CPU) 901, a system memory 904 including a random access memory 902 and a read only memory 903, and a system bus 905 connecting the system memory 904 and the central processing unit 901. The computer device 900 also includes a basic Input/Output system (I/O) 906, which helps to transfer information between various devices within the computer, and a mass storage device 907, for storing an operating system 913, application programs 914, and other program modules 915.
The basic input/output system 906 includes a display 908 for displaying information and an input device 909, such as a mouse, keyboard, etc., for user input of information. Wherein the display 908 and the input device 909 are connected to the central processing unit 901 via an input output controller 910 connected to the system bus 905. The basic input/output system 906 can also include an input/output controller 910 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 910 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 907 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905. The mass storage device 907 and its associated computer-readable media provide non-volatile storage for the computer device 900. That is, the mass storage device 907 may include a computer readable medium (not shown), such as a hard disk or drive.
The computer readable medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes random access Memory (RAM, random Access Memory), read Only Memory (ROM), flash Memory or other solid state Memory technology, compact disk (CD-ROM), digital versatile disk (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above. The system memory 904 and mass storage device 907 described above may be collectively referred to as memory.
The memory stores one or more programs configured to be executed by the one or more central processing units 901, the one or more programs containing instructions for implementing the methods described above, the central processing unit 901 executing the one or more programs to implement the methods provided by the various method embodiments described above.
According to various embodiments of the present application, the computer device 900 may also operate by being connected to a remote computer on a network, such as the Internet. I.e., the computer device 900 may be connected to the network 912 through a network interface unit 911 coupled to the system bus 905, or other types of networks or remote computer systems (not shown) may be coupled using the network interface unit 911.
The memory also includes one or more programs stored in the memory, the one or more programs including steps for performing the methods provided by the embodiments of the present application, as performed by the computer device.
Embodiments of the present application also provide a computer readable storage medium having at least one instruction stored therein, the at least one instruction being loaded and executed by a processor to implement the method of any of the embodiments described above.
Alternatively, the computer-readable storage medium may include: ROM, RAM, solid state disk (SSD, solid State Drives), or optical disk, etc. The RAM may include, among other things, resistive random access memory (ReRAM, resistance Random Access Memory) and dynamic random access memory (DRAM, dynamic Random Access Memory).
Embodiments of the present application provide a computer program product comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method described in the above embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc. The foregoing description of the preferred embodiments is merely exemplary in nature and is in no way intended to limit the invention, since it is intended that all modifications, equivalents, improvements, etc. that fall within the spirit and scope of the invention.

Claims (14)

1. A payment risk identification method, the method comprising:
extracting payment characteristics corresponding to the payment behaviors;
respectively inputting the payment characteristics into a plurality of base classifiers in a first risk recognition model to obtain sub-risk recognition results output by the plurality of base classifiers, wherein the base classifiers comprise a class of support vector machines obtained through unsupervised learning;
determining a first risk recognition result of the payment behavior based on a plurality of the sub risk recognition results, wherein the first risk recognition result is used for representing whether the payment behavior has a payment risk or not;
and processing the payment behavior based on a target payment processing strategy corresponding to the first risk identification result.
2. The method of claim 1, wherein the determining a first risk identification result of the payment action based on a plurality of the sub-risk identification results comprises:
performing result voting based on a plurality of sub-risk recognition results to obtain risk scores corresponding to the payment behaviors;
and determining the first risk identification result based on the risk score.
3. The method of claim 2, wherein the determining the first risk identification result based on the risk score comprises:
Determining a target risk level corresponding to the payment behavior as the first risk identification result based on the risk score and at least one primary score threshold, wherein the at least one primary score threshold is used for dividing at least two levels of risk levels;
the processing the payment behavior based on the target payment processing policy corresponding to the first risk identification result includes:
determining a target payment processing strategy corresponding to the target risk level based on a corresponding relation between the risk level and the payment processing strategy;
and processing the payment behavior based on the target payment processing policy.
4. The method according to claim 2, wherein the method further comprises:
adding an abnormal payment feature corresponding to the abnormal payment behavior identified by the first risk identification model to an abnormal sample set, wherein the abnormal payment behavior refers to a payment behavior of which the first risk identification result indicates that a payment risk exists;
training a second risk identification model by a supervised learning mode based on the abnormal sample set and the normal sample set under the condition that the data volume of the abnormal sample set is larger than a data volume threshold;
under the condition that the second risk identification model is trained, inputting the payment characteristics into the second risk identification model to obtain a second risk identification result of the payment behavior;
Determining a third risk identification result of the payment action based on the first risk identification result and the second risk identification result;
and processing the payment behavior based on a target payment processing strategy corresponding to the third risk identification result.
5. The method of claim 4, wherein the abnormal sample set and the normal sample set contain risk level tags corresponding to sample payment features;
training a second risk identification model based on the abnormal sample set and the normal sample set in a supervised learning mode, wherein the training comprises the following steps:
inputting sample payment features in the abnormal sample set and the normal sample set into the second risk identification model to obtain a predicted risk level;
and taking the risk level label corresponding to the sample payment feature as the supervision of the predicted risk level, and training the second risk identification model.
6. The method of claim 5, wherein the first risk identification result includes a first risk level and the second risk identification result includes a second risk level;
the determining a third risk identification result of the payment behavior based on the first risk identification result and the second risk identification result includes:
And determining the maximum value of the first risk level and the second risk level as a third risk level of the payment action.
7. The method of claim 2, wherein the first risk identification model includes at least two types of base classifiers;
the step of voting results based on the multiple sub-risk recognition results to obtain risk scores corresponding to the payment behaviors comprises the following steps:
determining the voting weight of each base classifier, wherein different types of base classifiers correspond to different voting weights;
and carrying out result voting based on the multiple sub-risk identification results and the voting weights to obtain risk scores corresponding to the payment behaviors.
8. The method according to claim 1, wherein the method further comprises:
performing feature engineering on a first sample set, and determining input features of the first risk identification model, wherein the first sample set contains labeling information;
based on the input features of a second sample set, training a plurality of base classifiers of the first risk identification model in an unsupervised learning mode, wherein the second sample set does not contain labeling information, and the data volume of the second sample set is larger than that of the first sample set.
9. The method of claim 8, wherein training the plurality of base classifiers of the first risk identification model via an unsupervised learning based on the input features of the second sample set comprises:
sampling from the second sample set to obtain a plurality of groups of training sample sets, wherein different training sample sets correspond to different base classifiers, and the data volume of the training sample sets is smaller than that of the second sample set;
based on the input features of the training sample sets, respectively training a plurality of base classifiers in an unsupervised learning mode.
10. The method of claim 8, wherein after training the plurality of base classifiers of the first risk identification model via an unsupervised learning based on the input features of the second sample set, the method further comprises:
testing a plurality of base classifiers on a third sample set, wherein time nodes corresponding to samples in the third sample set are different from time nodes corresponding to samples in the second sample set, and the third sample set comprises a marked normal sample set and an marked abnormal sample set;
and under the condition that the test result meets the test requirement, deploying a plurality of base classifiers as the first risk identification model.
11. A payment risk identification device, the device comprising:
the extraction module is used for extracting payment characteristics corresponding to the payment behaviors;
the identification module is used for respectively inputting the payment characteristics into a plurality of base classifiers in a first risk identification model to obtain sub-risk identification results output by the plurality of base classifiers, wherein the base classifiers comprise a class of support vector machines obtained through unsupervised learning;
the identification module is further configured to determine a first risk identification result of the payment behavior based on a plurality of the sub-risk identification results, where the first risk identification result is used to characterize whether the payment behavior has a payment risk;
and the processing module is used for processing the payment behavior based on a target payment processing strategy corresponding to the first risk identification result.
12. A computer device, the computer device comprising a processor and a memory; the memory stores at least one instruction for execution by the processor to implement the payment risk identification method of any of claims 1 to 10.
13. A computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the payment risk identification method of any of claims 1 to 10.
14. A computer program product, the computer program product comprising computer instructions stored in a computer readable storage medium; a processor of a terminal reads the computer instructions from the computer readable storage medium, the processor executing the computer instructions, causing the terminal to perform the payment risk identification method of any one of claims 1 to 10.
CN202310754485.2A 2023-06-25 2023-06-25 Payment risk identification method, device, computer equipment and storage medium Pending CN117575595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310754485.2A CN117575595A (en) 2023-06-25 2023-06-25 Payment risk identification method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310754485.2A CN117575595A (en) 2023-06-25 2023-06-25 Payment risk identification method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117575595A true CN117575595A (en) 2024-02-20

Family

ID=89892398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310754485.2A Pending CN117575595A (en) 2023-06-25 2023-06-25 Payment risk identification method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117575595A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118365334A (en) * 2024-06-17 2024-07-19 广州合利宝支付科技有限公司 Payment safety identification method and system of payment terminal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118365334A (en) * 2024-06-17 2024-07-19 广州合利宝支付科技有限公司 Payment safety identification method and system of payment terminal

Similar Documents

Publication Publication Date Title
Ogwueleka Data mining application in credit card fraud detection system
US20220103589A1 (en) Predicting data tampering using augmented machine learning models
CN109993233B (en) Method and system for predicting data auditing objective based on machine learning
CN108399509A (en) Determine the method and device of the risk probability of service request event
CN110689438A (en) Enterprise financial risk scoring method and device, computer equipment and storage medium
US20220327541A1 (en) Systems and methods of generating risk scores and predictive fraud modeling
US11715106B2 (en) Systems and methods for real-time institution analysis based on message traffic
CN105809448B (en) Clustering method and system for account transactions
CN113095927B (en) Method and equipment for identifying suspected transactions of backwashing money
CN111127178A (en) Data processing method and device, storage medium and electronic equipment
CN110659961A (en) Method and device for identifying off-line commercial tenant
WO2022221202A1 (en) Systems and methods of generating risk scores and predictive fraud modeling
CN117764741A (en) Medical exception violation big data risk early warning method based on unsupervised machine learning and ensemble learning
CN117575595A (en) Payment risk identification method, device, computer equipment and storage medium
CN117670359A (en) Abnormal transaction data identification method and device, storage medium and electronic equipment
US11916927B2 (en) Systems and methods for accelerating a disposition of digital dispute events in a machine learning-based digital threat mitigation platform
CN118154186A (en) Method, device and server for determining abnormal operation of transaction service
Kumar et al. Tax Management in the Digital Age: A TAB Algorithm-based Approach to Accurate Tax Prediction and Planning
CN117132383A (en) Credit data processing method, device, equipment and readable storage medium
CN116883147A (en) Credit card fraud prediction method and device based on TCN model and electronic equipment
CN112926989B (en) Bank loan risk assessment method and equipment based on multi-view integrated learning
CN115482084A (en) Method and device for generating wind control rule set
CN114626863A (en) Detection method, device, equipment and storage medium for export tax cheating enterprise
Kaur Development of Business Intelligence Outlier and financial crime analytics system for predicting and managing fraud in financial payment services
Lee et al. Application of machine learning in credit risk scorecard

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication