CN113094407A - Anti-money laundering identification method, device and system based on horizontal federal learning - Google Patents
Anti-money laundering identification method, device and system based on horizontal federal learning Download PDFInfo
- Publication number
- CN113094407A CN113094407A CN202110264163.0A CN202110264163A CN113094407A CN 113094407 A CN113094407 A CN 113094407A CN 202110264163 A CN202110264163 A CN 202110264163A CN 113094407 A CN113094407 A CN 113094407A
- Authority
- CN
- China
- Prior art keywords
- data
- sample
- feature
- money laundering
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses an anti-money laundering identification method, device and system based on horizontal federal learning, wherein the method comprises the steps of firstly, carrying out feature alignment on data features provided by each participating node, and extracting basic data features for constructing an anti-money laundering model; carrying out sample synchronization according to the user ID of each data sample uploaded by each participating node and the sample generation time; and issuing a time sequence characteristic construction instruction to each participating node, constructing a final characteristic value of the required time sequence characteristic, and issuing the final characteristic value to each participating node, so that each participating node constructs an anti-money laundering identification model according to the acquired time sequence characteristic value and the characteristic value of the data characteristic of the participating node, through transverse federal learning, and finally performs anti-money laundering identification according to the constructed anti-money laundering model. By implementing the embodiment of the invention, the accuracy of anti-money laundering identification can be improved.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an anti-money laundering identification method, device and system based on horizontal federal learning.
Background
In the existing anti-money laundering judgment based on machine learning, each security company independently trains a model by using respective transaction data, and then carries out anti-money laundering judgment; in the anti-money laundering model construction process, the required data is mainly divided into two types; one type is a single feature, the value of which depends on the current record, such as the age or professional characteristics of the customer; the other is a time-series signature, which relies on multiple records. For example, the number of transactions of a certain client in the last month, this feature needs to be obtained by summarizing all the transaction records of the client in the last month; however, different transaction data of the same client may exist in different companies, and the data of different companies have confidentiality and cannot be communicated with each other, if the anti-money laundering model is constructed only by the data of a single company, the constructed time sequence characteristics are not accurate due to incomplete data, and the accuracy of the model is low, in addition, the number of historical money laundering cases of the single company is small, the model constructed by the data of only one company has an overfitting phenomenon and a large error,
disclosure of Invention
The embodiment of the invention provides an anti-money laundering identification method, device and system based on horizontal federal learning, which can improve the accuracy of anti-money laundering identification.
An embodiment of the present invention provides an anti-money laundering identification method based on horizontal federal learning, including:
performing feature alignment on each data feature in the sample data tables of the plurality of participating nodes to generate basic data features for constructing an anti-money laundering model; each sample data table comprises a plurality of data samples, and each data sample is provided with a user ID and sample generation time;
carrying out sample synchronization on the sample data table of each participating node according to the user ID and the sample generation time; when sample synchronization is carried out, the user ID and the sample generation time of a selected data sample in a current participating node are sent to the participating node which does not own the selected data sample but owns the data sample with the same user ID as the selected data sample;
issuing a time sequence feature construction instruction to each participating node, so that each participating node calculates a basic feature value of the time sequence feature to be constructed based on a sample data table after sample synchronization according to statistical time dimension information, a feature name and a calculation mode of the required basic data feature, which are contained in the time sequence feature construction instruction, when receiving the time sequence feature construction instruction; calculating a final characteristic value of the time sequence characteristic according to each basic characteristic value;
and issuing the final characteristic value of the time sequence characteristic to each participating node, so that each participating node generates an anti-money laundering identification model through transverse federal learning according to the final characteristic value of the time sequence characteristic and the characteristic value of the data characteristic of the participating node, and performs anti-money laundering identification according to the anti-money laundering identification model.
Further, the performing feature alignment on each data feature in the sample data table of the plurality of participating nodes to generate a basic data feature for constructing an anti-money laundering model specifically includes:
taking the feature intersection of each data feature in the sample data table of each participating node to obtain a plurality of first basic data features;
calculating the global effective rate of each data characteristic except the first basic data characteristic one by one; taking the data characteristic with the global effective rate exceeding a first preset threshold value as a second basic data characteristic;
and taking all the first basic data features and all the second basic data features as the basic data features for constructing the anti-money laundering model.
Further, the global efficiency of a data feature is calculated by the following formula:
wherein, grIs a global efficiency of a data feature, M is the number of participating nodes, IrMLocally efficient, n, at Mth participating node for data characterizationMThe number of data samples for the mth participating node.
On the basis of the above method item embodiments, the present invention correspondingly provides apparatus item embodiments.
The invention provides an anti-money laundering identification device based on transverse federal learning, which comprises a feature alignment module, a sample synchronization module, a time sequence feature construction module and an anti-money laundering identification module, wherein the feature alignment module is used for aligning the feature of a user;
the characteristic alignment module is used for performing characteristic alignment on each data characteristic in the sample data table of the plurality of participating nodes to generate a basic data characteristic for constructing an anti-money laundering model; each sample data table comprises a plurality of data samples, and each data sample is provided with a user ID and sample generation time;
the sample synchronization module is used for carrying out sample synchronization on the sample data table of each participating node according to the user ID and the sample generation time; when sample synchronization is carried out, the user ID and the sample generation time of a selected data sample in a current participating node are sent to the participating node which does not own the selected data sample but owns the data sample with the same user ID as the selected data sample;
the time sequence feature construction module is used for issuing a time sequence feature construction instruction to each participating node so that when each participating node receives the time sequence feature construction instruction, the sample data table after sample synchronization calculates the basic feature value of the time sequence feature required to be constructed according to the statistical time dimension information contained in the time sequence construction instruction, the feature name of the required basic data feature and the calculation mode; calculating a final characteristic value of the time sequence characteristic according to each basic characteristic value;
and the anti-money laundering identification module is used for issuing the final characteristic value of the time sequence characteristic to each participating node, so that each participating node generates an anti-money laundering identification model through transverse federal learning according to the final characteristic value of the time sequence characteristic and the characteristic value of the data characteristic of the participating node, and carries out anti-money laundering identification according to the anti-money laundering identification model.
Further, the feature alignment module performs feature alignment on each data feature in the sample data table of the plurality of participating nodes to generate a basic data feature for constructing an anti-money laundering model, and specifically includes:
taking the feature intersection of each data feature in the sample data table of each participating node to obtain a plurality of first basic data features;
calculating the global effective rate of each data characteristic except the first basic data characteristic one by one; taking the data characteristic with the global effective rate exceeding a first preset threshold value as a second basic data characteristic;
and taking all the first basic data features and all the second basic data features as the basic data features for constructing the anti-money laundering model.
Further, the feature alignment module calculates a global efficiency of a data feature by the following formula:
wherein, grIs a global efficiency of a data feature, M is the number of participating nodes, IrMLocally efficient, n, at Mth participating node for data characterizationMThe number of data samples for the mth participating node.
On the basis of the embodiment of the device item, the invention provides an anti-money laundering identification system based on horizontal federal learning, which comprises a central node and a plurality of participating nodes; the central node comprises the anti-money laundering identification device based on the horizontal federal learning of the invention.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides an anti-money laundering identification method, device and system based on transverse federal learning, wherein the method comprises the steps of firstly carrying out feature alignment on data features provided by each participating node, extracting basic data features for constructing an anti-money laundering model, then carrying out sample synchronization according to user IDs (identity) of data samples uploaded by each participating node and sample generation time, and then issuing a time sequence feature construction instruction to each participating node, so that each participating node calculates a basic feature value of a time sequence feature required to be constructed based on a sample data sheet after sample synchronization according to the time sequence feature construction instruction; and then, each participating node constructs an anti-money laundering identification model according to the acquired time sequence characteristic value and the characteristic value of the data characteristic of the participating node by combining the acquired time sequence characteristic value with the characteristic value of the data characteristic of the participating node through horizontal federal learning, and finally performs anti-money laundering identification according to the constructed anti-money laundering model. Compared with the prior art, the method and the device have the advantages that the time sequence characteristics are constructed by combining the data of all the participating nodes, the problem that the constructed time sequence characteristics are inaccurate due to incomplete data is solved, the number of samples is increased through horizontal federal learning, the accuracy of the constructed anti-money laundering model is improved, and the anti-money laundering identification can be carried out more accurately.
Drawings
Fig. 1 is a schematic flow chart of an anti-money laundering identification method based on horizontal federal learning according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an anti-money laundering identification device based on horizontal federal learning according to an embodiment of the present invention.
Fig. 3 is a system architecture diagram of an anti-money laundering identification system based on horizontal federal learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides an anti-money laundering identification method based on horizontal federal learning, which at least includes:
step S101: performing feature alignment on each data feature in the sample data tables of the plurality of participating nodes to generate basic data features for constructing an anti-money laundering model; each sample data table comprises a plurality of data samples, and each data sample is provided with a user ID and a sample generation time.
Step S102: carrying out sample synchronization on the sample data table of each participating node according to the user ID and the sample generation time; when the sample synchronization is carried out, the user ID and the sample generation time of a selected data sample in the current participating node are sent to the participating node which does not own the selected data sample but owns the data sample with the same user ID as the selected data sample.
Step S103: issuing a time sequence feature construction instruction to each participating node, so that each participating node calculates a basic feature value of the time sequence feature to be constructed based on a sample data table after sample synchronization according to statistical time dimension information, a feature name and a calculation mode of the required basic data feature, which are contained in the time sequence feature construction instruction, when receiving the time sequence feature construction instruction; and calculating a final characteristic value of the time sequence characteristic according to each basic characteristic value.
Step S104: and issuing the final characteristic value of the time sequence characteristic to each participating node, so that each participating node generates an anti-money laundering identification model through transverse federal learning according to the final characteristic value of the time sequence characteristic and the characteristic value of the data characteristic of the participating node, and the anti-money laundering identification model performs anti-money laundering identification.
It should be noted that the anti-money laundering identification method based on horizontal federal learning is suitable for being operated at a central node.
In step S101, in a preferred embodiment, the performing feature alignment on each data feature in the sample data table of the plurality of participating nodes to generate a basic data feature for constructing an anti-money laundering model specifically includes:
taking the feature intersection of each data feature in the sample data table of each participating node to obtain a plurality of first basic data features; calculating the global effective rate of each data characteristic except the first basic data characteristic one by one; taking the data characteristic with the global effective rate exceeding a first preset threshold value as a second basic data characteristic; and taking all the first basic data features and all the second basic data features as the basic data features for constructing the anti-money laundering model.
In particular, in the anti-money laundering scenario of the securities industry, the problem of insufficient sample size of each participating node is often faced, and therefore lateral federal learning is introduced to solve the problem. In horizontal federal learning, each participating node tends to have a different sample, but the features held by each party overlap significantly. Therefore, before federal learning is carried out, feature alignment is carried out on each participating node, and features common to all parties are screened out for training. However, in the classical horizontal federated learning scenario, feature alignment is performed directly using feature intersections of the individual participating nodes. Thus, if some features exist only in part of the participating nodes, the features may be discarded even if the fill-in rate of the features is high. To address this problem, the present invention employs a new data feature alignment method to perform data feature alignment, which will be described in detail below.
Firstly, a sample data table of each participating node comprises a plurality of data samples, and each data sample records a plurality of data items (namely the data characteristic items); each data item comprises a data item name and a corresponding numerical value; the specific data items contained by each participating node may vary, but generally include: basic information of a user, historical transaction information of the user and historical non-transaction information of the user; basic information of the user: such as the user's age, position, annual income, gender, nationality, place of residence, etc.; the user historical trading information is the security consignment record of the user history, such as: commission price, target, etc.; the user history non-trading information is a record of the behavior of some unrelated trades performed by the user at the security company, such as: changing the records of the deposit bank and the records of fund transfer-in and transfer-out, etc.
When data characteristics are aligned, each participating node uploads the field name of each data item in the sample data table of the participating node to the central node; after receiving the data, the central node firstly calculates the intersection of the data items uploaded by each participating node, and takes the data items of all participating nodes as a first basic data characteristic;
according to the remaining data items, calculating the global effective rate of the remaining data items at the local effective rate of each participating node, extracting the data items of which the global effective rate reaches a preset threshold value, obtaining a plurality of second basic data characteristics, combining the first basic data characteristics and the second basic data characteristics, obtaining the basic data characteristics finally used for constructing an anti-money laundering model, and finishing the characteristic alignment:
the local effective rate of a data item at a single participating node can be characterized by the filling rate of the data item at the single participating node, and the higher the filling rate is, the higher the local effective rate is; if the data item is not in the sample data table of a participating node, the local effective rate of the data item at the participating node is 0;
effective fraction of the local area IrIndicating the efficiency for characterizing the data characteristics held by a single participating node.
Global effective rate of grAnd representing the overall efficiency of the data characteristic in all the participating nodes, and determining whether the characteristic participates in the subsequent federal learning training process.
The globally efficient computation may take the following form:
in the formula, grIs a global efficiency of a data feature, M is the number of participating nodes, IrMLocally efficient, n, at Mth participating node for data characterizationMThe number of data samples for the mth participating node.
Calculating the global validity of the rest data items by adopting the formula, and if the global validity g is reachedrGreater than a first preset threshold gth(ii) a Then the data characteristic is used as a second basic data characteristic for the subsequent training of the anti-money laundering model, the effect of the anti-money laundering recognition model can be improved by adopting the characteristic alignment method provided by the invention, wherein the first preset threshold value gthMay be determined as a hyper-parameter in subsequent lateral federal learning model training.
For step S102; the anti-money laundering scenario is a typical time-series class scenario, and the constructed data sample is often time-information (i.e., the sample generation time) because the same customer may have different money laundering risks at different times. Thus different customers may be treated as different samples at different times. The same client may trade at different participating nodes; therefore, the data samples in the sample data tables of different participating nodes may have the same data sample (the data sample is considered to be the same if the user ID and the sample generation time are both the same as each other, taking the user ID and the sample generation time as the criteria); it is also possible to have samples with the same user ID but different sample generation times
As shown in table 1, the representation provides data samples held by three different participating nodes:
TABLE 1
As can be seen from Table 1, participating node m1And m3Having identical data samples, i.e.Andparticipating node m1And m2The held data samples are different from one another, but the samples are differentAndare all of user U1. Because the clients of the participating nodes are overlapped, when the characteristics of the participating nodes are constructed, the data of the same client contained in other participating nodes can be used for improving the model effect. Therefore, in modeling the anti-money laundering scene in the securities industry, not only feature alignment but also sample synchronization are carried out. The sample synchronization method specifically comprises the following steps:
firstly, each participating node sends a respective data sample to a central node by using an ins _ sync message, the sent data sample only comprises a user ID and sample generation time, and the central node integrates the ins _ sync message sent by each participating node after receiving the ins _ sync message. Followed by sample synchronization of the various participating nodes. The specific synchronization mode is as follows: and for a certain sample of a certain participating node, sending the user ID and the sample generation time of the certain sample to each participating node which does not own the sample but owns the sample with the same user ID as the sample. The samples held by each participating node after sample synchronization are shown in table 2:
TABLE 2
From table 2, it can be seen that after sample synchronization, the participating node m1Increase aThe user ID of (a) and the sample generation time,participating node m2Increase aUser ID and sample generation time; participating node m3(ii) a Increase aUser ID and sample generation time. It should be noted that, when sample synchronization is performed, only the user ID and the sample generation time are synchronized, and the numerical values of the data items in the data sample are not synchronized; e.g. the above-mentioned participating node m1After sample synchronization, m1The user ID of U1 is added to the sample data table of (2), the sample generation time is 20201010, but the values of the data items in the data sample are all null.
For step S103, after the feature alignment and the sample synchronization, the time sequence feature construction process required for training the anti-money laundering model may be started, as mentioned in the above background art, the time sequence feature construction requires historical data, and the historical data of the same client may be scattered in each participating node. For this type of feature, it is therefore necessary for the individual participating nodes to be constructed with the aid of a central node. In the following, for some common timing characteristics, a communication protocol is designed to construct the characteristics on the premise of ensuring the security of basic data. Other more complex feature constructions may be combined or modified based on these common timing-like constructions.
The following details a given sample s, which is required to construct the relevant communication protocol for each type of feature on the data column c within the time window w.
1: summing type time series feature construction (e.g. to ask a user for the amount of the last month of a transaction)
Taking the configuration of the w _ sum _ trx _ amt _3m feature as an example, the feature means the total transaction amount of the customer within three months before the sample date. To construct the feature, the central node sends a summation type time sequence feature construction instruction to each participating node through a window _ sum _ cal message. The format of the window _ sum _ cal message is shown in Table 3
proto _ type (protocol type) | window_sum_cal |
fe _ name (feature name) | w_sum_trx_amt_3m |
W (time window length) | 3 months old |
C (data column) | Trx_amt |
TABLE 3
In table 3, the proto _ type protocol type corresponds to the calculation method included in the timing structure instruction of the present invention, W (time window length) corresponds to the statistical time dimension information included in the timing structure instruction of the present invention, and C (data column) corresponds to the feature name of the required basic data feature included in the timing structure value instruction of the present invention.
After each participating node receives the window _ sum _ cal message, the sum of the values of the data column c of each sample in the corresponding time window is directly calculated for the sample held by each participating nodeAnd then the data is sent to the central node through a window _ sum _ result message. The window _ sum _ result message format is as follows
Shown in Table 4:
TABLE 4
After receiving the window _ sum _ result protocol from each participating node, the central node directly sums the basic characteristic values of each participating node to obtain the final characteristic value of the characteristic. The final characteristic value is then sent back to each participating node via window _ sum _ notify.
For example: suppose at this time Trx _ amt is the transaction amount and the data sample is<ID1,20201225>(ii) a Then join node m1、m2、m3When the message w _ sum _ trx _ amt _3m is received, the data sample is pointed to<ID1,20201225>Based on the sample data table, extracting the data value of the data item of the "transaction amount" of the client ID1 in the time period 20200925-20201225, and summing the data values to obtain the summed value (i.e. the basic characteristic value of the time sequence characteristic required to be constructed); then each participating node sends the summed value, the user ID of the corresponding sample and the sample generation time to the central node; and the central node sums the summed values of all the participating nodes again to obtain a final value (namely, a final characteristic value of the time sequence characteristic). This final value is the value of the sum of the transaction amounts for the customer ID1 within the first 3 months of 2020/12/25 (i.e., 2020/09/25-2020/12/25). After calculating this value, the central node sends the final characteristic value back to the respective participating nodes.
2. Constructing a most-valued class time sequence characteristic: for such features, it can be constructed in a similar way to the summation-like features described above, for each sample, calculating the maximum/minimum value of the data column c within the time window w by the participating node to which the sample belongs, and sending the result to the central node.
3. And (3) constructing an average value class time sequence characteristic: taking the configuration of the w _ avg _ trx _ amt _3m feature as an example, the feature means the average transaction amount (total transaction amount divided by total transaction number) of the customer within three months before the sample date. To construct this feature, the central node first issues an indication to each participating node via a window _ avg _ cal message. The message format is shown in table 5.
TABLE 5
After receiving the window _ avg _ cal packet, each participating node calculates the sum of the data columns c within the time window w for each sampleThen, the data quantity sum of each sample of each participating node in the time window wAnd calculatedAnd sending the data to the central node through a window _ avg _ result message. The window _ avg _ result message format is shown in table 6:
TABLE 6
After the intermediate node receives the data sent by each participating node, the average value of the data column c of each sample in the time window w can be calculated through the following formulaAnd then sent back to the respective participating nodes.
4. And (3) constructing standard deviation time sequence characteristics: take the construction of the w _ std _ trx _ amt _3m feature as an example, which means the standard deviation of the transaction amount of the customer within three months before the sample date, usingTo characterize the customer's discrete degree of each recent transaction amount. Given a sample s, the standard deviation of the data column c is characterized in order to find it within the time window wThe central node firstly sends down indication to each participating node through a window _ std _ cal message. The message format is shown in table 7:
TABLE 7
Further, through the average value class feature construction process, the central node can obtain the global average value of the data column c of the sample s in the time window wThe central node then sends the average to all participating nodes holding the sample via a window _ mss _ cal message. The message format is shown in table 8.
Table 8 the participating nodes, after receiving the protocol request, calculate the MSS value by the following formula.
In the above formula, the first and second carbon atoms are,representing the set of data records in the participating node m that the sample s contains in its time window w. Vm,r,cRepresenting the value of column c of data record r in participating node m. Then each participating node transmits own information through the window _ mss _ result messageAnd calculatedThe value is sent to the central node. The message format is shown in table 9:
TABLE 9
The central node can calculate the characteristic value of each sample according to the following formula according to the received data. And sends the characteristic value to the participating node that originally holds the sample.
And constructing each time sequence characteristic and a corresponding final characteristic value according to the construction mode of each time sequence characteristic.
For step S104, the central node issues the final characteristic value of each time sequence characteristic to each participating node, each participating node trains a preliminary anti-money laundering identification model according to the final characteristic value of the issued time sequence characteristic in combination with the numerical value of the data item of the central node and sends the obtained gradient information to the central node, the central node aggregates the gradient information sent by each participating node to generate combined gradient information and issues the combined gradient information to each participating node, so that each participating node iteratively updates the preliminary anti-money laundering identification model according to the combined gradient information to obtain a final anti-money laundering identification model; and then carrying out anti-money laundering recognition based on the anti-money laundering model obtained by final training.
It should be noted that the central node and each participating node in the present invention can be understood as a server.
On the basis of the embodiment of the method item, the invention correspondingly provides an embodiment of a device item;
as shown in fig. 2, an embodiment of the present invention provides an anti-money laundering recognition apparatus based on horizontal federal learning, including: the device comprises a feature alignment module, a sample synchronization module, a time sequence feature construction module and a feature distribution module;
the characteristic alignment module is used for performing characteristic alignment on each data characteristic in the sample data table of the plurality of participating nodes to generate a basic data characteristic for constructing an anti-money laundering model; each sample data table comprises a plurality of data samples, and each data sample is provided with a user ID and sample generation time;
the sample synchronization module is used for carrying out sample synchronization on the sample data table of each participating node according to the user ID and the sample generation time; when sample synchronization is carried out, the user ID and the sample generation time of a selected data sample in a current participating node are sent to the participating node which does not own the selected data sample but owns the data sample with the same user ID as the selected data sample;
the time sequence feature construction module is used for issuing a time sequence feature construction instruction to each participating node so that each participating node calculates a basic feature value of the time sequence feature to be constructed according to the statistical time dimension information contained in the time sequence feature construction instruction, the feature name of the required basic data feature and the calculation mode when receiving the time sequence feature construction instruction; calculating a final characteristic value of the time sequence characteristic based on a sample data table after sample synchronization according to each basic characteristic value;
and the anti-money laundering identification module is used for issuing the final characteristic value of the time sequence characteristic to each participating node, so that each participating node generates an anti-money laundering identification model through transverse federal learning according to the final characteristic value of the time sequence characteristic and the characteristic value of the data characteristic of the participating node, and carries out anti-money laundering identification according to the anti-money laundering identification model.
In a preferred embodiment, the feature alignment module performs feature alignment on each data feature in the sample data table of the plurality of participating nodes to generate a basic data feature for constructing an anti-money laundering model, and specifically includes: taking the feature intersection of each data feature in the sample data table of each participating node to obtain a plurality of first basic data features; calculating the global effective rate of each data characteristic except the first basic data characteristic one by one; taking the data characteristic with the global effective rate exceeding a first preset threshold value as a second basic data characteristic; and taking all the first basic data features and all the second basic data features as the basic data features for constructing the anti-money laundering model.
In a preferred embodiment, the feature alignment module calculates a global efficiency of a data feature by the following formula:
wherein, grIs a global efficiency of a data feature, M is the number of participating nodes, IrMLocally efficient, n, at Mth participating node for data characterizationMThe number of data samples for the mth participating node.
It should be noted that the above device item embodiments correspond to the method item embodiments of the present invention, and can implement any one of the anti-money laundering identification methods based on horizontal federal learning of the present invention; in addition, the described device embodiments are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
On the basis of the above device item embodiment, the present invention correspondingly provides a system item embodiment;
as shown in fig. 3, an embodiment of the present invention provides an anti-money laundering identification system based on horizontal federal learning, which includes a central node and a plurality of participating nodes; wherein, the central node comprises any one of the above mentioned anti-money laundering identification devices based on horizontal federal learning.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention carries out feature synchronization and sample synchronization on the data of each participating node, and then combines the data of each participating node to construct the time sequence feature, thereby avoiding the problem of inaccurate constructed time sequence feature caused by incomplete data, improving the accuracy of the anti-money laundering identification model, enlarging the number of samples through horizontal federal learning, and further improving the accuracy of the constructed anti-money laundering model. And finally, the constructed model can be used for more accurately carrying out anti-money laundering recognition.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
Claims (7)
1. An anti-money laundering identification method based on horizontal federal learning is characterized by comprising the following steps:
performing feature alignment on each data feature in the sample data tables of the plurality of participating nodes to generate basic data features for constructing an anti-money laundering model; each sample data table comprises a plurality of data samples, and each data sample is provided with a user ID and sample generation time;
carrying out sample synchronization on the sample data table of each participating node according to the user ID and the sample generation time; when sample synchronization is carried out, the user ID and the sample generation time of a selected data sample in a current participating node are sent to the participating node which does not own the selected data sample but owns the data sample with the same user ID as the selected data sample;
issuing a time sequence feature construction instruction to each participating node, so that each participating node calculates a basic feature value of the time sequence feature to be constructed based on a sample data table after sample synchronization according to statistical time dimension information, a feature name and a calculation mode of the required basic data feature, which are contained in the time sequence feature construction instruction, when receiving the time sequence feature construction instruction; calculating a final characteristic value of the time sequence characteristic according to each basic characteristic value;
and issuing the final characteristic value of the time sequence characteristic to each participating node, so that each participating node generates an anti-money laundering identification model through transverse federal learning according to the final characteristic value of the time sequence characteristic and the characteristic value of the data characteristic of the participating node, and performs anti-money laundering identification according to the anti-money laundering identification model.
2. The anti-money laundering identification method based on horizontal federal learning of claim 1, wherein the generating of the basic data features for constructing the anti-money laundering model by performing feature alignment on each data feature in the sample data table of the plurality of participating nodes specifically comprises:
taking the feature intersection of each data feature in the sample data table of each participating node to obtain a plurality of first basic data features;
calculating the global effective rate of each data characteristic except the first basic data characteristic one by one; taking the data characteristic with the global effective rate exceeding a first preset threshold value as a second basic data characteristic;
and taking all the first basic data features and all the second basic data features as the basic data features for constructing the anti-money laundering model.
3. The method for anti-money laundering identification based on horizontal federal learning of claim 2, wherein the global effectiveness of a data feature is calculated by the following formula:
wherein, grIs a global efficiency of a data feature, M is the number of participating nodes, IrMLocally efficient, n, at Mth participating node for data characterizationMThe number of data samples for the mth participating node.
4. An anti-money laundering recognition apparatus based on horizontal federal learning, comprising: the system comprises a feature alignment module, a sample synchronization module, a time sequence feature construction module and an anti-money laundering identification module;
the characteristic alignment module is used for performing characteristic alignment on each data characteristic in the sample data table of the plurality of participating nodes to generate a basic data characteristic for constructing an anti-money laundering model; each sample data table comprises a plurality of data samples, and each data sample is provided with a user ID and sample generation time;
the sample synchronization module is used for carrying out sample synchronization on the sample data table of each participating node according to the user ID and the sample generation time; when sample synchronization is carried out, the user ID and the sample generation time of a selected data sample in a current participating node are sent to the participating node which does not own the selected data sample but owns the data sample with the same user ID as the selected data sample;
the time sequence feature construction module is used for issuing a time sequence feature construction instruction to each participating node so that each participating node calculates a basic feature value of the time sequence feature to be constructed according to the statistical time dimension information contained in the time sequence feature construction instruction, the feature name of the required basic data feature and the calculation mode when receiving the time sequence feature construction instruction; calculating a final characteristic value of the time sequence characteristic based on a sample data table after sample synchronization according to each basic characteristic value;
and the anti-money laundering identification module is used for issuing the final characteristic value of the time sequence characteristic to each participating node, so that each participating node generates an anti-money laundering identification model through transverse federal learning according to the final characteristic value of the time sequence characteristic and the characteristic value of the data characteristic of the participating node, and carries out anti-money laundering identification according to the anti-money laundering identification model.
5. The anti-money laundering identification device based on horizontal federal learning of claim 4, wherein the feature alignment module performs feature alignment on each data feature in the sample data table of a plurality of participating nodes to generate a basic data feature for constructing an anti-money laundering model, and specifically comprises:
taking the feature intersection of each data feature in the sample data table of each participating node to obtain a plurality of first basic data features;
calculating the global effective rate of each data characteristic except the first basic data characteristic one by one; taking the data characteristic with the global effective rate exceeding a first preset threshold value as a second basic data characteristic;
and taking all the first basic data features and all the second basic data features as the basic data features for constructing the anti-money laundering model.
6. The anti-money laundering identification device based on horizontal federal learning of claim 5, wherein the feature alignment module calculates a global effectiveness rate of a data feature by the following formula:
wherein, grIs a global efficiency of a data feature, M is the number of participating nodes, IrMLocally efficient, n, at Mth participating node for data characterizationMThe number of data samples for the mth participating node.
7. An anti-money laundering recognition system based on horizontal federal learning, comprising: a central node and a plurality of participating nodes; wherein the central node comprises the horizontal federal learning based anti-money laundering identification device as claimed in any one of claims 4 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110264163.0A CN113094407B (en) | 2021-03-11 | 2021-03-11 | Anti-money laundering identification method, device and system based on horizontal federal learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110264163.0A CN113094407B (en) | 2021-03-11 | 2021-03-11 | Anti-money laundering identification method, device and system based on horizontal federal learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113094407A true CN113094407A (en) | 2021-07-09 |
CN113094407B CN113094407B (en) | 2022-07-19 |
Family
ID=76667016
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110264163.0A Active CN113094407B (en) | 2021-03-11 | 2021-03-11 | Anti-money laundering identification method, device and system based on horizontal federal learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113094407B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114187007A (en) * | 2021-11-19 | 2022-03-15 | 中国银行股份有限公司 | Anti-money laundering judgment method with multiple banks participating and related application equipment |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070100744A1 (en) * | 2005-11-01 | 2007-05-03 | Lehman Brothers Inc. | Method and system for administering money laundering prevention program |
CN109598385A (en) * | 2018-12-07 | 2019-04-09 | 深圳前海微众银行股份有限公司 | Anti money washing combination learning method, apparatus, equipment, system and storage medium |
CN110309923A (en) * | 2019-07-03 | 2019-10-08 | 深圳前海微众银行股份有限公司 | Laterally federation's learning method, device, equipment and computer storage medium |
US20190325528A1 (en) * | 2018-04-24 | 2019-10-24 | Brighterion, Inc. | Increasing performance in anti-money laundering transaction monitoring using artificial intelligence |
CN110852884A (en) * | 2019-11-15 | 2020-02-28 | 成都数联铭品科技有限公司 | Data processing system and method for anti-money laundering recognition |
CN111325572A (en) * | 2020-01-21 | 2020-06-23 | 深圳前海微众银行股份有限公司 | Data processing method and device |
CN111898769A (en) * | 2020-08-17 | 2020-11-06 | 中国银行股份有限公司 | Method and system for establishing user behavior period model based on horizontal federal learning |
CN111967910A (en) * | 2020-08-18 | 2020-11-20 | 中国银行股份有限公司 | User passenger group classification method and device |
CN112364943A (en) * | 2020-12-10 | 2021-02-12 | 广西师范大学 | Federal prediction method based on federal learning |
-
2021
- 2021-03-11 CN CN202110264163.0A patent/CN113094407B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070100744A1 (en) * | 2005-11-01 | 2007-05-03 | Lehman Brothers Inc. | Method and system for administering money laundering prevention program |
US20190325528A1 (en) * | 2018-04-24 | 2019-10-24 | Brighterion, Inc. | Increasing performance in anti-money laundering transaction monitoring using artificial intelligence |
CN109598385A (en) * | 2018-12-07 | 2019-04-09 | 深圳前海微众银行股份有限公司 | Anti money washing combination learning method, apparatus, equipment, system and storage medium |
CN110309923A (en) * | 2019-07-03 | 2019-10-08 | 深圳前海微众银行股份有限公司 | Laterally federation's learning method, device, equipment and computer storage medium |
CN110852884A (en) * | 2019-11-15 | 2020-02-28 | 成都数联铭品科技有限公司 | Data processing system and method for anti-money laundering recognition |
CN111325572A (en) * | 2020-01-21 | 2020-06-23 | 深圳前海微众银行股份有限公司 | Data processing method and device |
CN111898769A (en) * | 2020-08-17 | 2020-11-06 | 中国银行股份有限公司 | Method and system for establishing user behavior period model based on horizontal federal learning |
CN111967910A (en) * | 2020-08-18 | 2020-11-20 | 中国银行股份有限公司 | User passenger group classification method and device |
CN112364943A (en) * | 2020-12-10 | 2021-02-12 | 广西师范大学 | Federal prediction method based on federal learning |
Non-Patent Citations (1)
Title |
---|
杨强: "AI与数据隐私保护:联邦学习的破解之道", 《信息安全研究》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114187007A (en) * | 2021-11-19 | 2022-03-15 | 中国银行股份有限公司 | Anti-money laundering judgment method with multiple banks participating and related application equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113094407B (en) | 2022-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263024B (en) | Data processing method, terminal device and computer storage medium | |
Zhou et al. | Patents, trademarks, and their complementarity in venture capital funding | |
CN107705113A (en) | A kind of cross-border inter-bank method of payment of block chain based on Baas frameworks and system | |
US11068885B2 (en) | Method and system for deanomymizing cryptocurrency users by analyzing bank transfers to a cryptocurrency exchange | |
JP2004528657A5 (en) | ||
CN111598679B (en) | Block chain-based multi-law person-to-person combined loan method, system and medium | |
Almagsoosi et al. | Effect of the volatility of the crypto currency and its effect on the market returns | |
WO2024119789A1 (en) | Fund releasing method and apparatus, and computer device and readable storage medium | |
CN113094407B (en) | Anti-money laundering identification method, device and system based on horizontal federal learning | |
CN112488804A (en) | Electronic commerce system based on big data cloud platform | |
CN110458555A (en) | Dispute process method, apparatus, electronic equipment and storage medium | |
CN112017028B (en) | Remittance path recommendation method and device | |
CN109617755A (en) | The acceptance method and device of access system | |
EP1542147A2 (en) | Global balancing tool | |
CN117094764A (en) | Bank integral processing method and device | |
US20200175562A1 (en) | Gem trade and exchange system and previous-block verification method for block chain transactions | |
CN107025545A (en) | A kind of transaction processing method and transaction system | |
CN109344383A (en) | A kind of correlating method and device of transaction data | |
CN115082177A (en) | Automatic certification making method, device, equipment and medium for decoration and amortization of banking institution | |
CN115660814A (en) | Risk prediction method and device, computer readable storage medium and electronic equipment | |
TWM597939U (en) | Credit evaluation system | |
TWI824128B (en) | financial calculation system | |
CN113487402B (en) | Supply chain financial platform based on trust model | |
CN113222742B (en) | Block chain-based money fund quick redemption share sharing method and device | |
CN114358939B (en) | Monitoring method and device for hot spot information in pharmaceutical industry, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Wu Runpeng Inventor after: Xin Zhiyun Inventor after: Li Heng Inventor after: Zhang Yan Inventor after: Zou Jie Inventor before: Wu Runpeng Inventor before: Li Heng Inventor before: Zhang Yan Inventor before: Zou Jie |