[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110443416A - Federal model building device, method and readable storage medium storing program for executing based on shared data - Google Patents

Federal model building device, method and readable storage medium storing program for executing based on shared data Download PDF

Info

Publication number
CN110443416A
CN110443416A CN201910697248.0A CN201910697248A CN110443416A CN 110443416 A CN110443416 A CN 110443416A CN 201910697248 A CN201910697248 A CN 201910697248A CN 110443416 A CN110443416 A CN 110443416A
Authority
CN
China
Prior art keywords
data
encryption
fields
field
data provider
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910697248.0A
Other languages
Chinese (zh)
Inventor
管基月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Original Assignee
Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuo Erzhi Lian Wuhan Research Institute Co Ltd filed Critical Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Priority to CN201910697248.0A priority Critical patent/CN110443416A/en
Publication of CN110443416A publication Critical patent/CN110443416A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/0819Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s)
    • H04L9/0825Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s) using asymmetric-key encryption or public key infrastructure [PKI], e.g. key signature or public key certificates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Signal Processing (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of federal modeling method based on shared data, comprising: receive the business datum that multiple data providings upload;Multiple data providing A are determined according to the business datum received1~AnCommon field, and sets of fields is formed according to shared field;Judge in sets of fields whether the corresponding value of each field is located at default outlier to determine to reject the field not in section from sets of fields in section;Multiple critical fielies are filtered out from by rejecting in treated sets of fields according to preset screening rule;Field fusion is carried out to multiple critical fielies, to construct training sample based on the data of fused critical field;And each data providing A of control1~AnJoint modelling operability is executed according to training sample.The present invention also provides a kind of federal model building device and computer readable storage medium based on shared data.Joint modeling is carried out in the case where not revealing present invention can be implemented in the business datum of each data providing.

Description

Federal modeling device and method based on shared data and readable storage medium
Technical Field
The invention relates to the technical field of computer application, in particular to a federal modeling device and method based on shared data and a computer readable storage medium.
Background
The prediction of unknown parameters or results by using a trained machine learning model is a common technical means in the field of artificial intelligence. The problem that a single node is few in samples and the prediction accuracy of the trained model is low is caused, so that the construction of the detection model by the joint modeling of a plurality of nodes is an important means for solving the problem of sample shortage. However, for some business companies sensitive to data, the company operation related data itself is a very valuable asset, and the data owner is not willing to provide the data directly for privacy protection and leakage prevention, which results in message blocking.
Disclosure of Invention
In view of the foregoing, there is a need for a federated modeling apparatus and method based on shared data, and a computer-readable storage medium, which can implement modeling based on shared data on the premise of sufficiently ensuring data security, and to solve the problem of data information blocking to some extent.
One embodiment of the present invention provides a federated modeling method based on shared data, which includes: receiving a plurality of data providers A1~AnUploading the service data; determining a plurality of data providers A according to the received service data1~AnCommon fields and forming a field set according to the common fields; judging whether the value corresponding to each field in the field set is positioned in a preset outlier judgment interval or not; if the values corresponding to one or more fields are not in the preset outlier judgment interval, removing the one or more fields from the field set; screening a plurality of key fields from the field set subjected to the elimination processing according to a preset screening rule; performing field fusion on the plurality of key fields to construct a training sample based on the fused data of the key fields; and providing A to each of the data providers1~AnSending a joint modeling instruction to control each data provider A1~AnAccording to the training sampleThe present implementation performs a joint modeling operation.
Preferably, the step of field fusing the plurality of key fields includes:
and summing the field values of the key fields belonging to the specified date interval according to the timestamps of the plurality of key fields.
Preferably, said controlling each of said data providers a1~AnThe step of performing a joint modeling operation based on the training samples comprises:
creating an encryption key pair and distributing a public key of the encryption key pair to each of the data providers A1~AnTo train each data provider A in the model1~AnEncrypting the interactive data;
providing a plurality of data providers A1~An-1Respectively sending the local encryption loss obtained by calculation to the data provider AnTo pass through the data provider AnSummarizing and calculating to obtain total encryption loss;
receiving the data provider AnCalculating the total encryption loss;
at each of the data providers A1~AnInitializing an interference item and calculating to obtain an encrypted interference item based on the interference item;
receiving each data provider A1~AnCalculating the obtained encryption gradient and the encryption interference item;
for the total encryption loss, each of the data providers A1~AnThe sum of the encryption gradient and the encryption interference term is decrypted to obtain the total loss of decryption and each decrypted data provider A1~AnThe sum of the gradient of (d) and the interference term;
correspondingly sending the sum of the decrypted gradient and the interference item to each data provider A1~AnSo that each of the data providers A1~AnCalculating to obtain a decryption gradient;
controlling each of said numbersAccording to provider A1~AnAnd updating the model parameters of the respective models to be trained according to the calculated decryption gradient so as to perform subsequent model training until the total loss function is converged.
Preferably, the method further comprises: training the model to be trained based on the training sample to obtain a weight value of each key field in the training sample, wherein the weight value represents the contribution degree of each key field to the model to be trained; and
and removing the key fields which are lower than the preset weight value from the training samples.
Preferably, the model to be trained is a traffic prediction model, and the method further includes:
and substituting the key fields shared by any data provider into the trained service prediction model to obtain a service prediction result of the data provider.
Preferably, the method further comprises: controlling a plurality of said data providers A1~An-1Calculating to obtain a local encryption sample weight according to the common key fields contained in the local encryption sample weight and sending the local encryption sample weight to the data provider AnTo pass through the data provider AnSummarizing and calculating to obtain the total encryption sample weight; and controlling the data provider AnDistributing the total encrypted sample weight to a plurality of the data providers A1~An-1So that each of the data providers A1~AnAnd calculating the encryption gradient based on the total encryption sample weight.
Preferably, said at each said data provider a1~AnThe step of initializing an interference term comprises:
acquiring each data provider A1~AnCalculating the magnitude of the obtained encryption gradient; and
at each of the data providers A1~AnWhere the random initialization has the same order of magnitude of the interference terms as the respective encryption gradient.
Preferably, said at each said data provider a1~AnThe step of initializing an interference term comprises:
according to each data provider A1~AnThe encryption gradients obtained by calculation respectively determine a random value range; and
at each of the data providers A1~AnRandomly initializing interference terms within respective random value ranges.
An embodiment of the present invention provides a federated modeling apparatus based on shared data, where the apparatus includes a processor and a memory, where the memory stores a plurality of computer programs, and the processor is configured to implement the steps of the federated modeling method based on shared data when executing the computer programs stored in the memory.
An embodiment of the present invention further provides a computer-readable storage medium storing a plurality of instructions executable by one or more processors to implement the steps of the above-mentioned method for federated modeling based on shared data.
Compared with the prior art, the federal modeling device and method based on shared data and the computer readable storage medium can realize the federal modeling based on the shared data on the premise of fully ensuring the data security, solve the problem of data information blocking to a certain extent, solve the problem of data privacy protection in a big data era, protect the data privacy of respective companies, predict the general operating condition of the other company through a model and provide auxiliary decision for enterprise operation.
Drawings
FIG. 1 is an architectural diagram of a federated modeling system in accordance with an embodiment of the present invention.
Fig. 2 is a functional block diagram of the federal modeling apparatus in accordance with an embodiment of the present invention.
FIG. 3 is a functional block diagram of the federated modeling program in accordance with an embodiment of the present invention.
FIG. 4 is a flow chart of a federated modeling method of an embodiment of the present invention.
Description of the main elements
The following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is further noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Please refer to fig. 1, which is a diagram illustrating a preferred embodiment of the federated modeling system based on shared data according to the present invention.
The federated modeling system 1 includes a plurality of data providers A1~AnAnd a cooperative node C, wherein n is preferably a positive integer greater than 1. A plurality of the data providers A1~AnAnd the cooperative nodes cooperate with the cooperative nodes C to realize combined modeling. Each of the data providers A1~AnAll constructA model to be trained is built.
The following description is given by taking the model to be trained as a business prediction model for enterprise business prediction, but not limited thereto, and in other embodiments of the present invention, the model to be trained may be determined according to actual requirements. When the multi-party combined modeling is completed, each data provider A1~AnEach of the data providers A forms a respective service prediction model1~AnThe sales condition or the supply condition of the company of the other side can be predicted through the trained business prediction model, and an auxiliary decision is provided for the strategic adjustment of the company.
In one embodiment, each of the data providers A1~AnMay include at least one supplier and at least one distributor. The supplier may be a merchant or business that produces/sells goods, and the wholesaler may be a merchant or business that purchases goods. The collaboration node C may be a trusted data platform of a third party, such as a data platform established by the local government. Each of the data providers A1~AnThe respective data may be uploaded to the cooperative node C. For example, data provider A1For the supplier, data provider A1Enterprise related data may be uploaded to the collaboration node C through a computer/server; data provider A2As wholesalers, data provider A2Enterprise related data may be uploaded to the collaboration node C via a computer/server. The enterprise-related data may include inventory data, stocking data, raw material procurement data, capacity data, etc. for various types of goods.
Please refer to fig. 2, which is a schematic diagram of a federated modeling apparatus based on shared data according to a preferred embodiment of the present invention. The federated modeling apparatus 100 may include a memory 10, a processor 20, and a federated modeling program 30 stored in the memory 10 and operable on the processor 20. The processor 20, when executing the federated modeling program 30, implements the steps in a federated modeling method embodiment, such as steps S400-S412 shown in FIG. 4. Alternatively, the processor 20 implements the functions of the modules in fig. 3, such as the modules 101-111, when executing the federated modeling program 30.
The federated modeling program 30 may be partitioned into one or more modules that are stored in the memory 10 and executed by the processor 20 to implement the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the federated modeling program 30 within the federated modeling apparatus 100. For example, the federated modeling program 30 may be partitioned into a first receiving module 101, a fusing module 102, a creating module 103, a first sending module 104, a second receiving module 105, a computing module 106, a third receiving module 107, a decrypting module 108, a second sending module 109, an updating module 110, and a predicting module 111 in fig. 3. Specific functions of the modules refer to the functions of the modules in fig. 3 below.
It will be understood by those skilled in the art that the schematic diagram is merely an example of the federal modeling apparatus 100 and does not constitute a limitation of the federal modeling apparatus 100, and may include more or fewer components than those shown, or some components in combination, or different components, for example, the federal modeling apparatus 100 may further include network access devices, buses, etc.
The Processor 20 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 20 may be any conventional processor or the like, and the processor 20 may connect the various parts of the federated modeling apparatus 100 using various interfaces and buses.
The memory 10 may be used to store the federated modeling program 30 and/or modules, and the processor 20 implements the various functions of the federated modeling apparatus 100 by running or executing computer programs and/or modules stored in the memory 10, as well as invoking data stored in the memory 10. The memory 10 may include high speed random access memory and may also include non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
In one embodiment, the federated modeling apparatus 100 may be integrated in the collaborative node C. The federal modeling apparatus 100 may also be partially integrated in each of the data providers a1~AnPartially integrated in said cooperative node C. In another embodiment of the present invention, the cooperative node C may be one of the modeling nodes a1~An
FIG. 3 is a functional block diagram of a preferred embodiment of the federated modeling program of the present invention.
Referring to fig. 3, the federal modeling program 30 may include a first receiving module 101, a fusion module 102, a creation module 103, a first sending module 104, a second receiving module 105, a calculation module 106, a third receiving module 107, a decryption module 108, a second sending module 109, an update module 110, and a prediction module 111. In one embodiment, the modules may be programmable software instructions stored in the memory 10 and called to be executed by the processor 20. It will be appreciated that in other embodiments, the modules may also be program instructions or firmware (firmware) that are resident in the processor 20.
The first receiving module 101 is configured to receive a plurality of data providers a1~AnAnd uploading the service data.
In one embodiment, each of the data providers A1~AnEach having a respective local database that preferably stores data associated with the respective business operations, such as production, sales, procurement data, etc. of the products. The data provider A1~AnUploading industry can be realized by accessing networkData is sent to the cooperative node C, and the first receiving module 101 can receive a plurality of data providers a1~AnAnd uploading the service data.
In one embodiment, each data provider A1~AnThe business data in the local database can be selected to be uploaded to the cooperative node C according to actual modeling requirements.
The fusion module 102 is configured to determine each data provider a according to the received service data1~AnAnd carrying out field fusion on the common fields to construct training samples based on the service data after field fusion.
In one embodiment, the service data received by the first receiving module 101 includes each data provider a1~AnThe common field also includes fields that are not common. The fusion module 102 may first confirm that each data provider a is available1~AnCommon business data to model in conjunction with the common business data. Specifically, the fusion module 102 may provide a data to each of the received data providers a1~AnComparing the service data to determine each data provider A1~AnCommon fields and forming a field set from the common fields. The fusion module 102 may also perform field fusion on fields shared in the field set, and further construct the service data after field fusion into a training sample.
In one embodiment, in order to improve the efficiency of model training, the common fields may include fields that do not contribute much or do not contribute much to the model training, and the fusion module 102 preferably screens out key fields from the common fields to perform fusion and construct the training samples. The key fields may specifically include data fields in the business data, in which key information contributing significantly to the training of the business prediction model is recorded, that is, some fields in the raw business data that are valuable for model training, and in practical applications, the key fields may be specifically specified based on actual modeling requirements. For example, taking training a model for sales/supply status of wholesalers/suppliers as an example, assuming that the original business data contains information recording sales, monthly capacity, raw material procurement fields, etc., since these fields are more valuable for business prediction of wholesalers/suppliers, these fields can be designated as key fields.
It is understood that the key field may be selected from the common fields according to a preset selection rule, for example, the key field may be a field containing a specific key. The fusion module 102 preferably determines each of the data providers a according to the received service data1~AnAnd carrying out field fusion on the common key fields to construct a training sample based on the service data after key field fusion.
In one embodiment, an outlier criterion line may be set for the value corresponding to the common field, and if the value corresponding to the field exceeds the outlier criterion line, it may be defined as an outlier, i.e., the value corresponding to the field is an outlier, which is not suitable for model training and needs to be discarded. Specifically, the federal modeling apparatus 100 may pre-establish an outlier determination interval, and the fusion module 102 may determine a plurality of data providers a1~AnWhether the values corresponding to the common fields are located in the outlier determination interval, if the values corresponding to one or more fields are not located in the outlier determination interval, the fusion module 102 may remove the one or more fields from the field set, which may improve the accuracy of model training.
In an embodiment, the field fusion may specifically refer to a process of integrating information recorded in fields in the service data transmitted by each data provider. The fusion module 102 may sum the field values of the common fields to achieve field fusion. The fusion module 102 may also add common fields to the specified field interval to achieve field fusion. For example, a blend may be initially createdStandardized data structure of the combined service data, a plurality of field intervals are planned, and then each data provider A is provided1~AnThe common fields are placed in the designated field intervals, respectively. In other embodiments of the present invention, the fusion module 102 may further sum the field values of the fields belonging to the specified date according to the timestamp of the common field. The fusion module 102 may sum the field values of the fields belonging to the specified date on the same day, the same week, or the same month.
In an embodiment, the field-fused business data may construct a training sample, and train the business prediction model. The method may further include obtaining a weight value of each field in the training sample in a process of training the service prediction model based on the training sample, where the weight value represents a contribution degree of each field to the service prediction model. Discarding can be considered for fields with relatively low contribution degrees to improve the model training efficiency. Specifically, fields below a preset weight value may be culled from the training samples. The preset weight value can be set according to the training requirement of the actual model.
The creation module 103 is configured to create an encryption key pair and distribute a public key of the encryption key pair to each of the data providers a1~AnAnd encrypting the data interacted by the nodes in the training process.
In one embodiment, the federated modeling apparatus 100 may provide each of the data providers A with a training sample when the training sample is constructed1~AnSending a joint modeling instruction to control each data provider A1~AnAnd executing joint modeling operation according to the training samples.
In an embodiment, in order to ensure the confidentiality of data during the model training process and avoid data privacy disclosure caused by data interaction during the training process, the creating module 103 may create an encryption key pair and distribute a public key of the encryption key pair to each of the dataProvider A1~An. Each of the data providers A1~AnThe public key may be used to encrypt data that is interacted with during training.
The first sending module 104 is configured to send a plurality of data providers a1~An-1Respectively sending the local encryption loss obtained by calculation to the data provider AnTo pass through the data provider AnAnd summarizing and calculating to obtain the total encryption loss.
In one embodiment, each of the data providers A1~AnThe respective local encryption loss and local encryption sample weight can be calculated. Each of the data providers A1~AnThe local encryption sample weights of (a) can be calculated by the following formulas respectively:
wherein,respectively providing each data provider A1~AnThe weight of the local encrypted sample of (a),respectively providing each data provider A1~AnThe model parameters of (a) are determined,respectively providing each data provider A1~AnA data set of the contained common fields. In the present embodiment, e (x) is a cryptographic value representing the parameter x.
Each of the data providers A1~AnThe local encryption loss can be calculated by the following formula:
wherein,respectively providing each data provider A1~Anλ is a preset regularization parameter, yiFor the data provider A1~AnA sales status or a supply status of the customer.
In one embodiment, data provider A is usednThe local encryption loss of other data providers is received as an example, and the total encryption loss is obtained through the summary calculation in the subsequent steps. It is understood that one data provider may be arbitrarily designated to receive the calculation results of the other data providers, and the data provider a is not limited theretonFor example, data provider A may be specified1Is responsible for receiving the calculation results of other data providers, at this time, the first sending module 104 may send a plurality of data providers a2~AnSending the calculated local encryption loss to the data provider A1
In one embodiment, the total encrypted sample weight may be calculated by the following formula:
wherein E (d)i) Is the total encrypted sample weight.
The total encryption loss is calculated by the following formula:
wherein E (L) is the total encryption loss.
The second receiving module 105 is configured to receive the data provider anThe resulting total encryption loss is calculated.
In one embodiment, when the data provider A is anAfter the total encryption loss is calculated, the total encryption loss may be sent to the federal modeling apparatus 100, and the second receiving module 105 may receive the data provider anThe resulting total encryption loss is calculated.
In one embodiment, the data provider AnThe calculated total encryption sample weight can also be distributed to other data providers A1~An-1So that other data providers A1~An-1The encryption gradient can be calculated separately.
The computing module 106 is used for computing the data at each of the data providers A1~AnInitializing an interference item and calculating an encrypted interference item based on the interference item.
In one embodiment, the calculation module 106 may be provided at each of the data providers A1~AnRandomly initializing an interference item and calculating to obtain an encrypted interference item based on the interference item. Each of the data providers A1~AnMay be different.
For example, the computing module 106 may be at the data provider A1In-phase random initialization of an interference termAnd calculating to obtain encrypted interference item based on the interference itemAt the data provider A2In-phase random initialization of an interference termAnd calculating to obtain encrypted interference item based on the interference itemAt the data provider AnIn-phase random initialization of an interference termAnd calculating to obtain encrypted interference item based on the interference item
In one embodiment, the data is provided by a data provider A at each of the data providers1~AnThe method can ensure that a modeling cooperative party cannot know the data provider A even if the modeling cooperative party is decrypted by initializing an interference item randomly1~AnThe model parameters of (2) avoid data leakage.
In one embodiment, the calculation module 106 may first obtain each of the data providers A1~AnThe magnitude of the calculated encryption gradient is then calculated at each of said data providers A1~AnRandom interference items with the same order of magnitude as the respective encryption gradients are initialized so as to improve interference effects. For example, when the computing module 106 obtains the data provider A1When the calculated encryption gradient is in the order of three digits, the calculation module 106 is preferably at the data provider a1A random three-digit interference term. When the calculation module 106 obtains the data provider AnWhen the calculated encryption gradient is on the order of two digits, the calculation module 106 is preferably at the data provider anA two-digit interference term.
In one embodiment, the calculation module 106 may further provide a data according to each of the data providers a1~AnDetermining a random range based on the calculated encryption gradient, and determining a random range at each of the data providers A based on the random range1~AnA random interference term is initialized, the random interference term being within the random range.
The third receiving module 107 is used for receiving each data provider A1~AnComputingThe obtained encryption gradient and the encryption interference item.
In one embodiment, when data provider A is presentnDistributing the calculated total encryption sample weight to a plurality of other data providers A1~An-1Thereafter, each data provider a1~AnAre recorded with total encryption sample weights, and each of the data providers a1~AnThe respective encryption gradients can be calculated according to the respective model parameters, the data sets of the included common fields and the total encryption sample weights.
In one embodiment, each of the data providers A1~AnThe encryption gradients of (a) are respectively calculated by the following formulas:
wherein, respectively providing each data provider A1~AnEncryption gradient of (2). Each of the data providers A1~AnThe calculated encryption gradient and encryption interference item may be sent to the federal modeling apparatus 100, and the third receiving module 107 may receive each data provider a1~AnAnd calculating the encryption gradient and the encryption interference item.
The decryption module 108 is for each of the data providers A for the total encryption loss1~AnAnd carrying out decryption processing on the sum of the encryption gradient and the encryption interference item.
In one embodiment, the decryption module 108 may utilize a previously created encryption key pair to perform the total encryption loss e (l)Decryption yields the total loss L. When each data provider A is received1~AnAfter calculating the obtained encryption gradient and the encryption interference item, the decryption module 108 may obtain the encryption gradient and the encryption interference item through the following algorithm The decryption module 108 then pairDecryption can result in the sum of the gradient and the interference term:
in one embodiment, for data provider A1In other words, when the decryption module 108 receives the data provider A1Calculated encryption gradientAnd encrypting the interference itemThe decryption module 108 may then calculate The decryption module 108 then pairThe decryption is carried out, and then the data provider A can be obtained1The sum of the gradient and the interference term of (c):for data provider A2In other words, upon receiving the data provider A2Calculated encryption gradientAnd encrypting the interference itemThe decryption module 108 may then calculate The decryption module 108 then pairThe decryption is carried out, and then the data provider A can be obtained2The sum of the gradient and the interference term of (c):
the second sending module 109 is configured to correspondingly send the decrypted sum of the gradient and the interference term to each of the data providers a1~AnSo that each of the data providers A1~AnAnd calculating to obtain a decryption gradient.
In one embodiment, the second sending module 109 adds the decrypted gradient and the interference termSending to the data provider A1Sum of the decrypted gradient and the interference termSending to the data provider A2Sum of the decrypted gradient and the interference termSending to the data provider An. Due to interference termsAre all at respective data providers a1~AnIs randomly generated, thereby each of the data providers A1~AnThe size of the self-interference term can be known. When the data provider A is1After receiving the sum of the decrypted gradient and the interference item, the decrypted gradient can be obtained through subtraction operationSimilarly, when the data provider A isnAfter receiving the sum of the decrypted gradient and the interference item, the decrypted gradient can be obtained through subtraction operation
The update module 110 is used for controlling each of the data providers A1~AnAnd updating the model parameters of the respective service prediction models according to the calculated decryption gradient so as to perform subsequent model training until the total loss function is converged.
In an embodiment, the business prediction model may be trained based on a neural network model or a multiple logistic regression model. When the service prediction models are obtained based on neural network model training, the updating module 110 may update the model parameters of the respective service prediction models through a back propagation algorithm.
In one embodiment, a is provided for each of the data providers a1~AnBefore the service prediction model is trained, each data provider A1~AnThe model parameters of the traffic prediction model of (1) are preferably set to an initial value. Each data provider A can be initialized randomly within a preset interval1~AnModel parameters of the traffic prediction model. For example, the preset interval may be between 0 and 1. When each of the data providers A1~AnAfter calculating the decryption gradient, the update module 110 may control each of the data providers a1~AnAnd updating respective model parameters according to the calculated decryption gradient, thereby carrying out subsequent model training. The subsequent model training process may be: iterating the training mode until the total loss function is converged, completing the multi-party combined modeling, and providing each data provider A1~AnForm its own traffic prediction model.
In one embodiment, the total encryption loss is calculated by the following formula: the total loss function convergence may refer to a function And (6) converging.
In one embodiment, each of the data providers A1~AnThe respective model parameters can be updated separately by the following equations:
wherein eta is a preset learning rate,respectively providing each data provider A1~AnThe gradient of the decryption of (a) is,respectively providing each data provider A1~AnUpdated model parameters.
The prediction module 111 is configured to substitute a field shared by any data provider into the trained service prediction model to obtain a service prediction result of the data provider.
In one embodiment, after the multi-party combined modeling is completed, each data provider A1~AnAnd each wholesaler/supplier can predict the sales condition or the supply condition of the company of the other party through the trained business prediction model, and make corresponding strategic adjustment of the company. In the actual business prediction process, any data provider A can be used1~AnAnd substituting the common field data into the trained service prediction model to obtain a service prediction result of the data provider. The business prediction result can be sales prediction data of a certain wholesaler or supply capacity prediction data of a certain supplier.
FIG. 4 is a flowchart of a federated modeling method based on shared data in an embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
Step S400, receiving a plurality of data providers A1~AnAnd uploading the service data.
Step S402, determining a plurality of data providers A according to the received service data1~AnCommon fields and forming a field set from the common fields.
Step S404, determining whether a value corresponding to each field in the field set is within a predetermined outlier determination interval.
Step S406, if there is a value corresponding to one or more fields that is not within the preset outlier determination interval, removing the one or more fields from the field set.
Step S408, screening a plurality of key fields from the field set subjected to the elimination processing according to a preset screening rule.
And step S410, performing field fusion on the plurality of key fields to construct a training sample based on the data of the fused key fields.
Step S412, providing each data provider A with data1~AnSending a joint modeling instruction to control each data provider A1~AnAnd executing joint modeling operation according to the training samples.
The federated modeling device and method based on the shared data and the computer readable storage medium can realize the federated modeling based on the shared data on the premise of fully ensuring the data security, solve the problem of data information blocking to a certain extent, solve the privacy protection problem of the data in a big data era, realize the data privacy protection of respective companies, predict the general operating condition of the opposite company through a model and provide auxiliary decision for enterprise operation.
It will be apparent to those skilled in the art that other variations and modifications may be made in accordance with the invention and its spirit and scope in accordance with the practice of the invention disclosed herein.

Claims (10)

1. A federated modeling method based on shared data, characterized in that the method comprises:
receiving a plurality of data providers A1~AnUploading the service data;
determining a plurality of data providers A according to the received service data1~AnCommon fields and forming a field set according to the common fields;
judging whether the value corresponding to each field in the field set is positioned in a preset outlier judgment interval or not;
if the values corresponding to one or more fields are not in the preset outlier judgment interval, removing the one or more fields from the field set;
screening a plurality of key fields from the field set subjected to the elimination processing according to a preset screening rule;
performing field fusion on the plurality of key fields to construct a training sample based on the fused data of the key fields; and
to each of the data providers A1~AnSending a joint modeling instruction to control each data provider A1~AnAnd executing joint modeling operation according to the training samples.
2. The method of claim 1, wherein the step of field fusing the plurality of key fields comprises:
and summing the field values of the key fields belonging to the specified date interval according to the timestamps of the plurality of key fields.
3. The method of claim 1, wherein said controlling each of said data providers a1~AnThe step of performing a joint modeling operation based on the training samples comprises:
creating an encryption key pair and distributing a public key of the encryption key pair to each of the data providers A1~AnTo train each data provider A in the model1~AnEncrypting the interactive data;
providing a plurality of data providers A1~An-1Respectively sending the local encryption loss obtained by calculation to the data provider AnTo pass through the data provider AnSummarizing and calculating to obtain total encryption loss;
receiving the data provider AnCalculating the total encryption loss;
at each of the data providers A1~AnInitializing an interference item and calculating to obtain an encrypted interference item based on the interference item;
receiving each data provider A1~AnCalculating the obtained encryption gradient and the encryption interference item;
for the total encryption loss, each of the data providers A1~AnIs decrypted to obtain a total loss of decryption and each of said data supplies decryptedSquare A1~AnThe sum of the gradient of (d) and the interference term;
correspondingly sending the sum of the decrypted gradient and the interference item to each data provider A1~AnSo that each of the data providers A1~AnCalculating to obtain a decryption gradient;
controlling each of the data providers A1~AnAnd updating the model parameters of the respective models to be trained according to the calculated decryption gradient so as to perform subsequent model training until the total loss function is converged.
4. The method of claim 3, wherein the method further comprises:
training the model to be trained based on the training sample to obtain a weight value of each key field in the training sample, wherein the weight value represents the contribution degree of each key field to the model to be trained; and
and removing the key fields which are lower than the preset weight value from the training samples.
5. The method of claim 3, wherein the model to be trained is a traffic prediction model, the method further comprising:
and substituting the key fields shared by any data provider into the trained service prediction model to obtain a service prediction result of the data provider.
6. The method of claim 3, wherein the method further comprises:
controlling a plurality of said data providers A1~An-1Calculating to obtain a local encryption sample weight according to the common key fields contained in the local encryption sample weight and sending the local encryption sample weight to the data provider AnTo pass through the data provider AnSummarizing and calculating to obtain the total encryption sample weight; and
controlling the data provider AnDistributing the total encrypted sample weight to a plurality of the data offersSquare A1~An-1So that each of the data providers A1~AnAnd calculating the encryption gradient based on the total encryption sample weight.
7. The method of claim 3, wherein said at each of said data providers A1~AnThe step of initializing an interference term comprises:
acquiring each data provider A1~AnCalculating the magnitude of the obtained encryption gradient; and
at each of the data providers A1~AnWhere the random initialization has the same order of magnitude of the interference terms as the respective encryption gradient.
8. The method of claim 3, wherein said at each of said data providers A1~AnThe step of initializing an interference term comprises:
according to each data provider A1~AnThe encryption gradients obtained by calculation respectively determine a random value range; and
at each of the data providers A1~AnRandomly initializing interference terms within respective random value ranges.
9. A shared data based federated modeling apparatus, the apparatus comprising a processor and a memory, the memory having stored thereon a plurality of computer programs, wherein the processor is configured to implement the steps of the shared data based federated modeling method of any of claims 1-8 when executing the computer programs stored in the memory.
10. A computer-readable storage medium having stored thereon instructions executable by one or more processors to perform the steps of the shared data-based federated modeling method of any of claims 1-8.
CN201910697248.0A 2019-07-30 2019-07-30 Federal model building device, method and readable storage medium storing program for executing based on shared data Pending CN110443416A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910697248.0A CN110443416A (en) 2019-07-30 2019-07-30 Federal model building device, method and readable storage medium storing program for executing based on shared data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910697248.0A CN110443416A (en) 2019-07-30 2019-07-30 Federal model building device, method and readable storage medium storing program for executing based on shared data

Publications (1)

Publication Number Publication Date
CN110443416A true CN110443416A (en) 2019-11-12

Family

ID=68432432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910697248.0A Pending CN110443416A (en) 2019-07-30 2019-07-30 Federal model building device, method and readable storage medium storing program for executing based on shared data

Country Status (1)

Country Link
CN (1) CN110443416A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955915A (en) * 2019-12-14 2020-04-03 支付宝(杭州)信息技术有限公司 Method and device for processing private data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165725A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Neural network federation modeling method, equipment and storage medium based on transfer learning
US20190012592A1 (en) * 2017-07-07 2019-01-10 Pointr Data Inc. Secure federated neural networks
CN109871702A (en) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 Federal model training method, system, equipment and computer readable storage medium
CN109977694A (en) * 2019-03-11 2019-07-05 暨南大学 A kind of data sharing method based on cooperation deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012592A1 (en) * 2017-07-07 2019-01-10 Pointr Data Inc. Secure federated neural networks
CN109165725A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Neural network federation modeling method, equipment and storage medium based on transfer learning
CN109871702A (en) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 Federal model training method, system, equipment and computer readable storage medium
CN109977694A (en) * 2019-03-11 2019-07-05 暨南大学 A kind of data sharing method based on cooperation deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANG QIANG: ""Federated Machine Learning : Concept and Applications"", 《ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY》 *
赵玮: "《应用机器学习方法度量在线品牌忠诚度模型构建研究》", 30 June 2017 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955915A (en) * 2019-12-14 2020-04-03 支付宝(杭州)信息技术有限公司 Method and device for processing private data

Similar Documents

Publication Publication Date Title
CN110443067B (en) Federal modeling device and method based on privacy protection and readable storage medium
Liu et al. A blockchain-based framework of cross-border e-commerce supply chain
Asante et al. Distributed ledger technologies in supply chain security management: A comprehensive survey
US10936580B2 (en) System and method for digital asset management
US11875300B2 (en) Perishable asset tracking for blockchain
US20210044421A1 (en) System and method for digital asset transfer
CN112132198B (en) Data processing method, device and system and server
JP7149445B2 (en) Encrypted data sharing management for blockchain
US11100427B2 (en) Multi-party computation system for learning a classifier
US20190305932A1 (en) Distributed key management and encryption for blockchains
KR20210041540A (en) System and method for secure electronic transaction platform
US11088834B2 (en) System for privacy-preserving monetization of big data and method for using the same
US20200153632A1 (en) System and method for controlling restrictions on digital asset
US11164115B1 (en) Capacity planning and data placement management in multi-cloud computing environment
US20230101755A1 (en) System and methods for tracking an item in a distributed environment
Chatterjee et al. A blockchain-enabled security framework for smart agriculture
Islam et al. IoT security, privacy and trust in home-sharing economy via blockchain
Shaabany et al. Secure information model for data marketplaces enabling global distributed manufacturing
Pennekamp et al. Designing secure and privacy-preserving information systems for industry benchmarking
CN112417031A (en) Contextual internet of things using blockchains
CN110443416A (en) Federal model building device, method and readable storage medium storing program for executing based on shared data
Bhagavan et al. Fedsmarteum: Secure federated matrix factorization using smart contracts for multi-cloud supply chain
Umekwudo et al. Blockchain technology for mobile applications recommendation systems
Sen et al. Analysis of a cloud migration framework for offline risk assessment of cloud service providers
Parmar et al. Uplifting blockchain technology for data provenance in supply chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191112