CN116226062A

CN116226062A - Data sharing method and device based on privacy protection

Info

Publication number: CN116226062A
Application number: CN202111467585.4A
Authority: CN
Inventors: 庞严
Original assignee: Chongqing New National University Research Institute; National University of Singapore
Current assignee: Chongqing New National University Research Institute; National University of Singapore
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2023-06-06

Abstract

Embodiments of the present disclosure relate to a method, node, distributed system and storage medium for data sharing under the premise of privacy protection, and relate to the computer field. According to the method, private data is obtained from local data including private data and first non-private data, the first non-private data is associated with the private data; a hash value of the private data is generated; a local identifier for the local data is generated The first association with the hash value; generate a global identifier for indexing the hash value in the distributed system, which is associated with the current node; generate the second between the global identifier and the hash value associating; and sending the second association to other nodes in the distributed system, so as to store the second association in the distributed system for connection sharing of the first non-private data. As a result, non-private data can be shared safely under the premise of protecting local private data.

Description

A data sharing method and device based on privacy protection

技术领域technical field

本公开的实施例总体涉及计算机领域，具体涉及在隐私保护前提下用于数据共享的方法、数据提供节点、数据使用节点、分布式系统和计算机存储介质。The embodiments of the present disclosure generally relate to the computer field, and specifically relate to a method for data sharing under the premise of privacy protection, a data providing node, a data using node, a distributed system, and a computer storage medium.

背景技术Background technique

在人工智能时代，数据成为一种重要的战略资源。国家和企业基于战略及安全需要，对数据隐私的保护也变得越来越严格。另外一方面，数据分析及人工智能模型训练又往往需要大量的数据。如何能在保护数据隐私的前提下进行数据分析及人工智能模型训练成为当前计算机领域的一个非常重要的研究方向。In the era of artificial intelligence, data has become an important strategic resource. Based on the strategic and security needs of countries and enterprises, the protection of data privacy is becoming more and more stringent. On the other hand, data analysis and artificial intelligence model training often require a large amount of data. How to conduct data analysis and artificial intelligence model training under the premise of protecting data privacy has become a very important research direction in the current computer field.

发明内容Contents of the invention

提供了一种用于数据共享的方法、数据提供节点、数据使用节点、分布式系统以及计算机存储介质，能够在保护本地隐私数据的前提下，安全地进行非隐私数据的共享。Provided are a method for data sharing, a data providing node, a data using node, a distributed system, and a computer storage medium, capable of safely sharing non-private data on the premise of protecting local private data.

根据本公开的第一方面，提供了一种用于数据共享的方法。该方法包括：从包括隐私数据和第一非隐私数据的本地数据，获取隐私数据，第一非隐私数据与隐私数据相关联；生成隐私数据的哈希值；生成用于本地数据的本地标识符与哈希值之间的第一关联；生成用于在分布式系统中索引哈希值的全局标识符，分布式系统与当前节点相关联；生成全局标识符与哈希值之间的第二关联；以及向分布式系统中的其他节点发送第二关联，以在分布式系统中存储第二关联，以用于第一非隐私数据的连接共享。According to a first aspect of the present disclosure, a method for data sharing is provided. The method includes: obtaining private data from local data including private data and first non-private data, the first non-private data being associated with the private data; generating a hash value of the private data; generating a local identifier for the local data The first association with the hash value; generate a global identifier for indexing the hash value in the distributed system, which is associated with the current node; generate the second between the global identifier and the hash value associating; and sending the second association to other nodes in the distributed system, so as to store the second association in the distributed system for connection sharing of the first non-private data.

根据本公开的第二方面，提供了一种用于数据共享的方法。该方法包括：基于分布式系统中存储的联合元数据，生成数据条件信息，联合元数据至少包括全局标识符和多项非隐私数据的多项属性信息之间的关联，全局标识符与隐私数据的哈希值相关联，隐私数据与多项非隐私数据相关联，多项非隐私数据位于分布式系统中的多个数据提供节点；向多个数据提供节点发送数据请求，数据请求包括数据条件信息；以及从多个数据提供节点中的至少一个数据提供节点接收与数据条件信息相匹配的至少一项非隐私数据。According to a second aspect of the present disclosure, a method for data sharing is provided. The method includes: generating data condition information based on joint metadata stored in a distributed system, the joint metadata at least includes associations between global identifiers and multiple attribute information of multiple non-private data, global identifiers and private data The hash value is associated, the private data is associated with multiple non-private data, and multiple non-private data are located in multiple data providing nodes in the distributed system; data requests are sent to multiple data providing nodes, and the data requests include data conditions information; and receiving at least one item of non-private data matching the data condition information from at least one data providing node among the plurality of data providing nodes.

根据本公开的第三方面，提供了一种数据提供节点。该数据提供节点包括：至少一个处理器，以及与至少一个处理器通信连接的存储器，其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行根据第一方面所述的方法。According to a third aspect of the present disclosure, a data providing node is provided. The data providing node includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that at least one processing A device capable of performing the method according to the first aspect.

根据本公开的第四方面，提供了一种数据使用节点。该数据使用节点包括：至少一个处理器，以及与至少一个处理器通信连接的存储器，其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行根据第二方面所述的方法。According to a fourth aspect of the present disclosure, a data usage node is provided. The data usage node includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processing The device is capable of performing the method according to the second aspect.

根据本公开的第五方面，提供了一种分布式系统。该分布式系统包括：多个根据本公开的第三方面所述的数据提供节点数据提供节点；以及根据本公开的第四方面所述的数据使用节点。According to a fifth aspect of the present disclosure, a distributed system is provided. The distributed system includes: a plurality of data providing nodes according to the third aspect of the present disclosure; and a data using node according to the fourth aspect of the present disclosure.

在本公开的第六方面中，提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现根据本公开的第一方面或第二方面所述的方法。In a sixth aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the method according to the first aspect or the second aspect of the present disclosure is implemented.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.

附图说明Description of drawings

结合附图并参考以下详细说明，本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中，相同或相似的附图标注表示相同或相似的元素。The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numerals denote the same or similar elements.

图1是根据本公开的实施例的分布式系统100的示意图。FIG. 1 is a schematic diagram of a distributed system 100 according to an embodiment of the present disclosure.

图2是根据本公开的实施例的用于数据共享的方法200的示意图。FIG. 2 is a schematic diagram of a method 200 for data sharing according to an embodiment of the present disclosure.

图3是根据本公开的实施例的用于数据共享的方法300的示意图。FIG. 3 is a schematic diagram of a method 300 for data sharing according to an embodiment of the present disclosure.

图4是根据本公开的实施例的用于数据共享的方法400的示意图。FIG. 4 is a schematic diagram of a method 400 for data sharing according to an embodiment of the present disclosure.

图5是根据本公开的实施例的用于数据共享的方法500的示意图。FIG. 5 is a schematic diagram of a method 500 for data sharing according to an embodiment of the present disclosure.

图6是根据本公开的实施例的用于数据共享的方法600的示意图。FIG. 6 is a schematic diagram of a method 600 for data sharing according to an embodiment of the present disclosure.

图7是根据本公开的实施例的用于数据共享的方法700的示意图。FIG. 7 is a schematic diagram of a method 700 for data sharing according to an embodiment of the present disclosure.

图8是用来实现本公开实施例的用于数据共享的方法的电子设备的框图。FIG. 8 is a block diagram of an electronic device used to implement the method for data sharing according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

在本文中使用的术语“包括”及其变形表示开放性包括，即“包括但不限于”。除非特别申明，术语“或”表示“和/或”。术语“基于”表示“至少部分地基于”。术语“一个示例实施例”和“一个实施例”表示“至少一个示例实施例”。术语“另一实施例”表示“至少一个另外的实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。As used herein, the term "comprise" and its variants mean open inclusion, ie "including but not limited to". The term "or" means "and/or" unless otherwise stated. The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment." The term "another embodiment" means "at least one further embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions, both express and implied, may also be included below.

如上所述，在实际的数据分析及人工智能模型训练任务中，往往需要使用到不同节点或区域(包括个人、公司或国家)的数据。根据数据安全法，各个节点或区域的与个人身份相关的隐私数据是不能离开本节点或区域的。一般信息系统设计中，节点或区域中的本地数据可以通过本地标识码(LID，Local ID)进行查询和提取。每个LID往往用于唯一标识数据中不同的个体，包括但不局限于病人编号及客户编号等等。由于不同节点或区域的系统设计中LID的生成方法往往不一致，对于相同个体在不同的节点或区域的本地数据的LID往往是不同的。As mentioned above, in actual data analysis and artificial intelligence model training tasks, it is often necessary to use data from different nodes or regions (including individuals, companies or countries). According to the data security law, the privacy data related to personal identity of each node or area cannot leave the node or area. In general information system design, local data in nodes or regions can be queried and extracted through local identification codes (LID, Local ID). Each LID is often used to uniquely identify different individuals in the data, including but not limited to patient numbers and customer numbers. Because the generation methods of LID in the system design of different nodes or regions are often inconsistent, the LIDs of the local data of the same individual in different nodes or regions are often different.

此外，各个节点或区域之间由于数据隐私保护需求，无法直接共享所有的源数据。在数据分析和人工智能模型的训练中，往往希望将不同节点或区域的数据进行关联，从而提高数据分析和模型训练的数据覆盖度和结果准确度。如何能够在保护隐私数据的前提下，关联及共享不同区域的非隐私数据内容，来进行相关的数据分析及人工智能模型训练，是一个非常重要的问题。In addition, due to data privacy protection requirements between various nodes or regions, it is impossible to directly share all source data. In data analysis and artificial intelligence model training, it is often desired to associate data from different nodes or regions, so as to improve the data coverage and result accuracy of data analysis and model training. How to associate and share non-private data content in different areas under the premise of protecting private data for relevant data analysis and artificial intelligence model training is a very important issue.

为了至少部分地解决上述问题以及其他潜在问题中的一个或者多个，本公开的示例实施例提出了一种用于数据共享的方案。在该方案中，数据提供节点从包括隐私数据和第一非隐私数据的本地数据获取隐私数据，第一非隐私数据与隐私数据相关联。数据提供节点生成隐私数据的哈希值，以及生成用于本地数据的本地标识符与哈希值之间的第一关联。数据提供节点生成用于在分布式系统中索引哈希值的全局标识符，分布式系统与当前节点相关联，以及生成全局标识符与哈希值之间的第二关联。随后，数据提供节点向分布式系统中的其他节点发送第二关联，以在分布式系统中存储第二关联，以用于第一非隐私数据的连接共享。以此方式，能够通过本地存储的、本地数据的本地标识符与隐私数据的哈希值之间的第一关联以及在分布式系统中存储的、全局标识符与隐私数据的哈希值之间的第二关联，能够在保护本地隐私数据的前提下，安全地进行非隐私数据的共享。此外，在分布式系统中存储的全局标识符与隐私数据的哈希值之间的第二关联，也便于将分布式系统中的不同节点处的不同非隐私数据进行关联，以用于数据分析和模型训练。In order to at least partially address one or more of the above-mentioned problems as well as other potential problems, example embodiments of the present disclosure propose a scheme for data sharing. In this solution, the data providing node acquires private data from local data including private data and first non-private data, and the first non-private data is associated with the private data. The data providing node generates a hash value of the private data, and generates a first association between the local identifier for the local data and the hash value. The data providing node generates a global identifier for indexing the hash value in the distributed system associated with the current node, and generates a second association between the global identifier and the hash value. Subsequently, the data providing node sends the second association to other nodes in the distributed system, so as to store the second association in the distributed system for connection sharing of the first non-private data. In this way, the first association between the local identifier of the local data and the hash value of the private data stored locally and the relationship between the global identifier and the hash value of the private data stored in the distributed system can be The second association can safely share non-private data under the premise of protecting local private data. In addition, the second association between the global identifier stored in the distributed system and the hash value of the private data also facilitates associating different non-private data at different nodes in the distributed system for data analysis and model training.

在下文中，将结合附图更详细地描述本方案的具体示例。Hereinafter, specific examples of this solution will be described in more detail with reference to the accompanying drawings.

图1示出了根据本公开的实施例的分布式系统100的示例的示意图。分布式系统100可以包括数据使用节点110和多个数据提供节点120-1至120-n(下文统称为120)，其中n大于或等于2。FIG. 1 shows a schematic diagram of an example of a distributed system 100 according to an embodiment of the present disclosure. The distributed system 100 may include a data usage node 110 and a plurality of data providing nodes 120 - 1 to 120 - n (hereinafter collectively referred to as 120 ), where n is greater than or equal to two.

数据使用节点110和数据提供节点120例如包括但不限于个人计算机、台式计算机、膝上型计算机、平板计算机、可穿戴设备、个人数字助理、智能手机、车载电子设备、服务器计算机、多处理器系统、大型计算机、包括上述系统或设备中的任意一个的分布式计算环境等。Examples of data usage nodes 110 and data provider nodes 120 include, but are not limited to, personal computers, desktop computers, laptop computers, tablet computers, wearable devices, personal digital assistants, smartphones, vehicle electronics, server computers, multiprocessor systems , a large computer, a distributed computing environment including any one of the above-mentioned systems or devices, and the like.

各个数据提供节点120可存储有本地数据。本地数据可以具有本地数据标识符(LID)，用于在数据提供节点120处标识和索引本地数据。本地数据可包括隐私数据和非隐私数据，其中非隐私数据与隐私数据相关联。隐私数据与个人身份相关，其可以包括但不局限于证件号码、手机号码、姓名、住址邮编等等。非隐私数据与个人身份无关，其可以包括但不局限于年龄、性别、城市、疾病历史记录、商品购买历史记录等等。非隐私数据可以包括一项或多项属性信息，例如癌症药品、高血压药品等。不同数据提供节点120可以具有与相同隐私数据相关联的不同非隐私数据。例如，数据提供节点120-1可具有用户A的癌症药品信息，而数据提供节点120-2可以具有用户A的高血压药品信息。Each data providing node 120 may store local data. The local data may have a local data identifier (LID) for identifying and indexing the local data at the data providing node 120 . Local data may include private data and non-private data, where non-private data is associated with private data. Privacy data is related to personal identity, which may include but not limited to ID number, mobile phone number, name, address and zip code, etc. Non-private data has nothing to do with personal identity, which may include but not limited to age, gender, city, disease history, commodity purchase history, etc. Non-private data can include one or more attribute information, such as cancer drugs, high blood pressure drugs, etc. Different data providing nodes 120 may have different non-private data associated with the same private data. For example, the data providing node 120-1 may have user A's cancer drug information, and the data providing node 120-2 may have user A's high blood pressure drug information.

不同数据提供节点120可以位于不同区域，例如属于不同个人、公司或国家等。Different data providing nodes 120 may be located in different regions, eg belong to different individuals, companies or countries.

在一些实施例中，分布式系统100可以为区块链系统，以及数据使用节点110和数据提供节点120可以为区块链节点。In some embodiments, the distributed system 100 may be a blockchain system, and the data using nodes 110 and the data providing nodes 120 may be blockchain nodes.

数据提供节点120可以用于从包括隐私数据和第一非隐私数据的本地数据获取隐私数据，第一非隐私数据与隐私数据相关联；生成隐私数据的哈希值(PHash)；生成用于本地数据的本地标识符(LID)与哈希值之间的第一关联(LID-PHash)；生成用于在分布式系统100中索引哈希值的全局标识符(GID，Global ID)，分布式系统100与当前节点相关联；生成全局标识符与哈希值之间的第二关联(GID-PHash)；以及向分布式系统100中的其他节点(例如，数据提供节点120-2至120-n和数据使用节点110)发送第二关联，以在分布式系统100中存储第二关联，以用于第一非隐私数据的连接共享。The data providing node 120 can be used to obtain private data from local data including private data and first non-private data, the first non-private data is associated with the private data; generate a hash value (PHash) of the private data; generate a hash value (PHash) for the local A first association (LID-PHash) between the local identifier (LID) of the data and the hash value; generate a global identifier (GID, Global ID) for indexing the hash value in the distributed system 100, distributed The system 100 associates with the current node; generates a second association (GID-PHash) between the global identifier and the hash value; n and the data usage node 110) send the second association to store the second association in the distributed system 100 for connection sharing of the first non-private data.

此外，数据提供节点120-1还可以用于从当前节点获取第一非隐私数据的第一属性信息；从分布式系统100中的一个或多个另外数据提供节点(例如，数据提供节点120-2至120-n)获取一项或多项第二非隐私数据的一项或多项第二属性信息，一项或多项第二非隐私数据与隐私数据相关联；基于第二关联，生成联合元数据(Federated Metadata)，联合元数据至少包括全局标识符、第一属性信息和一项或多项第二属性信息之间的关联；以及向分布式系统100中的其他节点(例如，数据提供节点120-2至120-n和数据使用节点110)发送联合元数据，以在分布式系统100中存储联合元数据，以用于第一非隐私数据和一项或多项第二非隐私数据的连接共享。In addition, the data providing node 120-1 can also be used to obtain the first attribute information of the first non-private data from the current node; 2 to 120-n) Obtain one or more second attribute information of one or more second non-private data, and one or more second non-private data are associated with private data; based on the second association, generate Federation metadata (Federated Metadata), federated metadata at least includes the global identifier, the association between the first attribute information and one or more second attribute information; and other nodes in the distributed system 100 (such as data Provider nodes 120-2 to 120-n and data usage nodes 110) send joint metadata to store joint metadata in distributed system 100 for the first non-private data and one or more second non-private data Data connection sharing.

数据使用节点110可以用于基于分布式系统100中存储的联合元数据，生成数据条件信息，联合元数据至少包括全局标识符和多项非隐私数据的多项属性信息之间的关联，全局标识符与隐私数据的哈希值相关联，隐私数据与多项非隐私数据相关联，多项非隐私数据位于分布式系统100中的多个数据提供节点120；向多个数据提供节点120发送数据请求，数据请求包括数据条件信息；以及从多个数据提供节点120中的至少一个数据提供节点接收与数据条件信息相匹配的至少一项非隐私数据。The data usage node 110 can be used to generate data condition information based on the joint metadata stored in the distributed system 100. The joint metadata includes at least the association between the global identifier and multiple attribute information of multiple non-private data, and the global identifier The symbol is associated with the hash value of the privacy data, and the privacy data is associated with a plurality of non-privacy data, and the plurality of non-privacy data are located in the multiple data providing nodes 120 in the distributed system 100; sending data to the multiple data providing nodes 120 request, the data request includes data condition information; and at least one item of non-private data matching the data condition information is received from at least one data providing node among the plurality of data providing nodes 120 .

由此，通过本地存储的、本地数据的本地标识符与隐私数据的哈希值之间的第一关联以及在分布式系统中存储的、全局标识符与隐私数据的哈希值之间的第二关联，能够在保护本地隐私数据的前提下，安全地进行非隐私数据的共享。此外，在分布式系统中存储的全局标识符与隐私数据的哈希值之间的第二关联，也便于将分布式系统中的不同节点处的不同非隐私数据进行关联，以用于数据分析和模型训练。此外，通过在分布式系统中存储关联不同数据提供节点处的不同非隐私数据的不同属性信息的联合元数据，数据使用节点可以在不知道任何隐私数据的前提下，方便地了解整个系统存在哪些属性信息。Thus, through the first association between the local identifier of the local data and the hash value of the private data stored locally and the second association between the global identifier and the hash value of the private data stored in the distributed system Two associations can safely share non-private data on the premise of protecting local private data. In addition, the second association between the global identifier stored in the distributed system and the hash value of the private data also facilitates associating different non-private data at different nodes in the distributed system for data analysis and model training. In addition, by storing joint metadata associated with different attribute information of different non-private data at different data providing nodes in the distributed system, the data using nodes can easily know what information exists in the entire system without knowing any private data. attribute information.

图2示出了根据本公开的实施例的用于数据共享的方法200的流程图。例如，方法200可以由如图1所示的数据提供节点120-1来执行。应当理解的是，方法200还可以包括未示出的附加框和/或可以省略所示出的框，本公开的范围在此方面不受限制。还应当理解，虽然以数据提供节点120-1进行举例说明，但是也可以由其他数据提供节点120-2至120-n来实现，本公开的范围在此不受限制。Fig. 2 shows a flowchart of a method 200 for data sharing according to an embodiment of the present disclosure. For example, the method 200 may be performed by the data providing node 120-1 as shown in FIG. 1 . It should be understood that method 200 may also include additional blocks not shown and/or blocks shown may be omitted, and that the scope of the present disclosure is not limited in this respect. It should also be understood that although the data providing node 120-1 is used as an example for illustration, it may also be implemented by other data providing nodes 120-2 to 120-n, and the scope of the present disclosure is not limited here.

在框202处，数据提供节点120-1从包括隐私数据和第一非隐私数据的本地数据获取隐私数据，第一非隐私数据与隐私数据相关联。At block 202, the data providing node 120-1 obtains private data from local data including private data and first non-private data, the first non-private data being associated with the private data.

隐私数据例如可以包括但不限于身份证号码、姓名+住址邮编或姓名+手机号码等。For example, private data may include but not limited to ID card number, name + address zip code or name + mobile phone number, etc.

在框204处，数据提供节点120-1生成隐私数据的哈希值(PHash)。At block 204, the data providing node 120-1 generates a hash value (PHash) of the private data.

例如，可以对身份证号码、姓名+住址邮编或姓名+手机号码等隐私数据，进行哈希函数(Hash Function)计算生成哈希值。哈希函数例如包括但不限于SHA-256、SHA-512等等。For example, a hash function (Hash Function) calculation can be performed on private data such as ID number, name + address zip code or name + mobile phone number to generate a hash value. Hash functions include, but are not limited to, SHA-256, SHA-512, and the like, for example.

在一些实施例中，数据提供节点120-1可以基于预定格式对隐私数据进行处理，以生成经格式处理的隐私数据，以及对经格式处理的隐私数据进行哈希函数计算，生成哈希值。预定格式对于不同数据提供节点是相同的，从而保证哈希计算结果的一致性。In some embodiments, the data providing node 120-1 may process the private data based on a predetermined format to generate formatted private data, and perform hash function calculation on the formatted private data to generate a hash value. The predetermined format is the same for different data providing nodes, so as to ensure the consistency of hash calculation results.

在框206处，数据提供节点120生成用于本地数据的本地标识符(LID)与哈希值(PHash)之间的第一关联(LID-PHash)。At block 206, the data providing node 120 generates a first association (LID-PHash) between a local identifier (LID) for the local data and a hash value (PHash).

在当前或相同节点，隐私数据的哈希值(PHash)和本地标识符(LID)可以一一对应。在不同节点中，对于同一个体，比如同一个病人，本地标识符(LID)往往不一样，但隐私数据的哈希值(PHash)是一样的。通过隐私数据的哈希值(PHash)，不同节点的同一个体的非隐私数据可以进行关联。隐私数据的哈希值(PHash)可以共享到节点外的分布式系统中用于数据连接。由于哈希函数的单向隐藏特性(Hiding Property)，数据使用节点没有办法根据哈希值反推出任何相关个人身份的隐私数据信息，从而有效地保护本地隐私数据。At the current or the same node, the hash value (PHash) of the private data and the local identifier (LID) can be in one-to-one correspondence. In different nodes, for the same individual, such as the same patient, the local identifier (LID) is often different, but the hash value (PHash) of the private data is the same. Through the hash value (PHash) of private data, non-private data of the same individual at different nodes can be associated. The hash value (PHash) of private data can be shared in a distributed system outside the node for data connection. Due to the one-way hidden property of the hash function (Hiding Property), the data user node has no way to deduce any private data information related to personal identity based on the hash value, so as to effectively protect the local private data.

在框208处，数据提供节点120-1生成用于在分布式系统100中索引哈希值(PHash)的全局标识符(GID)，分布式系统100与当前节点相关联。At block 208, the data providing node 120-1 generates a global identifier (GID) for indexing the hash value (PHash) in the distributed system 100 associated with the current node.

全局标识符(GID)可以根据任何合适的标识符索引要求而被生成。Global identifiers (GIDs) may be generated according to any suitable identifier indexing requirements.

在一些实施例中，数据提供节点120-1可以先确定分布式系统100中是否存在与哈希值相关联的全局标识符。如果确定不存在与哈希值相关联的全局标识符，则生成用于在分布式系统100中索引哈希值(PHash)的全局标识符。如果确定存在与哈希值相关联的全局标识符，则不生成用于在分布式系统100中索引哈希值(PHash)的全局标识符。In some embodiments, the data providing node 120 - 1 may first determine whether there is a global identifier associated with the hash value in the distributed system 100 . If it is determined that there is no global identifier associated with the hash value, a global identifier for indexing the hash value (PHash) in the distributed system 100 is generated. If it is determined that there is a global identifier associated with the hash value, the global identifier for indexing the hash value (PHash) in the distributed system 100 is not generated.

在框210处，数据提供节点120-1生成全局标识符(GID)与哈希值(PHash)之间的第二关联(GID-PHash)。At block 210, the data providing node 120-1 generates a second association (GID-PHash) between the global identifier (GID) and the hash value (PHash).

在框212处，数据提供节点120-1向分布式系统110中的其他节点发送第二关联，以在分布式系统110中存储所述第二关联，以用于第一非隐私数据的连接共享。应当理解，这里的连接共享指的是第一非隐私数据可以与其他数据提供节点处与相同隐私数据相关联的其他非隐私数据进行连接或关联地分享给数据使用节点。At block 212, the data providing node 120-1 sends the second association to other nodes in the distributed system 110 to store the second association in the distributed system 110 for connection sharing of the first non-private data . It should be understood that the connection sharing here means that the first non-private data can be connected or associated with other non-private data associated with the same private data at other data providing nodes and shared with the data using node.

例如，数据提供节点120-1可以将第二关联发送到分布式系统100中的数据提供节点120-2至120-n以及数据使用节点110，以便这些节点存储第二关联，从而实现在分布式系统110中存储第二关联。For example, the data providing node 120-1 may send the second association to the data providing nodes 120-2 to 120-n and the data using node 110 in the distributed system 100, so that these nodes store the second association, thereby realizing The second association is stored in system 110 .

全局标识符(GID)与哈希值(PHash)可以一一对应。由于对于同一个用户或病人在不同节点处的哈希值是一样的，其对应的全局标识符(GID)也是一致的。全局标识符在不同节点间是可见的，数据使用者可以使用全局标识符生成数据请求。There is a one-to-one correspondence between the global identifier (GID) and the hash value (PHash). Since the hash values of the same user or patient at different nodes are the same, their corresponding global identifiers (GID) are also consistent. Global identifiers are visible across different nodes, and data consumers can use global identifiers to generate data requests.

在一些实施例中，分布式系统100可以为区块链系统，以及全局标识符与哈希值之间的第二关联(GID-PHash)可以被存储在区块链系统中，也就是附加到区块链中。区块链系统可以是公链(permissionless chain)或许可链(permissioned chain)。In some embodiments, the distributed system 100 can be a blockchain system, and the second association (GID-PHash) between the global identifier and the hash value can be stored in the blockchain system, that is, attached to in the blockchain. The blockchain system can be a permissionless chain or a permissioned chain.

由此，通过本地存储的、本地数据的本地标识符与隐私数据的哈希值之间的第一关联以及在分布式系统中存储的、全局标识符与隐私数据的哈希值之间的第二关联，能够在保护本地隐私数据的前提下，安全地进行非隐私数据的共享。此外，在分布式系统中存储的全局标识符与隐私数据的哈希值之间的第二关联，也便于将分布式系统中的不同节点处的不同非隐私数据进行关联，以用于数据分析和模型训练。Thus, through the first association between the local identifier of the local data and the hash value of the private data stored locally and the second association between the global identifier and the hash value of the private data stored in the distributed system Two associations can safely share non-private data on the premise of protecting local private data. In addition, the second association between the global identifier stored in the distributed system and the hash value of the private data also facilitates associating different non-private data at different nodes in the distributed system for data analysis and model training.

此外，将隐私数据、非隐私数据以及用于本地数据的本地标识符(LID)与哈希值(PHash)之间的第一关联存储在本地，以及将全局标识符与哈希值之间的第二关联存储到区块链，实现了混合存储架构，既保证第二关联的不可篡改，又避免了区块链系统存储效率低的问题。In addition, the private data, the non-private data, and the first association between the local identifier (LID) and the hash value (PHash) for the local data are stored locally, and the association between the global identifier and the hash value The second association is stored in the blockchain, realizing a hybrid storage architecture, which not only ensures that the second association cannot be tampered with, but also avoids the problem of low storage efficiency in the blockchain system.

在生成并在分布式系统中存储了全局标识符与哈希值之间的第二关联之后，还可以利用第二关联将不同节点处的不同非隐私数据进行关联。下文将参照图3进行描述。After the second association between the global identifier and the hash value is generated and stored in the distributed system, the second association can also be used to associate different non-private data at different nodes. Description will be made below with reference to FIG. 3 .

图3示出了根据本公开的实施例的用于数据共享的方法300的流程图。例如，方法300可以由如图1所示的数据提供节点120-1来执行。应当理解的是，方法300还可以包括未示出的附加框和/或可以省略所示出的框，本公开的范围在此方面不受限制。还应当理解，虽然以数据提供节点120-1进行举例说明，但是也可以由其他数据提供节点120-2至120-n来实现，本公开的范围在此不受限制。Fig. 3 shows a flowchart of a method 300 for data sharing according to an embodiment of the present disclosure. For example, the method 300 may be performed by the data providing node 120-1 as shown in FIG. 1 . It should be appreciated that method 300 may also include additional blocks not shown and/or blocks shown may be omitted, and that the scope of the present disclosure is not limited in this respect. It should also be understood that although the data providing node 120-1 is used as an example for illustration, it may also be implemented by other data providing nodes 120-2 to 120-n, and the scope of the present disclosure is not limited here.

在框302处，数据提供节点120-1从当前节点获取第一非隐私数据的第一属性信息。At block 302, the data providing node 120-1 acquires first attribute information of the first non-private data from the current node.

第一非隐私数据可以包括第一属性信息和第一属性值信息。第一属性信息的示例包括条目“癌症药品”。第一属性值信息的示例包括癌症药品信息，例如癌症药品的名称、编号等。The first non-private data may include first attribute information and first attribute value information. An example of the first attribute information includes the item "cancer drug". Examples of the first attribute value information include cancer drug information, such as the name and serial number of the cancer drug.

在框304处，数据提供节点120-1从分布式系统100中的一个或多个另外数据提供节点(例如，数据提供节点120-2至120-n)获取一项或多项第二非隐私数据的一项或多项第二属性信息，一项或多项第二非隐私数据与隐私数据相关联。At block 304, the data providing node 120-1 obtains one or more second non-private One or more pieces of second attribute information of the data, and one or more pieces of second non-private data associated with the private data.

第二非隐私数据可以包括第二属性信息和第二属性值信息。第二属性信息的示例包括条目“高血压药品”、条目“风湿病药品”等。第二属性值信息的示例包括高血压药品信息、风湿病药品信息，例如高血压药品名、风湿病药品名等。The second non-private data may include second attribute information and second attribute value information. Examples of the second attribute information include the item "hypertension drug", the item "rheumatism drug", and the like. Examples of the second attribute value information include hypertension drug information, rheumatism drug information, such as hypertension drug name, rheumatism drug name, and the like.

在框306处，数据提供节点120-1基于第二关联，生成并在分布式系统100中存储联合元数据，联合元数据至少包括全局标识符(GID)、第一属性信息和一项或多项第二属性信息之间的关联。At block 306, the data providing node 120-1 generates and stores joint metadata in the distributed system 100 based on the second association, the joint metadata includes at least a global identifier (GID), first attribute information, and one or more The association between the second attribute information of the item.

例如，第一属性信息包括条目“癌症药品”，多项第二属性信息包括条目“高血压药品”、条目“风湿病药品”，则联合元数据的一个示例包括全局标识符(GID)、条目“癌症药品”、条目“高血压药品”和条目“风湿病药品”之间的关联。此外，联合元数据还可以包括对于全局标识符(GID)、条目“癌症药品”、条目“高血压药品”和条目“风湿病药品”的描述信息。For example, the first attribute information includes the item "cancer drug", and the items of second attribute information include the item "hypertension drug" and the item "rheumatology drug", and an example of joint metadata includes the global identifier (GID), the item Association between "Cancer Drugs", the entry "Hypertension Drugs", and the entry "Rheumatology Drugs". In addition, the joint metadata may also include description information for the Global Identifier (GID), the item "Cancer Drug", the item "Hypertension Drug", and the item "Rheumatology Drug".

在框308处，数据提供节点120-1向分布式系统100中的其他节点发送联合元数据，以在分布式系统100中存储联合元数据，以用于第一非隐私数据和一项或多项第二非隐私数据的连接共享。At block 308, the data providing node 120-1 sends joint metadata to other nodes in the distributed system 100 to store the joint metadata in the distributed system 100 for the first non-private data and one or more Item 2 Connection sharing of non-private data.

例如，数据提供节点120-1可以将联合元数据发送到分布式系统100中的数据提供节点120-2至120-n以及数据使用节点110，以便这些节点存储联合元数据，从而实现在分布式系统100中存储联合元数据。联合元数据在分布式系统100中存储以后，数据使用节点110可以基于联合元数据了解分布式系统100中有哪些GID，哪些属性信息，它们之间的关联关系是什么样子，从而按照需求生成数据条件信息进行数据请求。For example, the data providing node 120-1 may send the joint metadata to the data providing nodes 120-2 to 120-n and the data using node 110 in the distributed system 100, so that these nodes store the joint metadata, thereby realizing Federation metadata is stored in system 100 . After the joint metadata is stored in the distributed system 100, the data usage node 110 can know which GIDs, which attribute information are in the distributed system 100 based on the joint metadata, and what the relationship between them looks like, so as to generate data according to the requirements condition information to make a data request.

在一些实施例中，分布式系统100可以为区块链系统，以及联合元数据可以被存储在区块链系统中，例如附加到区块链中。In some embodiments, the distributed system 100 may be a blockchain system, and federation metadata may be stored in the blockchain system, eg, appended to the blockchain.

由此，通过在分布式系统中存储关联不同数据提供节点处的不同非隐私数据的不同属性信息的联合元数据，数据使用节点可以在不知道任何隐私数据的前提下，方便地了解整个系统存在哪些属性信息，以及便于关联及共享不同数据提供节点处的不同非隐私数据，以用于数据分析以及人工智能模型训练等。可以支持在传统隐私保护计算模式下无法进行或效率很低的复杂的数据分析或人工智能模型训练任务，所得到的数据分析及模型结果往往也更准确。Therefore, by storing joint metadata associated with different attribute information of different non-private data at different data providing nodes in the distributed system, data using nodes can easily know the existence of the entire system without knowing any private data. Which attribute information, as well as the convenience of associating and sharing different data provide different non-private data at nodes for data analysis and artificial intelligence model training, etc. It can support complex data analysis or artificial intelligence model training tasks that cannot be performed or are very inefficient under the traditional privacy protection computing mode, and the obtained data analysis and model results are often more accurate.

此外，将隐私数据、非隐私数据以及用于本地数据的本地标识符(LID)与哈希值(PHash)之间的第一关联存储在本地，以及将全局标识符与哈希值之间的第二关联以及联合元数据存储到区块链，实现了混合存储架构，既保证第二关联的不可篡改，又避免了区块链系统存储效率低的问题。In addition, the private data, the non-private data, and the first association between the local identifier (LID) and the hash value (PHash) for the local data are stored locally, and the association between the global identifier and the hash value The second association and joint metadata are stored in the blockchain, realizing a hybrid storage architecture, which not only ensures that the second association cannot be tampered with, but also avoids the problem of low storage efficiency in the blockchain system.

在生成并在分布式系统中存储了联合元数据之后，数据使用节点110可以利用联合元数据进行数据请求，数据提供节点120可以针对数据请求获取对应非隐私数据进行共享。下文将参照图4进行描述。After the joint metadata is generated and stored in the distributed system, the data using node 110 can use the joint metadata to make a data request, and the data providing node 120 can obtain corresponding non-private data for sharing according to the data request. Description will be made below with reference to FIG. 4 .

图4示出了根据本公开的实施例的用于数据共享的方法400的流程图。例如，方法400可以由如图1所示的数据提供节点120-1来执行。应当理解的是，方法400还可以包括未示出的附加框和/或可以省略所示出的框，本公开的范围在此方面不受限制。还应当理解，虽然以数据提供节点120-1进行举例说明，但是也可以由其他数据提供节点120-2至120-n来实现，本公开的范围在此不受限制。FIG. 4 shows a flowchart of a method 400 for data sharing according to an embodiment of the present disclosure. For example, the method 400 may be performed by the data providing node 120-1 as shown in FIG. 1 . It should be appreciated that method 400 may also include additional blocks not shown and/or blocks shown may be omitted, and that the scope of the present disclosure is not limited in this regard. It should also be understood that although the data providing node 120-1 is used as an example for illustration, it may also be implemented by other data providing nodes 120-2 to 120-n, and the scope of the present disclosure is not limited here.

在框402处，数据提供节点120-1接收来自分布式系统100中的数据使用节点110的数据请求，数据请求包括数据条件信息，数据条件信息基于联合元数据而被生成。At block 402, the data providing node 120-1 receives a data request from the data using node 110 in the distributed system 100, the data request includes data condition information generated based on federated metadata.

例如，数据使用节点110可以基于联合元数据中的属性信息生成数据条件信息。数据条件信息的一个示例可以为“SELECT GID,cancer medicine,hypertension medicine,age FROM Metadata WHERE age>＝20AND age<＝40”，其中“GID”、“cancer medicine”、“hypertension medicine”以及“age”可以为联合元数据中包括的全局标识符、属性信息。该数据条件信息意味着需要年龄在20-40岁之间的全局标识符、癌症药品信息、高血压信息以及年龄信息。For example, data usage node 110 may generate data condition information based on attribute information in federated metadata. An example of data condition information may be "SELECT GID, cancer medicine, hypertension medicine, age FROM Metadata WHERE age>=20AND age<=40", where "GID", "cancer medicine", "hypertension medicine", and "age" It can be the global identifier and attribute information included in the joint metadata. The data condition information means that a global identifier of age between 20-40 years old, cancer drug information, high blood pressure information, and age information are required.

还如，数据使用节点110可以基于联合元数据中的全局标识符生成数据条件信息。例如，数据条件信息包括一个或多个全局标识符。As another example, the data usage node 110 can generate data condition information based on the global identifier in the federation metadata. For example, data condition information includes one or more global identifiers.

在一些实施例中，分布式系统100包括区块链系统，数据提供节点120和数据使用节点110为区块链节点。接收来自数据使用节点110的数据请求包括接收来自数据使用节点110的第一智能合约，第一智能合约包括数据条件信息。由于区块链数据的不可篡改特性，智能合约一经部署，将无法直接在原有合约上进行修改。这样方便日后追寻数据使用足迹，保证共享的非隐私数据不被滥用。In some embodiments, the distributed system 100 includes a blockchain system, and the data providing node 120 and the data using node 110 are blockchain nodes. Receiving the data request from the data usage node 110 includes receiving a first smart contract from the data usage node 110, the first smart contract including data condition information. Due to the non-tamperable nature of blockchain data, once a smart contract is deployed, it cannot be directly modified on the original contract. This makes it easier to trace data usage footprints in the future and ensure that shared non-private data is not misused.

在框404处，数据提供节点120-1获取与数据条件信息相匹配的第一非隐私数据。At block 404, the data providing node 120-1 acquires first non-private data matching the data condition information.

以上文的数据条件信息“SELECT GID,cancer medicine,hypertensionmedicine,age FROM Metadata WHERE age>＝20AND age<＝40”为例，如果第一非隐私数据中包括第一属性信息“cancer medicine”、对应的第一属性值信息、年龄属性值为30，则第一非隐私数据与数据条件信息相匹配。Taking the above data condition information "SELECT GID, cancer medicine, hypertension medicine, age FROM Metadata WHERE age>=20AND age<=40" as an example, if the first non-private data includes the first attribute information "cancer medicine", the corresponding If the first attribute value information and the age attribute value are 30, then the first non-private data matches the data condition information.

在数据条件信息包括全局标识符的情况下，数据提供节点120-1可以基于分布式系统100中存储的全局标识符与隐私数据的哈希值之间的第二关联，确定与该全局标识符相关联的、隐私数据的哈希值。随后，数据提供节点120-1可以基于本地存储的、本地标识符与隐私数据的哈希值之间的第一关联，确定与隐私数据的哈希值相关联的本地标识符，以及获取本地标识符所对应的本地数据中的第一非隐私数据，作为与数据条件信息相匹配的第一非隐私数据。In the case that the data condition information includes the global identifier, the data providing node 120-1 may determine the global identifier related to the global identifier based on the second association between the global identifier stored in the distributed system 100 and the hash value of the private data. The associated, private data hash. Subsequently, the data providing node 120-1 may determine the local identifier associated with the hash value of the private data based on the locally stored first association between the local identifier and the hash value of the private data, and acquire the local identification The first non-private data in the local data corresponding to the symbol is used as the first non-private data matching the data condition information.

在一些实施例中，数据请求还包括数据使用节点110的节点地址。数据提供节点120-1还可以确定数据使用节点110的节点地址与预设节点地址是否匹配。例如，数据使用节点110与数据提供节点120-1可以提前协商采用数据使用节点110的地址作为预设节点地址。In some embodiments, the data request also includes the node address of the data using node 110 . The data providing node 120-1 may also determine whether the node address of the data using node 110 matches a preset node address. For example, the data usage node 110 and the data providing node 120-1 may negotiate in advance to use the address of the data usage node 110 as the preset node address.

数据提供节点120-1如果确定数据使用节点110的节点地址与预设节点地址匹配，则获取与数据条件信息相匹配的第一非隐私数据。由此，只有来自预设节点地址的数据请求才会被数据提供节点处理，从而数据的定向分享。If the data providing node 120-1 determines that the node address of the data using node 110 matches the preset node address, it acquires the first non-private data matching the data condition information. Thus, only the data request from the preset node address will be processed by the data providing node, thus the directional sharing of data.

在框406处，数据提供节点120-1还可以从一个或多个另外数据提供节点(例如数据提供节点120-2至120-n)中的至少一个另外数据提供节点获取一项或多项第二非隐私数据中的至少一项第二非隐私数据的至少一项第二属性信息，至少一项第二属性信息与数据条件信息相匹配。At block 406, the data providing node 120-1 may also obtain one or more first At least one item of second attribute information of at least one item of second non-private data in the two non-private data, at least one item of second attribute information matches the data condition information.

所获取的至少一项第二非隐私数据的至少一项第二属性信息可以是由至少一个另外数据提供节点响应于接收到来自数据使用节点110的数据请求，而基于数据条件信息从至少一个另外数据提供节点的本地数据获取并发送的。The acquired at least one item of second attribute information of at least one item of second non-private data may be obtained from at least one additional data providing node based on the data condition information in response to receiving a data request from the data usage node 110. The local data of the data provider node is acquired and sent.

在框408处，数据提供节点120-1基于第二关联(GID-PHash)，生成全局标识符、第一属性信息和至少一项第二属性信息之间的第三关联。At block 408, the data providing node 120-1 generates a third association between the global identifier, the first attribute information and at least one item of second attribute information based on the second association (GID-PHash).

在框410处，数据提供节点120-1向分布式系统100中的其他节点发送第三关联，以在分布式系统100中存储第三关联，以便于数据使用节点110基于第三关联，将从当前节点获取的第一非隐私数据以及从至少一个另外数据提供节点获取的至少一项第二非隐私数据进行关联。At block 410, the data providing node 120-1 sends the third association to other nodes in the distributed system 100, so as to store the third association in the distributed system 100, so that the data usage node 110, based on the third association, will send the third association from The first non-private data obtained by the current node is associated with at least one item of second non-private data obtained from at least one other data providing node.

在框412处，数据提供节点120-1向数据使用节点110发送所获取的第一非隐私数据。At block 412 , the data providing node 120 - 1 sends the acquired first non-private data to the data consuming node 110 .

例如，可以通过开放API或Web服务或直接文件下载等方式向数据使用节点发送第一非隐私数据。在一些实施例中，数据提供节点120-1可以在接收到来自数据使用节点110的数据请求后向数据使用节点110发送所获取的第一非隐私数据。在另一些实施例中，数据提供节点120-1可以直接向数据使用节点110发送所获取的第一非隐私数据，而不需数据使用节点110的请求。For example, the first non-private data may be sent to the data usage node through an open API or Web service or direct file download. In some embodiments, the data providing node 120 - 1 may send the obtained first non-private data to the data using node 110 after receiving the data request from the data using node 110 . In some other embodiments, the data providing node 120 - 1 may directly send the obtained first non-private data to the data using node 110 without a request from the data using node 110 .

由此，数据使用节点可以基于分布式系统中存储的联合元数据而生成数据条件信息进行数据请求，数据提供节点可以获取相匹配的非隐私数据并发送给数据使用节点，从而在数据使用节点不知道任何隐私数据的前提下实现非隐私数据的安全共享。此外，能够将多个数据提供节点处与数据条件信息相匹配的多项非隐私数据的属性信息进行关联，便于数据使用节点获取多个数据提供节点处的多项非隐私数据进行数据分析和模型训练。As a result, the data usage node can generate data condition information based on the joint metadata stored in the distributed system for data request, and the data provider node can obtain the matching non-private data and send it to the data usage node, so that the data usage node does not Realize the safe sharing of non-private data under the premise of knowing any private data. In addition, attribute information of multiple non-private data matching data condition information at multiple data providing nodes can be associated to facilitate data usage nodes to obtain multiple non-private data at multiple data providing nodes for data analysis and modeling train.

图5示出了根据本公开的实施例的用于数据共享的方法500的流程图。例如，方法500可以由如图1所示的数据提供节点120-1来执行。应当理解的是，方法500还可以包括未示出的附加框和/或可以省略所示出的框，本公开的范围在此方面不受限制。还应当理解，虽然以数据提供节点120-1进行举例说明，但是也可以由其他数据提供节点120-2至120-n来实现，本公开的范围在此不受限制。应当理解，在关于图5的实施例中，分布式系统110为区块链系统。FIG. 5 shows a flowchart of a method 500 for data sharing according to an embodiment of the present disclosure. For example, the method 500 may be performed by the data providing node 120-1 as shown in FIG. 1 . It should be appreciated that method 500 may also include additional blocks not shown and/or blocks shown may be omitted, and that the scope of the present disclosure is not limited in this respect. It should also be understood that although the data providing node 120-1 is used as an example for illustration, it may also be implemented by other data providing nodes 120-2 to 120-n, and the scope of the present disclosure is not limited here. It should be understood that, in the embodiment related to FIG. 5 , the distributed system 110 is a blockchain system.

在框502处，数据提供节点120-1接收来自数据使用节点110的第一智能合约，第一智能合约包括数据条件信息和用于数据共享的激励机制信息，数据条件信息基于联合元数据而被生成。At block 502, the data providing node 120-1 receives a first smart contract from the data using node 110, the first smart contract includes data condition information and incentive mechanism information for data sharing, the data condition information is determined based on joint metadata generate.

在框504处，数据提供节点120-1确定本地数据中是否存在与数据条件信息相匹配的第一非隐私数据。At block 504, the data providing node 120-1 determines whether there is first non-private data matching the data condition information in the local data.

如果在框504处数据提供节点120-1确定本地数据中存在与数据条件信息相匹配的第一非隐私数据，则在框506处确定激励机制信息是否通过评估。If at block 504 the data providing node 120-1 determines that there is first non-private data matching the data condition information in the local data, then at block 506 it is determined whether the incentive mechanism information passes the evaluation.

在一些实施例中，确定激励机制信息是否通过评估可以包括提示激励机制信息，以及确定是否接收到针对该提示的肯定确认。如果接收到肯定确认，则确定激励机制信息通过评估，否则确定激励机制信息未通过评估。In some embodiments, determining whether the incentive information passes the evaluation may include prompting the incentive information, and determining whether a positive acknowledgment for the prompt was received. If a positive confirmation is received, it is determined that the incentive mechanism information has passed the evaluation, otherwise it is determined that the incentive mechanism information has not passed the evaluation.

在另一些实施例中，确定激励机制信息是否通过评估可以包括确定激励机制信息是否满足预设条件，以及如果确定激励机制信息满足预设条件，则确定激励机制信息通过评估，否则确定激励机制信息未通过评估。In some other embodiments, determining whether the incentive mechanism information passes the evaluation may include determining whether the incentive mechanism information satisfies a preset condition, and if it is determined that the incentive mechanism information meets the preset condition, then determining that the incentive mechanism information passes the evaluation, otherwise determining that the incentive mechanism information Failed the assessment.

如果在框506处数据提供节点120-1确定激励机制信息通过评估，则在框508处生成并在区块链系统中部署第二智能合约，第二智能合约包括数据条件信息、激励机制信息、数据使用节点的节点地址和数字钱包地址以及当前节点的节点地址和数字钱包地址。If at block 506, the data providing node 120-1 determines that the incentive mechanism information has passed the evaluation, then at block 508, a second smart contract is generated and deployed in the blockchain system. The second smart contract includes data condition information, incentive mechanism information, The data uses the node address and digital wallet address of the node and the node address and digital wallet address of the current node.

在框510处，数据提供节点120-1确定第二智能合约是否被部署在区块链系统中。At block 510, the data providing node 120-1 determines whether the second smart contract is deployed in the blockchain system.

如果在框510处数据提供节点120-1确定第二智能合约被部署在区块链系统中，则在框512处获取与数据条件信息相匹配的第一非隐私数据。If at block 510 the data providing node 120-1 determines that the second smart contract is deployed in the blockchain system, then at block 512 the first non-private data matching the data condition information is obtained.

在框514处，数据提供节点120-1还可以从一个或多个另外数据提供节点(例如数据提供节点120-2至120-n)中的至少一个另外数据提供节点获取一项或多项第二非隐私数据中的至少一项第二非隐私数据的至少一项第二属性信息，至少一项第二属性信息与数据条件信息相匹配。At block 514, the data providing node 120-1 may also obtain one or more first At least one item of second attribute information of at least one item of second non-private data in the two non-private data, at least one item of second attribute information matches the data condition information.

在框516处，数据提供节点120-1可以基于第二关联(GID-PHash)，生成全局标识符、第一属性信息和至少一项第二属性信息之间的第三关联。At block 516, the data providing node 120-1 may generate a third association between the global identifier, the first attribute information and at least one item of second attribute information based on the second association (GID-PHash).

在框518处，数据提供节点120-1向区块链系统中的其他节点发送第三关联，以在区块链系统中存储第三关联，以便于数据使用节点110基于第三关联，将从当前节点获取的第一非隐私数据以及从至少一个另外数据提供节点获取的至少一项第二非隐私数据进行关联。At block 518, the data providing node 120-1 sends the third association to other nodes in the blockchain system to store the third association in the blockchain system, so that the data usage node 110 will, based on the third association, obtain The first non-private data obtained by the current node is associated with at least one item of second non-private data obtained from at least one other data providing node.

在框520处，数据提供节点120-1向数据使用节点110发送所获取的第一非隐私数据。第一非隐私数据的发送方式与上文类似，这里不再赘述。At block 520 , the data providing node 120 - 1 sends the acquired first non-private data to the data consuming node 110 . The sending method of the first non-private data is similar to the above, and will not be repeated here.

由此，数据使用节点可以基于区块链系统中存储的联合元数据而生成数据条件信息并通过智能合约进行数据请求，数据提供节点可以获取相匹配的非隐私数据并发送给数据使用节点，从而在数据使用节点不知道任何隐私数据的前提下实现非隐私数据的安全共享。数据请求通过智能合约一旦上链，记录无法等改，这样方便日后追寻数据使用足迹，保证共享的非隐私数据不被滥用。此外，能够将多个数据提供节点处与数据条件信息相匹配的多项非隐私数据的属性信息进行关联，便于数据使用节点对多个数据提供节点处的多项非隐私数据进行关联后进行数据分析和模型训练。另外，利用智能合约建立数据共享激励机制，通过数字权益分享设计，可以建立更有效地数据共享平台。同时也确保数据提供节点在贡献非隐私数据到数据分析或人工智能模型训练任务后，能够得到公平的利益回报。As a result, data usage nodes can generate data condition information based on the joint metadata stored in the blockchain system and request data through smart contracts, and data providing nodes can obtain matching non-private data and send it to data usage nodes, thereby The secure sharing of non-private data is realized on the premise that the data using nodes do not know any private data. Once the data request is uploaded to the chain through the smart contract, the record cannot be changed, so that it is convenient to trace the data usage footprint in the future and ensure that the shared non-private data will not be abused. In addition, it is possible to correlate the attribute information of multiple non-private data at multiple data providing nodes that match the data condition information, which facilitates data usage nodes to associate multiple non-private data at multiple data providing nodes for data processing. Analysis and model training. In addition, smart contracts are used to establish a data sharing incentive mechanism, and a more effective data sharing platform can be established through the design of digital rights and interests sharing. At the same time, it also ensures that data providing nodes can get a fair return of interest after contributing non-private data to data analysis or artificial intelligence model training tasks.

在一些实施例中，数据提供节点120-1还可以确定第一非隐私数据的发送是否完成。In some embodiments, the data providing node 120-1 may also determine whether the sending of the first non-private data is completed.

如果数据提供节点120-1确定第一非隐私数据的发送完成，则通过第二智能合约基于激励机制信息执行从数据使用节点的数字钱包地址向当前节点的数字钱包地址转移与第一非隐私数据相对应的数字权益。If the data providing node 120-1 determines that the sending of the first non-private data is completed, the transfer of the first non-private data from the digital wallet address of the data usage node to the digital wallet address of the current node is performed based on the incentive mechanism information through the second smart contract. Corresponding digital rights and interests.

例如，可以通过向第二智能合约的地址发送消息，以调用第二智能合约中的相关代码基于激励机制信息执行从数据使用节点的数字钱包地址向当前节点的数字钱包地址转移与第一非隐私数据相对应的数字权益。For example, by sending a message to the address of the second smart contract to call the relevant code in the second smart contract based on the incentive mechanism information to perform the transfer from the digital wallet address of the data usage node to the digital wallet address of the current node and the first non-privacy Digital rights corresponding to data.

由此，能够在数据提供节点完成数据共享之后，实现对应的数据权益的转移，从而实现对于数据共享的激励。In this way, after the data providing node completes the data sharing, the transfer of the corresponding data rights and interests can be realized, thereby realizing the incentive for data sharing.

图6示出了根据本公开的实施例的用于数据共享的方法600的流程图。例如，方法600可以由如图1所示的数据使用节点110来执行。应当理解的是，方法600还可以包括未示出的附加框和/或可以省略所示出的框，本公开的范围在此方面不受限制。FIG. 6 shows a flowchart of a method 600 for data sharing according to an embodiment of the present disclosure. For example, method 600 may be performed by data usage node 110 as shown in FIG. 1 . It should be appreciated that method 600 may also include additional blocks not shown and/or blocks shown may be omitted, and that the scope of the present disclosure is not limited in this respect.

在框602处，数据使用节点110基于分布式系统100中存储的联合元数据，生成数据条件信息，联合元数据至少包括全局标识符和多项非隐私数据的多项属性信息之间的关联，全局标识符与隐私数据的哈希值相关联，隐私数据与多项非隐私数据相关联，多项非隐私数据位于分布式系统中的多个数据提供节点。联合元数据由分布式系统100中的多个数据提供节点120通过根据上述方法300存储到分布式系统中。At block 602, the data usage node 110 generates data condition information based on the joint metadata stored in the distributed system 100, the joint metadata includes at least the association between the global identifier and multiple attribute information of multiple non-private data, The global identifier is associated with the hash value of the private data, the private data is associated with multiple non-private data, and the multiple non-private data are located in multiple data providing nodes in the distributed system. The joint metadata is stored in the distributed system by multiple data providing nodes 120 in the distributed system 100 according to the method 300 described above.

在框604处，数据使用节点110向多个数据提供节点120发送数据请求，数据请求包括数据条件信息。At block 604, the data consuming node 110 sends a data request to a plurality of data providing nodes 120, the data request including data condition information.

在一些实施例中，数据请求还可以包括数据使用节点110的节点地址。In some embodiments, the data request may also include the node address of the data usage node 110 .

在一些实施例中，分布式系统100为区块链系统。此外，数据请求可以通过智能合约的形式发送。In some embodiments, the distributed system 100 is a blockchain system. Additionally, data requests can be sent in the form of smart contracts.

在框606处，数据使用节点110从多个数据提供节点120中的至少一个数据提供节点接收与数据条件信息相匹配的至少一项非隐私数据。At block 606, the data usage node 110 receives at least one item of non-private data matching the data condition information from at least one data providing node among the plurality of data providing nodes 120 .

由此，数据使用节点可以基于分布式系统中存储的联合元数据而生成数据条件信息进行数据请求，数据提供节点可以获取相匹配的非隐私数据并发送给数据使用节点，从而在数据使用节点不知道任何隐私数据的前提下实现非隐私数据的安全共享。As a result, the data usage node can generate data condition information based on the joint metadata stored in the distributed system for data request, and the data provider node can obtain the matching non-private data and send it to the data usage node, so that the data usage node does not Realize the safe sharing of non-private data under the premise of knowing any private data.

图7示出了根据本公开的实施例的用于数据共享的方法700的流程图。例如，方法700可以由如图1所示的数据使用节点110来执行。应当理解的是，方法700还可以包括未示出的附加框和/或可以省略所示出的框，本公开的范围在此方面不受限制。FIG. 7 shows a flowchart of a method 700 for data sharing according to an embodiment of the present disclosure. For example, method 700 may be performed by data usage node 110 as shown in FIG. 1 . It should be appreciated that method 700 may also include additional blocks not shown and/or blocks shown may be omitted, and that the scope of the present disclosure is not limited in this regard.

在框702处，数据使用节点110基于区块链系统中存储的联合元数据，生成数据条件信息，联合元数据至少包括全局标识符和多项非隐私数据的多项属性信息之间的关联，全局标识符与隐私数据的哈希值相关联，隐私数据与多项非隐私数据相关联，多项非隐私数据位于分布式系统中的多个数据提供节点。联合元数据由区块链系统中的一个或多个数据提供节点120通过根据上述方法300存储到区块链系统中。At block 702, the data usage node 110 generates data condition information based on the joint metadata stored in the blockchain system, the joint metadata includes at least the association between the global identifier and multiple attribute information of multiple non-private data, The global identifier is associated with the hash value of the private data, the private data is associated with multiple non-private data, and the multiple non-private data are located in multiple data providing nodes in the distributed system. The joint metadata is stored in the blockchain system by one or more data providing nodes 120 in the blockchain system according to the method 300 described above.

在框704处，数据使用节点110向区块链系统中的多个数据提供节点120发送第一智能合约，第一智能合约包括数据条件信息和用于数据共享的激励机制信息。At block 704, the data using node 110 sends a first smart contract to multiple data providing nodes 120 in the blockchain system, the first smart contract includes data condition information and incentive mechanism information for data sharing.

数据使用节点可以调用系统智能合约模板进行修改或直接根据自己需求编写生成第一智能合约，并将第一智能合约部署到区块链上，从而发布数据请求给区块链系统中的各个节点。由于区块链数据的不可篡改特性，智能合约一经部署，将无法直接在原有合约上进行修改。如果需要对智能合约条款进行修改，系统将在参与区块链节点达成共识的前提下，调用合约销毁子模块销毁原智能合约，并根据新需求生成新的智能合约进行部署。所有合约生成和销毁记录将会记录在不可篡改的区块链区块数据上，便于追踪数据请求分享情况，更好地保护数据安全。The data use node can call the system smart contract template to modify or directly write and generate the first smart contract according to its own needs, and deploy the first smart contract to the blockchain, thereby issuing data requests to each node in the blockchain system. Due to the non-tamperable nature of blockchain data, once a smart contract is deployed, it cannot be directly modified on the original contract. If the terms of the smart contract need to be modified, the system will call the contract destruction sub-module to destroy the original smart contract on the premise that the participating blockchain nodes reach a consensus, and generate a new smart contract for deployment according to new requirements. All contract generation and destruction records will be recorded on the immutable blockchain block data, which is convenient for tracking data request sharing and better protecting data security.

在框706处，数据使用节点110基于区块链系统中存储的至少一个第二智能合约，获取至少一个个数据提供节点120的至少一个节点地址，至少一个第二智能合约由至少一个数据提供节点响应于第一智能合约而在区块链系统中存储，至少一个第二智能合约中的每个第二智能合约包括数据条件信息、激励机制信息、数据使用节点的节点地址和数字钱包地址以及对应数据提供节点的节点地址和数字钱包地址。至少一个第二智能合约由至少一个数据提供节点120通过上述方法500存储在区块链系统中。At block 706, the data using node 110 acquires at least one node address of at least one data providing node 120 based on at least one second smart contract stored in the blockchain system, and at least one second smart contract is provided by at least one data providing node Stored in the blockchain system in response to the first smart contract, each second smart contract in at least one second smart contract includes data condition information, incentive mechanism information, node address and digital wallet address of the data usage node, and corresponding The data provides the node address and digital wallet address of the node. At least one second smart contract is stored in the blockchain system by at least one data providing node 120 through the method 500 described above.

在框708处，数据使用节点110基于至少一个节点地址，从至少一个数据提供节点120获取与数据条件信息相匹配的至少一项非隐私数据。At block 708, the data usage node 110 acquires at least one item of non-private data matching the data condition information from at least one data providing node 120 based on at least one node address.

由此，数据使用节点可以基于区块链系统中存储的联合元数据而生成数据条件信息并通过智能合约进行数据请求，数据提供节点可以获取相匹配的非隐私数据并发送给数据使用节点，从而在数据使用节点不知道任何隐私数据的前提下实现非隐私数据的安全共享。数据请求通过智能合约一旦上链，记录无法等改，这样方便日后追寻数据使用足迹，保证共享的非隐私数据不被滥用。另外，利用智能合约建立数据共享激励机制，通过数字权益分享设计，可以建立更有效地数据共享平台。同时也确保数据提供节点在贡献非隐私数据到数据分析或人工智能模型训练任务后，能够得到公平的利益回报。As a result, data usage nodes can generate data condition information based on the joint metadata stored in the blockchain system and request data through smart contracts, and data providing nodes can obtain matching non-private data and send it to data usage nodes, thereby The secure sharing of non-private data is realized on the premise that the data using nodes do not know any private data. Once the data request is uploaded to the chain through the smart contract, the record cannot be changed, so that it is convenient to trace the data usage footprint in the future and ensure that the shared non-private data will not be abused. In addition, smart contracts are used to establish a data sharing incentive mechanism, and a more effective data sharing platform can be established through the design of digital rights and interests sharing. At the same time, it also ensures that data providing nodes can get a fair return of interest after contributing non-private data to data analysis or artificial intelligence model training tasks.

在一些实施例中，从至少一个数据提供节点120获取与数据条件信息相匹配的至少一项非隐私数据包括从至少两个数据提供节点120获取与数据条件信息相匹配的至少两项非隐私数据。数据使用节点110还可以基于分布式系统中存储的、全局标识符和至少两项属性信息之间的关联，将至少两项非隐私数据进行关联。全局标识符和至少两项属性信息之间的关联由至少两个数据提供节点响应于第一智能合约而在区块链系统中存储。In some embodiments, acquiring from at least one data providing node 120 at least one item of non-private data matching the data condition information includes acquiring at least two items of non-private data matching the data condition information from at least two data providing nodes 120 . The data usage node 110 may also associate at least two items of non-private data based on the association between the global identifier and at least two items of attribute information stored in the distributed system. The association between the global identifier and at least two items of attribute information is stored in the blockchain system by at least two data providing nodes in response to the first smart contract.

由此，能够基于多个数据提供节点处与数据条件信息相匹配的至少两项非隐私数据的属性信息之间的关联，对至少两个数据提供节点处的至少两项非隐私数据进行关联后用于数据分析和模型训练，有利于提高数据分析和模型训练的准确度。In this way, based on the association between at least two items of non-private data at least two items of non-private data that match the data condition information at multiple data providing nodes, after at least two items of non-private data at least two data providing nodes are associated It is used for data analysis and model training, which is conducive to improving the accuracy of data analysis and model training.

图8示出了可以用来实施本公开内容的实施例的示例设备800的示意性框图。例如，如图1所示的数据使用节点110和数据提供节点120可以由设备800来实施。如图所示，设备800包括中央处理单元(CPU)801，其可以根据存储在只读存储器(ROM)802中的计算机程序指令或者从存储单元808加载到随机存取存储器(RAM)803中的计算机程序指令，来执行各种适当的动作和处理。在随机存取存储器803中，还可存储设备800操作所需的各种程序和数据。中央处理单元801、只读存储器802以及随机存取存储器803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。FIG. 8 shows a schematic block diagram of an example device 800 that may be used to implement embodiments of the present disclosure. For example, the data usage node 110 and the data providing node 120 shown in FIG. 1 may be implemented by the device 800 . As shown, the device 800 includes a central processing unit (CPU) 801 that can execute commands according to computer program instructions stored in a read only memory (ROM) 802 or loaded from a storage unit 808 into a random access memory (RAM) 803 computer program instructions to perform various appropriate actions and processes. In the random access memory 803, various programs and data necessary for the operation of the device 800 can also be stored. The central processing unit 801 , the read-only memory 802 and the random-access memory 803 are connected to each other through a bus 804 . An input/output (I/O) interface 805 is also connected to the bus 804 .

设备800中的多个部件连接至输入/输出接口805，包括：输入单元806，例如键盘、鼠标、麦克风等；输出单元807，例如各种类型的显示器、扬声器等；存储单元808，例如磁盘、光盘等；以及通信单元809，例如网卡、调制解调器、无线通信收发机等。通信单元809允许设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 800 are connected to the input/output interface 805, including: an input unit 806, such as a keyboard, mouse, microphone, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a disk, CD, etc.; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.

上文所描述的各个过程和处理，例如方法200-700，可由中央处理单元801执行。例如，在一些实施例中，方法200-700可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元808。在一些实施例中，计算机程序的部分或者全部可以经由只读存储器802和/或通信单元809而被载入和/或安装到设备800上。当计算机程序被加载到随机存取存储器803并由中央处理单元801执行时，可以执行上文描述的方法200-700的一个或多个动作。The various procedures and processes described above, such as the methods 200-700, can be executed by the central processing unit 801. For example, in some embodiments, methods 200 - 700 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 808 . In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 800 via the read-only memory 802 and/or the communication unit 809 . When the computer program is loaded into random access memory 803 and executed by central processing unit 801, one or more actions of methods 200-700 described above may be performed.

本公开涉及方法、装置、系统、电子设备、计算机可读存储介质和/或计算机程序产品。计算机程序产品可以包括用于执行本公开的各个方面的计算机可读程序指令。The present disclosure relates to methods, apparatuses, systems, electronic devices, computer readable storage media and/or computer program products. A computer program product may include computer readable program instructions for carrying out various aspects of the present disclosure.

计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA)，该电子电路可以执行计算机可读程序指令，从而实现本公开的各个方面。Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the present disclosure are implemented by executing computer readable program instructions.

这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理单元，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理单元执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processing unit of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

以上已经描述了本公开的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术的技术改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles of the various embodiments, practical applications or technical improvements over technologies in the market, or to enable other persons of ordinary skill in the art to understand the various embodiments disclosed herein.

Claims

1. A method for data sharing, comprising:

obtaining private data from local data comprising the private data and first non-private data, the first non-private data being associated with the private data;

generating a hash value of the private data;

generating a first association between a local identifier for the local data and the hash value;

generating a global identifier for indexing the hash value in a distributed system, the distributed system being associated with a current node;

generating a second association between the global identifier and the hash value; and

The second association is sent to other nodes in the distributed system to store the second association in the distributed system for connection sharing of the first non-private data.

2. The method of claim 1, further comprising:

acquiring first attribute information of the first non-private data from a current node;

obtaining one or more items of second attribute information of one or more items of second non-private data from one or more further data providing nodes in the distributed system, the one or more items of second non-private data being associated with the private data;

generating, based on the second association, federated metadata including at least an association between the global identifier, the first attribute information, and the one or more items of second attribute information; and

the federated metadata is sent to the other nodes in the distributed system to store the federated metadata in the distributed system for connection sharing of the first non-private data and the one or more second non-private data.

3. The method of claim 2, further comprising:

receiving a data request from a data usage node in the distributed system, the data request including data condition information, the data condition information being generated based on the federated metadata;

Acquiring first non-private data matched with the data condition information; and

and sending the acquired first non-private data to the data using node.

4. A method according to claim 3, further comprising:

obtaining, from at least one of the one or more further data providing nodes, at least one item of second attribute information of at least one item of second non-private data of the one or more items of second non-private data, the at least one item of second non-private data matching the data condition information;

generating a third association between the global identifier, the first attribute information, and the at least one item of second attribute information based on the second association; and

the third association is sent to other nodes in the distributed system to store the third association in the distributed system, so that the data usage node associates the first non-private data acquired from the current node with the at least one second non-private data acquired from the at least one further data providing node based on the third association.

5. The method of claim 3 or 4, wherein the data request further includes a node address of the data usage node, and acquiring first non-private data that matches the data condition information includes:

Determining whether the node address is matched with a preset node address; and

and if the node address is determined to be matched with the preset node address, acquiring first non-private data matched with the data condition information.

6. The method of claim 3 or 4, wherein the distributed system comprises a blockchain system, and receiving a data request from a data-using node comprises receiving a first smart contract from the data-using node, the first smart contract comprising the data condition information.

7. The method of claim 6, wherein the first smart contract further includes incentive mechanism information for data sharing, and acquiring first non-private data that matches the data condition information comprises:

determining whether first non-private data matched with the data condition information exists in the local data;

if it is determined that first non-private data matched with the data condition information exists in the local data, determining whether the incentive mechanism information passes evaluation;

generating and deploying a second smart contract in the blockchain system if it is determined that the incentive mechanism information passes the evaluation, the second smart contract including the data condition information, the incentive mechanism information, a node address and a digital wallet address of the data usage node, and a node address and a digital wallet address of a current node; and

If it is determined that the second smart contract is deployed in the blockchain system, first non-private data that matches the data condition information is obtained.

8. The method of claim 7, further comprising:

if it is determined that the transmission of the first non-private data is completed, performing, by the second smart contract, transfer of digital rights corresponding to the first non-private data from the digital wallet address of the data usage node to the digital wallet address of the current node based on the incentive mechanism information.

9. A method for data sharing, comprising:

generating data condition information based on joint metadata stored in a distributed system, the joint metadata comprising at least an association between a global identifier and a plurality of items of attribute information of a plurality of items of non-private data, the global identifier being associated with a hash value of private data, the private data being associated with the plurality of items of non-private data, the plurality of items of non-private data being located at a plurality of data providing nodes in the distributed system;

transmitting a data request to the plurality of data providing nodes, the data request including the data condition information; and

At least one item of non-private data matching the data condition information is received from at least one data providing node of the plurality of data providing nodes.

10. The method of claim 9, wherein the distributed system comprises a blockchain system.

11. The method of claim 9, wherein sending the data request to the plurality of data providing nodes comprises sending a first smart contract to the plurality of data providing nodes, the first smart contract comprising the data condition information.

12. The method of claim 11, wherein the smart contract further includes incentive mechanism information for data sharing, and the obtaining at least one item of non-private data matching the data condition information from at least one of the plurality of data providing nodes comprises:

obtaining at least one node address of the at least one data providing node based on at least one second smart contract stored in the blockchain system, the at least one second smart contract stored by the at least one data providing node in response to the first smart contract, each of the at least one second smart contract including the data condition information, the incentive mechanism information, a node address and a digital wallet address of the data using node, and a node address and a digital wallet address of a corresponding data providing node; and

At least one item of non-private data matching the data condition information is acquired from the at least one data providing node based on the at least one node address.

13. The method of any of claims 9-12, wherein receiving at least one item of non-private data from at least one of the plurality of data providing nodes that matches the data condition information comprises receiving at least two items of non-private data from at least two of the plurality of data providing nodes that match the data condition information, and the method further comprises:

the plurality of non-private data is associated based on an association between the global identifier and at least two items of attribute information stored in the distributed system, the association between the global identifier and at least two items of attribute information being stored by at least two data providing nodes in the blockchain system in response to the first smart contract.

14. A data providing node, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

15. A data usage node, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 9-13.

16. A distributed system, comprising:

a plurality of data providing nodes according to claim 14; and

the data usage node of claim 15.

17. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-13.