[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111538865A - Multi-party set synchronization method and device and electronic equipment - Google Patents

Multi-party set synchronization method and device and electronic equipment Download PDF

Info

Publication number
CN111538865A
CN111538865A CN202010230302.3A CN202010230302A CN111538865A CN 111538865 A CN111538865 A CN 111538865A CN 202010230302 A CN202010230302 A CN 202010230302A CN 111538865 A CN111538865 A CN 111538865A
Authority
CN
China
Prior art keywords
synchronization
fingerprint
cuckoo
aggregation
mcf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010230302.3A
Other languages
Chinese (zh)
Other versions
CN111538865B (en
Inventor
郭得科
罗来龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202010230302.3A priority Critical patent/CN111538865B/en
Publication of CN111538865A publication Critical patent/CN111538865A/en
Application granted granted Critical
Publication of CN111538865B publication Critical patent/CN111538865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明提供一种多方集合同步方法、装置和电子设备,其特征在于,包括:构建标志布谷鸟滤波数据结构并定义所述标志布谷鸟滤波的操作;将每个同步参与方的本地元素使用所述标志布谷鸟滤波分别进行表示,得到每个所述同步参与方的局部集合;将所述局部集合发送到中央节点进行聚合,得到全局集合;将所述全局集合发送至每个所述同步参与方;令所述同步参与方遍历所述全局集合,确定差异元素;令所述同步参与方从所述全局集合中获取所述差异元素,完成多方集合同步。

Figure 202010230302

The present invention provides a multi-party set synchronization method, device and electronic device, which are characterized by comprising: constructing a marker cuckoo filter data structure and defining an operation of the marker cuckoo filter; using the local elements of each synchronization participant to use all The flag cuckoo filter is respectively represented to obtain a partial set of each of the synchronization participants; the partial set is sent to the central node for aggregation to obtain a global set; the global set is sent to each of the synchronization participants make the synchronization participant traverse the global set to determine the difference element; make the synchronization participant obtain the difference element from the global set to complete the multi-party set synchronization.

Figure 202010230302

Description

多方集合同步方法、装置和电子设备Multi-party set synchronization method, apparatus and electronic device

技术领域technical field

本发明涉及分布式系统技术领域,尤其涉及一种多方集合同步方法、装 置和电子设备。The present invention relates to the technical field of distributed systems, and in particular, to a multi-party set synchronization method, device and electronic device.

背景技术Background technique

多方集合内容同步时分布式和网络系统中的基本功能。当前已有的数据 结构不能满足多方同步场景下要求的空间高效性、可融合性以及信息完整性 的要求,并且不能从全局层面对同步的传输开销进行优化。Basic functionality in distributed and networked systems when synchronizing multi-party collection content. The existing data structures cannot meet the requirements of space efficiency, fusion and information integrity required in multi-party synchronization scenarios, and cannot optimize the synchronization transmission overhead at the global level.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明的目的在于提出一种可以从全局层面对同步的传输开 销进行优化并且满足多方同步场景下要求的空间高效性、可融合性以及信息 完整性要求的多方集合同步方法、装置和电子设备。In view of this, the purpose of the present invention is to propose a multi-party set synchronization method and device that can optimize the synchronization transmission overhead from a global level and meet the requirements of space efficiency, fusion and information integrity required in a multi-party synchronization scenario. and electronic equipment.

基于上述目的,本发明提供了一种多方集合同步方法,其特征在于,包 括:Based on the above object, the invention provides a kind of multi-party set synchronization method, it is characterized in that, comprises:

构建标志布谷鸟滤波数据结构并定义所述标志布谷鸟滤波的操作;constructing a flag cuckoo filter data structure and defining the operation of the flag cuckoo filter;

将每个同步参与方的本地元素使用所述标志布谷鸟滤波分别进行表示, 得到每个所述同步参与方的局部集合;The local elements of each synchronization participant are respectively represented by using the flag cuckoo filter to obtain a local set of each synchronization participant;

将所述局部集合发送到中央节点进行聚合,得到全局集合;sending the local set to the central node for aggregation to obtain a global set;

将所述全局集合发送至每个所述同步参与方;sending the global set to each of the synchronization participants;

令所述同步参与方遍历所述全局集合,确定差异元素;Let the synchronization participants traverse the global set to determine the difference element;

令所述同步参与方从所述全局集合中获取所述差异元素,完成多方集合 同步。The synchronization participant is made to obtain the difference element from the global set to complete the multi-party set synchronization.

在一些实施方式中,所述标志布谷鸟滤波数据结构具体包括:In some embodiments, the flag cuckoo filter data structure specifically includes:

同构标志布谷鸟滤波:由m个单元格组成,每个单元格包含b个存储槽, 最多存储b个元素指纹,所述元素指纹能被存储在任意一个所述候选单元格 中;所述存储槽有两个域,包括用于存储所述元素指纹信息的指纹域和用于 记录所存储元素的隶属信息的标志域;所述指纹域包含f位比特,所述标志 域包含n个比特位,其中n为参与同步的集合数量;为任意元素x提供两个 候选单元格,所述元素x在两个所述候选单元格的索引分别为: h1(x)=hash(x)%m和h2(x)=hash(x)⊙(hash(ηx)%m),其中ηx表示元素x的元素 指纹,hash表示哈希函数;Isomorphic sign cuckoo filter: It consists of m cells, each cell contains b storage slots, and stores up to b element fingerprints, and the element fingerprints can be stored in any one of the candidate cells; the The storage slot has two fields, including a fingerprint field for storing the fingerprint information of the element and a flag field for recording the membership information of the stored element; the fingerprint field contains f bits, and the flag field contains n bits bit, where n is the number of sets participating in synchronization; two candidate cells are provided for any element x, and the indices of the element x in the two candidate cells are: h 1 (x)=hash(x)% m and h 2 (x)=hash(x)⊙(hash(n x )%m), where n x represents the element fingerprint of the element x, and hash represents the hash function;

异构标志布谷鸟滤波:单元格数量以及每个单元格容量的关系具体为: mi=α|Si|/bi,其中mi和bi分别表示单元格数量以及每个单元格的容量,α≥1 是一个常数;所述元素x在两个所述候选单元格的索引分别为: h1(x)=hash(ηx)%m以及

Figure BDA0002429088440000021
Heterogeneous sign cuckoo filter: the relationship between the number of cells and the capacity of each cell is: m i =α|S i |/bi , where m i and bi represent the number of cells and the capacity of each cell, respectively capacity, α≥1 is a constant; the indices of the element x in the two candidate cells are: h 1 (x)=hash(η x )%m and
Figure BDA0002429088440000021

在一些实施方式中,所述标志布谷鸟滤波的操作具体包括:In some embodiments, the operation of the flag cuckoo filtering specifically includes:

元素插入:通过哈希函数计算出元素指纹,然后将所述元素指纹插入到 两个所述候选单元格中;当两个所述候选单元格中所有存储槽均被占用时进 行重新分配;Element insertion: calculate element fingerprint by hash function, then insert described element fingerprint into two described candidate cells; When all storage slots in two described candidate cells are all occupied, redistribute;

元素查询:遍历两个所述候选单元格,找到被查询元素时返回所述被查 询元素的存储槽的标志域;否则,返回错误;Element query: traverse two described candidate cells, and return the flag field of the storage slot of the queried element when finding the queried element; otherwise, return an error;

元素删除:Element removal:

从某个集合中删除某个元素时:首先确定待删除元素的元素指纹的候选 单元格位置,如果所述待删除元素的元素指纹不存在或者存储所述待删除元 素的元素指纹的存储槽的标志域的第i个比特位为0,则返回错误;否则, 将存储所述待删除元素的元素指纹的存储槽的标志域的第i个比特位反置为 0;若存储所述待删除元素的元素指纹的存储槽的标志域所有比特皆为0,则 将所述待删除元素的元素指纹也删除;When deleting an element from a certain set: first determine the candidate cell position of the element fingerprint of the element to be deleted, if the element fingerprint of the element to be deleted does not exist or the storage slot of the element fingerprint of the element to be deleted is stored If the i-th bit of the flag field is 0, an error is returned; otherwise, the i-th bit of the flag field of the storage slot storing the element fingerprint of the element to be deleted is inverted to 0; If all bits of the flag field of the storage slot of the element fingerprint of the element are all 0, the element fingerprint of the element to be deleted is also deleted;

从所有集合中删除某个元素时:将所述待删除元素的元素指纹删除并将 所述待删除元素的元素指纹所在存储槽的标志域的所有位都置为0;When deleting a certain element from all sets: the element fingerprint of the element to be deleted is deleted and all positions of the sign field of the storage slot where the element fingerprint of the element to be deleted is placed are set to 0;

多个标志布谷鸟滤波聚合:给定两个标志布谷鸟滤波向量MCFi和MCFj, 将这两个向量中共同元素的隶属信息添加到向量MCFi中,MCFj中的差异元 素插入到MCFi中。Multiple sign cuckoo filter aggregation: Given two sign cuckoo filter vectors MCF i and MCF j , add the membership information of common elements in these two vectors to vector MCF i , and insert the difference elements in MCF j into MCF i in.

在一些实施方式中,在所述重新分配具体包括:In some embodiments, the reallocation specifically includes:

当两个所述候选单元格中所有存储槽均被占用时,随机踢出一个已经 存储的元素指纹,得到一个空存储槽,然后将待存储元素指纹和所述待存 储元素指纹的隶属信息存储在所述空存储槽中;被踢出的元素指纹将被重 分配到另一个候选单元格中;当没有额外元素指纹被踢出或者重新分配次 数到达给定上限时结束。When all the storage slots in the two candidate cells are occupied, randomly kick out a stored element fingerprint to obtain an empty storage slot, and then store the element fingerprint to be stored and the membership information of the element fingerprint to be stored. In the empty storage slot; the kicked element fingerprints will be reallocated into another candidate cell; it ends when no additional element fingerprints are kicked out or the number of reallocations reaches a given upper limit.

在一些实施方式中,所述将所述局部集合发送到中央节点进行聚合具体 包括:In some embodiments, the sending the partial set to the central node for aggregation specifically includes:

选择所述同步参与方中具有最大节点度的一方作为中央节点;Select the party with the largest node degree among the synchronization participants as the central node;

所述同构标志布谷鸟滤波:通过最小生成树算法选择聚合路径,将所述 局部集合通过所述聚合路径发送到中央节点进行聚合;Described isomorphic sign cuckoo filter: select aggregation path by minimum spanning tree algorithm, and send described partial set to central node for aggregation through described aggregation path;

所述异构标志布谷鸟滤波:构建一个空标志布谷鸟滤波,所述空标志布 谷鸟滤波的容量由待聚合的局部集合的元素指纹个数决定;将所述待聚合的 局部集合中的元素指纹逐个插入到所述空标志布谷鸟滤波中。The heterogeneous sign cuckoo filter: constructs an empty sign cuckoo filter, and the capacity of the empty sign cuckoo filter is determined by the number of element fingerprints of the partial set to be aggregated; Fingerprints are inserted one by one into the empty flag cuckoo filter.

在一些实施方式中,所述令所述同步参与方遍历所述全局集合,确定差 异元素具体包括:In some embodiments, the step of causing the synchronization participants to traverse the global set to determine the difference element specifically includes:

所述差异元素包括缺失元素和独有元素;The difference elements include missing elements and unique elements;

所述同步参与方遍历所述全局集合,将所述缺失元素的元素指纹及其 隶属信息添加到缺失元素集合中,所述独有元素的元素指纹及其隶属信息添 加到独有元素集合中。The synchronization participant traverses the global set, adds the element fingerprint of the missing element and its membership information to the missing element set, and adds the element fingerprint of the unique element and its membership information to the unique element set.

在一些实施方式中,当所述同步参与方动态地加入和离开时:In some embodiments, when the synchronization participants join and leave dynamically:

当参与同步的同步参与方离开时,若离开的所述同步参与方在聚合路径 中非最小生成树的叶子结点,则所述最小生成树被分裂为多个子树,将所述 多个子树使用子树间最小边进行连接得到新的最小生成树;When the synchronization participant participating in the synchronization leaves, if the left synchronization participant is not a leaf node of the minimum spanning tree in the aggregation path, the minimum spanning tree is split into multiple subtrees, and the multiple subtrees are split into multiple subtrees. Use the minimum edge between subtrees to connect to obtain a new minimum spanning tree;

当有新的同步参与方加入时,选取所述新的同步参与方与已有同步参与 方之间的最小边,并将所述新的同步参与方经由所述最小边加入到已有最小 生成树中。When a new synchronization participant joins, select the minimum edge between the new synchronization participant and the existing synchronization participant, and add the new synchronization participant to the existing minimum generation via the minimum edge in the tree.

基于同一发明构思,本发明还提供了一种多方集合同步装置,其特征在 于,包括:Based on the same inventive concept, the present invention also provides a multi-party set synchronization device, characterized in that it includes:

数据结构构建模块,被配置为构建标志布谷鸟滤波数据结构并定义所述 标志布谷鸟滤波的操作;a data structure building module configured to construct a flag cuckoo filter data structure and define the operation of the flag cuckoo filter;

本地元素表示模块,被配置为将每个同步参与方的本地元素使用所述标 志布谷鸟滤波分别进行表示,得到每个所述同步参与方的局部集合;a local element representation module, configured to represent the local elements of each synchronization participant using the sign cuckoo filter, respectively, to obtain a partial set of each of the synchronization participants;

集合聚合模块,被配置为将所述局部集合发送到中央节点进行聚合,得 到全局集合;a set aggregation module, configured to send the local set to the central node for aggregation to obtain a global set;

集合发送模块,被配置为将所述全局集合发送至每个所述同步参与方;a set sending module configured to send the global set to each of the synchronization participants;

差异元素确定模块,被配置为令所述同步参与方遍历所述全局集合,确 定差异元素;a difference element determination module, configured to make the synchronization participants traverse the global set to determine a difference element;

同步模块,被配置为令所述同步参与方从所述全局集合中获取所述差异 元素,完成多方集合同步。The synchronization module is configured to make the synchronization participant acquire the difference element from the global set to complete the synchronization of the multi-party set.

基于同一发明构思,本发明还提供了一种电子设备,包括存储器、处理 器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述 处理器执行所述程序时实现如权利要求1至7任意一项所述的方法。Based on the same inventive concept, the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor implements the program when executing the program. A method as claimed in any one of claims 1 to 7.

基于同一发明构思,本发明还提供了一种非暂态计算机可读存储介质, 其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机 指令用于使所述计算机执行权利要求1至7任一所述方法。Based on the same inventive concept, the present invention also provides a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute The method of any one of claims 1 to 7.

从上面所述可以看出,本发明提供的一种多方集合同步方法、装置和电 子设备,针对分布式场景下多方集合同步场景构建和使用标志布谷鸟滤波作 为集合概括的数据结构,通过最小生成树进行集合的聚合和分发,缩短了传 输路径,而中央节点的选取和边传输边聚合的传输方法进一步降低了传输开 销,满足多方同步场景下要求的空间高效性、可融合性以及信息完整性要求, 极大的节省了带宽。It can be seen from the above that the multi-party set synchronization method, device and electronic device provided by the present invention construct and use the flag cuckoo filter as the data structure of the set summary for the multi-party set synchronization scene in the distributed scenario. The tree aggregates and distributes sets, shortening the transmission path, and the selection of the central node and the transmission method of aggregation while transmission further reduces the transmission overhead and meets the requirements of space efficiency, fusion and information integrity in multi-party synchronization scenarios. requirements, which greatly saves bandwidth.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面 描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明一个实施例的标志布谷鸟滤波的结构图;Fig. 1 is the structure diagram of the mark cuckoo filter of one embodiment of the present invention;

图2为本发明一个实施例的聚合与分发过程示意图;2 is a schematic diagram of an aggregation and distribution process according to an embodiment of the present invention;

图3为本发明一个实施例的电子设备的硬件结构示意图。FIG. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施 例,并参照附图,对本发明进一步详细说明。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

需要说明的是,除非另外定义,本发明实施例使用的技术术语或者科学 术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公 开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者 重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的 词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或 者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类 似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不 管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置 关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地 改变。It should be noted that, unless otherwise defined, the technical terms or scientific terms used in the embodiments of the present invention shall have the usual meanings understood by those with ordinary skill in the art to which this disclosure belongs. As used in this disclosure, "first," "second," and similar terms do not denote any order, quantity, or importance, but are merely used to distinguish the different components. "Comprising" or "comprising" and similar words mean that the elements or things appearing before the word encompass the elements or things recited after the word and their equivalents, but do not exclude other elements or things. Words like "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only used to indicate the relative positional relationship. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.

随着用户将计算和数据转移到云端,诸如Dropbox、Google Drive以及 OneDrive的云服务不断涌现,用于帮助用户从不同设备访问数据以及实现在 相同数据上的协同任务。由于多个数据副本共同存在于用户设备端、云端服 务器和边缘服务器,当用户从不同设备对数据进行更新时,这些副本必须定 期进行同步以保证其一致性和正确性。此种分布式场景下的多方集合同步问 题也普遍存在于无线传感网络、软件定义网络、内容分发网络、区块链交易 过程以及其他场景中。As users move computing and data to the cloud, cloud services such as Dropbox, Google Drive, and OneDrive are emerging to help users access data from different devices and perform collaborative tasks on the same data. Since multiple data copies coexist on the user device, cloud server, and edge server, when users update data from different devices, these copies must be regularly synchronized to ensure their consistency and correctness. The multi-party set synchronization problem in this distributed scenario also commonly exists in wireless sensor networks, software-defined networks, content distribution networks, blockchain transaction processes, and other scenarios.

多方集合同步能天然地分解为两个维度加以解决:1)集合表示—如何 表示集合元素;2)同步策略—如何让同步多方进行交互以确定并传输差异 元素。现有的集合表示方法主要依赖于线性sketch数据结构,例如哈希树、 Bloom滤波、可逆Bloom滤波,可逆Bloom查询表(Invertible Bloom lookup table,IBLT)等。特别地,如果集合元素可以用一个整数来表示,则基于这 些整数构架的集合特征多项式也可以用于作为集合sketch。同步策略则是建 立在这些集合表示方法之上。通常而言,这些sketch需要在参与同步的各方 之间进行交换。在掌握其他同步参与方的集合sketch之后,本地同步参与方 能根据所获得的sketch推断出需要传输的差异元素。Multi-party set synchronization can be solved by being naturally decomposed into two dimensions: 1) set representation - how to represent set elements; 2) synchronization strategy - how to let synchronizing parties interact to determine and transmit difference elements. Existing set representation methods mainly rely on linear sketch data structures, such as hash tree, Bloom filter, reversible Bloom filter, Invertible Bloom lookup table (IBLT), etc. In particular, if the set elements can be represented by an integer, set characteristic polynomials based on these integer frameworks can also be used as set sketches. Synchronization strategies are built on top of these collection representations. Typically, these sketches need to be exchanged between the parties involved in the synchronization. After mastering the set sketches of other synchronization participants, the local synchronization participants can infer the difference elements that need to be transmitted according to the obtained sketches.

然而,当前的sketch数据结构不能胜任多方集合同步场景提出的新需求。 具体而言,在多方参与情形下,所采用的sketch数据结构需要具有以下特性: 1)空间高效性,所使用的空间要远小于集合元素的原始数据大小;2)可融 合性,多个sketch能在不损失任何信息的情况下融合为一个sketch;3)信息 完整性,集合元素的内容信息(例如元素指纹)和隶属信息(元素属于哪个 或哪些集合)都能被记录到sketch中。空间高效性和可融合性保证了在同步 参与方之间交换sketch带来的传输开销。信息完整性则保证了同步精度。但 是,当前已有sketch数据结构不同时具有以上三个特性。Bloom滤波及其变 种都是空间友好的,但他们中的大多数不具备可融合性。IBLT兼具空间友好 性和可融合性,却没实现信息完整性。However, the current sketch data structure is not capable of meeting the new requirements of multi-party set synchronization scenarios. Specifically, in the case of multi-party participation, the sketch data structure used needs to have the following characteristics: 1) Space efficient, the space used is much smaller than the original data size of the set elements; 2) Fusion, multiple sketches It can be merged into a sketch without losing any information; 3) Information integrity, the content information (such as element fingerprint) and membership information (which set or sets the element belongs to) of the set elements can be recorded in the sketch. Space efficiency and merging guarantees the transmission overhead of exchanging sketches between synchronizing participants. Information integrity ensures synchronization accuracy. However, the existing sketch data structures do not have the above three characteristics at the same time. Bloom filters and their variants are space friendly, but most of them are not fusible. IBLT is both space-friendly and fusible, but does not achieve information integrity.

再者,现有同步策略没有综合考虑同步参与方在网络中的位置以及数据 在同步参与方的分布情况,故不能实现传输开销的全局优化。因此,这些同 步策略通常会导致不必要的sketch交换或者数据传输。当前,大多数云存储 服务部署副本策略来实现集合同步。它们通常将云端服务器作为中央节点, 用于收集其他用户设备即各个其他同步参与方上的数据。其他用户设备则上 传其更新的数据模块到云端或者从云端下载其缺失的数据模块即多方同步。 不幸的是,此种同步策略的传输开销高度依赖于同步参与方的物理位置以及 同步参与方之间的距离。这一策略也会浪费大量网络带宽,因为同步方只能 从云服务器端获得数据,尽管该同步方的邻居节点也有其所需数据。其他可 能的同步策略,例如通过all-to-all或者gossip协议来交换sketch和传输差异 元素,则会更严重地占用网络带宽资源。Furthermore, the existing synchronization strategies do not comprehensively consider the location of the synchronization participants in the network and the distribution of data in the synchronization participants, so global optimization of the transmission overhead cannot be achieved. Therefore, these synchronization strategies often lead to unnecessary sketch exchanges or data transfers. Currently, most cloud storage services deploy replica strategies to achieve collection synchronization. They usually use a cloud server as a central node for collecting data on other user devices, that is, various other synchronization participants. Other user devices upload their updated data modules to the cloud or download their missing data modules from the cloud, that is, multi-party synchronization. Unfortunately, the transmission overhead of this synchronization strategy is highly dependent on the physical location of the synchronization participants and the distance between the synchronization participants. This strategy also wastes a lot of network bandwidth, because the synchronizing party can only get data from the cloud server, although the neighbor nodes of the synchronizing party also have the data it needs. Other possible synchronization strategies, such as exchanging sketches and transferring diff elements via all-to-all or gossip protocols, would consume more network bandwidth resources.

为此,本申请首先设计标志Cuckoo滤波即标志布谷鸟滤波(Marked CuckooFilter,MCF)数据结构用于表示多方集合,后续以MCF表示标志布 谷鸟滤波。以此为基础,本申请进一步提出多方集合同步策略MCFsyn。 MCFsyn基于参与同步方之间的最小生成树对各方产生的MCF进行聚合和 分发。各参与同步方对记录整个并集信息的全局MCF进行遍历,从而鉴定 其缺失和独有的集合元素。对于缺失元素,MCFsyn使得同步参与方选择最 佳元素内容提供方,从而实现传输开销最小化。实验表明,MCFsyn能从同 步精度和传输开销方面皆优于其他方法。For this reason, the present application firstly designs the Marked Cuckoo Filter, namely the Marked Cuckoo Filter (Marked CuckooFilter, MCF) data structure to represent the multi-party set, and subsequently uses the MCF to represent the Marked Cuckoo Filter. Based on this, the present application further proposes a multi-party set synchronization strategy MCFsyn. MCFsyn aggregates and distributes the MCF generated by each party based on the minimum spanning tree between participating parties. Each participating synchronizer traverses the global MCF that records the entire union information to identify its missing and unique set elements. For missing elements, MCFsyn enables synchronization participants to choose the best element content provider, thereby minimizing transmission overhead. Experiments show that MCFsyn can outperform other methods in terms of synchronization accuracy and transmission overhead.

本申请首先提出Cuckoo滤波的新变种标志Cuckoo滤波MCF即标志布 谷鸟滤波,用于表示多方集合。MCF在每个存储槽上额外增加一个标志域用 于表示所存储元素在同步完成前的隶属信息。例如,给定三个集合S1、S2和 S3,MCF将使用三位标志域来表示所存储元素在同步完成前的隶属信息。如 果该元素存在于集合Si中,标志域中第i位比特将会被置为1。MCF天然继 承了标准Cuckoo滤波的功能,包括元素插入、查询和删除。此外,MCF还 支持MCF之间的聚合操作,使得来自不同同步参与方的MCF能被聚合为单 一MCF。基于以上设计思路,MCF同时实现了空间高效性、可融合性和信 息完整性。The present application first proposes a new variant of Cuckoo filtering, the Cuckoo filtering MCF, ie, the Cuckoo filtering, which is used to represent a multi-party set. The MCF adds an additional flag field to each storage slot to indicate the membership information of the stored elements before the synchronization is completed. For example, given three sets S 1 , S 2 and S 3 , the MCF will use a three-bit flag field to represent the membership information of the stored elements before synchronization is complete. If the element exists in the set Si, the i -th bit in the flag field will be set to 1. MCF naturally inherits the functions of standard Cuckoo filtering, including element insertion, query and deletion. In addition, MCF also supports aggregation operations between MCFs, so that MCFs from different synchronization participants can be aggregated into a single MCF. Based on the above design ideas, MCF achieves space efficiency, fusion and information integrity at the same time.

基于MCF数据结构,本申请提出了新型多方集合同步策略MCFsyn。 MCFsyn有如下五个主要步骤:1)每个同步参与方将其集合元素表示为MCF; 2)聚合各个同步方产生的MCF为一个全局MCF;3)将全局MCF分发给 各个同步方;4)同步参与方遍历全局MCF确定其缺失元素;5)同步参与 方试图以最小传输开销获取其缺失元素。以上每个步骤都需要综合考虑同步 延迟、传输开销和底层网络拓扑。Based on the MCF data structure, this application proposes a novel multi-party set synchronization strategy MCFsyn. MCFsyn has the following five main steps: 1) Each synchronization participant represents its set element as MCF; 2) Aggregates the MCF generated by each synchronization party into a global MCF; 3) Distributes the global MCF to each synchronization party; 4) The synchronization participants traverse the global MCF to determine their missing elements; 5) The synchronization participants try to obtain their missing elements with minimal transmission overhead. Each of the above steps requires a comprehensive consideration of synchronization latency, transmission overhead, and the underlying network topology.

本发明的目的在于提出一种可以从全局层面对同步的传输开销进行优 化并且满足多方同步场景下要求的空间高效性、可融合性以及信息完整性要 求的多方集合同步方法、装置和电子设备。The purpose of the present invention is to propose a multi-party set synchronization method, device and electronic device that can optimize the synchronization transmission overhead from the global level and meet the requirements of space efficiency, fusion and information integrity required in multi-party synchronization scenarios.

下面结合图1、图2和图3分别为本发明一个实施例的标志布谷鸟滤波 的结构图、聚合与分发过程示意图、电子设备的硬件结构示意图对本发明做 进一步说明。本发明提供了一种多方集合同步方法,包括:The present invention will be further described below in conjunction with Fig. 1, Fig. 2 and Fig. 3, which are the structure diagram, aggregation and distribution process schematic diagram, and hardware structure schematic diagram of electronic equipment of a sign cuckoo filter according to an embodiment of the present invention, respectively. The present invention provides a multi-party set synchronization method, comprising:

S1:构建标志布谷鸟滤波数据结构并定义所述标志布谷鸟滤波的操作;S1: constructing the flag cuckoo filtering data structure and defining the operation of the flag cuckoo filtering;

所述标志布谷鸟滤波数据结构具体包括:The sign cuckoo filter data structure specifically includes:

同构标志布谷鸟滤波:由m个单元格组成,每个单元格包含b个存储槽, 最多存储b个元素指纹,所述元素指纹能被存储在任意一个所述候选单元格 中;所述存储槽有两个域,包括用于存储所述元素指纹信息的指纹域和用于 记录所存储元素的隶属信息的标志域;所述指纹域包含f位比特,所述标志 域包含n个比特位,其中n为参与同步的集合数量;为任意元素x提供两个 候选单元格,所述元素x在两个所述候选单元格的索引分别为: h1(x)=hash(x)%m和h2(x)=hash(x)⊙(hash(ηx)%m),其中ηx表示元素x的元素 指纹,hash表示哈希函数;Isomorphic sign cuckoo filter: It consists of m cells, each cell contains b storage slots, and stores up to b element fingerprints, and the element fingerprints can be stored in any one of the candidate cells; the The storage slot has two fields, including a fingerprint field for storing the fingerprint information of the element and a flag field for recording the membership information of the stored element; the fingerprint field contains f bits, and the flag field contains n bits bit, where n is the number of sets participating in synchronization; two candidate cells are provided for any element x, and the indices of the element x in the two candidate cells are: h 1 (x)=hash(x)% m and h 2 (x)=hash(x)⊙(hash(n x )%m), where n x represents the element fingerprint of the element x, and hash represents the hash function;

如图1所示,MCF由m个单元格组成。每个单元格包含b个存储槽, 所以每个单元格最多存储b个元素指纹。每个存储槽有两个域,包括用于存 储元素指纹信息的指纹域和用于记录所存储元素的隶属信息的标志域。指纹 域包含f位比特;而标志域则有n个比特位,其中n为参与同步的集合数量。 如果该元素存在于集合Si中,标志域中第i位比特将会被置为1。MCF为任 意元素x提供两个候选单元格,其索引为:h1(x)=hash(x)%m和h2(x)=hash(x)⊙ (hash(ηx)%m)。元素指纹ηx能被存储在任意其中一个单元格中。如果两个候选 单元格中所有存储槽均被占用,MCF将随机踢出一个已经存储的元素指纹, 并将ηx及其隶属信息存储在该存储槽中。被踢出的元素指纹将被重分配到其 另外一个候选单元格中。以上重分配过程将在没有额外元素指纹被踢出(元 素表示成功)或者重分配次数到达给定上限max时(元素表示失败)结束。As shown in Figure 1, the MCF consists of m cells. Each cell contains b storage slots, so each cell stores at most b element fingerprints. Each storage slot has two fields, including a fingerprint field for storing element fingerprint information and a flag field for recording membership information of stored elements. The fingerprint field contains f bits; while the flag field has n bits, where n is the number of sets participating in synchronization. If the element exists in the set Si, the i -th bit in the flag field will be set to 1. The MCF provides two candidate cells for any element x whose indices are: h1 (x)=hash(x)%m and h2 ( x)=hash(x)⊙(hash( nx )%m). The element fingerprint ηx can be stored in any of the cells. If all the storage slots in the two candidate cells are occupied, MCF will randomly kick out a stored element fingerprint, and store nx and its membership information in this storage slot. The kicked element fingerprint will be reassigned to another of its candidate cells. The above reallocation process will end when no additional element fingerprints are kicked out (elements indicate success) or when the number of reallocations reaches a given upper limit max (elements indicate failure).

异构标志布谷鸟滤波:单元格数量以及每个单元格容量的关系具体为: mi=α|Si|/bi,其中mi和bi分别表示单元格数量以及每个单元格的容量,α≥1 是一个常数;所述元素x在两个所述候选单元格的索引分别为: h1(x)=hash(ηx)%m以及

Figure BDA0002429088440000081
Heterogeneous sign cuckoo filter: the relationship between the number of cells and the capacity of each cell is: m i =α|S i |/bi , where m i and bi represent the number of cells and the capacity of each cell, respectively capacity, α≥1 is a constant; the indices of the element x in the two candidate cells are: h 1 (x)=hash(η x )%m and
Figure BDA0002429088440000081

实际中,同步参与方上的集合规模千差万别。比如,长期离线的参与方 可能具有比其他同步方少得多的元素,而突然更新的同步方可能包含更多元 素。在此种情况下,用同构MCF来表示不同集合并不是经济的选择。因此, 本节考虑使用异构MCF时的MCFsyn设计。如此一来,每个同步参与方能 根据其本地集合规模来定制其使用的MCF容量。In practice, the size of the set on the synchronization participants varies widely. For example, a participant that has been offline for a long time may have far fewer elements than other synchronizing parties, while a synchronizing party that is updated suddenly may contain more elements. In this case, representing different sets with isomorphic MCFs is not an economical choice. Therefore, this section considers the MCFsyn design when using heterogeneous MCFs. In this way, each synchronization participant can customize the MCF capacity it uses according to its local aggregate size.

与同构MCF不同,异构MCF由于具有不同的单元格个数而致使元素在 其中的候选单元格不尽相同。这一特征使得聚合操作失效。为此,MCF数据 结构需要重新设计,以便元素的候选单元格只由元素指纹决定。具体改进为, 给定元素x及其指纹ηx,其所对应的两个候选单元格为:h1(x)=hash(ηx)%m 以及

Figure BDA0002429088440000082
如此一来,当将来自于两个不同MCF中的元素 指纹插入到其聚合结果的MCF中时,其候选单元格仅由元素指纹决定。对 于参与方Pi,其使用的MCF长度计算为:mi=α|Si|/bi,其中mi和bi分别表 示MCFi的单元格数量以及每个单元格的容量,而α≥1是一个常数。Different from homogeneous MCF, heterogeneous MCF has different number of cells, so the candidate cells of elements in it are not the same. This feature invalidates the aggregation operation. To this end, the MCF data structure needs to be redesigned so that the candidate cells of an element are determined only by the element fingerprint. The specific improvement is that, given an element x and its fingerprint η x , the corresponding two candidate cells are: h 1 (x)=hash(η x )%m and
Figure BDA0002429088440000082
In this way, when element fingerprints from two different MCFs are inserted into the MCF of its aggregated result, its candidate cells are determined only by the element fingerprints. For the participant P i , the length of MCF used is calculated as: m i =α|S i |/bi , where m i and bi represent the number of cells of MCF i and the capacity of each cell, respectively, and α ≥1 is a constant.

除了元素指纹之外,多方集合表示还要求记录元素的隶属信息。为此, MCF在每个存储槽里引入额外比特并用于记录该信息。基于以上设计,MCF 支持面向元素的集合表示操作,包括元素插入、查询和删除。此外,MCF 还使能了MCF向量之间的聚合操作。MCF在指纹域和标志域的设计下,天 然支持多方集合表示。因此,MCF将在本申请中作为支撑多方同步策略的基 本数据结构。In addition to element fingerprints, multi-party set representation also requires the recording of element membership information. To this end, the MCF introduces extra bits in each memory slot and is used to record this information. Based on the above design, MCF supports element-oriented collection representation operations, including element insertion, query and deletion. In addition, MCF also enables aggregation operations between MCF vectors. Under the design of fingerprint domain and logo domain, MCF naturally supports multi-party set representation. Therefore, MCF will be used in this application as the basic data structure supporting the multi-party synchronization strategy.

元素插入:通过哈希函数计算出元素指纹,然后将所述元素指纹插入到 两个所述候选单元格中;当两个所述候选单元格中所有存储槽均被占用时进 行重新分配;具体过程包括:Element insertion: calculate the element fingerprint through a hash function, and then insert the element fingerprint into the two candidate cells; reassign when all the storage slots in the two candidate cells are occupied; specifically The process includes:

在插入集合Si中元素x时,MCF首先通过哈希函数计算出其f位元素指 纹ηx。随后,x相对应的候选单元格通过函数h1(x)和h2(x)计算而得。所述重 新分配具体包括:当两个所述候选单元格中所有存储槽均被占用时,随机 踢出一个已经存储的元素指纹,得到一个空存储槽,然后将待存储元素指 纹和所述待存储元素指纹的隶属信息存储在所述空存储槽中;被踢出的元 素指纹将被重分配到另一个候选单元格中;当没有额外元素指纹被踢出或者重新分配次数到达给定上限时结束,具体过程包括:如果其中存在没被 占用的存储槽,则ηx将被放在该存储槽。而该存储槽的标志域中第i位比特 将会被置为1。否则,MCF必须将这两个候选单元格中某个存储的元素指 纹踢出为ηx腾出空间。被踢出的元素指纹及其隶属信息将被重分配到其另 外一个候选单元格中。以上重分配过程将在没有额外元素指纹被踢出或者 重分配次数到达给定上限max时结束。在元素指纹被踢出存储槽时,该槽 的标志域中的1也将被反置为0。相应地,在重插入元素指纹时,其所在存 储槽的标志域的相应比特将被置为1。如此便保证了整个重插入过程中元素 信息的正确性。When inserting element x in set Si, MCF first calculates its f-bit element fingerprint η x through a hash function. Subsequently, the candidate cells corresponding to x are calculated by the functions h 1 (x) and h 2 (x). The reassignment specifically includes: when all the storage slots in the two candidate cells are occupied, randomly kicking out a stored element fingerprint to obtain an empty storage slot, and then combining the element fingerprint to be stored with the to-be-stored element fingerprint. Membership information storing element fingerprints is stored in the empty storage slot; kicked element fingerprints will be reassigned to another candidate cell; when no additional element fingerprints are kicked out or the number of reassignments reaches a given upper limit At the end, the specific process includes: if there is an unoccupied storage slot, nx will be placed in the storage slot. The i-th bit in the flag field of the storage slot will be set to 1. Otherwise, the MCF must kick out the element fingerprint stored in one of these two candidate cells to make room for nx . The kicked element fingerprint and its membership information will be reassigned to another of its candidate cells. The above reallocation process will end when no additional element fingerprints are kicked out or the number of reallocations reaches the given upper limit max. When the element fingerprint is kicked out of the storage slot, the 1 in the flag field of the slot will also be inverted to 0. Correspondingly, when the element fingerprint is reinserted, the corresponding bit of the flag field of the storage slot where it is located will be set to 1. In this way, the correctness of the element information in the whole re-insertion process is guaranteed.

元素查询:遍历两个所述候选单元格,找到被查询元素时返回所述被查 询元素的存储槽的标志域;否则,返回错误;具体过程包括:Element query: traverse the two candidate cells, and return the flag field of the storage slot of the queried element when the queried element is found; otherwise, return an error; the specific process includes:

查询元素x时,MCF只需要检查元素x的两个候选单元格即可。如果元 素指纹ηx能在这两个单元格的任意一个中找到,则MCF返回其存储槽的标 志域以显性地给出元素x的隶属信息。否则,MCF返回False,意味着元素 x不是任何集合中的元素。元素查询的时间复杂度也是常量级,因为只有两 个单元格会被遍历。MCF成员关系查询的假阳性误判概率为

Figure RE-GDA0002571382940000091
Figure RE-GDA0002571382940000092
其中f和b分别为元素指纹长度和每个单元格中存储槽数量。When querying for element x, MCF only needs to check two candidate cells for element x. If the element fingerprint nx can be found in either of these two cells, the MCF returns the flag field of its storage slot to explicitly give the membership information of element x. Otherwise, MCF returns False, meaning that element x is not an element in any set. The time complexity of element query is also constant, because only two cells will be traversed. The false positive and false positive probability of MCF membership query is
Figure RE-GDA0002571382940000091
Figure RE-GDA0002571382940000092
where f and b are the element fingerprint length and the number of storage slots in each cell, respectively.

元素删除:从某个集合中删除某个元素时:首先确定待删除元素的元素 指纹的候选单元格位置,如果所述待删除元素的元素指纹不存在或者存储所 述待删除元素的元素指纹的存储槽的标志域的第i个比特位为0,则返回错 误;否则,将存储所述待删除元素的元素指纹的存储槽的标志域的第i个比 特位反置为0;若存储所述待删除元素的元素指纹的存储槽的标志域所有比 特皆为0,则将所述待删除元素的元素指纹也删除;从所有集合中删除某个 元素时:将所述待删除元素的元素指纹删除并将所述待删除元素的元素指纹所在存储槽的标志域的所有位都置为0;具体过程包括:Element deletion: When deleting an element from a certain set: first determine the candidate cell position of the element fingerprint of the element to be deleted, if the element fingerprint of the element to be deleted does not exist or the element fingerprint of the element to be deleted is stored. If the i-th bit of the flag field of the storage slot is 0, an error is returned; otherwise, the i-th bit of the flag field of the flag field of the storage slot of the element fingerprint of the element to be deleted is inverted to 0; All bits of the flag field of the storage slot of the element fingerprint of the element to be deleted are all 0, then the element fingerprint of the element to be deleted is also deleted; when deleting a certain element from all sets: the element of the element to be deleted is deleted. The fingerprint is deleted and all the bits of the flag field of the storage slot where the element fingerprint of the element to be deleted is located are set to 0; the specific process includes:

删除操作在表示动态集合时至关重要。MCF支持从某个集合Si中删除元 素x,也支持将x从所有集合中同时删除。具体而言,为了删除集合中Si的 元素x,MCF首先确定元素指纹ηx的候选单元格位置。如果ηx不存在于其 候选单元格中或者存储该元素指纹的存储槽的标志域的第i个比特位为0, 则MCF返回False,表示该元素不存在于集合Si中。否则,存储该元素指纹 的存储槽的标志域的第i个比特位将被反置为0。在此之后,如果存储槽的 标志域所有比特皆为0,则将该元素指纹也删除。从所有集合中删除元素x 则更为简单直接。MCF只需要尝试将ηx删除并将其所在存储槽的标志域所 有位都置为0即可。由此可见,元素删除操作的时间复杂度为常数级。Delete operations are critical when representing dynamic collections. MCF supports deleting element x from a certain set Si, and also supports deleting x from all sets at the same time. Specifically, in order to delete element x of Si in the set, MCF first determines the candidate cell positions of element fingerprint nx . If nx does not exist in its candidate cell or the i -th bit of the flag field of the storage slot storing the fingerprint of the element is 0, the MCF returns False, indicating that the element does not exist in the set Si. Otherwise, the i-th bit of the flag field of the storage slot storing the fingerprint of this element will be inverted to 0. After this, if all bits in the flag field of the storage slot are 0, the element fingerprint is also deleted. It is simpler and more straightforward to remove element x from all sets. MCF only needs to try to delete nx and set all bits in the flag field of the storage slot where it is located to 0. It can be seen that the time complexity of the element deletion operation is constant.

Figure BDA0002429088440000101
Figure BDA0002429088440000101

表1 算法1Table 1 Algorithm 1

多个标志布谷鸟滤波聚合:给定两个标志布谷鸟滤波向量MCFi和MCFj, 将这两个向量中共同元素的隶属信息添加到向量MCFi中,MCFj中的差异元 素插入到MCFi中;具体过程包括:Multiple sign cuckoo filter aggregation: Given two sign cuckoo filter vectors MCF i and MCF j , add the membership information of common elements in these two vectors to vector MCF i , and insert the difference elements in MCF j into MCF i ; the specific process includes:

聚合操作意味着将来自于多个同构MCF的元素指纹信息及其服隶属信 息融合到一个MCF中。该操作对于降低集合同步传输开销具有重要意义。 如算法1所示,给定两个MCF向量MCFi和MCFj,聚合操作的基本思想是将 这两个向量中共同元素的隶属信息添加到向量MCFi中,而MCFj中的差异元 素则被插入到MCFi中。具体而言,算法遍历整个MCFj向量。对于任意存储 的元素指纹MCFj[k][r].fingerprint(0≤k≤m-1而0≤r≤b-1),MCF尝试在MCFi中查找 该元素指纹。如果该元素指纹能在MCFi中找到,则其所对应存储槽(记为 slot)的标志域将被重新计算为该标志域与MCFj[k][r].mark的或操作结果(算 法1中第4到第6行)。否则,MCFj[k][r].fingerprint将被插入到MCFi中。最后, 更新过后的MCFi将被作为聚合结果被返回。该算法的时间复杂度为O(mb)。Aggregation operation means fusing element fingerprint information and its server membership information from multiple homogeneous MCFs into one MCF. This operation is of great significance for reducing the overhead of ensemble synchronization transmission. As shown in Algorithm 1, given two MCF vectors MCF i and MCF j , the basic idea of the aggregation operation is to add the membership information of the common elements in these two vectors to the vector MCF i , while the difference elements in MCF j are is inserted into MCF i . Specifically, the algorithm traverses the entire MCF j vector. For an arbitrarily stored element fingerprint MCF j [k][r].fingerprint (0≤k≤m-1 and 0≤r≤b-1), MCF tries to find the element fingerprint in MCF i . If the element fingerprint can be found in MCF i , the mark field of its corresponding storage slot (referred to as slot) will be recalculated as the OR operation result of the mark field and MCF j [k][r].mark (algorithm 1, lines 4 to 6). Otherwise, MCF j [k][r].fingerprint will be inserted into MCF i . Finally, the updated MCF i will be returned as the aggregated result. The time complexity of this algorithm is O(mb).

基于以上同步框架,仍然有两个挑战问题急需解决。首先,聚合和分发 MCF的路径对总的传输开销有明显影响。Gossip或者广播这些sketch也能达 到目的,但是却会导致大量延迟和传输开销。因此,需要综合考虑底层网络 并合理规划传输路径。其次,缺失元素可能存在于多个同步方中。如何选择 最优的缺失元素发送方也极具挑战。为解决以上两个问题,本申请首先在介 绍具体方案之前给出同步组抽象。Based on the above synchronization framework, there are still two challenges that need to be solved urgently. First, the paths for aggregating and distributing MCFs have a significant impact on the total transmission overhead. Gossiping or broadcasting these sketches can also do the trick, but incurs a lot of latency and transmission overhead. Therefore, it is necessary to comprehensively consider the underlying network and plan the transmission path reasonably. Second, missing elements may exist in multiple synchronization parties. How to choose the optimal missing element sender is also very challenging. In order to solve the above two problems, this application first gives the synchronization group abstraction before introducing the specific solution.

考虑n个同步参与方,记为P1,…,Pn,每个参与方上有一个本地集合。 如图2所示,本申请将同步组抽象为一个完全图。图中每条边拥有一个权重 值,用于表示这对节点之间的物理网络跳数。例如,权重wi,j表示参与方Pi和Pj之间的跳数。在聚合步骤中,所有参与方所得sketch将被发送到中央节 点。基于以上定义,多方集合同步MCFsyn的具体设计细节如下:Consider n simultaneous participants, denoted as P 1 ,...,P n , with a local set on each participant. As shown in FIG. 2 , the present application abstracts the synchronization group into a complete graph. Each edge in the graph has a weight value, which is used to represent the number of physical network hops between the pair of nodes. For example, the weight w i,j represents the number of hops between the parties P i and P j . In the aggregation step, the sketches obtained by all participants will be sent to the central node. Based on the above definitions, the specific design details of the multi-party set synchronization MCFsyn are as follows:

S2:将每个同步参与方的本地元素使用所述标志布谷鸟滤波分别进行表 示,得到每个所述同步参与方的局部集合:S2: The local elements of each synchronization participant are respectively represented by the sign cuckoo filter, and the local set of each synchronization participant is obtained:

每个同步参与方将其本地元素用一个MCF进行表示,作为其集合的 sketch。不同参与方所用MCF可以根据具体需求选用同构MCF或异构MCF。 同构MCF具有相同的参数配置,包括MCF长度m,用于生成元素指纹和计 算候选单元格的哈希函数,以及每个单元格容量b,大大简化了后续聚合和 抽取步骤;异构MCF减少了存储开销和聚合分发的开销,降低了带宽浪费。Each synchronization participant represents its local element with an MCF as the sketch of its collection. MCFs used by different parties can choose homogeneous MCF or heterogeneous MCF according to specific needs. Homogeneous MCF has the same parameter configuration, including MCF length m, hash function for generating element fingerprints and calculating candidate cells, and each cell capacity b, which greatly simplifies subsequent aggregation and extraction steps; heterogeneous MCF reduces The storage overhead and aggregate distribution overhead are reduced, and bandwidth waste is reduced.

S3:将所述局部集合发送到中央节点进行聚合,得到全局集合:S3: Send the local set to the central node for aggregation to obtain a global set:

将所述局部集合发送到中央节点进行聚合具体包括:选择所述同步参与 方中具有最大节点度的一方作为中央节点;所述同构标志布谷鸟滤波:通过 最小生成树算法选择聚合路径,将所述局部集合通过所述聚合路径发送到中 央节点进行聚合;所述异构标志布谷鸟滤波:构建一个空标志布谷鸟滤波, 所述空标志布谷鸟滤波的容量由待聚合的局部集合的元素指纹个数决定;将 所述待聚合的局部集合中的元素指纹逐个插入到所述空标志布谷鸟滤波中。Sending the partial set to the central node for aggregation specifically includes: selecting the party with the largest node degree among the synchronization participants as the central node; the isomorphic sign cuckoo filtering: selecting an aggregation path through a minimum spanning tree algorithm, The local set is sent to the central node for aggregation through the aggregation path; the heterogeneous flag cuckoo filter: constructs an empty flag cuckoo filter, and the capacity of the empty flag cuckoo filter is determined by the elements of the local set to be aggregated. The number of fingerprints is determined; the element fingerprints in the partial set to be aggregated are inserted into the empty sign cuckoo filter one by one.

具体包括:Specifically include:

所有同步参与方将其sketch发送到中央节点进行聚合,从而得到并集的 全局sketch。出于隐私保护目的,被同步的数据只能在参与同步的各方之间 进行传输,因此,所选取的中央节点也必须是参与同步的某一方。All synchronization participants send their sketches to the central node for aggregation, resulting in a union global sketch. For the purpose of privacy protection, the data to be synchronized can only be transmitted between the parties participating in the synchronization. Therefore, the selected central node must also be one of the parties involved in the synchronization.

在构建同步组拓扑之后,MCFsyn计算出该图中的最小生成树(Minimum SpanningTree,MST)作为sketch聚合的路径。同步参与方中的某一方被选 为中央节点,负责收集来自其他同步方的MCF并将其聚合。值得注意的是, 中央节点的选择对整个聚合过程的传输开销没有影响。然而MCFsyn偏向于 选择在MST中具有最大节点度的参与方作为中央节点。如此一来,出于最 小生成树叶子节点的参与方能并行传输其MCF到中央节点。如图2中,参与方P3被选作中央节点。因此,参与方P2和P5可以同时将其MCF沿着MST 传输到P3。介于叶子节点和中央节点之间的中间节点则将来自于其子节点的 MCF与其本地MCF进行聚合,并将聚合结果发送给其父节点。例如图2中, 参与方P1在接收来自P5的MCF5之后,将接收到的MCF5与本地的MCF1进 行聚合,并将聚合结果发送给其父节点P3。此种边传输边聚合的传输策略能 明显地降低聚合的传输开销。After constructing the synchronization group topology, MCFsyn calculates the Minimum Spanning Tree (MST) in the graph as the path of sketch aggregation. One of the synchronization participants is elected as the central node, which is responsible for collecting and aggregating MCFs from other synchronization parties. It is worth noting that the selection of the central node has no effect on the transmission overhead of the entire aggregation process. However, MCFsyn prefers to select the party with the largest node degree in MST as the central node. In this way, the participants from the leaf nodes of the minimum generation tree can transmit their MCFs to the central node in parallel. As in Figure 2, participant P3 is selected as the central node. Thus, parties P2 and P5 can simultaneously transmit their MCFs to P3 along the MST. The intermediate node between the leaf node and the central node aggregates the MCF from its child nodes with its local MCF, and sends the aggregated result to its parent node. For example, in FIG. 2 , after receiving the MCF 5 from P 5 , the participant P 1 aggregates the received MCF 5 with the local MCF 1 , and sends the aggregation result to its parent node P 3 . This transmission strategy of aggregation while transmission can significantly reduce the transmission overhead of aggregation.

在异构标志布谷鸟滤波中,给定MCFi和MCFj,一个空的MCF,记为 MCFo,将会被启用,其容量可以计算为:mo bo=α|Si∪Sj|。|Si∪Sj|的值可以 通过MCFi和MCFj中存储的元素指纹个数来决定。然后,MCFi和MCFj中 的元素指纹将被逐个插入到MCFo中。因此,聚合操作的时间复杂度便从同 构MCF时的O(mb)增加为O(mibi+mjbj)。In the heterogeneous sign cuckoo filter, given MCF i and MCF j , an empty MCF, denoted as MCF o , will be enabled, and its capacity can be calculated as: m o b o =α|S i ∪S j |. The value of |S i ∪S j | can be determined by the number of element fingerprints stored in MCF i and MCF j . Then, the element fingerprints in MCF i and MCF j will be inserted into MCF o one by one. Therefore, the time complexity of the aggregation operation increases from O(mb) for isomorphic MCF to O(m i b i +m j b j ).

当所述同步参与方动态地加入和离开时:When the sync participants join and leave dynamically:

当参与同步的同步参与方离开时,若离开的所述同步参与方在聚合路径 中非最小生成树的叶子结点,则所述最小生成树被分裂为多个子树,将所述 多个子树使用子树间最小边进行连接得到新的最小生成树;When the synchronization participant participating in the synchronization leaves, if the left synchronization participant is not a leaf node of the minimum spanning tree in the aggregation path, the minimum spanning tree is split into multiple subtrees, and the multiple subtrees are split into multiple subtrees. Use the minimum edge between subtrees to connect to obtain a new minimum spanning tree;

当有新的同步参与方加入时,选取所述新的同步参与方与已有同步参与 方之间的最小边,并将所述新的同步参与方经由所述最小边加入到已有最小 生成树中。When a new synchronization participant joins, select the minimum edge between the new synchronization participant and the existing synchronization participant, and add the new synchronization participant to the existing minimum generation via the minimum edge in the tree.

具体包括:Specifically include:

在同步组拓扑中,如果构建的MST即最小生成树的叶子节点离开同步 组,则不会对剩余部分MST产生影响,只需要忽略或者删除MCF中标志域 的相应比特即可。然而,当MST中非叶子节点离开同步组时,MST将被分 裂为多个子树,此种情况下,MCFsyn需要重新构建一个新的MST。这一问 题可以通过以下割集性质和推论解决。In the synchronization group topology, if the constructed MST, that is, the leaf node of the minimum spanning tree, leaves the synchronization group, it will not affect the remaining part of the MST, and only need to ignore or delete the corresponding bits of the flag field in the MCF. However, when the non-leaf nodes in the MST leave the synchronization group, the MST will be split into multiple subtrees. In this case, MCFsyn needs to rebuild a new MST. This problem can be solved by the following cut-set properties and inferences.

定理1(最小生成树的割集性质):让G(V,E)表示一个图,(X,V-X)表示图 G的一个割集,边e是连接该割集的所有边中代价最小的边,则图G的每个 最小生成树均包含边e。Theorem 1 (cutset property of minimum spanning tree): Let G(V,E) denote a graph, (X,V-X) denote a cutset of graph G, and edge e is the least expensive among all edges connecting the cutset edge, then every minimum spanning tree of graph G contains edge e.

推论1:让G(V,E)表示一个图,T表示图G的一个最小生成树,图

Figure RE-GDA0002571382940000131
是图G的一个联通子图,其中
Figure RE-GDA0002571382940000132
是T的一个子树,并且覆盖子图
Figure RE-GDA0002571382940000133
中所有的节点,则
Figure RE-GDA0002571382940000134
是图
Figure RE-GDA0002571382940000135
的一棵最小生成树。Corollary 1: Let G(V,E) denote a graph, T denote a minimum spanning tree of graph G, and the graph
Figure RE-GDA0002571382940000131
is a connected subgraph of graph G, where
Figure RE-GDA0002571382940000132
is a subtree of T and covers the subgraph
Figure RE-GDA0002571382940000133
all nodes in , then
Figure RE-GDA0002571382940000134
is a picture
Figure RE-GDA0002571382940000135
a minimum spanning tree of .

证明:考虑割集

Figure RE-GDA0002571382940000136
及最小生成树T,则根据定理1,可以得到:
Figure RE-GDA0002571382940000137
其中
Figure RE-GDA0002571382940000138
是T中覆盖子图
Figure RE-GDA0002571382940000139
节点的子树,而e则是连接
Figure RE-GDA00025713829400001310
Figure RE-GDA00025713829400001311
的边中 的最小代价边。如果
Figure RE-GDA00025713829400001312
不是子图
Figure RE-GDA00025713829400001313
的最小生成树,则它可以被拥有更小代价的 最小生成树代替,从而使得图G拥有比T代价更小的生成树。这便与T是图 G的最小生成树矛盾,从而证明了推论1的正确性。Proof: Consider Cut Sets
Figure RE-GDA0002571382940000136
and the minimum spanning tree T, then according to Theorem 1, we can get:
Figure RE-GDA0002571382940000137
in
Figure RE-GDA0002571382940000138
is the covering subgraph in T
Figure RE-GDA0002571382940000139
subtree of nodes, and e is the connection
Figure RE-GDA00025713829400001310
and
Figure RE-GDA00025713829400001311
The least-cost edge among the edges. if
Figure RE-GDA00025713829400001312
not a subgraph
Figure RE-GDA00025713829400001313
, then it can be replaced by a minimum spanning tree with a smaller cost, so that the graph G has a spanning tree with a lower cost than T. This contradicts that T is the minimum spanning tree of graph G, thus proving the correctness of Inference 1.

推论2:让完全图G(V,E)表示同步组拓扑,其最小生成树为T,当同步 参与方Pi离开同步组时,T可能被分裂为多个子树。通过将这些子树逐个使 用其子树间最小边连接并不产生环时,便能得到没有Pi的同步组的最小生成 树。Corollary 2: Let the complete graph G(V, E) represent the synchronization group topology, and its minimum spanning tree is T. When the synchronization participant Pi leaves the synchronization group, T may be split into multiple subtrees. By connecting these subtrees one by one using the minimum edge connection among their subtrees without generating a cycle, the minimum spanning tree of the synchronization group without Pi can be obtained.

推论2是定理1和推论1的天然结果,在此不再加以证明。基于该推论, MCFsyn通过将这些子树使用子树间最小边进行连接得到新的最小生成树。 当然,在添加这些边时,不允许给最小生成树中引入环。Corollary 2 is a natural consequence of Theorem 1 and Corollary 1, and will not be proved here. Based on this inference, MCFsyn obtains a new minimum spanning tree by connecting these subtrees using the minimum edges between subtrees. Of course, when adding these edges, it is not allowed to introduce cycles into the minimum spanning tree.

当有新的同步参与方加入同步组时,需要的操作则颇为简单。当Pn+1加 入同步组时,为了维护最小生成树,MCFsyn只需要选取Pn+1与已有同步方 之间的最小边,并将其加入到已有最小生成树中即可。此外,MCF数据结构 中需要引入额外一个比特位用于表征该同步方上集合Sn+1的元素隶属信息。When a new synchronization participant joins the synchronization group, the required operation is quite simple. When P n+1 joins the synchronization group, in order to maintain the minimum spanning tree, MCFsyn only needs to select the minimum edge between P n+1 and the existing synchronization party, and add it to the existing minimum spanning tree. In addition, an extra bit needs to be introduced into the MCF data structure to represent the element membership information of the set Sn +1 on the synchronization party.

S4:将所述全局集合发送至每个所述同步参与方:S4: Send the global set to each of the synchronization participants:

中央节点将聚合所得的全局sketch分发给每个同步方。如此一来,每个 同步方便知晓并集元素的信息。The central node distributes the aggregated global sketch to each synchronization party. In this way, each synchronization easily knows the information of the elements of the union.

中央节点将其子节点发送来的MCF聚合为全局MCF,作为集合并集的 sketch。该全局MCF,记为MCFo,表示了整个并集中元素的指纹信息和隶 属信息。如图2所示,MCFsyn将MCFo通过MST从中央节点分发给每个同 步参与方。The central node aggregates the MCF sent by its child nodes into a global MCF as a sketch of the union of sets. The global MCF, denoted as MCF o , represents the fingerprint information and membership information of the elements in the entire union. As shown in Figure 2, MCFsyn distributes MCF o from the central node to each synchronization participant through MST.

S5:令所述同步参与方遍历所述全局集合,确定差异元素:S5: Let the synchronization participants traverse the global set to determine the difference elements:

所述差异元素包括缺失元素和独有元素;所述同步参与方遍历所述全局 集合,将所述缺失元素的元素指纹及其隶属信息添加到缺失元素集合中,所 述独有元素的元素指纹及其隶属信息添加到独有元素集合中。The difference elements include missing elements and unique elements; the synchronization participant traverses the global set, and adds the element fingerprint of the missing element and its membership information to the missing element set, and the element fingerprint of the unique element and their affiliation information are added to the set of unique elements.

具体包括:Specifically include:

接收到MCFo之后,每个参与方尝试通过遍历MCFo来确定其缺失元素 和独有元素。如算法2给出了参与方Pi进行抽取的过程。对于存储在任意 存储槽中的元素指纹,如果存储槽的标志域中第i个比特位的值为0,则表 示该元素不属于集合Si。因此,该元素指纹及其隶属信息将被添加到集合

Figure BDA0002429088440000141
中(算法2中第6到第7行)。另外,如果只有第i个比特位的值为1,则该 元素只存在于集合Si,并将被加入到集合
Figure BDA0002429088440000142
中(算法2中第8到第9行)。 抽取操作的时间复杂度为O(mb),因为MCFo中所有存储槽都需要被遍历。After receiving MCF o , each participant attempts to determine its missing and unique elements by traversing MCF o . For example, Algorithm 2 gives the process of extracting the participant Pi . For the element fingerprint stored in any storage slot, if the value of the i-th bit in the flag field of the storage slot is 0, it means that the element does not belong to the set S i . Therefore, the element fingerprint and its membership information will be added to the collection
Figure BDA0002429088440000141
in (lines 6 to 7 in Algorithm 2). Also, if only the i -th bit has a value of 1, the element exists only in the set Si and will be added to the set
Figure BDA0002429088440000142
in (Lines 8 to 9 in Algorithm 2). The time complexity of the extraction operation is O(mb), because all the storage slots in the MCF o need to be traversed.

Figure RE-GDA0002571382940000141
Figure RE-GDA0002571382940000141

Figure RE-GDA0002571382940000151
Figure RE-GDA0002571382940000151

表2 算法2Table 2 Algorithm 2

S6:令所述同步参与方从所述全局集合中获取所述差异元素,完成多方 集合同步:S6: Make the synchronization participant obtain the difference element from the global set to complete the synchronization of the multi-party set:

对于某参与方Pi,其

Figure BDA0002429088440000152
中的独有元素将会被其推送给其他参与方。采用 广播的方式推送这些元素能节约时间,而沿着MST进行传输则能节约传输 开销。对于
Figure BDA0002429088440000153
中的缺失元素,如果其对应标志域中仅有一个比特位为1,则 该元素为该比特位对应参与方的独有元素。此时,Pi只需要等着接收来自元 素宿主的元素内容即可。相反,如果标志域中存在多个比特位为1,Pi将选 择那个与其在同步组拓扑中构成最小边权值的参与方作为该元素的内容发 送方完成同步。此种策略可以通过在每个同步参与方上维持一个偏好列表实现。在该列表中,边权值越低的邻居节点将具有更高的优先级,这样就选择 出了最优元素内容发送方。For a party P i , its
Figure BDA0002429088440000152
The unique elements in will be pushed to other parties by it. Pushing these elements by broadcasting saves time, and transmitting along the MST saves transmission overhead. for
Figure BDA0002429088440000153
For the missing element in , if only one bit in the corresponding flag field is 1, the element is the unique element of the party corresponding to the bit. At this point, Pi just needs to wait to receive the element content from the element host. On the contrary, if there are multiple bits in the flag field that are 1, Pi will select the party with the smallest edge weight in the synchronization group topology as the content sender of the element to complete the synchronization. Such a strategy can be implemented by maintaining a preference list on each synchronization participant. In this list, neighbor nodes with lower edge weights will have higher priority, thus selecting the optimal sender of element content.

基于同一发明构思,本发明还提供了一种多方集合同步装置,包括:Based on the same inventive concept, the present invention also provides a multi-party set synchronization device, including:

数据结构构建模块,被配置为构建标志布谷鸟滤波数据结构并定义所述 标志布谷鸟滤波的操作;a data structure building module configured to construct a flag cuckoo filter data structure and define the operation of the flag cuckoo filter;

本地元素表示模块,被配置为将每个同步参与方的本地元素使用所述标 志布谷鸟滤波分别进行表示,得到每个所述同步参与方的局部集合;a local element representation module, configured to represent the local elements of each synchronization participant using the sign cuckoo filter, respectively, to obtain a partial set of each of the synchronization participants;

集合聚合模块,被配置为将所述局部集合发送到中央节点进行聚合,得 到全局集合;a set aggregation module, configured to send the local set to the central node for aggregation to obtain a global set;

集合发送模块,被配置为将所述全局集合发送至每个所述同步参与方;a set sending module configured to send the global set to each of the synchronization participants;

差异元素确定模块,被配置为令所述同步参与方遍历所述全局集合,确 定差异元素;a difference element determination module, configured to make the synchronization participants traverse the global set to determine a difference element;

同步模块,被配置为令所述同步参与方从所述全局集合中获取所述差异 元素,完成多方集合同步。The synchronization module is configured to make the synchronization participant acquire the difference element from the global set to complete the synchronization of the multi-party set.

上述实施例的装置用于实现前述实施例中相应的方法,并且具有相应的 方法实施例的有益效果,在此不再赘述。The apparatuses in the foregoing embodiments are used to implement the corresponding methods in the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

基于同一发明构思,本发明还提供了一种电子设备,包括存储器、处理 器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述 处理器执行所述程序时实现如上述实施例所述的方法。Based on the same inventive concept, the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor implements the program when executing the program. The method as described in the above embodiment.

基于同一发明构思,本发明还提供了一种非暂态计算机可读存储介质, 其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机 指令用于使所述计算机执行如上述实施例所述的方法。Based on the same inventive concept, the present invention also provides a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute The method as described in the above embodiment.

需要说明的是,本发明实施例的方法可以由单个设备执行,例如一台计 算机或服务器等。本实施例的方法也可以应用于分布式场景下,由多台设备 相互配合来完成。在这种分布式场景的情况下,这多台设备中的一台设备可 以只执行本发明实施例的方法中的某一个或多个步骤,这多台设备相互之间 会进行交互以完成所述的方法。It should be noted that, the method in this embodiment of the present invention may be executed by a single device, such as a computer or a server. The method in this embodiment can also be applied to a distributed scenario, and is completed by the cooperation of multiple devices. In the case of such a distributed scenario, one device among the multiple devices may only perform one or more steps in the method of the embodiment of the present invention, and the multiple devices will interact with each other to complete all the steps. method described.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书 的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同 于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描 绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在 某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

图3示出了本实施例所提供的一种更为具体的电子设备硬件结构示意 图,该设备可以包括:处理器1010、存储器1020、输入/输出接口1030、 通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接 口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连 接。FIG. 3 shows a more specific schematic diagram of the hardware structure of an electronic device provided in this embodiment. The device may include: a processor 1010 , a memory 1020 , an input/output interface 1030 , a communication interface 1040 and a bus 1050 . The processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 realize the communication connection between each other within the device through the bus 1050.

处理器1010可以采用通用的CPU(Central Processing Unit,中央处理器)、 微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、 或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书 实施例所提供的技术方案。The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the technical solutions provided by the embodiments of this specification.

存储器1020可以采用ROM(Read Only Memory,只读存储器)、 RAM(Random AccessMemory,随机存取存储器)、静态存储设备,动态存储 设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过 软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码 保存在存储器1020中,并由处理器1010来调用执行。The memory 1020 may be implemented in the form of a ROM (Read Only Memory, read only memory), a RAM (Random Access Memory, random access memory), a static storage device, a dynamic storage device, and the like. The memory 1020 can store the operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1020 and invoked by the processor 1010 for execution.

输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。 输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设 备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、 各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1030 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various types of sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.

通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设 备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通 信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1040 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices. Wherein, the communication module can realize communication through wired means (such as USB, network cable, etc.), or can realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).

总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器 1020、输入/输出接口1030和通信接口1040)之间传输信息。Bus 1050 includes a path to transfer information between the various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输 入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中, 该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人 员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需 的组件,而不必包含图中所示的全部组件。It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in the specific implementation process, the device may also include necessary components for normal operation. other components. In addition, those skilled in the art can understand that, the above-mentioned device may also include only the components necessary to realize the solutions of the embodiments of the present specification, and need not include all the components shown in the figures.

本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动 媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、 数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限 于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器 (DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦 除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带, 磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可 以被计算设备访问的信息。The computer readable medium of this embodiment includes both persistent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性 的,并非旨在暗示本公开的范围(包括权利要求)被限于这些例子;在本发 明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组 合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多 其它变化,为了简明它们没有在细节中提供。Those of ordinary skill in the art should understand that the discussion of any of the above embodiments is only exemplary, and is not intended to imply that the scope of the present disclosure (including the claims) is limited to these examples; under the spirit of the present invention, the above embodiments or There may also be combinations between technical features in different embodiments, steps may be carried out in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

另外,为简化说明和讨论,并且为了不会使本发明难以理解,在所提供 的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的 电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本发明难以 理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是 高度取决于将要实施本发明的平台的(即,这些细节应当完全处于本领域技 术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本发明的 示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没有 这些具体细节的情况下或者这些具体细节有变化的情况下实施本发明。因 此,这些描述应被认为是说明性的而不是限制性的。Additionally, well known power/ground connections to integrated circuit (IC) chips and other components may or may not be shown in the figures provided in order to simplify illustration and discussion, and in order not to obscure the present invention. . Furthermore, devices may be shown in block diagram form in order to avoid obscuring the present invention, and this also takes into account the fact that the details regarding the implementation of these block diagram devices are highly dependent on the platform on which the invention will be implemented (i.e. , these details should be fully within the understanding of those skilled in the art). Where specific details (eg, circuits) are set forth to describe exemplary embodiments of the invention, it will be apparent to those skilled in the art that these specific details may be used without or with changes The present invention is carried out below. Accordingly, these descriptions are to be regarded in an illustrative rather than a restrictive sense.

尽管已经结合了本发明的具体实施例对本发明进行了描述,但是根据前 面的描述,这些实施例的很多替换、修改和变型对本领域普通技术人员来说 将是显而易见的。例如,其它存储器架构(例如,动态RAM(DRAM))可 以使用所讨论的实施例。Although the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications and variations to these embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures, such as dynamic RAM (DRAM), may use the discussed embodiments.

本发明的实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这 样的替换、修改和变型。因此,凡在本发明的精神和原则之内,所做的任何 省略、修改、等同替换、改进等,均应包含在本发明的保护范围之内。Embodiments of the present invention are intended to cover all such alternatives, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included within the protection scope of the present invention.

Claims (10)

1. A multi-party aggregation synchronization method, comprising:
constructing a mark cuckoo filtering data structure and defining the operation of the mark cuckoo filtering;
respectively representing the local elements of each synchronous participant by using the mark cuckoo filtering to obtain a local set of each synchronous participant;
sending the local set to a central node for aggregation to obtain a global set;
sending the global set to each of the synchronization participants;
enabling the synchronous participants to traverse the global set to determine difference elements;
and enabling the synchronous participants to acquire the difference elements from the global set to complete multi-party set synchronization.
2. The multi-party aggregation synchronization method according to claim 1, wherein the flag cuckoo filtering data structure specifically comprises:
filtering the isomorphic marker cuckoo: the unit cell comprises m unit cells, each unit cell comprises b storage grooves, and at most b element fingerprints are stored, and the element fingerprints can be stored in any one candidate unit cell; the storage slot has two fields including a fingerprint field for storing the element fingerprint information and a field for recording the element fingerprint informationA flag domain storing membership information of the element; the fingerprint field comprises f bits, and the mark field comprises n bits, wherein n is the number of sets participating in synchronization; providing two candidate cells for any element x, wherein indexes of the element x in the two candidate cells are respectively as follows: h is1(x) Hash (x)% m and h2(x)=hash(x)⊙(hash(ηx) % m), wherein ηxAn element fingerprint representing an element x, and a hash representing a hash function;
heterogeneous marker cuckoo filtering: the relationship between the number of cells and the capacity of each cell is specifically as follows: m isi=α|Si|/biWherein m isiAnd biRespectively representing the number of cells and the capacity of each cell, α is a constant, and the index of the element x in the two candidate cells is h1(x)=hash(ηx) % m and
Figure FDA0002429088430000011
3. the multi-party aggregation synchronization method according to claim 2, wherein the operation of flag cuckoo filtering specifically comprises:
element insertion: calculating an element fingerprint through a hash function, and then inserting the element fingerprint into the two candidate cells; reallocating when all slots in both of the candidate cells are occupied;
element query: traversing the two candidate cells, and returning the mark domain of the storage slot of the queried element when the queried element is found; otherwise, returning an error;
element deletion:
when an element is deleted from a collection: firstly, determining the candidate cell position of the element fingerprint of an element to be deleted, and returning an error if the element fingerprint of the element to be deleted does not exist or the ith bit of a mark field of a storage groove for storing the element fingerprint of the element to be deleted is 0; otherwise, the ith bit of the mark field of the storage slot for storing the element fingerprint of the element to be deleted is inverted to 0; if all bits of the flag field of the storage slot for storing the element fingerprint of the element to be deleted are 0, deleting the element fingerprint of the element to be deleted;
when an element is deleted from all sets: deleting the element fingerprints of the elements to be deleted and setting all the positions of the mark fields of the storage slots where the element fingerprints of the elements to be deleted are located as 0;
filtering and aggregating a plurality of mark cuckoos: given two flag cuckoo filter vectors MCFiAnd MCFjAdding membership information of common elements in the two vectors to the MCF vectoriIn, MCFjThe difference element in (3) is inserted into the MCFiIn (1).
4. The multi-party aggregation synchronization method according to claim 3, wherein the reallocating specifically comprises:
when all storage grooves in the two candidate cells are occupied, randomly kicking out a stored element fingerprint to obtain an empty storage groove, and then storing the element fingerprint to be stored and the membership information of the element fingerprint to be stored in the empty storage groove; the kicked-out element fingerprint will be reassigned to another candidate cell; ending when no additional element fingerprints are kicked out or the number of reassignments reaches a given upper limit.
5. The multi-party aggregation synchronization method according to claim 4, wherein the sending the local aggregation to a central node for aggregation specifically comprises:
selecting one of the synchronous participants with the maximum node degree as a central node;
filtering the isomorphic marker cuckoo: selecting an aggregation path through a minimum spanning tree algorithm, and sending the local set to a central node through the aggregation path for aggregation;
the heterogeneous marker cuckoo filtering: constructing an empty-mark cuckoo filter, wherein the capacity of the empty-mark cuckoo filter is determined by the number of element fingerprints of a local set to be aggregated; and inserting the element fingerprints in the local set to be aggregated into the empty marker cuckoo filtering one by one.
6. The multi-party collection synchronization method of claim 5, wherein the traversing the global collection by the synchronization participants to determine the difference element specifically comprises:
the difference elements comprise missing elements and unique elements;
and the synchronous participants traverse the global set, add the element fingerprints and the membership information of the missing elements into the missing element set, and add the element fingerprints and the membership information of the unique elements into the unique element set.
7. The multi-party aggregation synchronization method of claim 6, wherein, when the synchronization participants join and leave dynamically:
when a synchronous participant participating in synchronization leaves, if the synchronous participant leaves and is not a leaf node of the minimum spanning tree in an aggregation path, splitting the minimum spanning tree into a plurality of subtrees, and connecting the subtrees by using minimum edges among the subtrees to obtain a new minimum spanning tree;
when a new synchronous participant joins, selecting a minimum edge between the new synchronous participant and an existing synchronous participant, and joining the new synchronous participant into an existing minimum spanning tree through the minimum edge.
8. A multi-party aggregation synchronization apparatus, comprising:
a data structure construction module configured to construct a signature cuckoo filtering data structure and define operations of the signature cuckoo filtering;
a local element representation module configured to represent the local element of each synchronization participant by using the marker cuckoo filtering, so as to obtain a local set of each synchronization participant;
the set aggregation module is configured to send the local set to a central node for aggregation to obtain a global set;
a set sending module configured to send the global set to each of the synchronization participants;
a difference element determination module configured to cause the synchronization participant to traverse the global set, determining a difference element;
and the synchronization module is configured to enable the synchronization participants to acquire the difference elements from the global set so as to complete multi-party set synchronization.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202010230302.3A 2020-03-27 2020-03-27 Multi-party collection synchronization method, device and electronic device Active CN111538865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010230302.3A CN111538865B (en) 2020-03-27 2020-03-27 Multi-party collection synchronization method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010230302.3A CN111538865B (en) 2020-03-27 2020-03-27 Multi-party collection synchronization method, device and electronic device

Publications (2)

Publication Number Publication Date
CN111538865A true CN111538865A (en) 2020-08-14
CN111538865B CN111538865B (en) 2023-06-02

Family

ID=71974826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010230302.3A Active CN111538865B (en) 2020-03-27 2020-03-27 Multi-party collection synchronization method, device and electronic device

Country Status (1)

Country Link
CN (1) CN111538865B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579311A (en) * 2022-03-04 2022-06-03 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for executing distributed computing task

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
CN108804226A (en) * 2018-05-28 2018-11-13 中国人民解放军国防科技大学 A Graph Segmentation Method for Distributed Graph Computing
US20190065557A1 (en) * 2017-08-31 2019-02-28 David Boles Reducing probabilistic filter query latency
CN109416694A (en) * 2016-07-11 2019-03-01 微软技术许可有限责任公司 The key assignments storage system effectively indexed including resource
CN109819030A (en) * 2019-01-22 2019-05-28 西北大学 A pre-scheduling method of data resources based on edge computing
CN110222088A (en) * 2019-05-20 2019-09-10 华中科技大学 Data approximation set representation method and system based on insertion position selection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
CN109416694A (en) * 2016-07-11 2019-03-01 微软技术许可有限责任公司 The key assignments storage system effectively indexed including resource
US20190065557A1 (en) * 2017-08-31 2019-02-28 David Boles Reducing probabilistic filter query latency
CN108804226A (en) * 2018-05-28 2018-11-13 中国人民解放军国防科技大学 A Graph Segmentation Method for Distributed Graph Computing
CN109819030A (en) * 2019-01-22 2019-05-28 西北大学 A pre-scheduling method of data resources based on edge computing
CN110222088A (en) * 2019-05-20 2019-09-10 华中科技大学 Data approximation set representation method and system based on insertion position selection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
L LUO: "The Consistent Cuckoo Filter" *
蒋捷;杨仝;张梦瑜;代亚非;黄亮;郑廉清;: "DCuckoo:基于片内摘要的高性能散列表" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579311A (en) * 2022-03-04 2022-06-03 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for executing distributed computing task
CN114579311B (en) * 2022-03-04 2023-05-30 北京百度网讯科技有限公司 Method, device, equipment and storage medium for executing distributed computing task

Also Published As

Publication number Publication date
CN111538865B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
US8982709B2 (en) Selecting service nodes for an end-to-end service path from a reduced search space
RU2400806C2 (en) Organisation of mating requests for resource with according resources
WO2018112805A1 (en) Blockchain storage method and device, and node device
CN112235420B (en) Data synchronization method, system and related equipment based on block chain
CN106170968B (en) A data compression storage method, device, and distributed file system
WO2022242361A1 (en) Data download method and apparatus, computer device and storage medium
US20200267081A1 (en) Forwarding Table Management
US9135833B2 (en) Process for selecting compressed key bits for collision resolution in hash lookup table
JP4506387B2 (en) Information communication system, node device, overlay network forming method, etc.
CN111538865B (en) Multi-party collection synchronization method, device and electronic device
CN109698814A (en) Botnet finds that method and Botnet find device
CN104063377B (en) Information processing method and use its electronic equipment
KR101944594B1 (en) Server for distributed network and method of consensus thereof
CN115086226B (en) A method and system for establishing anonymous links in anonymous networks
CN113966602A (en) Distributed storage of blocks in a blockchain
CN108733805B (en) File interaction method, system, computer device and storage medium
CN101521597B (en) Data statistical approach and system of mixed P2P network
CN110291765B (en) Use network topology to localize traffic
CN115499362B (en) IP configuration information management method and device and electronic equipment
CN114866514B (en) Multi-user data flow control and processing method, device, equipment and medium
CN111342927B (en) Time synchronization processing method and device
CN117579542A (en) Anycast method based on bit index screening and related equipment
CN116915627A (en) Network health determination method, device and readable storage medium
HK40038159B (en) Method and system for synchronizing data based on blockchain and relevant device
HK40038159A (en) Method and system for synchronizing data based on blockchain and relevant device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant