CN113297210A

CN113297210A - Data processing method and device

Info

Publication number: CN113297210A
Application number: CN202110189328.2A
Authority: CN
Inventors: 汪晟; 孙园园; 李飞飞; 黎火荣
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2021-08-24

Abstract

The embodiment of the specification provides a data processing method and a data processing device, wherein the data processing method comprises the steps of receiving a data processing request, wherein the data processing request carries encrypted target data; decrypting the encrypted target data, and determining a target index node corresponding to the target data based on keywords of the target data obtained after decryption; determining an adjustment index node in a first cache unit based on a preset strategy under the condition that the target index node is determined not to have the first cache unit and the first cache unit reaches a preset cache threshold; and deleting the adjustment index node in the first cache unit, and writing the target index node into the first cache unit to realize data processing of the target data.

Description

Data processing method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium.

Background

The cloud platform has a sound safety protection capability, and can ensure that data stored in the database by a user cannot be leaked due to external attack. However, since the data is visible to the cloud platform, the user still has a concern about the security of the data of the user, and there is a fear that an internal operation and maintenance person may access the data without authorization or steal the user data. Based on these concerns of the user, some databases employ data storage in the form of data encryption.

At present, an encryption database directly stores ciphertext with keywords as granularity in an index, when comparison operation is to be executed in each query, an index node corresponding to the keyword needs to be loaded into a memory, then elements in the index node and the keyword to be queried are sequentially placed into an Enclave (a trusted memory area in an SGX, codes and data existing in the area cannot be leaked or maliciously tampered) for decryption and comparison, and then a comparison result is returned, so that frequent Enclave interaction causes large additional overhead.

Therefore, it is urgently needed to provide a data processing method capable of reducing the interaction overhead of Enclave and other memories in the execution operation process.

Disclosure of Invention

In view of this, the present specification provides a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium to address technical deficiencies in the prior art.

According to a first aspect of embodiments herein, there is provided a data processing method including:

receiving a data processing request, wherein the data processing request carries encrypted target data;

decrypting the encrypted target data, and determining a target index node corresponding to the target data based on keywords of the target data obtained after decryption;

determining an adjustment index node in a first cache unit based on a preset strategy under the condition that the target index node is determined not to have the first cache unit and the first cache unit reaches a preset cache threshold;

and deleting the adjustment index node in the first cache unit, and writing the target index node into the first cache unit to realize data processing of the target data.

According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:

the device comprises a request receiving module, a data processing module and a data processing module, wherein the request receiving module is configured to receive a data processing request, and the data processing request carries encrypted target data;

the target index node obtaining module is configured to decrypt the encrypted target data and determine a target index node corresponding to the target data based on keywords of the target data obtained after decryption;

an adjustment index node determining module configured to determine an adjustment index node in a first cache unit based on a preset policy when it is determined that the target index node does not have the first cache unit and the first cache unit reaches a preset cache threshold;

and the data processing module is configured to delete the adjustment index node in the first cache unit and write the target index node into the first cache unit so as to realize data processing on the target data.

According to a third aspect of the embodiments of the present specification, there is provided an encrypted database, where the database employs a B + tree encryption index, and the B + tree encryption index includes an EBuffer layer, the EBuffer layer includes a first cache unit, and each slot of the first cache unit includes an index node of a plaintext, where,

the encryption database receives a data processing request carrying encrypted target data, decrypts the encrypted target data, and determines a target index node corresponding to the target data based on keywords of the target data obtained after decryption; determining an adjustment index node in a first cache unit based on a preset strategy under the condition that the target index node is determined not to have the first cache unit and the first cache unit reaches a preset cache threshold; and deleting the adjustment index node in the first cache unit, and writing the target index node into the first cache unit to realize data processing of the target data.

According to a fourth aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is for storing computer-executable instructions and the processor is for executing the computer-executable instructions, which when executed by the processor implement the steps of the data processing method.

According to a fifth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data processing method.

One embodiment of the present specification implements a data processing method and apparatus, where the data processing method includes receiving a data processing request, where the data processing request carries encrypted target data; decrypting the encrypted target data, and determining a target index node corresponding to the target data based on keywords of the target data obtained after decryption; determining an adjustment index node in a first cache unit based on a preset strategy under the condition that the target index node is determined not to have the first cache unit and the first cache unit reaches a preset cache threshold; and deleting the adjustment index node in the first cache unit, and writing the target index node into the first cache unit to realize data processing of the target data. Specifically, each time a data processing request is received and a target index node is accessed, the index node with a small historical access frequency of the first cache unit is deleted through a preset strategy, and then the target index node is written into the first cache unit.

Drawings

FIG. 1 is a flow chart of a data processing method provided by an embodiment of the present description;

fig. 2 is a schematic structural diagram of an index node in a data processing method according to an embodiment of the present specification;

fig. 3 is a schematic diagram of an application architecture of a data processing method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;

fig. 5 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

TEE: trusted Execution Environment. By providing a secure execution environment that is isolated from the outside world, its internal code and data are protected from leakage or malicious tampering.

SGX: software Guard Extensions, a set of secure instruction sets provided by intel CPUs, are a technical implementation of TEE that can provide a secure, isolated execution environment for applications.

Enclave: and the SGX is a trusted memory area, and code and data existing in the area cannot be leaked or maliciously tampered.

B + tree: the node is a variant of a B tree, non-leaf nodes of the B tree only store key value information, a chain pointer is arranged between all leaf nodes, and data records are stored in the leaf nodes.

LRU strategy: the LRU algorithm means a least recently used algorithm, which means that the LRU considers data that has been used recently and the probability of being accessed in the future is high, and the data that has not been accessed recently means the probability of being accessed later is low.

In order to ensure the security of data, the database query can be realized by adopting a full-encryption database, namely the full-encryption database aims to eliminate the hidden danger of data leakage when the database runs, ensure that the data exists in an encryption mode in the whole process of a server and simultaneously have complete database query capability.

Currently, some cryptographic databases support cryptographic database query capabilities by implementing forms of operation operators in the TEE in order to implement minimal changes on a traditional basis. The B + tree is used as a default index of an encryption database, keywords in the nodes are stored in a ciphertext mode, when the keywords need to be compared in query, the corresponding nodes are loaded into a memory, then elements in the nodes and the keywords to be queried are sequentially placed into an Enclave for decryption and comparison operation, comparison results are returned, and large extra expenses are brought by frequent Enclave interaction. Meanwhile, the encryption mode using keywords as granularity has large storage amplification and potential safety hazard, elements in each node are ciphertext storage, but a large amount of metadata needs to be maintained, and the size relationship between the nodes (the pointer of the upper node is not encrypted) and the size relationship of the elements in the nodes are revealed.

In view of this, in the present specification, a data processing method is provided, and the present specification simultaneously relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

Referring to fig. 1, fig. 1 shows a flowchart of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.

Step 102: receiving a data processing request, wherein the data processing request carries encrypted target data.

In specific implementation, the data processing method is an encryption index method of a B + tree based on an Enable-Native, and the index structure of the data processing method is the logic structure of the B + tree.

Specifically, the data processing request includes, but is not limited to, a data read request, a data write request, and the like, and both the data read request and the data write request need to implement reading and writing of data by accessing a target index node of target data.

In practical application, the data processing method is described in detail by taking a database engine as an execution main body, so that the data processing request is received, and the data processing request can be understood as a data read/write request received by the database engine, and the data read/write request carries encrypted target data, wherein the target data can be data in any format and in any type, such as student achievements, employee information and the like. This is not intended to be limiting in this specification.

The data processing request may be any type of data processing request, for example, a data processing request in SQL.

In the embodiment of the present specification, before receiving a data processing request, the data processing method implements data storage based on an index structure of an envelope-Native B + tree, and includes a three-layer storage architecture: the system comprises a first cache unit, a second cache unit and a third storage unit, wherein the first cache unit is a cache manager in an envelope and is used for managing data transmission between an unprotected host memory and a protected envelope memory by taking a node as granularity, and each cache slot of the first cache unit comprises an index node of a plaintext; the second cache unit is a cache manager in the unprotected host memory and is used for managing data transmission between the unprotected host memory and external storage by taking a data page as a unit, each cache slot of the second cache unit comprises a data page, and each data page comprises a plurality of encrypted index nodes; and the third storage unit is a persistent storage disk and is used for storing index data and continuously using the index after the reloading is supported.

Specifically, the difference between the first cache unit and the second cache unit is the encryption state, the location and the size of the index node. For example, the first cache unit is an inode that manages plaintext in a limited Enclave memory, while the second cache unit manages data pages in a large unprotected host memory. And the first cache unit and the second cache unit have the same structure.

For example, if one wants to write a piece of data: the math number 1 is 90 minutes, after the database engine receives the encrypted data, the database engine decrypts the encrypted data, and then according to the Key: learning number 1 to find the corresponding index node; if the index is empty, an index node is created in the first cache unit as a root node, and then the piece of data is inserted into the index node, when the quantity of data to be written is more and more, the capacity of the first cache unit is limited, in the case of exceeding the capacity of the first cache unit, some index nodes can be moved to the data page of the second cache unit in an encrypted manner, and in the case of exceeding the capacity of the second cache unit, some data pages can be moved to the third storage unit in an encrypted manner. And the data page of the second cache unit consists of a plurality of encrypted index nodes.

Step 104: and decrypting the encrypted target data, and determining a target index node corresponding to the target data based on the keywords of the target data obtained after decryption.

Specifically, after receiving a data processing request, the database engine loads encrypted target data carried in the data processing request into Enclave created by the SGX for decryption, and determines a target index node corresponding to the target data based on a keyword of the target data obtained after decryption.

For example, the target data is: if the "1" is math 90 points, the keyword in the target data is "1" and the target inode corresponding to the target data is determined based on the "1" at this time.

In specific implementation, the first cache unit comprises a cache pool for storing the index nodes, a cache description layer for storing the index node metadata and a hash mapping layer;

correspondingly, after determining the target inode corresponding to the target data based on the keyword of the target data obtained after decryption, the method further includes:

determining, based on a hash mapping layer of the first cache unit, a location of the target index node in a cache description layer of the first cache unit if it is determined that the target index node has a cache pool of the first cache unit;

and acquiring metadata of the target index node based on the position of the target index node in a cache description layer of the first cache unit, and realizing data processing on the target data based on the metadata of the target index node.

The cache pool is a simple array, and each slot of the cache pool stores an index node in the first cache unit. The cache description layer is an array for storing metadata, each slot in the cache description layer corresponds to each slot in the cache pool in a one-to-one manner, and stores the metadata of the index node corresponding to each slot, and the metadata comprises a bidirectional pointer, which is used for maintaining the logic sequence of each slot to form a bidirectional linked list, thereby supporting the LRU replacement policy described below. The hash mapping layer is a hash table and is used for judging whether an index node required to be accessed is in a cache pool, in addition, address conversion is required among adjacent layers of a first cache unit, a second cache unit and a third storage unit in a storage architecture, so that an upper layer representation is converted into a lower layer identifier, namely a node identifier of the index node is converted into a page identifier and an offset of a data page, and the page identifier of the data page is converted into a physical address.

Specifically, when it is determined that the target index node has the cache pool of the first cache unit based on the hash mapping layer of the first cache unit, the location of the target index node on the cache description layer of the first cache unit is determined based on the hash mapping layer of the first cache unit, then the metadata of the target index node is obtained based on the location of the target index node on the cache description layer of the first cache unit, and finally the data processing on the target data is realized based on the metadata of the target index node.

Referring to fig. 2, fig. 2 is a schematic structural diagram illustrating an index node in a data processing method according to an embodiment of the present disclosure.

As can be seen from fig. 2, the index node in the first cache unit is composed of metadata, keywords, and branches/values, where the metadata includes a current node unique identifier, a parent node identifier, a previous node identifier, a next node identifier, a type, and a number of keywords; key words: i.e., the key of the data; branching: that is, the id of the node where the key is located in a certain range exists in the intermediate node; the value: exists in the leaf node, namely the value corresponding to the key word; marking: namely index node id; type (2): intermediate nodes are either leaf nodes (last layer).

In specific implementation, after the metadata of the target inode is acquired, for example, the key and the value of the target inode, the target data is read or written based on the key and the value of the target inode.

In this embodiment of the present description, when it is determined that a target index node has a first cache unit, the target index node may be directly found in the first cache unit, and data reading and writing of target data are implemented based on the target index node, without performing data interaction with a second cache unit and/or a third cache unit, so that a data processing flow is saved.

Step 106: and determining an adjustment index node in the first cache unit based on a preset strategy under the condition that the target index node is determined not to have the first cache unit and the first cache unit reaches a preset cache threshold value.

For example, if the first cache unit is 4k, the preset cache threshold may be set to 4k, and if it is determined that the target index node is not in the first cache unit and the first cache unit reaches 4k, in order to ensure that the target index node can be written into the first cache unit for data processing, an adjustment index node in the first cache unit needs to be determined based on a preset policy at this time, so as to implement subsequent deletion of the adjustment index node, and prepare for writing the target index node into the first cache unit.

Specifically, the determining an index adjustment node in the first cache unit based on a preset policy includes:

and determining an adjusted index node in the first cache unit based on the historical access record of the index node in the first cache unit.

Wherein, the historical access records include, but are not limited to, historical access times, historical access time and the like; and the preset policy may be an LRU replacement policy.

Taking a historical access record as an example of historical access times, when the adjustment index node in the first cache unit is determined, the adjustment index node in the first cache unit is determined based on the historical access times of each index node in the first cache unit; or determining the adjusted index node in the first cache unit based on the historical access time of each index node in the first cache unit. For example, an index node with a smaller historical access time in the first cache unit is used as the adjustment index node, or an index node with a longer historical access time in the first cache unit is used as the adjustment index node, and so on.

In the embodiment of the present specification, index nodes with a small number of historical accesses or a long historical access time in the first cache unit are deleted as adjustment index nodes to make a write space for target index nodes, and in this way, index nodes in the first cache unit are all index nodes with frequent accesses, so that when a data access request is received next time, it is highly possible that a corresponding target index node is in the first cache unit, and at this time, the target index node can be directly accessed from the first cache unit, so that data processing is achieved, and frequent interaction with other layers is avoided.

Step 108: and deleting the adjustment index node in the first cache unit, and writing the target index node into the first cache unit to realize data processing of the target data.

Specifically, after the adjustment index node in the first cache unit is determined, the adjustment index node in the first cache unit is deleted, and then the target index node is written into the first cache unit, so that reading or writing of target data and the like is realized based on the target index node.

In specific implementation, the writing the target inode into the first cache unit includes:

under the condition that the target index node is determined to have a second cache unit, determining the page identification and the offset of the data page of the target index node in the second cache unit based on the node identification of the target index node;

and acquiring the target index node which is stored in an encrypted manner in the second cache unit based on the page identifier and the offset of the data page, decrypting the target index node which is stored in the encrypted manner and writing the decrypted target index node into the first cache unit.

Specifically, each data page in the second cache unit includes a plurality of index nodes in the first cache unit, so that the index node in each data page of the second cache unit needs to consider an offset of the index node in the data page when corresponding to the index node in the first cache unit. For example, each data page in the second cache unit includes an inode in 4 first cache units, and the node identification (i.e., nodeid) of the target inode in the first cache unit is 10, then it takes an integer of 2 down to the page identification (i.e., pageid) of the data page in the second cache unit of 10/4. The offset is the number of data pages of the target inode in its corresponding second cache unit.

In practical application, the offset and the number of the identifier are all from 0, the page identifier of the data page of the second cache unit is 0, and the index nodes of the first cache unit contained in the data page are 0, 1, 2 and 3; the page identifier of the data page is 1, and the index nodes of the first cache unit contained in the data page are 4, 5, 6 and 7; the page identification of the data page is 2, and the index nodes of the first cache unit contained in the data page are 8, 9, 10 and 11. Therefore, it can be determined that in the case where the node identification of the target inode in the first cache unit is 10, the page identification of the target inode in the data page of the second cache unit is 2, and the offset is 2 nd of the data page of the target inode in the page identification of the second cache unit is 2. That is, for a page with a pageid of 2, the contained node ids are 8, 9, 10, and 11, respectively, and the offset offsets of these 4 nodes are 0, 1, 2, and 3, respectively. The pageid and offset are calculated by nodeid, and it is known from which position of which data page in the second cache unit the node is located.

In specific implementation, if the target index node is determined to be in the second cache unit based on the node identifier of the target index node, the page identifier and the offset of the data page corresponding to the target index node in the second cache unit are obtained through calculation based on the node identifier of the target index node, then the target index node stored in an encrypted manner is obtained in the second cache unit based on the page identifier and the offset, and finally the target index node stored in the encrypted manner is decrypted and loaded to the first cache unit.

In the embodiment of the present specification, all index nodes that need to be accessed need to be taken to the first cache unit to be decrypted into plaintext for access, if a target index node to be accessed is in the first cache unit, the target index node can be directly accessed, if the target index node to be accessed is in the second cache unit, the target index node needs to be loaded from the second cache unit to the first cache unit based on the node identifier of the target index node to be decrypted into plaintext for access, and an Enclave limited memory is fully utilized to achieve access of the index nodes, thereby saving interaction overhead.

In specific implementation, the second cache unit comprises a cache pool for storing the data page, a cache description layer for storing metadata of the data page and a hash mapping layer;

correspondingly, the determining the page identifier and the offset of the data page of the target inode in the second cache unit based on the node identifier of the target inode includes:

and determining the page identifier and the offset of the data page of the target index node in the cache pool of the second cache unit based on the node identifier of the target index node and the hash mapping layer of the second cache unit.

In practical application, the first cache unit and the second cache unit have the same structure and respectively include a cache pool, a cache description layer and a hash mapping layer, and the difference is that the cache pool, the cache description layer and the hash mapping layer of the first cache unit are different from the cache pool, the cache description layer and the hash mapping layer of the second cache unit in storage content. Specifically, in the second cache unit, the cache pool is also a simple array, and each slot of the cache pool stores one data page in the second cache unit. The cache description layer is an array for storing metadata, each slot in the cache description layer corresponds to each slot in the cache pool one by one and stores the metadata of the corresponding data page, and the metadata includes a bidirectional pointer, which is used for maintaining the logic sequence of each slot to form a bidirectional linked list, thereby supporting the LRU replacement policy described below. The hash mapping layer is a hash table and is used for judging whether a data page needing to be accessed is in a cache pool, in addition, address conversion is needed among adjacent layers of a first cache unit, a second cache unit and a third storage unit in a storage architecture, so that an upper layer representation is converted into a lower layer identifier, namely a node identifier of an index node is converted into a page identifier and an offset of the data page, and the page identifier of the data page is converted into a physical address.

Specifically, when it is determined that the target inode has the cache pool of the second cache unit based on the node identifier of the target inode, the page identifier and the offset of the data page of the target inode in the cache pool of the second cache unit are determined based on the node identifier of the target inode and the hash mapping layer of the second cache unit.

In this embodiment of the present specification, since the hash mapping layer of the second cache unit records the address translation information between the first cache unit and the second cache unit, the page identifier and the offset of the data page corresponding to the node identifier of the target index node can be quickly and accurately obtained based on the hash mapping layer of the second cache unit and the node identifier of the target index node.

In addition, the acquiring, at the second cache unit, the target inode that is stored in an encrypted manner based on the page identifier and the offset of the data page, and writing the target inode that is stored in the encrypted manner into the first cache unit after decrypting the target inode, includes:

acquiring the target index node which is stored in a cache pool of the second cache unit in an encrypted manner based on the page identifier and the offset of the data page;

determining the position of the target index node stored in an encrypted manner in a cache description layer of the second cache unit;

based on the position of the target index node in the cache description layer of the second cache unit, which is stored in an encrypted manner, acquiring metadata of the target index node, which is stored in an encrypted manner;

and decrypting the encrypted and stored target index node, and writing the decrypted target index node into the first cache unit based on the decrypted metadata of the target index node.

Since the second cache unit is an untrusted memory, the index nodes stored in the second cache unit are all encrypted index nodes.

In specific implementation, after acquiring the page identifier and the offset of the data page of the target index node in the cache pool of the second cache unit, acquiring the target index node stored in an encrypted manner from the cache pool of the second cache unit based on the page identifier and the offset of the data page, and determining the position of the target index node stored in the encrypted manner in the cache description layer of the second cache unit based on the corresponding relationship between the cache pool and the cache description layer; then, acquiring the metadata of the target index node stored in an encrypted manner from the position; and finally, decrypting the encrypted and stored target index node and the metadata, and writing the decrypted target index node into the first cache unit based on the encrypted metadata (such as the identification of the father node, the key, the value and the like in the target index node) of the target index node. And writing the decrypted target index node into a first cache unit of the parent node through the parent node identification in the decrypted metadata of the target index node.

In this embodiment of the present description, in a case that a target index node is determined in a second cache unit, the target index node is acquired from a data page of the second cache unit, and then the target index node is decrypted and loaded to a first cache unit, so that the target index node is accessed in a plaintext manner in the first cache unit, and operation complexity of accessing the target index node is avoided.

In another embodiment of the present specification, the writing the target inode into the first cache unit includes:

determining a data page containing the target inode stored in encrypted form from a third storage unit in the case where the target inode is determined to have the third storage unit;

writing a data page containing the target index node stored in an encrypted manner into the second cache unit;

In practical application, the third storage unit is a persistent storage disk and is used for storing encrypted index data in a data page, under the condition that the third storage unit exists in a target index node based on a node identifier of the target index node, the data page containing the target index node in encrypted storage is determined from the third storage unit, then the data page is written into the second cache unit, and then the page identifier and the offset of the data page of the target index node in the second cache unit are determined based on the node identifier of the target index node; and finally, the target index node which is stored in an encrypted manner is obtained in the second cache unit based on the page identification and the offset of the data page, and the target index node which is stored in the encrypted manner is decrypted and then loaded to the first cache unit.

In specific implementation, if a target index node to be accessed is in the third storage unit, the data page where the target index node is located needs to be loaded to the second cache unit, and then the target index node is loaded to the first cache unit from the second cache unit, so that the plaintext target index node can be acquired, and the target index node is accessed.

In practical applications, when a data processing request is received and access of a target inode is executed, there are three cases: firstly, the method comprises the following steps: the target index node to be accessed is already stored in the first cache unit, the position of the metadata of the target index node in the cache description layer is directly obtained through the calculation of the hash mapping layer of the first cache unit, and the metadata information of the target index node is updated, for example, the historical access times of the target index node is added with 1, the value data of the target index node is updated based on the target data in the data processing request, and the like. Secondly, the method comprises the following steps: and if the target index node to be accessed is in the second cache unit but not in the first cache unit, adding a new slot in the first cache unit to accommodate the target index node to be accessed, calculating the page identifier and the offset of the data page where the target index node is located based on the node identifier of the target index node, finding the target index node in the second cache unit through the calculated page identifier and offset, decrypting and writing the target index node into the added slot of the first cache unit, updating the metadata of the data page corresponding to the target index node of the second cache unit, and then executing the first condition to access the target index node. Thirdly, the method comprises the following steps: and the target index node to be accessed is in the third storage unit, not in the second cache unit and the first cache unit, firstly, a new slot is added in the second cache unit to accommodate the data page containing the target index node, the data page is loaded from the third storage unit to the new slot of the second cache unit, and then the second condition is executed to realize the access of the target index node.

In practical application, the main execution logic of the first cache unit, the second cache unit and the third storage unit is in the archive, so as to reduce the interaction overhead of the archive and other untrusted memories, when performing index node query, the index node is traversed from a B + tree root node (the first cache unit) to a corresponding leaf node (the third storage unit), and when any index node is not in the archive, it needs to load the index node from the second cache unit or the third storage unit through an OCall.

In one embodiment, an index node may be kicked out of the first cache unit to make room for other index nodes. Then, the kicked index node is the above adjustment index node, and when the adjustment index node is removed from the first cache unit, in order to avoid the loss of the index node, the adjustment index node needs to be written back to the second cache unit, and the specific implementation manner is as follows:

after deleting the adjusted inode in the first cache unit, the method further includes:

determining whether the adjustment inode is modified,

if so, encrypting and writing back the adjustment index node to the data page of the second cache unit corresponding to the adjustment index node under the condition that the adjustment index node has the corresponding data page of the second cache unit, and updating the historical access times of the data page of the second cache unit corresponding to the adjustment index node,

and if not, updating the historical access times of the data page of the second cache unit corresponding to the adjustment index node.

Specifically, due to the limited space of the first cache unit, when the slots of the cache pool of the first cache unit are all full and the target index node is not in the first cache unit, one index node which is not accessed currently is selected as the adjustment index node through the LRU policy to be deleted from the first cache unit. If the adjustment index node is not modified, updating the historical access times of the data page of the second cache unit corresponding to the adjustment index node, for example, directly deleting the adjustment index node from the first cache unit and subtracting 1 from the access times of the data page where the adjustment index node is located in the second cache unit; and if the adjustment index node is modified, under the condition that the data page corresponding to the adjustment index node has a second cache unit, encrypting and writing the adjustment index node back to the data page corresponding to the adjustment index node, and updating the historical access times of the data page of the second cache unit corresponding to the adjustment index node.

If the data page corresponding to the adjustment index node does not have the second cache unit, the method includes two cases, one is that the data page corresponding to the adjustment index node has the third storage unit, and the other is that the data page corresponding to the adjustment index node does not exist, and new data page creation is required, and the specific implementation manner is as follows:

after the determining that the adjustment index node is modified, the method further includes:

loading the data page of the second cache unit corresponding to the adjustment index node, which is acquired from the third storage unit, to the second cache unit under the condition that the adjustment index node is the newly added index node and the adjustment index node does not have the corresponding data page of the second cache unit;

and encrypting and writing back the data page of the second cache unit corresponding to the adjustment index node, and updating the historical access times of the data page of the second cache unit corresponding to the adjustment index node.

Specifically, when it is determined that the adjustment index node is the newly added index node, but the data page of the second cache unit corresponding to the adjustment index node does not have the second cache unit but has a third storage unit, the data page corresponding to the adjustment index node is acquired from the third storage unit, written into the second cache unit, and encrypted and written back to the data page of the second cache unit.

In practical application, the page identifier of the corresponding data page may be calculated by adjusting the node identifier of the index node, and then the page identifier of the data page is compared with the largest page identifier in the third storage unit, if the page identifier of the data page is less than or equal to the largest page identifier, it may be determined that the data page is in the third storage unit, and if the page identifier of the data page is less than the largest page identifier, it may be determined that the data page is not in the third storage unit.

In this embodiment of the present description, after the adjustment index node in the first cache unit is deleted, in order to avoid the loss of the adjustment index node, if the data page of the adjustment index node does not exist in the second cache unit, the data page is loaded from the third storage unit, and then the adjustment index node is encrypted and written back to the data page of the second cache unit, so that the update of the index node in the data page is realized based on the adjustment index node.

In addition, after the determining that the adjustment inode is modified, the method further includes:

under the condition that the adjustment index node is a newly added index node and the data page corresponding to the adjustment index node does not have a second cache unit and a third storage unit, a data page corresponding to the adjustment index node is newly established in the second cache unit;

and encrypting and writing the adjustment index node into the newly-built data page, and updating the historical access times of the newly-built data page.

In specific implementation, based on the above method, it is determined that the adjustment index node is the new index node, but under the condition that the second cache unit does not exist in the data page of the second cache unit corresponding to the adjustment index node and the third storage unit does not exist, the data page corresponding to the adjustment index node is newly created in the second cache unit, the adjustment index node is encrypted and written into the newly created data page, and the historical access frequency of the newly created data page is updated. For example, the number of historical accesses to the newly created data page is updated from 0 to 1.

In practical applications, if the modified index node is modified or the modified index node is a new index node, in order to avoid the loss of the index node, the modified index node needs to be encrypted and written back to the data page of the second cache unit, for example, if the index node is an existing index node and the corresponding data page is always in the second cache unit, in such a case, it is only necessary to simply write the index node back to the corresponding data page and mark the data page as updated; if the index node is a new node and is not written back to the data page of the second cache unit before, two situations exist, and one situation is that if the data page where the index node is located is in the second cache unit, the index node is directly written back to the data page and marks that the data page is updated; if the data page where the index node is located is not in the second cache unit but in the third storage unit, the data page needs to be loaded from the third storage unit to the second cache unit so as to avoid losing other nodes on the data page; and under the condition that the index node is not in the second cache unit and is not in the third storage unit, a data page is newly created, and then the logic of the first condition is executed to realize the writing back of the index node to the data page of the second cache unit.

In the embodiment of the present specification, each time a data processing request is received and an index node with a small historical access frequency of a first cache unit is accessed, the data processing method deletes the index node with the small historical access frequency of the first cache unit through a preset policy, and then writes the target index node into the first cache unit.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a structure of an encrypted database according to an embodiment of the present disclosure.

Specifically, the database adopts an Enclave-Native B + tree encryption index, and the Enclave-Native B + tree encryption index includes an EBuffer layer, an MBuffer layer, and an External storage layer three-layer storage architecture, wherein the EBuffer layer includes a first cache unit, and each slot of the first cache unit includes a plaintext index node, specifically,

In addition, the B + tree encryption index further comprises an MBuffer layer, the MBuffer layer comprises a second cache unit, each slot of the second cache unit comprises a data page, each data page is composed of a plurality of encrypted index nodes in the EBuffer layer, wherein,

the encryption database determines the page identifier and the offset of the data page of the target index node in a second cache unit based on the node identifier of the target index node under the condition that the target index node is determined to have the second cache unit; and acquiring the target index node which is stored in an encrypted manner in the second cache unit based on the page identifier and the offset of the data page, decrypting the target index node which is stored in the encrypted manner and writing the decrypted target index node into the first cache unit.

Optionally, the B + tree encryption index further comprises an External storage layer, the External storage layer comprises a third cache unit, the third cache unit comprises a data page in the MBuffer layer, wherein,

the encryption database determines a data page containing the target index node which is stored in an encrypted manner from a third storage unit under the condition that the third storage unit exists in the target index node; writing a data page containing the target index node stored in an encrypted manner into the second cache unit; under the condition that the target index node is determined to have a second cache unit, determining the page identification and the offset of the data page of the target index node in the second cache unit based on the node identification of the target index node; and acquiring the target index node which is stored in an encrypted manner in the second cache unit based on the page identifier and the offset of the data page, decrypting the target index node which is stored in the encrypted manner and writing the decrypted target index node into the first cache unit.

In practical application, the EBuffer layer is an Enclave cache manager, and is configured to manage data transmission between an unprotected host memory and a protected Enclave memory, where an index node is used as a granularity, and the EBuffer layer includes a first cache unit, and each slot of the first cache unit includes an index node in a plaintext.

The MBuffer layer is a cache manager of the unprotected host memory and configured to manage data transmission between the host memory and the External storage layer in units of data pages, and the MBuffer layer includes a second cache unit, each slot of the second cache unit includes one data page, and each data page is composed of a plurality of encrypted index nodes in the EBuffer layer.

The External storage layer is a persistent storage disk and is configured to store index data of data pages of the MBuffer layer, the index data is acquired from the External storage layer and used after reloading is supported, the External storage layer comprises a third cache unit, and the third cache unit comprises the data pages in the MBuffer layer.

In specific implementation, the data processing method is applied to the encrypted database, and the encryption index of the Enable-Native B + tree of the encrypted data adopts a three-layer storage architecture: the buffer memory comprises an EBuffer layer comprising a first cache unit, an MBuffer layer comprising a second cache unit and an External storage layer comprising a third memory unit, wherein the EBuffer layer: the method comprises the steps that a cache manager in the Enclave is used for managing data transmission between an unprotected host memory and a protected Enclave memory by taking an index node as granularity, and each slot of the cache manager contains an index node of a plaintext; MBuffer layer: the cache manager in the host memory is unprotected and is used for managing data transmission between the host memory and external storage by taking a data page as a unit, each slot of the cache manager comprises a data page, and the data page consists of a plurality of encrypted index nodes; external storage layer: the method is a persistent storage disk and is used for storing index data and continuously using the index after the reloading is supported.

The EBuffer layer and the MBuffer layer differ in encryption status, location and size. For example, the EBuffer layer is an index node that manages plaintext in a limited Enclave memory, and the MBuffer layer manages encrypted data pages in a large host memory. Otherwise, the two cache managers (i.e., EBuffer layer and MBuffer layer) have the same structure. Each cache manager consists of three components:

buffer pool (buffer pool): the cache pool is a simple array, each slot stores an index node in the EBuffer layer, and stores a data page in the MBuffer layer;

buffer description layers (buffers): the cache description layer is an array storing metadata, each slot corresponds to a slot of the buffer pool one-to-one, and stores metadata of its corresponding inode/data page. The metadata comprises a bidirectional pointer to maintain the logic sequence of each slot and form a bidirectional linked list, thereby supporting the LRU replacement strategy;

hash map layer (Hash map): the hash mapping layer is a hash table and is used for judging whether the index node/data page to be accessed is in the cache pool or not.

Furthermore, address translation is required between adjacent layers in the storage architecture in order to translate the upper layer representation into the lower layer representation, i.e. the node nodeID is translated into the pageID and offset of the data page, the pageID is translated into a physical address.

And main execution logics of the B + tree, the EBuffer and the MBuffer are all realized in the Enclave, so that the interaction between the Enclave and other unprotected hosts is reduced, and the interaction overhead is saved. When index queries are performed, traversal is performed from the root node of the B + tree to the corresponding leaf node.

Specifically, when a Database Engine (Database Engine) receives a data query request, executes the query request, and accesses an index node corresponding to a key (key) based on target data in the data processing request, if the index node to be accessed is already stored in the EBuffer layer: directly calculating through a Hash mapping layer to obtain the position of the metadata of the index node in a cache description layer, and updating the metadata information of the index node; if the accessed node is in the MBuffer layer but not in the EBuffer layer: adding a new slot in the EBuffer layer to accommodate the index node to be accessed, and calculating the pageID of the data page where the index node is located and the offset of the data page; the index node is found in the MBuffer layer through the information, the decryption is written into a slot distributed in the EBuffer layer, and after the metadata information of the data page of the index node in the MBuffer layer is updated, the steps are executed to realize the access of the index node; if the accessed node is in the external storage layer, but not in the MBuffer layer and the EBuffer layer: adding a new slot in the MBuffer layer to accommodate a data page containing a required index node, loading the page from the external storage layer into the MBuffer, and then executing the steps to realize the access of the index node; and finally, realizing the query of the target data and returning a query result through the access of the index node.

The encryption database provided by the embodiment of the specification reduces the interaction overhead of Enclave in the operation executing process by encrypting the index through an Enclave-native B + tree, and encrypts and decrypts by using the index node as granularity, so that the security is ensured on the premise of reducing the space overhead, and the metadata of the index node is also encrypted, thereby greatly preventing the leakage of the metadata of the index node; meanwhile, the leakage of index node query path information is reduced to a certain extent by the design of a cache manager (a first cache unit) in the Enclave, namely, multi-layer memory cache management is supported, the index nodes with frequent access are stored by fully utilizing the Enclave limited memory, and the leakage of node query path information is avoided.

Corresponding to the above method embodiment, this specification further provides an embodiment of a data processing apparatus, and fig. 4 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of this specification. As shown in fig. 4, the apparatus includes:

a request receiving module 402, configured to receive a data processing request, where the data processing request carries encrypted target data;

a target index node obtaining module 404, configured to decrypt the encrypted target data, and determine a target index node corresponding to the target data based on a keyword of the target data obtained after decryption;

an adjusted index node determining module 406, configured to determine, based on a preset policy, an adjusted index node in a first cache unit when it is determined that the target index node does not have the first cache unit and the first cache unit reaches a preset cache threshold;

a first data processing module 408 configured to delete the adjusted inode in the first cache unit and write the target inode into the first cache unit, so as to implement data processing on the target data.

Optionally, the first cache unit includes a cache pool for storing the index nodes, a cache description layer for storing the index node metadata, and a hash mapping layer;

accordingly, the apparatus further comprises:

a second data processing module configured to:

under the condition that the target index node is determined to have the cache pool of the first cache unit, determining the position of the target index node on a cache description layer of the first cache unit based on a hash mapping layer of the first cache unit;

Optionally, the first data processing module 408 is further configured to:

Optionally, the second cache unit includes a cache pool for storing the data page, a cache description layer for storing metadata of the data page, and a hash mapping layer;

accordingly, the first data processing module 408 is further configured to:

Optionally, the first data processing module 408 is further configured to:

Optionally, the apparatus further includes:

a determination module configured to:

determining whether the adjustment inode is modified,

Optionally, the apparatus further includes:

a first node write back module configured to:

Optionally, the apparatus further includes:

a second node write back module configured to:

Optionally, the adjusted index node determining module 406 is further configured to:

In the data processing apparatus provided in the embodiment of the present specification, each time a data processing request is received and an index node with a small number of historical access times of the first cache unit is accessed, is deleted by using a preset policy, and then the target index node is written into the first cache unit.

The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.

FIG. 5 illustrates a block diagram of a computing device 500 provided in accordance with one embodiment of the present description. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.

Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 5 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.

Wherein the processor 520 is configured to execute computer-executable instructions for executing the computer-executable instructions, which when executed by the processor, implement the steps of the data processing method.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification also provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the data processing method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of data processing, comprising:

2. The data processing method according to claim 1, wherein the first cache unit comprises a cache pool storing index nodes, a cache description layer storing index node metadata, and a hash mapping layer;

3. The data processing method of claim 1, the writing the target inode to the first cache unit comprising:

4. The data processing method according to claim 3, wherein the second cache unit comprises a cache pool storing data pages, a cache description layer storing data page metadata, and a hash mapping layer;

5. The data processing method according to claim 4, wherein the obtaining, at the second cache unit, the target inode that is stored in an encrypted manner based on the page identifier and the offset of the data page, and writing the target inode that is stored in the encrypted manner into the first cache unit after decrypting the target inode, includes:

6. The data processing method of claim 1, the writing the target inode to the first cache unit comprising:

7. The data processing method according to claim 6, further comprising, after deleting the modified inode in the first cache unit:

determining whether the adjustment inode is modified,

8. The data processing method of claim 7, after determining that the adjustment inode is modified, further comprising:

9. The data processing method of claim 8, after determining that the adjustment inode is modified, further comprising:

10. The data processing method according to any one of claims 1 to 9, wherein the determining an adjusted inode in the first cache unit based on a preset policy includes:

11. A data processing apparatus comprising:

12. An encrypted database, wherein the database adopts a B + tree encryption index, the B + tree encryption index comprises an EBuffer layer, the EBuffer layer comprises a first cache unit, each slot of the first cache unit comprises a plaintext index node, and the B + tree encryption index comprises a plaintext index node,

13. The cryptographic database of claim 12, comprising:

the B + tree encryption index further comprises an MBuffer layer, the MBuffer layer comprises a second cache unit, each slot of the second cache unit comprises a data page, each data page comprises a plurality of encrypted index nodes in the EBuffer layer, wherein,

14. The cryptographic database of claim 13, comprising:

the B + tree encryption index further comprises an External storage layer, the External storage layer comprises a third cache unit, the third cache unit comprises a data page in the MBuffer layer, wherein,

15. A computing device, comprising:

a memory and a processor;

the memory is for storing computer-executable instructions and the processor is for executing the computer-executable instructions, which when executed by the processor implement the steps of the data processing method of any one of claims 1 to 10.

16. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 10.