WO2017222407A1 - Two-mode encryption scheme allowing comparison-based indexing - Google Patents
Two-mode encryption scheme allowing comparison-based indexing Download PDFInfo
- Publication number
- WO2017222407A1 WO2017222407A1 PCT/RU2016/000382 RU2016000382W WO2017222407A1 WO 2017222407 A1 WO2017222407 A1 WO 2017222407A1 RU 2016000382 W RU2016000382 W RU 2016000382W WO 2017222407 A1 WO2017222407 A1 WO 2017222407A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- encrypted
- bound
- values
- value
- data values
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
Definitions
- the present disclosure relates to adaptive indexing over encrypted numeric data and specifically to adaptive indexing over encrypted numeric data without revealing the total data order of the numeric data.
- a firm may deploy trading, analytics, and risk management modules to the cloud so as to filter out the most relevant financial data for in-house analysis.
- sensitive data and query results may be leaked to malicious adversaries and/or an honest-but-curious service provider itself.
- Such concerns have motivated research on data encryption and query answering over encrypted data, starting with SQL query processing over encrypted data, and expanding to the design of tailor-made schemes that allow the processing of specialized queries over encrypted data, such as kNN and skyline queries.
- CryptDB has provided a coherent collection of efficient SQL- aware encryption schemes that allows for the execution of SQL queries over encrypted data.
- the aim is to allow data to be managed by a server with minimal intervention by data owner and clients.
- the server should be able to process queries over the encrypted data, and deliver encrypted query results to the client, while the client should only have to decrypt the data and obtain the actual results thereby.
- it should, first of all, be capable to index the data to the extent deemed necessary.
- modern applications, handling continuously arriving data call for an inherent capacity to perform incremental indexing on demand, while processing user queries, and as a side-effect of such queries, requiring neither a priori idle time nor a priori workload knowledge; in other words, the system should not only be capable for indexing; it should be capable for adaptive indexing and self-organization.
- extant encryption schemes suffer from one or more of the following drawbacks: (i) they are too computationally expensive; or (ii) leak too much information on the encrypted data, and/or (iii) require more data than the actual query results to be retrieved and then filtered in a postprocessing step.
- a model is proposed herein for breaking this isolation and achieving a seemingly contradictory target: allow for database systems that are both secure and adaptive.
- the challenge of adaptively indexing an encrypted database is examined herein.
- the extent to which encryption and indexing are in tension with each other is ivestigated: the bare-bones requirements for indexing, and the ways in which such requirements may be reconciled with encryption are studied.
- a lightweight and efficient encryption scheme that reveals less information than previous suggestions and allows for indexing encrypted numeric data adaptively is proposed. This indexing capacity does not aim to develop a complete index upfront, but to one built in an incremental, progressive manner, so that only those data which are queried get indexed.
- Such an adaptive indexing enables lightweight adaptation of the system to the workload at hand and also mitigates the tension between encryption and indexing.
- the proposed scheme relies on simple linear-algebra operations for encryption and decryption, while it can efficiently process range and point queries over ciphertexts without disclosing the order among attribute values, as an Order- Preserving Encryption Scheme, and without the computational and storage burdens of schemes like fully homomorphic encryption .
- Hacgumus et al. H. Hacigumiis, B. R. Iyer, C. Li, and S. Mehrotra. Executing SQL over encrypted data in the database-service- provider model. In SIGMOD, pages 216-227, 2002) proposed a bucketization scheme that allows for approximate filtering of query results at the server, followed by the final processing at the client, after decryption.
- Hore et al. B. Hore, S. Mehrotra, and G. Tsudik. A privacy-preserving index for range queries.
- FHE fully homomorphic encryption
- Enabling secure database as a service using fully homomorphic encryption Challenges and opportunities.
- CoRR, abs/1302.2654, 2013) discuss the potential of FHE to enable the vision of secure database as a service, and conclude that practicality remains a very important concern.
- the server computes the query results first and sends to the client the encrypted number n of result tuples; the client decrypts n , and asks for n' ⁇ n rows from the server. The server returns the top- tt' rows in the result; thus, the client needs to be actively involved in simple query processing tasks.
- Shi et al. (E. Shi, J. Bethencourt, H. T.-H. Chan, D. X. Song, and A. Perrig. Multidimensional range query over encrypted data.
- IEEE Symposium on Security and Privacy, pages 350-364, 2007) propose an encryption scheme for answering multidimensional range queries. Yet the problem they address is that of allowing an auditor to decrypt those and only those records (e.g., financial audit logs, medical anamneses, etc.) whose attributes fall within a specified range; privacy is not protected when an entry is matched by the query.
- this match-revealing scheme does not allow for protecting attribute values while enabling a service provider to identify records matching queries over encrypted data in a way that can be exploited for indexing.
- Boneh and Waters (D. Boneh and B. Waters. Conjunctive, subset, and range queries on encrypted data. In TCC, pages 535-554, 2007) process complex queries over encrypted data based on Hidden Vector Encryption (HVE).
- HVE Hidden Vector Encryption
- the notion of security in this work is stronger than the one in the work of Shi et al.: query-matching records are identified, but their attribute values remain hidden. This match-concealing security notion fits the requirements for indexing while protecting the privacy of attribute values.
- the proposed scheme is not practicable for vast volumes of dynamic data; it necessitates O(DT) public key size, encryption time, and ciphertext size, for D attributes and T discrete values per attribute.
- MONOMI S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich. Processing analytical queries over encrypted data. PVLDB, 6(5):289-300, 2013
- MONOMI uses several techniques so as to improve performance, along with a designer that chooses efficient server-side physical designs and a planner that selects efficient query execution plans involving server and client.
- OPES OPES.
- Li et al. Li, A. X. Liu, A. L. Wang, and B.
- a client maintains a secret information dispersal matrix and the keys for decoding a data matrix D ; the client encodes and disperses D onto n servers, while using a pseudo-random number generator with a secret seed to add a random salt to each data entry; when retrieving data, salts are reconstructed and deducted from the decoded data, so as to recover D .
- query processing is directed by the client, while servers can only access data following the client's instructions.
- approaches based on an encrypted index do not allow the server to build, maintain, and traverse an index relying on its own devices.
- Cipherbase A third direction involves processing encrypted data in secure hardware, most recently exemplified by Cipherbase (A. Arasu, K. Eguro, M. Joglekar, R. Kaushik, D. Kossmann, and R. Ramamurthy. Transaction processing on confidential data using Cipherbase. In ICDE, pages 435- 446, 2015). Yet, for a range index Cipherbase reveals the full ordering information of index keys, even to a weak adversary; thus, range indexes in Cipherbase provide "similar confidentiality guarantees", and have the same drawbacks as order preserving encryption (OPE).
- OPE order preserving encryption
- Adaptive indexing is a recently introduced concept, by which index tuning need not be performed during system initialization; instead, it occurs during query processing: each query is interpreted not only as a request for a particular result set, but also as an advice on how to physically store the data, triggering small actions that refine the adaptive indexes. Those data portions that are queried become progressively and incrementally indexed. This capacity to index by continuously reacting to a changing query workload brings forth the property of self-organization.
- Database cracking introduced the notion of continuous, incremental, and on-demand adaptive indexing in the context of modern column-stores.
- the select operator of a database system performs index-building operations as a side-effect of processing a range (or equality) filtering action; an index is built and refined collaterally to query execution: the more a data range is queried, the more its index is refined.
- the physical data store is changing with each incoming range query q, interpreting q as a hint on how data should be stored. That hint may explicitly use q's query bounds, or follow a more lax interpretation for the sake of robustness. Without loss of generality, the strict interpretation is discussed. Assume a query requests A ⁇ ⁇ 0.
- a cracking DBMS responds to q by clustering all tuples of A with A ⁇ 10 at the beginning of the respective column C , while pushing all other tuples to the column's end.
- a subsequent query requesting A ⁇ v, , where v, > 10 will only need to look into the last piece of C , while a query that requests A ⁇ v 2 , where v 2 ⁇ 10 , searches its first part; each subsequent query cracks its respective piece further.
- Figure 1 shows how two queries crack a column by their selection predicates.
- Query Q l cuts the column in three pieces, and Q2 enhances this partitioning by cutting the first and the last piece further.
- Each query collects its qualifying tuples in a contiguous area. Thereby, the original column A (including positions) is copied into a cracker index column, which is then continuously reorganized.
- Algorithm 1 shows the most basic cracking algorithm that splits a piece into two, identifying tuples that need to be exchanged by comparison operations.
- numerous algorithms have been proposed that split a piece of a column into three pieces, N pieces using radix clustering, partially sorting pieces in combination with partition/merge operations, fully sorting pieces when touched for the first time, and splitting pieces based on random pivots as opposed to query predicates, yet the core idea remains the same.
- Figure 1 shows Query Ql , which partitions the whole column into three pieces, followed by Query Q2 that reorganizes Pieces 1 and 3. Other pieces of work accommodate updates, propagate the physical organization from one column to others on demand, and exploit multi-core CPUs.
- the basic cracking design is considered.
- the focus is directed on the action of reorganizing a column into pieces with respect to a breakpoint as part of a query plan itself; thereby, a cracking select operator physically reorganizes the proper pieces of a column to bring all qualifying values in a contiguous area, which is leveraged to return the result, without additional result materialization.
- AVL tree used for indexing
- tree updates and rebalancing actions are also performed during query processing, in logarithmic time. This core operation is common and universally applicable to all adaptive indexing design.
- a method for searching a database comprises encrypting a plurality of data values in the database with a first encryption mode to obtain encrypted data values; encrypting a bound value with a second encryption mode to obtain an encrypted bound value, wherein the second encryption mode being different from the first encryption mode; and comparing the bound value with the plurality of data values using the encrypted bound values and the encrypted data values.
- comparing a bound value with the plurality of data values using the encrypted bound values and the encrypted data values comprises comparing a bound value with the plurality of data values using an operation over the encrypted bound values and the encrypted data values.
- one or more of the encrypted data values are key values used for indexing the encrypted data values, wherein comparing the bound value with the plurality of data values comprises comparing the bound value with the plurality of data values using the encrypted bound values, the key values and the encrypted data values
- encrypting a plurality of data values in the database comprises transforming each of the plurality of data values into a value vector; encrypting a bound value comprises transforming the bound value into a bound vector; and comparing the bound value with the plurality of data values comprises obtaining scalar products of the value vectors and the bound vector.
- each of the value vectors comprises a noisy value subvector; the bound vector comprises a noisy bound subvector; wherein the noisy bound subvector is orthogonal to each of the noisy value subvectors.
- the method comprises selecting a random positive multiplier for each of the value vectors; and scaling each of the value vector by the respective selected multiplier.
- the method comprises selecting a first positive random multiplier and a second positive multiplier for each of the value vectors; scaling each of the noisy value subvector by the respective first random multiplier; and scaling the rest of each of the value vectors by the respective second positive random multiplier.
- the method comprises selecting an invertible matrix; wherein transforming each of the data values into a value vector comprises transforming the data value into a column value vector and multiplying one of the inverse of the matrix and the transpose of the matrix by the column value vector; and transforming the bound value into a bound vector comprises transforming the bound value into a column bound vector and multiplying the other of the inverse of the matrix and the transpose of the matrix by the column bound vector.
- a method for indexing data values comprises encrypting a plurality of data values with a first encryption mode to obtain encrypted data values; encrypting one or more bound values with the first encryption mode to obtain first encrypted bound values and with a second encryption mode to obtain second encrypted bound values, wherein the second encryption mode being different from the first encryption mode; comparing the one or more bound values with the plurality of data values using the second encrypted bound values and the encrypted data values; and indexing encrypted data values based on said comparison using the first encrypted bound values as key values.
- values encrypted with the first encryption mode are not comparable to each other; and values encrypted with the second encryption mode are not comparable to each other.
- the method comprises encrypting another bound value with the first encryption mode to obtain another first encrypted bound value and with a second encryption mode to obtain another second encrypted bound value; comparing the another bound value with the plurality of data values using the another second encrypted bound value, the key values, and the encrypted data values; adding the another first encrypted bound value as another key value based on said comparison.
- indexing encrypted data values comprising constructing a hierarchical index structure, wherein nodes in the hierarchical index structure being the key values.
- the hierarchical index structure is a binary search tree.
- the binary search tree is an AVL tree.
- the method comprises adding ambiguity to at least one of the data values, so that a plurality of interpretations for each of the at least one of the data value are possible, wherein a single interpretation of the plurality of interpretations being true.
- the method comprises adding ambiguity to at least one of the one or more bound values, so that a plurality of interpretations for the at least one of the one or more bound values are possible, wherein a single interpretation of the plurality of interpretations being true.
- encrypting the plurality of data values with a first encryption mode comprises transforming the data values into value vectors; encrypting the one or more bound values with the first encryption mode comprises transforming the bound values into first bound vectors; encrypting the one or more bound values with the second encryption mode comprises transforming the bound values into second bound vectors; and comparing the one or more bound values with the plurality of data values comprises obtaining scalar products of the value vectors and the second bound vectors.
- a server for indexing data values comprises a storage means configured to store encrypted data values, the encrypted data values being a plurality of data values encrypted with a first encryption mode; a receiving unit configured to receive, from a client, first encrypted bound values and second encrypted bound values, the first encrypted bound values being one or more bound values encrypted with the first encryption mode and the second encrypted bound values being the one or more bound values encrypted with a second encryption mode, wherein the second encryption mode being different from the first encryption mode; and a processing unit configured to compare the one or more bound values with the plurality of data values using the second encrypted bound values and the encrypted data values; and index encrypted data values based on said comparison using the first encrypted bound values as key values.
- the encrypted data values are not comparable to each other.
- the receiving unit is further configured to receive another first encrypted bound value and another second encrypted value, the another first encrypted bound value is another bound value encrypted with the first encryption mode and the another second encrypted value is the another bound value encrypted with the second encryption mode; and the processing unit is further configured to compare the another bound value with the plurality of data values using the another second encrypted bound value, the key values and the encrypted data values; and add the another first encrypted bound value as another key value based on said comparison, wherein the server further comprises a sending unit configured to send the result of said comparison to the client.
- a client for receiving a search result comprises a sending unit configured to send a second encrypted bound value to a server, the second encrypted bound value is a bound value encrypted with a second encryption mode, wherein the server comprises encrypted data values, the encrypted data values being a plurality of data values encrypted with a first encryption mode; a receiving unit configured to receive, from the server, a result of comparison between the bound value and the plurality of data values using the second encrypted bound value and the encrypted data values, the result comprising a set of the encrypted data values; and a processing unit configured to decrypt the set of the encrypted data values.
- the another embodiment of the client the sending unit is further configured to send a first encrypted bound value to the server, the first encrypted bound value being the bound value encrypted with the first encryption mode.
- processing unit is configured to encrypt the bound value with the second encryption mode to obtain the second encrypted bound value and encrypt the bound value with the first encryption mode to obtain the first encrypted bound value.
- At least one of the plurality of data values have ambiguity, so that a plurality of interpretations for each of the at least some of the plurality of data value are possible, wherein a single interpretation of the plurality of interpretations being true; wherein the processor is configured to extract true interpretations of data values from the set of encrypted data values.
- Fig. 1 shows cracking a column of values
- Fig. 2 shows an encrypted vector
- Fig. 3 shows an encrypted vector with ambiguity
- Fig. 4 show how a piece can be found in an encrypted AVL Tree
- Fig. 5 shows how a node can be added in an encrypted AVL tree.
- the invention is directed to devise an encryption scheme that provides protection against attacks that aim to compromise data confidentiality, yet enables self-organizing indexing operations over encrypted numeric data, i.e. an indexable encryption scheme.
- an adversary can be either an external malicious entity or an honest-but-curious server. Insider attacks, such as those arising from malicious partners, are not considered. Client machines are assumed to be safe; confidential information such as secret keys on the client and client queries are not known to attackers. Attacker's computations are bounded by polynomial-size circuits. Sych an encryption scheme should satisfy the following requirements.
- the key sizes, encryption time, and ciphertext sizes should be manageable and not growing with the size of data, number of attributes, or number of discrete values.
- the client should not be elaborately involved in processing simple queries; the server should deliver the encrypted results, and only those, to the client in a single round of communication.
- the server should be capable to gracefully accommodate newly arriving data values and support updates in the encrypted data.
- a prime challenge is to resolve the tension between requirements (1) and (2), i.e., enable inequality comparisons without revealing order.
- a way to achieve this result is to postulate that comparisons should not be possible among encrypted data residing at the server. If such data are not comparable to each other, then they cannot be straightforwardly brought to sorted order. Then it is necessary spell out under what circumstances a server should be able to perform an inequality check among ciphertexts. Such comparisons should be possible only on demand, between ciphertexts (e.g., a query bound and a tuple value) rendered comparable by their encryption; in effect, indexing actions would also be possible only on client demand.
- ciphertexts e.g., a query bound and a tuple value
- Such on-demand indexing would be compatible with adaptive indexing: If one has to perform on-demand indexing for the sake of security, one also needs to perform exactly such actions for the sake of adaptiveness, as the data has to be indexed in response to queries. Then, instead of being in tension with each other, the requirements for security and adaptivity reinforce each other.
- Such a method could utilize two complementary encryption modes, A and B, interfacing with each other, so that a ciphertext encrypted in mode A can be compared to one encrypted in mode B. Then, encrypting attribute values in mode A and query bounds in mode B offers the server a capacity to compare the former to the latter; as can be seen, such comparisons form the backbone of adaptive indexing operations, and are performed with operators ⁇ and ⁇ 2 in Algorithm 1. Assume a server needs to perform an inequality check between key value v in a column C and query bound value b , i.e., to check whether v > b . Equivalently, the server needs to check whether: v -b ⁇ 0 (1)
- Inequality (1) can be expressed as
- the objective is to devise an encryption scheme that allows the computation of such scalar vector products in obscured fashion.
- this scheme three obscurement operations are employed that are to be performed by the data vendor when uploading the data to the server and by a trusted client before issuing a query: (i) noise addition; (ii) scalar multiplication, and (iii) matrix multiplication. The details of these operations are given in the following.
- the noise addition operation builds two longer vectors b , v, by adding extra noisy elements to the length- 2 vectors used in Inequality (2), such that the result of the vector product remains the same, b and v are referred the encrypted bound and value vector, respectively.
- the specific length i of vectors b and v , and the positioning and distinction between the added noisy contents and original value contents therein constitute part of the encryption key, known to the data owner and trusted client, but not to the service provider. Without loss of generality, an example using vectors of length 5 is presented. Inequality (2) can be rewritten as follows:
- the original contents of the vectors occupy positions 1 and 3 in them, whereas positions ⁇ 0,2,4 ⁇ are reserved for noisy contents that cancel each other out in the scalar vector product, as they form two orthogonal length- 3 embedded subvectors, n h and n v , with inner product 0 :
- noisy subvectors of b and v are called noisy subvectors of b and v . It should be emphasized that the same exact noisy subvectors need not be used for each encrypted key value v or each encrypted bound value b , yet the same positions have to be used for noisy contents across all encrypted values in the same column. Any pair of vectors orthogonal to each other can be used, such that their inner product produces 0 . Nevertheless, a vector b produced for a given bound value b should be orthogonal to all vectors v produced for attribute values. Therefore, the orthogonal vector pairs cannot be chosen in an entirely arbitrary fashion.
- the unit vector used is u j
- 1 .
- a noisy subvector of v is produced by randomly selecting any vector of length £ - 2, u L (v) , orthogonal to u .
- the encryption scheme devised so far enables the calculation of the difference v-b by the server, via the calculation of a scalar product of two vectors.
- This scheme solves the objective of performing inequality checks, hence processing range queries, over the encrypted numeric data, while The encryption scheme does not preserve the order of the data in the way that OPES does.
- an adversary can calculate any scalar product of the form b v , hence the difference v-b , for any encrypted value vector v ; then, by comparing the obtained v-b values, one obtains not only the order, but also the exact differences among the encrypted data values.
- the server still obtains the correct sign of the difference v-b , without the exact norm of this difference being leaked.
- an adversary cannot reconstruct the order of data values using information obtained from scalar products.
- the noise addition and scalar multiplication operations produce a vector v representing each key value v and a vector b representing a query bound value b , so that the sign, but not the norm, of v- b can be obtained from the scalar product b v .
- the security these operations provide cannot withstand a simple breach: an adversary who learns the position within a vector v where the values ⁇ ( ⁇ ) ⁇ and - ⁇ ( ⁇ ) reside, can effectively acquire all original key values. Therefore, an additional encryption layer is provided to obstruct this breach.
- v and b are vectors of length I , and let M be any invertible t t matrix, and M ' its inverse. M constitutes part of the encryption key; it is known to the data vendor and its trusted clients, but not to the service provider.
- Any key value v and any breakpoint b can be encrypted as follows.
- E v (v) - (5)
- E b (b) M T (6)
- ⁇ ⁇ ( ⁇ ) , ⁇ ⁇ ( ⁇ ) are encryption functions
- b and v are represented as column vectors
- r is the transpose of matrix M .
- row vector a T is the transpose of column vector a .
- E v (v) is generated by the data vendor and passed on to the server when the data (or an update) is generated.
- E b (b) is produced by the trusted client, or again by the data vendor on behalf of the client, when issuing a query.
- These two encryption (and reverse decryption) steps are the only workload data vendor and client have to bear.
- Other computations and query processing are conducted by the server as with a non-encrypted database. At the same time, given those encrypted vectors, an adversary cannot determine the values of v or b without knowing the invertible matrix M .
- the total encryption key consists of:
- the multiplication matrix M enters the picture, consisting of I 1 unknowns.
- the unit vector u brings 1 - 2 more unknowns into the picture; each encrypted vector v brings £ - 3 more unknowns, as only one out of its £ - 2 noisy elements can be derived using the £ - 3 others and orthogonality to u , and the ⁇ ( ⁇ ) factor, which has multiplied ⁇ v,-l ⁇ to produce its noisy contents; each encrypted vector b brings a A(b) factor, which multiplies u to produce its noisy contents.
- An attack that could still bear fruit is a known plaintext attack, in case Alice could gain access to pairs of an original value (either v or b ) and its encrypted form ( E v (v) or E h (b) , respectively). For each such pair, she can construct I scalar linear equations.
- the described error-allowance scheme creates one of the following two vectors: Without loss of generality, consider only the first variant. In this first variant, it is
- Fig. 2 shows the form of an encrypted vector E v
- Fig. 3 shows the encrypted vector E v (v) , with ambiguity added.
- the size of the ambiguity-added vector is one more than original encrypted vector, hence each original value acquires two representations in the database: the one represented by the real encrypted vector, as the I -prefix in the figure, and another represented by the fake vector, as the ⁇ -suffix in the figure. Both are derived from the ambiguity-added vector as shown in Fig. 3.
- the server generates and manages each possible interpretation of E v (v) separately; this redundancy provides an obscurity that reduces the confidence of an adversary's inferences about the value order.
- the end result is similar to adding counterfeit records in the database.
- the security provided by he proposed scheme is higher than that provided by mere counterfeit insertion.
- an adversary with sufficient background knowledge may identify a single record of interest and infer information-leaking observations about its position in the index.
- the position of a record of interest in the index is uncertain even when that record of interest is identified, since each single record spawns two possible interpretations; this state of affairs confers additional security to the proposed construction.
- prefix/suffix values ⁇ are generated deterministically from the respective encrypted values, introducing additional leakage, while their domain is skewed away from the domain of encrypted real vector components, thus, ⁇ can be easily distinguished.
- another approach is proposed, which may not provide any compression possibility, but provides security guarantees:
- a "fake” query distribution F b is generated by the client and becomes a part of the secret key.
- the server cannot distinguish which query is "real”; it processes both queries and sends the results in the same order.
- this ambiguous query generation tackles a separate problem in the context of adaptive indexing: naive cracking schemes perform indexing based blindly on the incoming query bounds; thus, they leave the question of workload-robustness aside, and are vulnerable to substandard performance in case of queries selecting worst-case pivots, just like quicksort is vulnerable to quadratic worst-case complexity.
- Stochastic cracking [16] proposes adding randomness in the selection of cracking pivot points, thereby achieving robustness against arbitrary workloads. In our setup, this randomness is effectively moved from the server side to the client side in the form of ambiguous query generation: the introduced ambiguity serves not only a security objective, but also a workload-robustness objective with one stroke.
- scaling term is integer for "real" values and fractional for "fake” values.
- the proposed encryption scheme enables a server to manage a column of encrypted values, even with ambiguity inserted, and conduct comparisons between query bounds and data values.
- the proposed system should also construct an AVL tree; in particular, query bounds, which correspond to the breakpoints b , are used as key values in tree nodes and utilized in subsequent tree traversals. During such a traversal, a new bound value b' has to be compared to a previous one, b , now used as key; then, at each node two breakpoints values, b' and b , is to be compared to each other.
- the server needs to be able to compare two breakpoint values, b' and b , to each other.
- the proposed encryption scheme uses two different encryption modes, one for breakpoints b , and one for values v , constructed so as to allow comparing b to v only, but not different b and v values to each other.
- This problem is solved by having the query-issuing client encrypt a breakpoint b in both ways, i.e., in its native way, as E h (b) , and as an attribute value, E V ( >) , and pass them on to the server.
- the former value is used for inequality checks with attribute values, while the latter is employed for storing it as a key in the constructed AVL tree index.
- the required comparison is conducted by calculating
- Algorithm findpiece (obj AVLTree, N , E h , posL , posH )
- minNode objAVLTree.findMinNode(root)
- Algorithm f indpiece illustrates the finding of the crack positions for a query bound b ; it makes use of largest and smallest values in the AVL tree in order to determine whether the query bound is out of the tree's range; otherwise it obtains the leaf node flVode by a search operation on the tree based on the given query bound E ft ; overall, it distinguishes the following cases:
- Case 1 holds in case b is greater than the largest value in the tree; then the largest value in the tree is returned as the lower bound of the piece range.
- Fig. 5 presents an AVL-tree traversal along with the addition of a leaf node for b s at the last step, after the piece in which b belongs has been cracked in two.
- Algorithm addCrack illustrates how this operation is performed; its first three cases examine the situation where a node for b already exists in the tree; the fourth case adds a new node.
- rootNode objAVLTree.getRootQ
- minNode objAVLTree.findMinNode(root)
- this scheme allows for (i) range query processing over encrypted numeric data outsourced in the cloud, and thereby for (ii) incremental, adaptive indexing whereby only data that are queried by trusted clients get indexed.
- the scheme represents numerical values as short vectors, and relies on simple linear-algebra operations for encryption and decryption; it allows neither the actual data values nor their order to be disclosed. While the structure of the index may reveal order in the long-term, this only happens after crucial indexing operations have been performed; furthermore, an additional obfuscation component is proposed in the scheme, this component deliberately introduces ambiguity in the proposed construction by allowing two variant interpretations of each encrypted value vector.
- the scheme assures the security needed in time-critical operations such as high-frequency trading and financial transaction processing over the cloud.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
The disclosure is directed to a lightweith encryption scheme that allows for range query processing over encrypted numeric data outsourced in the cloud or a remote server and thereby for incremental, adaptive indexing whereby only data that are queried by trusted clients get indexed. In one embodiment a method for indexing data values is provided. The method comprises encrypting a plurality of data values with a first encryption mode to obtain encrypted data values; encrypting one or more bound values with the first encryption mode to obtain first encrypted bound values and with a second encryption mode to obtain second encrypted bound values, wherein the second encryption mode being different from the first encryption mode; comparing the one or more bound values with the plurality of data values using the second encrypted bound values and the encrypted data values; and indexing encrypted data values based on said comparison using the first encrypted bound values as key values.
Description
TWO-MODE ENCRYPTION SCHEME ALLOWING COMPARISON-BASED INDEXING
FIELD OF THE INVENTION
The present disclosure relates to adaptive indexing over encrypted numeric data and specifically to adaptive indexing over encrypted numeric data without revealing the total data order of the numeric data.
BACKGROUND OF THE INVENTION
Introduction
The mankind has entered the database-as-a-service era; data management capabilities need not be present at the data owner's locale, but can be provided as a service by a cloud-based service provider, to which the data is outsourced. This model provides users (i.e., the data owner and its trusted clients) power to create, store, modify, and retrieve data from anywhere in the world; the discussion about the applicability of this model has entered domains such as high-frequency trading, whereby a firm can use cloud providers' servers to test trading strategies, run time series analysis, assess risks, and even execute trades, while collecting financial data daily, or on a finer time scale. At the same time, such a model raises security and confidentiality concerns; to follow the same application area, a firm may deploy trading, analytics, and risk management modules to the cloud so as to filter out the most relevant financial data for in-house analysis. Thereby, sensitive data and query results may be leaked to malicious adversaries and/or an honest-but-curious service provider itself. Such concerns have motivated research on data encryption and query answering over encrypted data, starting with SQL query processing over encrypted data, and expanding to the design of tailor-made schemes that allow the processing of specialized queries over encrypted data, such as kNN and skyline queries. CryptDB has provided a coherent collection of efficient SQL- aware encryption schemes that allows for the execution of SQL queries over encrypted data.
In all cases, the aim is to allow data to be managed by a server with minimal intervention by data owner and clients. The server should be able to process queries over the encrypted data, and deliver encrypted query results to the client, while the client should only have to decrypt the data and obtain the actual results thereby. Nevertheless, in order for a system to manage data and process queries efficiently, it should, first of all, be capable to index the data to the extent deemed necessary. In addition, modern applications, handling continuously arriving data, call for an inherent capacity
to perform incremental indexing on demand, while processing user queries, and as a side-effect of such queries, requiring neither a priori idle time nor a priori workload knowledge; in other words, the system should not only be capable for indexing; it should be capable for adaptive indexing and self-organization.
These requirements for data management in the database-as-a-service era bring forth a contradiction. The requirement that data be kept confidential necessitates that it be encrypted; the requirement that data be efficiently managed necessitates that it be indexed. Encryption and indexing are in tension with each other; the former requires that the service provider knows nothing about the values of the data, while the latter necessitates that the same service provider knows, ideally, the exact values of the data under its purview. Given such conflicting aims, previous research has confronted these two challenges in isolation from each other and made no effort to face them in unison. Solutions for adaptive indexing and self-organization have not considered security and confidentiality, while secure database systems do not cater for adaptivity in dynamic environments. Unfortunately, by shying away from such an objective, such approaches offer fragmentary solutions to the problem of secure and efficient outsourced data management; in particular, extant encryption schemes suffer from one or more of the following drawbacks: (i) they are too computationally expensive; or (ii) leak too much information on the encrypted data, and/or (iii) require more data than the actual query results to be retrieved and then filtered in a postprocessing step.
A model is proposed herein for breaking this isolation and achieving a seemingly contradictory target: allow for database systems that are both secure and adaptive. The challenge of adaptively indexing an encrypted database is examined herein. To bring about such a result, the extent to which encryption and indexing are in tension with each other is ivestigated: the bare-bones requirements for indexing, and the ways in which such requirements may be reconciled with encryption are studied. Herein a lightweight and efficient encryption scheme that reveals less information than previous suggestions and allows for indexing encrypted numeric data adaptively is proposed. This indexing capacity does not aim to develop a complete index upfront, but to one built in an incremental, progressive manner, so that only those data which are queried get indexed. Such an adaptive indexing enables lightweight adaptation of the system to the workload at hand and also mitigates the tension between encryption and indexing. The proposed scheme relies on simple linear-algebra operations for encryption and decryption, while it can efficiently process range and point queries over ciphertexts without disclosing the order among attribute values, as an Order-
Preserving Encryption Scheme, and without the computational and storage burdens of schemes like fully homomorphic encryption .
Below a review of prior work on data systems that support queries over encrypted data, as well as on adaptive indexing is given.
Processing Encrypted Data
Research on processing encrypted data can be classified in two broad classes. The former class of solutions propose processing the encrypted data directly. Hacgumus et al. (H. Hacigumiis, B. R. Iyer, C. Li, and S. Mehrotra. Executing SQL over encrypted data in the database-service- provider model. In SIGMOD, pages 216-227, 2002) proposed a bucketization scheme that allows for approximate filtering of query results at the server, followed by the final processing at the client, after decryption. Hore et al. (B. Hore, S. Mehrotra, and G. Tsudik. A privacy-preserving index for range queries. In VLDB, pages 720-731, 2004) expand on this scheme with an index that supports obfuscated range queries, extended to the multidimensional case (B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu. Secure multidimensional range queries over outsourced data. VLDB Journal, 21(3):333— 358, 2012). However, such schemes reveal the data distribution and involve the client in query processing, while the only indexing capacity they offer is that afforded by bucketization.
An attempt for fine-grained indexing would necessitate sorting values at the server. Such an alternative is offered by Agrawal et al.'s Order-Preserving Encryption Scheme (OPES) (R. Agrawal, J. iernan, R. Srikant, and Y. Xu. Order-preserving encryption for numeric data. In SIGMOD, 2004), built on an encoding that preserves the order of numerical data. However, as previous research has noted, OPES reveals the data order, hence cannot overcome attacks based on statistical analysis on encrypted data. Arguably, OPES provides an overkill solution: it delivers encrypted values in sortable form, hence allows such values to be compared to each other.
In the meantime, advances in cryptography have led to the capacity to perform arbitrary computations over encrypted data, so as to obtain a correct encrypted result, by fully homomorphic encryption (FHE) (C. Gentry. Fully homomorphic encryption using ideal lattices. In STOC, pages 169-178, 2009). Such computations rely on an expensive public key cryptosystem, bringing forth prohibitive overheads of up to nine orders of magnitude. Besides, FHE does not enable indexing encrypted data. With FHE, even while the server performs all computations correctly, it does so only in the encrypted view of the world. It has no access to any truth in the real world that would allow for building an index. Mani et al. (M. Mani, K. Shah, and M. Gunda. Enabling secure database as a
service using fully homomorphic encryption: Challenges and opportunities. CoRR, abs/1302.2654, 2013) discuss the potential of FHE to enable the vision of secure database as a service, and conclude that practicality remains a very important concern. To process a simple select query by the two-step process, the server computes the query results first and sends to the client the encrypted number n of result tuples; the client decrypts n , and asks for n'≥n rows from the server. The server returns the top- tt' rows in the result; thus, the client needs to be actively involved in simple query processing tasks.
Wang and Lakshmanan (W. H. Wang and L. V. S. Lakshmanan. Efficient secure query evaluation over encrypted XML databases. In VLDB, pages 127-138, 2006) propose a scheme for securely evaluating queries over encrypted XML data. They advocate an extension of OPES, order- preserving encryption with splitting and scaling (OPESS), whereby each plaintext is "split" into more ciphertexts and the split data are "scaled", so that the attacker cannot determine the identity of ciphertexts based on data frequency knowledge. However, this scheme cannot support updates; thus, it is unsuitable for a dynamic and adaptive database system.
Shi et al. (E. Shi, J. Bethencourt, H. T.-H. Chan, D. X. Song, and A. Perrig. Multidimensional range query over encrypted data. In IEEE Symposium on Security and Privacy, pages 350-364, 2007) propose an encryption scheme for answering multidimensional range queries. Yet the problem they address is that of allowing an auditor to decrypt those and only those records (e.g., financial audit logs, medical anamneses, etc.) whose attributes fall within a specified range; privacy is not protected when an entry is matched by the query. Unfortunately, this match-revealing scheme does not allow for protecting attribute values while enabling a service provider to identify records matching queries over encrypted data in a way that can be exploited for indexing.
Boneh and Waters (D. Boneh and B. Waters. Conjunctive, subset, and range queries on encrypted data. In TCC, pages 535-554, 2007) process complex queries over encrypted data based on Hidden Vector Encryption (HVE). The notion of security in this work is stronger than the one in the work of Shi et al.: query-matching records are identified, but their attribute values remain hidden. This match-concealing security notion fits the requirements for indexing while protecting the privacy of attribute values. However, the proposed scheme is not practicable for vast volumes of dynamic data; it necessitates O(DT) public key size, encryption time, and ciphertext size, for D attributes and T discrete values per attribute.
Tu et al. developed MONOMI (S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich. Processing analytical queries over encrypted data. PVLDB, 6(5):289-300, 2013), a system that can
execute analytical queries over encrypted data. Building on CryptDB, MONOMI uses several techniques so as to improve performance, along with a designer that chooses efficient server-side physical designs and a planner that selects efficient query execution plans involving server and client. However, insofar as MONOMI allows for order-based indexing, it does so by relying on OPES. More recently, Li et al. (R. Li, A. X. Liu, A. L. Wang, and B. Bruhadeshwar. Fast range query processing with strong privacy protection for cloud computing. PVLDB, 7(14): 1953— 1964, 2014) proposed a scheme for processing range queries over encrypted data in which inequality checks are conducted via exhaustive equality checks; no mechanism for pure inequality checks over encrypted data is suggested.
Overall, these approaches do not provide a good tradeoff between confidentiality and efficiency; lately, some research efforts have provided tailor-made encryption schemes that allow the processing of specialized queries over encrypted data, such as kNN queries, range search queries, and skyline queries.
An alternative to processing encrypted data directly is to maintain an encrypted index on the server, and rely on the client for traversing this index and locate the data of interest after a few iterations of retrieval and decryption. Damiani et al. (E. Damiani, S. De Capitani di Vimercati, S. Jajodia, S. Paraboschi, and P. Samarati. Balancing confidentiality and efficiency in untrusted relational DBMSs. In ACM CCS, pages 93-102, 2003) build a B-tree over plaintext values, but encrypt every tuple and the B-tree itself at node level; the tree's content is not visible to the server. Ge and Zdonik (T. Ge and S. B. Zdonik. Fast, secure encryption for indexing in a column-oriented DBMS. In ICDE, pages 676-685, 2007) propose Fast Comparison Encryption (FCE). An index traversal with FCE necessitates comparisons between plaintext and ciphertext key values by partial decryption on the client side. Wang et al. (S. Wang, D. Agrawal, and A. El Abbadi. A comprehensive framework for secure query processing on relational data in the cloud. In Secure Data Management, pages 52- 9, 201 1) provide a secure query processing framework that protects data both in storage and at access, based on Salted IDA (Information Dispersal Algorithm). By this scheme, a client maintains a secret information dispersal matrix and the keys for decoding a data matrix D ; the client encodes and disperses D onto n servers, while using a pseudo-random number generator with a secret seed to add a random salt to each data entry; when retrieving data, salts are reconstructed and deducted from the decoded data, so as to recover D . Thus, query processing is directed by the client, while servers can only access data following the client's instructions. Overall, approaches based on an encrypted index do not allow the server to build, maintain, and traverse an
index relying on its own devices.
A third direction involves processing encrypted data in secure hardware, most recently exemplified by Cipherbase (A. Arasu, K. Eguro, M. Joglekar, R. Kaushik, D. Kossmann, and R. Ramamurthy. Transaction processing on confidential data using Cipherbase. In ICDE, pages 435- 446, 2015). Yet, for a range index Cipherbase reveals the full ordering information of index keys, even to a weak adversary; thus, range indexes in Cipherbase provide "similar confidentiality guarantees", and have the same drawbacks as order preserving encryption (OPE).
Adaptive Indexing
Apart from security requirements, data systems for modern applications need to be flexible and agile, easily adapting to rapidly changing requirements. Adaptive indexing is a recently introduced concept, by which index tuning need not be performed during system initialization; instead, it occurs during query processing: each query is interpreted not only as a request for a particular result set, but also as an advice on how to physically store the data, triggering small actions that refine the adaptive indexes. Those data portions that are queried become progressively and incrementally indexed. This capacity to index by continuously reacting to a changing query workload brings forth the property of self-organization.
In this section, database cracking is described in more detail. Database cracking introduced the notion of continuous, incremental, and on-demand adaptive indexing in the context of modern column-stores. With cracking, the select operator of a database system performs index-building operations as a side-effect of processing a range (or equality) filtering action; an index is built and refined collaterally to query execution: the more a data range is queried, the more its index is refined. The physical data store is changing with each incoming range query q, interpreting q as a hint on how data should be stored. That hint may explicitly use q's query bounds, or follow a more lax interpretation for the sake of robustness. Without loss of generality, the strict interpretation is discussed. Assume a query requests A< \ 0. A cracking DBMS responds to q by clustering all tuples of A with A<10 at the beginning of the respective column C , while pushing all other tuples to the column's end. A subsequent query requesting A≥ v, , where v, > 10 , will only need to look into the last piece of C , while a query that requests A < v2 , where v2 < 10 , searches its first part; each subsequent query cracks its respective piece further. Figure 1 shows how two queries crack a column by their selection predicates. Query Q l cuts the column in three pieces, and Q2 enhances this partitioning by cutting the first and the last piece further. Each query collects its qualifying
tuples in a contiguous area. Thereby, the original column A (including positions) is copied into a cracker index column, which is then continuously reorganized.
As queries are being processed, the adaptive index of a column is continuously split into more (and thus smaller) pieces. Therefore, there is a need in a data structure to localize a piece of interest. Past adaptive indexing literature has relied on an in-memory AVL-tree to keep track of all pieces created due to physical reorganization. As a column becomes progressively indexed, queries requesting ranges that are exact matches of values known by the index are answered at the cost of searching only that tree-index; in all cases, even when there is no exact match, the index reduces the amount of data a query has to touch, as each range query reorganizes at most two pieces at the edge of the relevant range.
This on-the-fly physical reorganization is performed on individual pieces (or the whole column for the very first query) by in-place algorithms relying on sequential access patterns. Algorithm 1 shows the most basic cracking algorithm that splits a piece into two, identifying tuples that need to be exchanged by comparison operations. Over the years, numerous algorithms have been proposed that split a piece of a column into three pieces, N pieces using radix clustering, partially sorting pieces in combination with partition/merge operations, fully sorting pieces when touched for the first time, and splitting pieces based on random pivots as opposed to query predicates, yet the core idea remains the same. Figure 1 shows Query Ql , which partitions the whole column into three pieces, followed by Query Q2 that reorganizes Pieces 1 and 3. Other pieces of work accommodate updates, propagate the physical organization from one column to others on demand, and exploit multi-core CPUs.
Algorithm 1 CrackInTwo(c,posL,posH,med,inc)
Physically reorganize the piece of column c between posL and posH such that all values lower than med are in a contiguous space, inc indicates whether med s inclusive or not, e.g., if inc = false then φι is " < " and φ2 is ">= "
1 : X, = point at position posL
2: x2 = point at position posH
3: while position( X, ) < position( X2 ) do
4: if value( Χ ) φχ med then
5: Xj = point at next position
6: else
7: while value( X2) φ2 med &&
position( x2 ) > position( x, ) do
8: X2 = point at previous position
9: exchange^, , x2 )
10: χ = point at next position
1 1 : X2 = point at previous position
Herein, without loss of generality, the basic cracking design is considered. The focus is directed on the action of reorganizing a column into pieces with respect to a breakpoint as part of a query plan itself; thereby, a cracking select operator physically reorganizes the proper pieces of a column to bring all qualifying values in a contiguous area, which is leveraged to return the result, without additional result materialization. Besides, with an AVL tree used for indexing, tree updates and rebalancing actions are also performed during query processing, in logarithmic time. This core operation is common and universally applicable to all adaptive indexing design. The subsequent discussion shows how to apply this basic notion over encrypted data; it shoud be emphasized that the proposed design extends the basic cracking design in a way that does not violate its assumptions about the underlying architecture: it is assumed that data is stored one column-at-a-time in fixed- width dense arrays as in modern column-stores, both on disk and in memory. Query processing may happen in bulk or vector-wise processing. These basic properties apply to all column-stores.
Crucially, adaptive indexing never necessitates fully sorting individual pieces; when a piece becomes small enough to fit in LI cache, the data can be scanned at virtually no overhead. Thus, queries only cause reorganization for data pieces larger than a size threshold; that threshold can be bigger (e.g., L3 cache size) without a significant performance drop. In the context of the present application, it follows that the total data order is never leaked by a fully sorted index, as OPES does by default.
SUMMARY OF THE INVENTION
Therefore, there is a need to devise a lightweight encryption scheme that allows for (i) range query processing over encrypted numeric data outsourced in a cloud or a remote server, and thereby for (ii) incremental, adaptive indexing whereby only data that are queried by trusted clients get indexed.
To solve this problem in the first aspect a method for searching a database is provided. The method comprises encrypting a plurality of data values in the database with a first encryption mode
to obtain encrypted data values; encrypting a bound value with a second encryption mode to obtain an encrypted bound value, wherein the second encryption mode being different from the first encryption mode; and comparing the bound value with the plurality of data values using the encrypted bound values and the encrypted data values.
In another embodiment of the method values encrypted with the first encryption mode are not comparable to each other.
In a further embodiment of the method comparing a bound value with the plurality of data values using the encrypted bound values and the encrypted data values comprises comparing a bound value with the plurality of data values using an operation over the encrypted bound values and the encrypted data values.
In a still further embodiment of the method one or more of the encrypted data values are key values used for indexing the encrypted data values, wherein comparing the bound value with the plurality of data values comprises comparing the bound value with the plurality of data values using the encrypted bound values, the key values and the encrypted data values
In another embodiment of the method encrypting a plurality of data values in the database comprises transforming each of the plurality of data values into a value vector; encrypting a bound value comprises transforming the bound value into a bound vector; and comparing the bound value with the plurality of data values comprises obtaining scalar products of the value vectors and the bound vector.
In a further embodiment of the method each of the value vectors comprises a noisy value subvector; the bound vector comprises a noisy bound subvector; wherein the noisy bound subvector is orthogonal to each of the noisy value subvectors.
In a still further embodiment the method comprises selecting a random positive multiplier for each of the value vectors; and scaling each of the value vector by the respective selected multiplier.
In another embodiment the method comprises selecting a first positive random multiplier and a second positive multiplier for each of the value vectors; scaling each of the noisy value subvector by the respective first random multiplier; and scaling the rest of each of the value vectors by the respective second positive random multiplier.
In a further embodiment the method comprises selecting an invertible matrix; wherein transforming each of the data values into a value vector comprises transforming the data value into a column value vector and multiplying one of the inverse of the matrix and the transpose of the matrix by the column value vector; and transforming the bound value into a bound vector comprises
transforming the bound value into a column bound vector and multiplying the other of the inverse of the matrix and the transpose of the matrix by the column bound vector.
In the second aspect of the disclosure a method for indexing data values is provided. The method comprises encrypting a plurality of data values with a first encryption mode to obtain encrypted data values; encrypting one or more bound values with the first encryption mode to obtain first encrypted bound values and with a second encryption mode to obtain second encrypted bound values, wherein the second encryption mode being different from the first encryption mode; comparing the one or more bound values with the plurality of data values using the second encrypted bound values and the encrypted data values; and indexing encrypted data values based on said comparison using the first encrypted bound values as key values.
In another embodiment of the method values encrypted with the first encryption mode are not comparable to each other; and values encrypted with the second encryption mode are not comparable to each other.
In a further embodiment the method comprises encrypting another bound value with the first encryption mode to obtain another first encrypted bound value and with a second encryption mode to obtain another second encrypted bound value; comparing the another bound value with the plurality of data values using the another second encrypted bound value, the key values, and the encrypted data values; adding the another first encrypted bound value as another key value based on said comparison.
In a still further embodiment of the method indexing encrypted data values comprising constructing a hierarchical index structure, wherein nodes in the hierarchical index structure being the key values.
In another embodiment the hierarchical index structure is a binary search tree.
In a further embodiment of the method the binary search tree is an AVL tree.
In a still further embodiment the method comprises adding ambiguity to at least one of the data values, so that a plurality of interpretations for each of the at least one of the data value are possible, wherein a single interpretation of the plurality of interpretations being true.
In another embodiment the method comprises adding ambiguity to at least one of the one or more bound values, so that a plurality of interpretations for the at least one of the one or more bound values are possible, wherein a single interpretation of the plurality of interpretations being true.
In a further embodiment of the method encrypting the plurality of data values with a first encryption mode comprises transforming the data values into value vectors; encrypting the one or
more bound values with the first encryption mode comprises transforming the bound values into first bound vectors; encrypting the one or more bound values with the second encryption mode comprises transforming the bound values into second bound vectors; and comparing the one or more bound values with the plurality of data values comprises obtaining scalar products of the value vectors and the second bound vectors.
In the third aspect of the disclosure a server for indexing data values is provided/ The server comprises a storage means configured to store encrypted data values, the encrypted data values being a plurality of data values encrypted with a first encryption mode; a receiving unit configured to receive, from a client, first encrypted bound values and second encrypted bound values, the first encrypted bound values being one or more bound values encrypted with the first encryption mode and the second encrypted bound values being the one or more bound values encrypted with a second encryption mode, wherein the second encryption mode being different from the first encryption mode; and a processing unit configured to compare the one or more bound values with the plurality of data values using the second encrypted bound values and the encrypted data values; and index encrypted data values based on said comparison using the first encrypted bound values as key values.
In another embodiment of the server the encrypted data values are not comparable to each other.
In a further embodiment of the server the receiving unit is further configured to receive another first encrypted bound value and another second encrypted value, the another first encrypted bound value is another bound value encrypted with the first encryption mode and the another second encrypted value is the another bound value encrypted with the second encryption mode; and the processing unit is further configured to compare the another bound value with the plurality of data values using the another second encrypted bound value, the key values and the encrypted data values; and add the another first encrypted bound value as another key value based on said comparison, wherein the server further comprises a sending unit configured to send the result of said comparison to the client.
In the fourth aspect of the disclosure a client for receiving a search result is provided. The client comprises a sending unit configured to send a second encrypted bound value to a server, the second encrypted bound value is a bound value encrypted with a second encryption mode, wherein the server comprises encrypted data values, the encrypted data values being a plurality of data values encrypted with a first encryption mode; a receiving unit configured to receive, from the
server, a result of comparison between the bound value and the plurality of data values using the second encrypted bound value and the encrypted data values, the result comprising a set of the encrypted data values; and a processing unit configured to decrypt the set of the encrypted data values.
The another embodiment of the client the sending unit is further configured to send a first encrypted bound value to the server, the first encrypted bound value being the bound value encrypted with the first encryption mode.
In a further embodiment the processing unit is configured to encrypt the bound value with the second encryption mode to obtain the second encrypted bound value and encrypt the bound value with the first encryption mode to obtain the first encrypted bound value.
In a still further embodiment of the client at least one of the plurality of data values have ambiguity, so that a plurality of interpretations for each of the at least some of the plurality of data value are possible, wherein a single interpretation of the plurality of interpretations being true; wherein the processor is configured to extract true interpretations of data values from the set of encrypted data values.
BRIEF DESCRIPTION OF DRAWINGS
Fig. 1 shows cracking a column of values;
Fig. 2 shows an encrypted vector;
Fig. 3 shows an encrypted vector with ambiguity;
Fig. 4 show how a piece can be found in an encrypted AVL Tree;
Fig. 5 shows how a node can be added in an encrypted AVL tree.
DETAILED DESCRIPTION PROPOSED ENCRYPTION SCHEME
The invention is directed to devise an encryption scheme that provides protection against attacks that aim to compromise data confidentiality, yet enables self-organizing indexing operations over encrypted numeric data, i.e. an indexable encryption scheme. However, unlike OPES, the proposed encryption scheme does not preserve the order of numerical data, so that it enables efficient querying without leaking information about the order of tuples. An adversary can be either an external malicious entity or an honest-but-curious server. Insider attacks, such as those arising from malicious partners, are not considered. Client machines are assumed to be safe; confidential
information such as secret keys on the client and client queries are not known to attackers. Attacker's computations are bounded by polynomial-size circuits. Sych an encryption scheme should satisfy the following requirements.
1. It should provide the server with some capacity to conduct inequality comparisons between data values (plaintexts) employing their encrypted forms (ciphertexts).
2. This capacity to perform comparisons should not directly reveal the order among encrypted numeric data.
3. Comparison operations should not necessitate decryption at the server; attribute values should always remain hidden to the server.
4. The key sizes, encryption time, and ciphertext sizes should be manageable and not growing with the size of data, number of attributes, or number of discrete values.
5. The client should not be elaborately involved in processing simple queries; the server should deliver the encrypted results, and only those, to the client in a single round of communication.
6. The server should be capable to gracefully accommodate newly arriving data values and support updates in the encrypted data.
A prime challenge is to resolve the tension between requirements (1) and (2), i.e., enable inequality comparisons without revealing order. A way to achieve this result is to postulate that comparisons should not be possible among encrypted data residing at the server. If such data are not comparable to each other, then they cannot be straightforwardly brought to sorted order. Then it is necessary spell out under what circumstances a server should be able to perform an inequality check among ciphertexts. Such comparisons should be possible only on demand, between ciphertexts (e.g., a query bound and a tuple value) rendered comparable by their encryption; in effect, indexing actions would also be possible only on client demand. Such on-demand indexing, would be compatible with adaptive indexing: If one has to perform on-demand indexing for the sake of security, one also needs to perform exactly such actions for the sake of adaptiveness, as the data has to be indexed in response to queries. Then, instead of being in tension with each other, the requirements for security and adaptivity reinforce each other.
Then is is necessary to devise a method of creating comparable ciphertexts. Such a method could utilize two complementary encryption modes, A and B, interfacing with each other, so that a
ciphertext encrypted in mode A can be compared to one encrypted in mode B. Then, encrypting attribute values in mode A and query bounds in mode B offers the server a capacity to compare the former to the latter; as can be seen, such comparisons form the backbone of adaptive indexing operations, and are performed with operators φ and φ2 in Algorithm 1. Assume a server needs to perform an inequality check between key value v in a column C and query bound value b , i.e., to check whether v > b . Equivalently, the server needs to check whether: v -b≥0 (1)
The objective is to devise an encryption scheme that allows the computation of such scalar vector products in obscured fashion. In this scheme three obscurement operations are employed that are to be performed by the data vendor when uploading the data to the server and by a trusted client before issuing a query: (i) noise addition; (ii) scalar multiplication, and (iii) matrix multiplication. The details of these operations are given in the following.
Noise Addition
The noise addition operation builds two longer vectors b , v, by adding extra noisy elements to the length- 2 vectors used in Inequality (2), such that the result of the vector product remains the same, b and v are referred the encrypted bound and value vector, respectively. The specific length i of vectors b and v , and the positioning and distinction between the added noisy contents and original value contents therein constitute part of the encryption key, known to the data owner and trusted client, but not to the service provider. Without loss of generality, an example using vectors of length 5 is presented. Inequality (2) can be rewritten as follows:
In this example, the original contents of the vectors occupy positions 1 and 3 in them, whereas positions {0,2,4} are reserved for noisy contents that cancel each other out in the scalar vector product, as they form two orthogonal length- 3 embedded subvectors, nh and nv , with inner product 0 :
Herein these orthogonal subvectors are called noisy subvectors of b and v . It should be emphasized that the same exact noisy subvectors need not be used for each encrypted key value v or each encrypted bound value b , yet the same positions have to be used for noisy contents across all encrypted values in the same column. Any pair of vectors orthogonal to each other can be used, such that their inner product produces 0 . Nevertheless, a vector b produced for a given bound value b should be orthogonal to all vectors v produced for attribute values. Therefore, the orthogonal vector pairs cannot be chosen in an entirely arbitrary fashion. However, it suffices to select a unit vector u , and make sure that each bound value b is obscured by embedding into the column vector a randomly selected noise vector nb collinear to u , nh = l( >) - u , with the values 1 and b occupying specific preselected positions, as shown above, to produce b . Then, any data value v is obscured by similarly embedding a randomly selected vector nv = uJ" (v) , orthogonal to u , into the column vector ( v ) , to produce v . It should be emphasized that the vectors ux(v) do not need to be collinear to each other. Any vector orthogonal to u will suffice. Summing up, given a unit vector u, the noisy subvectors embedded in b should be collinear to u , while those embedded in v should be orthogonal to u . Then an inequality check can be performed by means of a scalar vector product, as
v - b is calculated as b · v .
of b is produced by multiplying u by a randomly selected factor (b) . In above case, λφ) = 4Λ/6 . On the other hand, a noisy subvector of v is produced by randomly selecting any vector of length £ - 2, uL(v) , orthogonal to u .
Scalar Multiplication
The encryption scheme devised so far enables the calculation of the difference v-b by the server, via the calculation of a scalar product of two vectors. This scheme solves the objective of performing inequality checks, hence processing range queries, over the encrypted numeric data, while The encryption scheme does not preserve the order of the data in the way that OPES does. However, given an encrypted bound vector b, an adversary can calculate any scalar product of the form b v , hence the difference v-b , for any encrypted value vector v ; then, by comparing the obtained v-b values, one obtains not only the order, but also the exact differences among the encrypted data values.
To prevent this kind of leakage, it is preferable to avoid the calculation of exact differences of the form v-b . After all, It is not necessary to obtain the correct norms of those differences. Getting merely their sign would suffice. The norm of | v - b | may be altered, as long as the sign is correctly obtained. This effect can be achieved by adding a scalar multiplication in the obscurement operation. Then, for each data value v , the data vendor chooses a random positive multiplier ξ(ν) > 0 , and recalculates the encrypted value vector v as v'= ξ(ν)ν . Henceforward v is referred to this recalculated value vector. In effect, the scalar product becomes b v = ξ(ν)(ν - b) . The server still obtains the correct sign of the difference v-b , without the exact norm of this difference being leaked. In effect, an adversary cannot reconstruct the order of data values using information obtained from scalar products.
Rather than using a common scaling multiplier for both encrypted values and noisy components, spawned by some pseudorandom functions ξ(χ), we can utilize two separate
pseudorandom functions, ξ](ν) and ξ2(ν). This way, noisy and determined components can be scaled independently, but are likely to distributed in the same manner.
Matrix Multiplication
The noise addition and scalar multiplication operations produce a vector v representing each key value v and a vector b representing a query bound value b , so that the sign, but not the norm, of v- b can be obtained from the scalar product b v . Still, the security these operations provide cannot withstand a simple breach: an adversary who learns the position within a vector v where the values ξ(ν)ν and - ξ(ν) reside, can effectively acquire all original key values. Therefore, an additional encryption layer is provided to obstruct this breach.
Assume v and b are vectors of length I , and let M be any invertible t t matrix, and M' its inverse. M constitutes part of the encryption key; it is known to the data vendor and its trusted clients, but not to the service provider. Any key value v and any breakpoint b can be encrypted as follows.
Ev(v) = - (5) Eb(b) = MT (6) where Εν(·) , ΕΛ(·) are encryption functions, b and v are represented as column vectors, and r is the transpose of matrix M . Then:
Ε,(ό) · Εν(ν) = Ε6(ό)ΓΕν(ν)
= ( rb)r( -'v)
= (br )( -'v)
= bτv = b■v = ξ(v)(v - b) where row vector aT is the transpose of column vector a .
As it should be obvious for those skilled in the art, the same result can be obtained when Ev(v) = MTv and E„ b) = M'lb .
In effect, any inequality check can be effectively conducted using the encrypted vectors
Ev (v) and Eb (b) . Ev (v) is generated by the data vendor and passed on to the server when the data (or an update) is generated. Eb (b) is produced by the trusted client, or again by the data vendor on behalf of the client, when issuing a query. These two encryption (and reverse decryption) steps are the only workload data vendor and client have to bear. Other computations and query processing are conducted by the server as with a non-encrypted database. At the same time, given those encrypted vectors, an adversary cannot determine the values of v or b without knowing the invertible matrix M .
The Encryption Key
In a nutshell, the total encryption key consists of:
1. The unit vector u .
2. The arbitrary orientation of each vector u1 ^) , orthogonal to u , which is individually selected to construct v for a data value .
3. Each factor A(b) used to produce a noisy vector collinear to u , in order to construct b for a query bound b .
4. Each random positive multiplier ξ(ν) used to obscure the norm of v -b .
5. The positions occupied by the contents of (^)v }) ar>d b) m v ar>d b , respectively.
6. The invertible matrix M .
Resistance to Attacks
A sketch of the capacity of this lightweight encryption scheme to withstand attacks will be provided now, assuming an honest but curious adversary, aware of the internal workings of the encryption scheme, who aims to learn the key from known ciphertexts and potentially known plaintexts. As the encryption scheme consists of three layers, the further discussion is separated in two parts.
Noise and Scalar Multiplication Layers Leaving the matrix multiplication part aside, assume an adversary, Alice, who directly observes noisy vectors before they are multiplied by M . In such a known ciphertext attack, Alice would only be challenged to sort out the noisy contents from the value contents of those vectors. She can use the information that the noisy elements of an
encrypted value vector v and an encrypted breakpoint vector b are orthogonal to each other, and make a hypothesis about the positions of the noisy contents vs. value contents in the observed vectors, i.e. arbitrarily select 2 positions out of £ in those vectors where the assumed value contents reside. To test this hypothesis, she can examine whether the assumed noisy subvectors produce consistently inner product 0 across many different observed {b, v} pairs. If that is the case, then she can safely conclude that her hypothesis is verified. The question is how much Alice would have to try in order to reach a correct hypothesis by exhaustive search. Her hypothesis amounts to selecting
£(£ - \ )
2 out of £ vector elements. As there are = -^— - = 0(l2 ) ways of making such a selection,
Alice can arrive at the correct hypothesis in polynomial time. As indicated above, the noise layer of the encryption scheme is easy to break. This is why the matrix multiplication layer has been added, which will be discussed below.
Matrix Multiplication Layer Once the noisy vectors Alice observes get multiplied by M , she cannot carry out the above attack. Now the multiplication matrix M enters the picture, consisting of I1 unknowns. The unit vector u brings 1 - 2 more unknowns into the picture; each encrypted vector v brings £ - 3 more unknowns, as only one out of its £ - 2 noisy elements can be derived using the £ - 3 others and orthogonality to u , and the ξ(ν) factor, which has multiplied {v,-l } to produce its noisy contents; each encrypted vector b brings a A(b) factor, which multiplies u to produce its noisy contents. An attack that could still bear fruit is a known plaintext attack, in case Alice could gain access to pairs of an original value (either v or b ) and its encrypted form ( Ev (v) or Eh (b) , respectively). For each such pair, she can construct I scalar linear equations.
£2 + £ - 2
Eventually, it can be shown that it suffices for Alice to know N >— -— j— + l = 0( ) plaintext- ciphertext pairs of values b ; she can then solve the resulting system of equations for each of
£(£ - \ )
— - ways of positioning noisy contents in v and b , and, again, identify the one that leads to a solution consistently resulting in orthogonal noisy vector products. The security of the proposed encryption scheme against known plaintext attacks strongly depends on the chosen ciphertext size £ , while the proposed scheme provides the data owner with the flexibility to choose the value of £ at will.
INDEXING ENCRYPTED DATA
It will be discussed below how the server can perform adaptive indexing over data encrypted by the encryption scheme. Notably, the choice of I in the scheme determines the dimensions of matrix M , and incurs a I -fold increase in the storage requirements. This storage overhead is an affordable price to pay for the sake for performing secure adaptive indexing operations.
The Problem of Leakage by Structure
It has been emphasized so far that an order-preserving encryption scheme is avoided, so that the order of values does not leak to adversaries. Nevertheless, adaptive indexing operations do progressively bring the underlying data column in sorted order. After all, database cracking can be validly described as an incremental quichort, while another alternative for adaptive indexing, adaptive merging, can be seen as an incremental external merge sort. An adaptive indexing mechanism, left to its own devices, will tend to bring the data in sorted order. Given a workload W such that, for any pair of values in the column v, , v2 , there exists at least one query bound b e {v, , v2 } , adaptive indexing with W will eventually bring the data in an exact sorted order.
In effect, even though the encryption does not intrinsically betray the order of values, that order can eventually be observed in the order records are brought to after enough cracking operations have been applied. An adversary who can identify a target tuple can infer its position in the sorted order by observing the structure of the constructed index tree. The more refined the tree becomes, the more information it can leak about the order of underlying tuples.
Deliberately Allowing Errors
To mitigate the problem of leaking order by structure, erroneous interpretations of outsourced data are introduced, while the server cannot distinguish which interpretation is valid. Thus, even an adversary (or a server) knowing the internal workings of the enryption scheme will not be able to make confident inferences about the relative positions of tuples.
Herein two possible ways of interpreting each encrypted data value are defined, without leaking which one of the two is correct for each tuple / . The server merely takes actions on both, by maintaining both possible interpretations of each record whenever needed. Only one of those versions corresponds to the true record, yet only the data owner and its trusted clients can make this distinction. From the point of view of the server, each interpretation is equally likely to be valid. In other words, the sever is concurrently managing multiple possible worlds. Such multiple realities are configured by making each Ev(v) vector longer by a value Θ, arbitrarily adding Θ as a prefix or as
a suffix to it, to obtain Ev (v) ; Θ is selected so that both the I? -prefix and ^ -suffix of Ev (v) work consistently as valid Ev (v) vectors interfacing with EA ( >) vectors: in both, after multiplying by matrix , the contents obtained in the positions reserved for noisy contents in a v vector are orthogonal to u .
To facilitate the following discussion, a number of matrices are introduced that are useful so as to formally represent the creation of vectors and noise addition via linear-algebra operations. These matrices, and the purposes they serve, are gathered together in Table 1.
Table 1. Employed matrices
The value of Θ may be calculated as follows. The calculations are starting out from (v;-l)r , the noisy subvector nv oc u1 , and (\ ;b)' , the noisy subvector nb = (b) - u , where u is the secret unit vector and u1 a vector orthogonal to u . Then, for the purposes of the encryption scheme, it is first necessary to formally describe the expanding of these vectors into vectors of size I . To that end, two nx m expansion matrices are emloyed that are denoted as Enm such matrices multiply vectors from left and thereby extend those vectors with n - m zeros. The structure of an Enm matrix can be represented as , where / is the identity matrix. Then it holds that:
As a next step, it is necessary to shuffle the contents of those expanded vectors so as to bring
their contents in the positions predicted by the encryption scheme. To that end, two nx n permutation matrices are employed that are denoted as Pnm and Pn L m ; these matrices shuffle the extended value/bound and noise vectors, respectively, according to the noise addition scheme; only their first m rows have nonzero contents, hence it holds that P„m(P„m) = ; P° is chosen so
0 0
tnat L(Or = "' > e- Pnm anc* PL not nave permutation intersections: the one shuffles a vector's elements into positions complementary to those of the other. Last, let ξ(ν) and A(b) be the "scaling" functions employed in the encryption scheme, as defined in Section "The Encryption Key." Thus, scaled noisy value and bound vectors can be represented as:
The final encrypted noisy value vector is denoted as Ev = M , where M is the encryptio x. The described error-allowance scheme creates one of the following two vectors: Without loss of generality, consider only the first variant. In this first variant, it is
necessary to represent the fake encrypted noisy value vector E^ , i.e., the t -suffix of Ev (v) , which can be considered as true by an adversary. To this end, an £ x £ shift matrix is employed that is denoted as 5 ; multiplication by this matrix cyclically shifts vector components down. For example, for n = 3 : S■ Symmetrically, its transpose, ST , shifts vector components up. Thereby:
E{ = STEv + (0 - Ev - e,)et
where ek is a unit vector with all components zero except the k'h ; the above equation simply retains the (ί - 1) -prefix of Ev , shifted one position upwards, while it places Θ at the i -th position.
This E^ is to have all properties of a real encrypted value vector. Then, the "noisy" / - 2 x l
subvector of the respective fake value vector \f = ME{ must be orthogonal to nA :
[Ev¾ + (^ - Ev -e e^ j ¾_2)¾M) . u = 0 o E¾^ - u
0 = Ev -e, where W = Mr P^f_2)Et(f_2) .
Fig. 2 shows the form of an encrypted vector Ev , while Fig. 3 shows the encrypted vector Ev(v) , with ambiguity added. The size of the ambiguity-added vector is one more than original encrypted vector, hence each original value acquires two representations in the database: the one represented by the real encrypted vector, as the I -prefix in the figure, and another represented by the fake vector, as the ^ -suffix in the figure. Both are derived from the ambiguity-added vector as shown in Fig. 3.
Having added this ambiguity about the identity of the real encrypted vector, it shuld be rendered that identity identifiable by the trusted client. This can be achieved by selecting the (previously declared random) multiplier ξ{ν) in a particular way, e.g., postulating that it be an odd integer. The data owner and trusted client can then determine the real Ev(v) given Ev(v) as follows: they decrypt both the ^ -prefix and £ -suffix of Ev(v) , multiplying each by matrix M . The one that delivers an odd integer in the position of ξ(ν) is the real one. The fact that only one decryption attempt delivers an odd integer at that position is verified by the data owner during encryption.
However, what the client can determine so easily is inaccessible to the server. Under this arrangement, the server generates and manages each possible interpretation of Ev(v) separately; this redundancy provides an obscurity that reduces the confidence of an adversary's inferences about the value order. The end result is similar to adding counterfeit records in the database. However, the security provided by he proposed scheme is higher than that provided by mere counterfeit insertion. In the case of adding counterfeits, an adversary with sufficient background knowledge may identify a single record of interest and infer information-leaking observations about its position in the index. On the other hand, with the proposed scheme, the position of a record of interest in the index is
uncertain even when that record of interest is identified, since each single record spawns two possible interpretations; this state of affairs confers additional security to the proposed construction.
Yet the proposed approach for ambiguous data generation has several flaws: prefix/suffix values Θ are generated deterministically from the respective encrypted values, introducing additional leakage, while their domain is skewed away from the domain of encrypted real vector components, thus, Θ can be easily distinguished. In another embodiment another approach is proposed, which may not provide any compression possibility, but provides security guarantees:
1. During initialization, depending on the "real" data distribution Rv (obtained either from outsourced data by sampling or empirically), a "fake" data distribution Fv is generated by the client and becomes a part of the secret key.
2. Whenever a client updates the database, they encrypt each "real" value v along with a "fake" value i picked from the Fv and send them both in random order to the server.
This way, the distribution of fake values can be controlled directly and is less deterministic.
Ambiguous query generation
The ambiguous "fake" data values are introduced in order to make adversary less certain about data distribution under FA along with adding exponential complexity in the case of Known Plaintext Attack (KPA) over data (see security analysis section 5). However, it is necessary to use ambiguity when issuing queries as well, since query bound values reveal search patterns and are subject to KPA too. To achieve that effect, the following approach is proposed:
1. During initialization, depending on the "real" data distribution Rv, a "fake" query distribution Fb is generated by the client and becomes a part of the secret key.
2. Whenever a client issues a query, they encrypt each "real" range query bi, bR along with some "fake" range query b[ , b picked from Fb and send them both in random order.
3. The server cannot distinguish which query is "real"; it processes both queries and sends the results in the same order.
4. The client drops the "fake" part of the results and filters false positives.
This way, even if an adversary has access to an oracle that produces ciphertexts of given query bounds, there is still an exponential complexity for KPA and uncertainty about data distribution under FA, with a manageable overhead of doubled communication complexity and doubled index size.
Furthermore, this ambiguous query generation tackles a separate problem in the context of
adaptive indexing: naive cracking schemes perform indexing based blindly on the incoming query bounds; thus, they leave the question of workload-robustness aside, and are vulnerable to substandard performance in case of queries selecting worst-case pivots, just like quicksort is vulnerable to quadratic worst-case complexity. Stochastic cracking [16] proposes adding randomness in the selection of cracking pivot points, thereby achieving robustness against arbitrary workloads. In our setup, this randomness is effectively moved from the server side to the client side in the form of ambiguous query generation: the introduced ambiguity serves not only a security objective, but also a workload-robustness objective with one stroke.
Additionally, in order for the client to filter false-positives it was suggested to make scaling term to be integer for "real" values and fractional for "fake" values. As there is no need for the client to filter false positives from the issued queries themselves, we propose the scaling factor to be fractional
or both "real" and "fake" queries, thus, making query bound encryption a one-way function. Managing an Encrypted AVL Tree
The proposed encryption scheme enables a server to manage a column of encrypted values, even with ambiguity inserted, and conduct comparisons between query bounds and data values. Yet, the proposed system should also construct an AVL tree; in particular, query bounds, which correspond to the breakpoints b , are used as key values in tree nodes and utilized in subsequent tree traversals. During such a traversal, a new bound value b' has to be compared to a previous one, b , now used as key; then, at each node two breakpoints values, b' and b , is to be compared to each other.
In order to adapt this tree traversal mechanism to the encrypted data, the server needs to be able to compare two breakpoint values, b' and b , to each other. However, the proposed encryption scheme uses two different encryption modes, one for breakpoints b , and one for values v , constructed so as to allow comparing b to v only, but not different b and v values to each other. This problem is solved by having the query-issuing client encrypt a breakpoint b in both ways, i.e., in its native way, as Eh (b) , and as an attribute value, EV( >) , and pass them on to the server.
Subsequently, the former value is used for inequality checks with attribute values, while the latter is employed for storing it as a key in the constructed AVL tree index. When searching the encrypted AVL tree for a new breakpoint value b' , the required comparison is conducted by calculating
ΕΛ0') · Εν(6) = £(6) · (δ -ό') .
Consider the example encrypted AVL tree in Fig 4. Finding a piece in this tree utilizes a query bound b encrypted as Eh (b) , and key values in the tree itself encrypted as EV( >) ; thus, the query bound and key values interface with each other and allow for conducting comparisons, resulting in the traversal shown in the figure.
Algorithm findpiece(obj AVLTree, N , Eh , posL , posH )
Traverses AVL Tree and returns upper bound ( posH ) and lower bound ( posL ) values of the piece in which query bound vector Eft is located.
1 : posL = 0
2: posH = N
3: isFound = true
4: rootNode = objAVLTree.getRoot()
5: if rootNode is not empty then
6: minNode = objAVLTree.findMinNode(root)
7: maxNode = objAVLTree.find axNode(root)
8: fNode = obj AVLTree. findNode( Eh )
9: if ScalarProduct( Eb ,maxNode.key) > 0 then
10: isFound = false
* *Case 1 * *
1 1 : if isFound == false && fNode is not minNode then
12: posL = maxNode. position
**Case 2**
13: else if fNode == minNode then
14: if ScalarProduct(Eb,fNode.key) < 0 then
15: posH = fNode. position
16: else
17: posL = fNode.position
18: fNode = objAVLTree.findSuccessor(fNode)
19: if ScalarProduct(fNode.key,maxNode.key) < 0 then
20: posH = fNode.position
**Case 3 **
21 : else if ScalarProduct(Eb,fNode.key) < 0 then
22: posH = fNode.position
23: posL = objAVLTree.findPredecessor(fNode).position
**Case 4**
24: else
25 posL = fNode. position
26: fNode = objAVLTree.findSuccessor(fNode)
27: if ScalarProduct(Eb,fNode.key) < 0 then
28 posH = fNode.position
Algorithm f indpiece illustrates the finding of the crack positions for a query bound b ; it makes use of largest and smallest values in the AVL tree in order to determine whether the query bound is out of the tree's range; otherwise it obtains the leaf node flVode by a search operation on the tree based on the given query bound Eft ; overall, it distinguishes the following cases:
• Case 1 holds in case b is greater than the largest value in the tree; then the largest value in the tree is returned as the lower bound of the piece range.
• Case 2 holds when the returned fNode is equal to the smallest value in the tree; then, if the query bound b is greater than that value, the range from that value to its successor is returned; if the query bound is smaller, then the smallest value is returned as the upper bound of the piece range.
• Case 3 holds in case b is less than fNode; then the range between fNode and its predecessor is returned.
• Case 4 holds when b is greater than fNode; then a range between fNode and its successor, if such exists, is returned.
In all cases, comparisons are performed via scalar vector products according to the proposed scheme.
Having located a piece where a query bound b belongs, it is necessary to complete the cracking operation by adding the bound b , represented by both JLv(b) and E¾(£) , at a leaf node position in the AVL tree itself (and possibly rebalance the tree), so as to facilitate future searches. Fig. 5 presents an AVL-tree traversal along with the addition of a leaf node for bs at the last step, after the piece in which b belongs has been cracked in two. Algorithm addCrack illustrates how this operation is performed; its first three cases examine the situation where a node for b already exists in the tree; the fourth case adds a new node.
Algorithm addCrack(obj AVLTree, N , Ev , Eb , pos )
Adds a new node, for query bound b , to the AVL tree indexing N values, corresponding to position pos in the data
array.
1 : if pos = 0 or pos≥ N then return
2: rootNode = objAVLTree.getRootQ
3 : isFound = true
5 : if rootNode is not empty then
6: minNode = objAVLTree.findMinNode(root)
7: maxNode = objAVLTree.findMaxNode(root)
8 : fNode = obj AVLTree.findNode( Eb )
9: if ScalarProduct( Eh ,maxNode.key) > 0 then
10: isFound = false
**Casel **
1 1 : if isFound == true then
12: if fNode.position == pos then return
12: tmp = fNode
13: if ScalarProduct(tmp.key, Eb ) == 0 then
14: tmp = objAVLTree.findSuccessor(tmp)
15: if ScalarProduct(tmp.key, Eh ) > 0 then
16: if tmp. position == pos then return
* *Case2 * *
1 : if fNode is not minNode then
18: if isFound == false then fNode = maxNode
19: else fNode = objAVLTree.findPredecessor(fNode)
20: if fNode.position == pos then return
**Case3 **
21 : if isFound = true && ScalarProduct(fNode.key, Eb ) == 0 then
22: fNode. setPosition(pos)
23: return
**Case4**
24: objAVLTree.insert( Ev ,pos, Eh )
25: return
Conclusion
Herein a novel, lightweight, linear-algebra-based encryption scheme is presented, this scheme allows for (i) range query processing over encrypted numeric data outsourced in the cloud, and thereby for (ii) incremental, adaptive indexing whereby only data that are queried by trusted
clients get indexed. The scheme represents numerical values as short vectors, and relies on simple linear-algebra operations for encryption and decryption; it allows neither the actual data values nor their order to be disclosed. While the structure of the index may reveal order in the long-term, this only happens after crucial indexing operations have been performed; furthermore, an additional obfuscation component is proposed in the scheme, this component deliberately introduces ambiguity in the proposed construction by allowing two variant interpretations of each encrypted value vector. The scheme assures the security needed in time-critical operations such as high-frequency trading and financial transaction processing over the cloud.
Claims
1. A method for searching a database, comprising
encrypting a plurality of data values in the database with a first encryption mode to obtain encrypted data values;
encrypting a bound value with a second encryption mode to obtain an encrypted bound value, wherein the second encryption mode being different from the first encryption mode; and comparing the bound value with the plurality of data values using the encrypted bound values and the encrypted data values.
2. The method according to claim 1 , wherein
values encrypted with the first encryption mode are not comparable to each other.
3. The method according to claim 1 , wherein
comparing a bound value with the plurality of data values using the encrypted bound values and the encrypted data values comprises comparing a bound value with the plurality of data values using an operation over the encrypted bound values and the encrypted data values.
4. A method according to claim 1 , wherein
one or more of the encrypted data values are key values used for indexing the encrypted data values,
wherein comparing the bound value with the plurality of data values comprises comparing the bound value with the plurality of data values using the encrypted bound values, the key values and the encrypted data values
5. The method according to claim 1 , wherein
encrypting a plurality of data values in the database comprises transforming each of the plurality of data values into a value vector;
encrypting a bound value comprises transforming the bound value into a bound vector; and comparing the bound value with the plurality of data values comprises obtaining scalar products of the value vectors and the bound vector.
6. The method according to claim 5, wherein:
each of the value vectors comprises a noisy value subvector;
the bound vector comprises a noisy bound subvector;
wherein the noisy bound subvector is orthogonal to each of the noisy value subvectors.
7. The method according to claim 5, further comprising:
selecting a random positive multiplier for each of the value vectors; and
scaling each of the value vector by the respective selected multiplier.
8. The method according to claim 6, wherein
selecting a first positive random multiplier and a second positive multiplier for each of the value vectors;
scaling each of the noisy value subvector by the respective first random multiplier; and scaling the rest of each of the value vectors by the respective second positive random multiplier.
9. The method according to claim 5, further comprising:
selecting an invertible matrix; wherein
transforming each of the data values into a value vector comprises transforming the data value into a column value vector and multiplying one of the inverse of the matrix and the transpose of the matrix by the column value vector; and
transforming the bound value into a bound vector comprises transforming the bound value into a column bound vector and multiplying the other of the inverse of the matrix and the transpose of the matrix by the column bound vector.
10. A method for indexing data values comprising:
encrypting a plurality of data values with a first encryption mode to obtain encrypted data values;
encrypting one or more bound values with the first encryption mode to obtain first encrypted bound values and with a second encryption mode to obtain second encrypted bound values, wherein the second encryption mode being different from the first encryption mode;
comparing the one or more bound values with the plurality of data values using the second encrypted bound values and the encrypted data values; and
indexing encrypted data values based on said comparison using the first encrypted bound values as key values.
1 1. The method according to claim 10, wherein
values encrypted with the first encryption mode are not comparable to each other; and values encrypted with the second encryption mode are not comparable to each other.
12. The method according to claim 10, further comprising:
encrypting another bound value with the first encryption mode to obtain another first encrypted bound value and with a second encryption mode to obtain another second encrypted
bound value;
comparing the another bound value with the plurality of data values using the another second encrypted bound value, the key values, and the encrypted data values;
adding the another first encrypted bound value as another key value based on said comparison.
13. The method according to claim 10, wherein
indexing encrypted data values comprising constructing a hierarchical index structure, wherein nodes in the hierarchical index structure being the key values.
14. The method according to claim 13, wherein
a hierarchical index structure is a binary search tree.
15. The method according to claim 14, wherein
the binary search tree is an AVL tree.
16. The method according to claim 10, further comprising:
adding ambiguity to at least one of the data values, so that a plurality of interpretations for each of the at least one of the data value are possible, wherein a single interpretation of the plurality of interpretations being true.
17. The method according to claim 10, further comprising:
adding ambiguity to at least one of the one or more bound values, so that a plurality of interpretations for the at least one of the one or more bound values are possible, wherein a single interpretation of the plurality of interpretations being true.
18. The method according to claim 10, wherein
encrypting the plurality of data values with a first encryption mode comprises transforming the data values into value vectors;
encrypting the one or more bound values with the first encryption mode comprises transforming the bound values into first bound vectors;
encrypting the one or more bound values with the second encryption mode comprises transforming the bound values into second bound vectors; and
comparing the one or more bound values with the plurality of data values comprises obtaining scalar products of the value vectors and the second bound vectors.
19. A server for indexing data values, comprising:
a storage means configured to store encrypted data values, the encrypted data values being a plurality of data values encrypted with a first encryption mode;
a receiving unit configured to receive, from a client, first encrypted bound values and second encrypted bound values, the first encrypted bound values being one or more bound values encrypted with the first encryption mode and the second encrypted bound values being the one or more bound values encrypted with a second encryption mode, wherein the second encryption mode being different from the first encryption mode; and
a processing unit configured to:
compare the one or more bound values with the plurality of data values using the second encrypted bound values and the encrypted data values; and
index encrypted data values based on said comparison using the first encrypted bound values as key values.
20. The server according to claim 19, wherein
the encrypted data values are not comparable to each other.
21. The server according to claim 19, wherein
the receiving unit is further configured to receive another first encrypted bound value and another second encrypted value, the another first encrypted bound value is another bound value encrypted with the first encryption mode and the another second encrypted value is the another bound value encrypted with the second encryption mode; and
the processing unit is further configured to
compare the another bound value with the plurality of data values using the another second encrypted bound value, the key values and the encrypted data values; and
add the another first encrypted bound value as another key value based on said comparison,
wherein the server further comprises a sending unit configured to send the result of said comparison to the client.
22. A client for receiving a search result comprising:
a sending unit configured to send a second encrypted bound value to a server, the second encrypted bound value is a bound value encrypted with a second encryption mode, wherein the server comprises encrypted data values, the encrypted data values being a plurality of data values encrypted with a first encryption mode;
a receiving unit configured to receive, from the server, a result of comparison between the bound value and the plurality of data values using the second encrypted bound value and the encrypted data values, the result comprising a set of the encrypted data values; and
a processing unit configured to decrypt the set of the encrypted data values.
23. The client according to claim 22, wherein
the sending unit is further configured to send a first encrypted bound value to the server, the first encrypted bound value being the bound value encrypted with the first encryption mode.
24. The client according to claim 23, wherein:
the processing unit is configured to encrypt the bound value with the second encryption mode to obtain the second encrypted bound value and encrypt the bound value with the first encryption mode to obtain the first encrypted bound value.
25. The client according to claim 22, wherein
at least one of the plurality of data values have ambiguity, so that a plurality of interpretations for each of the at least some of the plurality of data value are possible, wherein a single interpretation of the plurality of interpretations being true;
wherein the processor is configured to extract true interpretations of data values from the set of encrypted data values.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EA201990063A EA036613B1 (en) | 2016-06-22 | 2016-06-22 | Two-mode encryption scheme allowing comparison-based indexing |
PCT/RU2016/000382 WO2017222407A1 (en) | 2016-06-22 | 2016-06-22 | Two-mode encryption scheme allowing comparison-based indexing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2016/000382 WO2017222407A1 (en) | 2016-06-22 | 2016-06-22 | Two-mode encryption scheme allowing comparison-based indexing |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017222407A1 true WO2017222407A1 (en) | 2017-12-28 |
Family
ID=57910092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/RU2016/000382 WO2017222407A1 (en) | 2016-06-22 | 2016-06-22 | Two-mode encryption scheme allowing comparison-based indexing |
Country Status (2)
Country | Link |
---|---|
EA (1) | EA036613B1 (en) |
WO (1) | WO2017222407A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108595554A (en) * | 2018-04-11 | 2018-09-28 | 湖南大学 | A kind of more range of attributes querying methods based on cloud environment |
CN108710698A (en) * | 2018-05-23 | 2018-10-26 | 湖南大学 | Multi-key word fuzzy query method based on ciphertext under cloud environment |
CN111339050A (en) * | 2018-12-03 | 2020-06-26 | 国网宁夏电力有限公司信息通信公司 | Centralized security audit method and system based on big data platform |
CN112632297A (en) * | 2020-12-10 | 2021-04-09 | 沈阳航空航天大学 | Encryption index-based secure space text skyline query method |
CN113297596A (en) * | 2021-06-09 | 2021-08-24 | 东北大学 | Efficient ubiquitous reading method for static data |
CN115168909A (en) * | 2022-09-07 | 2022-10-11 | 翼方健数(北京)信息科技有限公司 | Ciphertext data range query method and system based on comparison index |
CN117077209A (en) * | 2023-10-16 | 2023-11-17 | 云阵(杭州)互联网技术有限公司 | Large-scale data hiding trace query method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111078699B (en) * | 2019-12-12 | 2024-01-26 | 金蝶软件(中国)有限公司 | Incremental data judging method and related equipment thereof |
-
2016
- 2016-06-22 WO PCT/RU2016/000382 patent/WO2017222407A1/en active Application Filing
- 2016-06-22 EA EA201990063A patent/EA036613B1/en not_active IP Right Cessation
Non-Patent Citations (17)
Title |
---|
A. ARASU; K. EGURO; M. JOGLEKAR; R. KAUSHIK; D. KOSSMANN; R. RAMAMURTHY: "Transaction processing on confidential data using Cipherbase", ICDE, 2015, pages 435 - 446, XP032781303, DOI: doi:10.1109/ICDE.2015.7113304 |
B. HORE; S. MEHROTRA; G. TSUDIK: "A privacy-preserving index for range queries", VLDB, 2004, pages 720 - 731, XP009133028 |
B. HORE; S. MEHROTRA; M. CANIM; M. KANTARCIOGLU: "Secure multidimensional range queries over outsourced data", VLDB JOURNAL, vol. 21, no. 3, 2012, pages 333 - 358, XP035056139, DOI: doi:10.1007/s00778-011-0245-7 |
BIJIT HORE ET AL: "A Privacy-Preserving Index for Range Queries", PROCEEDINGS OF THE THIRTIETH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES,, 1 January 2004 (2004-01-01), pages 720 - 731, XP009133028, ISBN: 978-0-12-088469-8 * |
C. GENTRY: "Fully homomorphic encryption using ideal lattices", STOC, 2009, pages 169 - 178, XP058164708, DOI: doi:10.1145/1536414.1536440 |
D. BONEH; B. WATERS: "Conjunctive, subset, and range queries on encrypted data", TCC, 2007, pages 535 - 554, XP047030282, DOI: doi:10.1007/978-3-540-70936-7_29 |
E. DAMIANI; S. DE CAPITANI DI VIMERCATI; S. JAJODIA; S. PARABOSCHI; P. SAMARATI: "Balancing confidentiality and efficiency in untrusted relational DBMSs", ACM CCS, 2003, pages 93 - 102, XP002491419, DOI: doi:10.1145/948109.948124 |
E. SHI; J. BETHENCOURT; H. T.-H. CHAN; D. X. SONG; A. PERRIG: "Multidimensional range query over encrypted data", IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2007, pages 350 - 364 |
ELAINE SHI ET AL: "Multi-Dimensional Range Query over Encrypted Data", SECURITY AND PRIVACY, 2007. SP '07. IEEE SYMPOSIUM ON, 20 May 2007 (2007-05-20), Pi, pages 350 - 364, XP055329444, ISBN: 978-0-7695-2848-9, DOI: 10.1109/SP.2007.29 * |
H. HACIGUMII; B. R. IYER; C. LI; S. MEHROTRA: "Executing SQL over encrypted data in the database-service-provider model", SIGMOD, 2002, pages 216 - 227 |
M. MANI; K. SHAH; M. GUNDA: "Enabling secure database as a service using fully homomorphic encryption: Challenges and opportunities", CORR, 2013 |
PANAGIOTIS KARRAS ET AL: "Adaptive Indexing over Encrypted Numeric Data", PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD '16, 18 April 2016 (2016-04-18), New York, New York, USA, pages 171 - 183, XP055351121, ISBN: 978-1-4503-3531-7, Retrieved from the Internet <URL:https://web.archive.org/web/20160418003601/http://stratos.seas.harvard.edu/files/stratos/files/securadapt.pdf?m=1455415195> [retrieved on 20170302], DOI: 10.1145/2882903.2882932 * |
R. LI; A. X. LIU; A. L. WANG; B. BRUHADESHWAR: "Fast range query processing with strong privacy protection for cloud computing", PVLDB, vol. 7, no. 14, 2014, pages 1953 - 1964 |
S. TU; M. F. KAASHOEK; S. MADDEN; N. ZELDOVICH: "Processing analytical queries over encrypted data", PVLDB, vol. 6, no. 5, 2013, pages 289 - 300, XP055321667, DOI: doi:10.14778/2535573.2488336 |
S. WANG; D. AGRAWAL; A. EL ABBADI: "A comprehensive framework for secure query processing on relational data in the cloud", SECURE DATA MANAGEMENT, 2011, pages 52 - 69, XP047390966, DOI: doi:10.1007/978-3-642-23556-6_4 |
T. GE; S. B. ZDONIK: "Fast, secure encryption for indexing in a column-oriented DBMS", ICDE, 2007, pages 676 - 685, XP031095811 |
W. H. WANG; L. V. S. LAKSHMANAN: "Efficient secure query evaluation over encrypted XML databases", VLDB, 2006, pages 127 - 138 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108595554A (en) * | 2018-04-11 | 2018-09-28 | 湖南大学 | A kind of more range of attributes querying methods based on cloud environment |
CN108710698A (en) * | 2018-05-23 | 2018-10-26 | 湖南大学 | Multi-key word fuzzy query method based on ciphertext under cloud environment |
CN108710698B (en) * | 2018-05-23 | 2021-10-15 | 湖南大学 | Multi-keyword fuzzy query method based on ciphertext under cloud environment |
CN111339050A (en) * | 2018-12-03 | 2020-06-26 | 国网宁夏电力有限公司信息通信公司 | Centralized security audit method and system based on big data platform |
CN111339050B (en) * | 2018-12-03 | 2023-07-18 | 国网宁夏电力有限公司信息通信公司 | Centralized security audit method and system based on big data platform |
CN112632297A (en) * | 2020-12-10 | 2021-04-09 | 沈阳航空航天大学 | Encryption index-based secure space text skyline query method |
CN112632297B (en) * | 2020-12-10 | 2024-02-02 | 沈阳航空航天大学 | Secure space text skyline query method based on encryption index |
CN113297596A (en) * | 2021-06-09 | 2021-08-24 | 东北大学 | Efficient ubiquitous reading method for static data |
CN113297596B (en) * | 2021-06-09 | 2023-10-31 | 东北大学 | Efficient and vast reading method for static data |
CN115168909A (en) * | 2022-09-07 | 2022-10-11 | 翼方健数(北京)信息科技有限公司 | Ciphertext data range query method and system based on comparison index |
CN117077209A (en) * | 2023-10-16 | 2023-11-17 | 云阵(杭州)互联网技术有限公司 | Large-scale data hiding trace query method |
CN117077209B (en) * | 2023-10-16 | 2024-02-23 | 云阵(杭州)互联网技术有限公司 | Large-scale data hiding trace query method |
Also Published As
Publication number | Publication date |
---|---|
EA036613B1 (en) | 2020-11-30 |
EA201990063A1 (en) | 2019-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017222407A1 (en) | Two-mode encryption scheme allowing comparison-based indexing | |
Karras et al. | Adaptive indexing over encrypted numeric data | |
Zheng et al. | Achieving efficient and privacy-preserving k-NN query for outsourced ehealthcare data | |
Popa et al. | CryptDB: A practical encrypted relational DBMS | |
Wang et al. | Forward and backward-secure range-searchable symmetric encryption | |
Amjad et al. | Breach-resistant structured encryption | |
Dautrich Jr et al. | Compromising privacy in precise query protocols | |
Egorov et al. | ZeroDB white paper | |
Emekci et al. | Dividing secrets to secure data outsourcing | |
Hadavi et al. | Security and searchability in secret sharing-based data outsourcing | |
Hadavi et al. | AS5: A secure searchable secret sharing scheme for privacy preserving database outsourcing | |
Moghadam et al. | Toward securing cloud-based data analytics: A discussion on current solutions and open issues | |
Wang et al. | Stargazing in the dark: Secure skyline queries with SGX | |
Kamel et al. | Dynamic spatial index for efficient query processing on the cloud | |
di Vimercati et al. | Three-server swapping for access confidentiality | |
Wang et al. | Enabling efficient approximate nearest neighbor search for outsourced database in cloud computing | |
Talha et al. | Facilitating secure and efficient spatial query processing on the cloud | |
Muhammad et al. | A secure data outsourcing scheme based on Asmuth–Bloom secret sharing | |
Tian et al. | Privacy preserving query processing on secret share based data storage | |
Dang | Ensuring correctness, completeness, and freshness for outsourced tree-indexed data | |
Zhang et al. | Privacy-preserving linear region search service | |
Hore et al. | Managing and querying encrypted data | |
Sobati Moghadam et al. | Enforcing privacy in cloud databases | |
Jang et al. | A Comparison of the Query Execution Algorithms in Secure Database System. | |
Zheng et al. | An efficient and privacy-preserving $ k $-nn query scheme for ehealthcare data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16831648 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16831648 Country of ref document: EP Kind code of ref document: A1 |