CN104036050A

CN104036050A - Complex query method for encrypted cloud data

Info

Publication number: CN104036050A
Application number: CN201410316970.2A
Authority: CN
Inventors: 陈兰香
Original assignee: Fujian Normal University
Current assignee: Fujian Normal University
Priority date: 2014-07-04
Filing date: 2014-07-04
Publication date: 2014-09-10

Abstract

The complex query method of ciphertext cloud data includes: the data owner constructs a binary vector index for its file set, encrypts the file set using a symmetric cryptographic mechanism, and then sends the encrypted file set to the cloud. When a user requests to access files containing certain keywords, he applies for a query token from the data owner, and the query token contains the keyword set and the binary vector indexes of all files. The user constructs the query binary vector according to the query keyword and the keyword set, and calculates the inner product of the query binary vector and the index binary vector of each file to determine whether the file contains the user's query keyword. If the file contains query keywords, a new index binary vector corresponding to the query keywords is further constructed. The user generates the LSSS matrix based on the query keywords according to the logical expression, and calculates the inner product of the new index binary vector and the LSSS matrix to further judge whether the file satisfies the query logical expression. The invention can realize precise and complicated query, and can realize higher query efficiency than the widely used inverted index at present.

Description

A complex query method for ciphertext cloud data

技术领域technical field

本发明属于云存储和信息检索领域，具体涉及一种密文云数据复杂查询方法。The invention belongs to the field of cloud storage and information retrieval, and in particular relates to a complex query method for ciphertext cloud data.

背景技术Background technique

在云存储环境下，要保护用户数据机密性和隐私性，加密是一种常用的方法，但是数据加密后，密文数据检索问题亟待解决。In the cloud storage environment, encryption is a common method to protect user data confidentiality and privacy, but after data encryption, the problem of ciphertext data retrieval needs to be solved urgently.

为解决密文云数据检索问题，目前主要有两种典型的方法：一种是直接对密文进行线性搜索，即对密文中单词逐个进行比对，确认关键词是否存在以及出现的次数；第二种方法基于安全索引，即先对文档建立关键词索引，然后将文档和索引加密后上传至云端，搜索时从索引中查询关键词是否存在于某个文档中。直接对密文线性搜索的方法缺点在于搜索效率不高，且无法应对海量数据的搜索场景。基于索引的密文检索方法是目前的研究主流，原因是其查询效率更好，安全性能更高，适合用于大规模的云存储密文检索系统。In order to solve the problem of ciphertext cloud data retrieval, there are currently two typical methods: one is to directly perform a linear search on the ciphertext, that is, to compare the words in the ciphertext one by one to confirm whether the keyword exists and the number of times it appears; The two methods are based on a secure index, that is, first establish a keyword index for the document, then encrypt the document and the index and upload it to the cloud, and check whether the keyword exists in a certain document from the index when searching. The disadvantage of the method of directly searching the ciphertext linearly is that the search efficiency is not high, and it cannot cope with the search scenario of massive data. The index-based ciphertext retrieval method is the mainstream of current research because it has better query efficiency and higher security performance, and is suitable for large-scale cloud storage ciphertext retrieval systems.

在已有的研究工作中，所有方案都是采用倒排索引机制，还没有使用二进制向量索引的方案。并且目前关于复杂查询的方案比较少，而查询结果的准确性更是亟待提高。In the existing research work, all schemes use the inverted index mechanism, and there is no scheme using binary vector index. Moreover, there are relatively few solutions for complex queries at present, and the accuracy of query results needs to be improved urgently.

采用二进制向量索引在数据拥有者端只需要保留较少的信息，就可以实现高效安全的密文数据检索。采用LSSS矩阵可以实现精确的复杂查询。Using the binary vector index only needs to retain less information on the data owner side to achieve efficient and secure ciphertext data retrieval. Precise and complex queries can be achieved by using the LSSS matrix.

密文云数据查询是保证云存储中数据机密性和可检索性的关键技术，对于推进云存储的快速发展具有重要的理论意义和实用价值。Ciphertext cloud data query is a key technology to ensure the confidentiality and retrievability of data in cloud storage, and it has important theoretical significance and practical value for promoting the rapid development of cloud storage.

发明内容Contents of the invention

针对现有技术的缺陷，本发明的目的在于提供一种密文云数据复杂查询方法，旨在提高数据查询准确性、查询效率与安全性。In view of the defects of the prior art, the purpose of the present invention is to provide a complex query method for ciphertext cloud data, aiming at improving the accuracy, efficiency and security of data query.

为实现上述目的，本发明提供了一种密文云数据复杂查询方法，包括以下步骤：In order to achieve the above object, the present invention provides a complex query method for ciphertext cloud data, comprising the following steps:

步骤1.数据拥有者对其文件集构建索引，使用二进制向量索引,即索引中每一位代表一个关键词,以0和1表示相应关键词是否存在于此文件中；Step 1. The data owner builds an index for its file set, using a binary vector index, that is, each bit in the index represents a keyword, and 0 and 1 indicate whether the corresponding keyword exists in this file;

步骤2.数据拥有者基于单个文件或数据块使用对称密码机制加密文件集；Step 2. The data owner encrypts the file set using a symmetric encryption mechanism based on a single file or data block;

步骤3.数据拥有者将加密文件集发送至云端；Step 3. The data owner sends the encrypted file set to the cloud;

步骤4.用户要求访问包含某些关键词的文件时,向数据拥有者申请查询令牌，查询令牌中包含有关键词集合和所有文件的二进制向量索引；Step 4. When the user requests to access files containing certain keywords, apply for a query token from the data owner. The query token contains the keyword set and the binary vector index of all files;

步骤5.用户根据查询关键词与关键词集合构建查询二进制向量,并将查询二进制向量与每个文件的索引二进制向量进行内积计算判断该文件是否包含用户的查询关键词；Step 5. The user constructs a query binary vector according to the query keyword and the keyword set, and calculates the inner product of the query binary vector and the index binary vector of each file to determine whether the file contains the user's query keyword;

步骤6.若该文件包含有查询关键词，则进一步构建与查询关键词对应的新索引二进制向量；Step 6. If the file contains query keywords, then further construct a new index binary vector corresponding to query keywords;

步骤7.用户将查询关键词根据逻辑表达式生成LSSS(Linear Secret SharingScheme，线性秘密共享方案)矩阵，并将新索引二进制向量与LSSS矩阵进行内积计算以进一步判断该文件是否满足查询逻辑表达式。Step 7. The user generates the LSSS (Linear Secret Sharing Scheme) matrix based on the query keywords according to the logical expression, and calculates the inner product of the new index binary vector and the LSSS matrix to further determine whether the file satisfies the query logical expression .

步骤1具体包括以下子步骤：Step 1 specifically includes the following sub-steps:

1.1数据拥有者使用已有的分词算法对其文件集提取关键词，构建关键词集合；1.1 The data owner uses the existing word segmentation algorithm to extract keywords from its file set and build a keyword set;

1.2数据拥有者根据每个文件中是否包含关键词集合中的对应关键词构建二进制向量索引，以1表示相应关键词存在于此文件中，以0表示相应关键词不存在于此文件中。1.2 The data owner builds a binary vector index based on whether each file contains the corresponding keyword in the keyword set, with 1 indicating that the corresponding keyword exists in the file, and 0 indicating that the corresponding keyword does not exist in the file.

步骤2中，如果是基于单个文件加密，数据拥有者根据文件集中文件数量，利用对称密码机制随机生成对应数目的对称密钥，并利用对称密钥对文件进行加密生成密文，每个文件的加密密钥均不同；如果是基于数据块加密，数据拥有者根据设定数据块大小将文件集中文件进行分块，利用对称密码机制随机生成对应数目的对称密钥，并利用对称密钥对数据块进行加密生成密文，每个数据块的加密密钥均不同。In step 2, if the encryption is based on a single file, the data owner uses the symmetric encryption mechanism to randomly generate a corresponding number of symmetric keys according to the number of files in the file set, and uses the symmetric key to encrypt the file to generate ciphertext. The encryption keys are all different; if it is based on data block encryption, the data owner divides the files in the file set into blocks according to the set data block size, uses the symmetric encryption mechanism to randomly generate a corresponding number of symmetric keys, and uses the symmetric key to encrypt the data. Each block is encrypted to generate ciphertext, and the encryption key is different for each data block.

步骤4具体包括以下子步骤：Step 4 specifically includes the following sub-steps:

4.1用户向数据拥有者发送查询授权申请，数据拥有者根据其安全策略决定是否向用户以及针对哪些文件集颁发授权令牌，令牌中包含有授权文件集的关键词集合以及授权文件的二进制向量索引；4.1 The user sends a query authorization application to the data owner, and the data owner decides whether to issue an authorization token to the user and for which file sets according to its security policy. The token contains the keyword set of the authorized file set and the binary vector of the authorized file index;

4.2数据拥有者使用通用的安全传输机制将令牌发送给用户。4.2 The data owner sends the token to the user using a common secure transmission mechanism.

步骤5具体包括以下子步骤：Step 5 specifically includes the following sub-steps:

5.1首先构建查询二进制向量，其方法如下：用户根据查询关键词是否在关键词集合中构建查询二进制向量,以1表示相应关键词存在于关键词集合中，以0表示相应关键词不存在于关键词集合中。5.1 First construct the query binary vector, the method is as follows: the user constructs the query binary vector according to whether the query keyword is in the keyword set, and 1 indicates that the corresponding keyword exists in the keyword set, and 0 indicates that the corresponding keyword does not exist in the keyword set. word set.

5.2将查询二进制向量与每个文件的索引二进制向量进行内积计算，当内积计算结果为非0时，表明该文件包含查询关键词，当内积计算结果为0时，表明该文件不包含查询关键词。并且内积计算结果的值越大，表明包含的关键词越多。5.2 Calculate the inner product of the query binary vector and the index binary vector of each file. When the inner product calculation result is non-zero, it indicates that the file contains the query keyword. When the inner product calculation result is 0, it indicates that the file does not contain Query keywords. And the larger the value of the inner product calculation result, the more keywords are included.

假设r_i是文档F_i的二进制索引向量，其中r_i[j]∈{0,1}表示关键词w_i是否在文档中存在；Q是一个查询向量，其屮Q[j]∈{0，1}表示关键词w_j是否在查询关键词集合W中。文档F_i与查询关键词集合W的相似性得分通过内积方式计算出来，即rQ。Suppose r _i is the binary index vector of document F _i , where r _i [j]∈{0,1} indicates whether the keyword w _i exists in the document; Q is a query vector, where Q[j]∈{0 , 1} indicates whether the keyword w _j is in the query keyword set W. The similarity score between the document F _i and the query keyword set W is calculated by the inner product, that is, rQ.

步骤6中，构建与查询关键词对应的新索引二进制向量方法如下：在文件的索引二进制向量中，将查询关键词对应位置的二进制位保留,将其它非查询关键词对应位去掉。In step 6, the method of constructing a new index binary vector corresponding to the query keyword is as follows: in the index binary vector of the file, the binary bits corresponding to the query keywords are reserved, and the bits corresponding to other non-query keywords are removed.

步骤7具体包括以下子步骤：Step 7 specifically includes the following sub-steps:

7.1首先根据查询逻辑表达式构建LSSS矩阵，其方法如下：首先将根节点向量设为(1)，其向量长度为1，并将变量c初始化为1，父节点使用向量v标记。如父节点为OR门，则孩子节点由v标记；如父节点为AND门，则左孩子节点为v||1，右孩子节点为(0,……0)||-1，0的个数为c，并且c＝c+1。完成整棵树的标记后，叶子节点组成LSSS矩阵M的行，若长度不等，则填充0。7.1 First construct the LSSS matrix according to the query logic expression, the method is as follows: first set the root node vector to (1), the vector length is 1, and the variable c is initialized to 1, and the parent node is marked with the vector v. If the parent node is an OR gate, the child node is marked by v; if the parent node is an AND gate, the left child node is v||1, and the right child node is (0,...0)||-1, 0 The number is c, and c=c+1. After marking the entire tree, the leaf nodes form the rows of the LSSS matrix M, and if the lengths are not equal, fill them with 0.

7.2将新索引二进制向量与LSSS矩阵进行内积计算，当且仅当计算结果为(1,0,0,…,0)时，表明文件满足查询条件，否则不满足查询条件。7.2 Calculate the inner product of the new index binary vector and the LSSS matrix. If and only if the calculation result is (1,0,0,...,0), it indicates that the file meets the query condition, otherwise it does not meet the query condition.

一种密文云数据复杂查询方法，包括数据拥有者、用户和云端，数据拥有者用于使用已有分词算法对其文件集提取关键词，并构建所有文件的二进制向量索引；数据拥有者还用于对文件使用对称密码机制进行加密，如果是基于数据块，还要将文件按设定数据块大小进行分块，然后使用对称密码机制进行加密，然后将加密的文件发送到云端；用户用于向数据拥有者请求查询授权；数据拥有者还用于按照指定安全策略向用户发放授权令牌；用户还用于使用令牌信息构建查询二进制向量；用户还用于使用查询二进制向量与所有文件的索引二进制向量进行内积计算以判断文件是否包含查询关键词；用户还用于构建与查询关键词对应的新索引二进制向量；用户还用于将查询关键词根据逻辑表达式生成LSSS矩阵，并将新索引二进制向量与LSSS矩阵进行内积计算；用户还用于向云端请求包含查询关键词的文件密文，并使用令牌中包含的文件密钥解密文件；云端用于存放数据，并响应用户的读写请求。A complex query method for ciphertext cloud data, including data owners, users, and the cloud. Data owners use existing word segmentation algorithms to extract keywords from their file sets and build binary vector indexes for all files; data owners also It is used to encrypt files using a symmetric cipher mechanism. If it is based on data blocks, the file must be divided into blocks according to the set data block size, then encrypted using a symmetric cipher mechanism, and then the encrypted files are sent to the cloud; users use It is used to request query authorization from the data owner; the data owner is also used to issue an authorization token to the user according to the specified security policy; the user is also used to use the token information to construct the query binary vector; the user is also used to use the query binary vector with all files Inner product calculation of the index binary vector to determine whether the file contains the query keyword; the user is also used to construct a new index binary vector corresponding to the query keyword; the user is also used to generate the LSSS matrix based on the query keyword according to the logical expression, and Calculate the inner product of the new index binary vector and the LSSS matrix; the user is also used to request the ciphertext of the file containing the query keyword from the cloud, and use the file key contained in the token to decrypt the file; the cloud is used to store the data and respond The user's read and write requests.

通过本发明所构思的以上技术方案，与现有技术相比，本发明具有以下的优势：Through the above technical solutions conceived by the present invention, compared with the prior art, the present invention has the following advantages:

1.查询准确度高，使用查询逻辑表达式可以表示复杂的查询条件，使用LSSS矩阵可以得到与查询逻辑表达式完全相符的查询结果。1. The query accuracy is high, the query logic expression can be used to express complex query conditions, and the LSSS matrix can be used to obtain query results that completely match the query logic expression.

2.数据更新方便，建立索引的过程由数据拥有者完成，关键词集合信息由数据拥有者保管，当有文件需要更新时，数据拥有者只需要更新文件的二进制向量索引，并重新加密文件，然后将加密的文件发送至云端。2. The data update is convenient. The indexing process is completed by the data owner, and the keyword set information is kept by the data owner. When a file needs to be updated, the data owner only needs to update the binary vector index of the file and re-encrypt the file. The encrypted file is then sent to the cloud.

3.使用二进制向量内积计算非常高效，只需要在用户端增加少量的存储就可以实现高效的检索。3. The calculation of the inner product of binary vectors is very efficient, and only a small amount of storage needs to be added on the user side to achieve efficient retrieval.

附图说明Description of drawings

图1为本发明所涉及的各实体关系图。FIG. 1 is a relationship diagram of various entities involved in the present invention.

图2为本发明方法流程图。Fig. 2 is a flow chart of the method of the present invention.

图3为本发明二进制向量索引图。Fig. 3 is a binary vector index diagram of the present invention.

图4为本发明LSSS矩阵构造图。Fig. 4 is a structure diagram of the LSSS matrix of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

以下首先就本发明的技术术语进行解释和说明：Below at first explain and illustrate with regard to the technical terms of the present invention:

数据拥有者：指文件的拥有者，需要将文件存储在云中，且制定文件的访问控制策略；Data owner: refers to the owner of the file, who needs to store the file in the cloud and formulate access control policies for the file;

用户：需要读取数据拥有者发布的文件；User: need to read the files released by the data owner;

云端或云存储：存储数据拥有者的文件，会忠实执行数据拥有者和合法用户发出的操作请求，但在条件允许时会偷窥文件内容；Cloud or cloud storage: store the files of the data owner, and will faithfully execute the operation requests issued by the data owner and legitimate users, but will peek at the contents of the file when conditions permit;

文件：数据拥有者需要上传至云端的数据；File: the data that the data owner needs to upload to the cloud;

文件块：文件分块，数据拥有者对同一文件的不同分块采用不同的加密密钥；File blocks: file blocks, data owners use different encryption keys for different blocks of the same file;

对称密码机制：是一种传统密码机制，加密和解密采用相同密钥，效率较高，在本发明中采用该机制加密文件或文件块；Symmetric encryption mechanism: it is a traditional encryption mechanism, the same key is used for encryption and decryption, and the efficiency is high. This mechanism is used in the present invention to encrypt files or file blocks;

对称密钥：对称密码机制中随机生成的二进制数据；Symmetric key: Randomly generated binary data in a symmetric encryption mechanism;

LSSS：线性秘密共享方案，是其英文全称Linear Secret Sharing Scheme的缩写。LSSS: Linear Secret Sharing Scheme, which is the abbreviation of its full English name, Linear Secret Sharing Scheme.

以下结合实施例和附图对本发明做进一步说明。The present invention is further described below in conjunction with embodiment and accompanying drawing.

如图1所示，本发明的密文云数据复杂查询方法是应用在加密云存储系统中，该系统包括数据拥有者、用户以及云端。As shown in FIG. 1 , the complex query method for ciphertext cloud data of the present invention is applied in an encrypted cloud storage system, which includes data owners, users and the cloud.

在本实施方式中，数据拥有者为某科研单位秘书，传至云端的数据是该单位的科研项目文件，主要用于单位内人员包括有出差在外的人员在项目申请以及开发过程中的数据共享。In this embodiment, the data owner is the secretary of a scientific research unit, and the data transmitted to the cloud is the scientific research project file of the unit, which is mainly used for data sharing in the process of project application and development by personnel in the unit, including those who are on business trips .

如图2所示，本发明的密文云数据复杂查询方法包括以下步骤：As shown in Figure 2, the complex query method of ciphertext cloud data of the present invention comprises the following steps:

步骤1.数据拥有者对其文件集构建索引，使用二进制向量索引,即索引中每一位代表一个关键词,以0和1表示相应关键词是否存在于此文件中，如图3所示。本步骤具体包括以下子步骤：Step 1. The data owner constructs an index for its file set, using a binary vector index, that is, each bit in the index represents a keyword, and 0 and 1 indicate whether the corresponding keyword exists in the file, as shown in Figure 3. This step specifically includes the following sub-steps:

1.1数据拥有者使用已有的分词算法对其文件集提取关键词，构建关键词集合；举例而言，如图3所示，关键词集合{云计算，云存储，加密，数据检索，二进制向量}。1.1 The data owner uses the existing word segmentation algorithm to extract keywords from its file set, and builds a keyword set; for example, as shown in Figure 3, the keyword set {cloud computing, cloud storage, encryption, data retrieval, binary vector }.

举例而言，如图3所示，文件1包含关键词{云计算，加密}，其索引二进制向量为f₁＝(1,0,1,0,0)，文件2包含关键词{云存储，加密，数据检索，二进制向量}，其索引二进制向量为f₂＝(0,1,1,1,1)。For example, as shown in Figure 3, file 1 contains keywords {cloud computing, encryption}, its index binary vector is f ₁ =(1,0,1,0,0), and file 2 contains keywords {cloud storage , encryption, data retrieval, binary vector}, its index binary vector is f ₂ =(0,1,1,1,1).

步骤2.数据拥有者使用对称密码机制加密文件集(可以基于单个文件或数据块)；Step 2. The data owner uses a symmetric encryption mechanism to encrypt the file set (can be based on a single file or data block);

步骤4.用户要求访问包含某些关键词的文件时,向数据拥有者申请查询令牌，查询令牌中包含有关键词集合和所有文件的二进制向量索引。本步骤具体包括以下子步骤：Step 4. When the user requests to access files containing certain keywords, apply for a query token from the data owner. The query token contains the keyword set and the binary vector indexes of all files. This step specifically includes the following sub-steps:

步骤5.用户根据查询关键词与关键词集合构建查询二进制向量,并将查询二进制向量与每个文件的索引二进制向量进行内积计算判断该文件是否包含用户的查询关键词。本步骤具体包括以下子步骤：Step 5. The user constructs a query binary vector based on the query keyword and the keyword set, and calculates the inner product of the query binary vector and the index binary vector of each file to determine whether the file contains the user's query keyword. This step specifically includes the following sub-steps:

举例而言，设查询关键词为：w₁＝“云计算”，w₂＝“云存储”，w₃＝“加密”，w₄＝“数据检索”，查询表达式为：(w₁or w₂)and w₃and w₄，则查询二进制向量为q＝(1,1,1,1,0)。For example, suppose the query keywords are: w ₁ = "cloud computing", w ₂ = "cloud storage", w ₃ = "encryption", w ₄ = "data retrieval", and the query expression is: (w ₁ or w ₂ )and w ₃ and w ₄ , then the query binary vector is q=(1,1,1,1,0).

举例而言，如图3所示，文件1包含关键词{云计算，加密}，其索引二进制向量为f₁＝(1,0,1,0,0)，文件2包含关键词{云存储，加密，数据检索，二进制向量}，其索引二进制向量为f₂＝(0,1,1,1,1)。将查询向量与文件1的索引向量进行内积计算：q·f₁＝(1,1,1,1,0)·(1,0,1,0,0)^-1＝2，将查询向量与文件2的索引向量进行内积计算：q·f₂＝(1,1,1,1,0)·(0,1,1,1,1)^-1＝3。For example, as shown in Figure 3, file 1 contains keywords {cloud computing, encryption}, its index binary vector is f ₁ =(1,0,1,0,0), and file 2 contains keywords {cloud storage , encryption, data retrieval, binary vector}, its index binary vector is f ₂ =(0,1,1,1,1). Calculate the inner product of the query vector and the index vector of file 1: q·f ₁ =(1,1,1,1,0)·(1,0,1,0,0) ^-1 =2, the query vector Inner product calculation with the index vector of file 2: q·f ₂ =(1,1,1,1,0)·(0,1,1,1,1) ⁻¹ =3.

举例而言，文件1的新索引二进制向量为f₁'＝(1,0,1,0)，文件2的新索引二进制向量为f₂'＝(0,1,1,1)。For example, the new index binary vector of file 1 is f ₁ ′=(1,0,1,0), and the new index binary vector of file 2 is f ₂ ′=(0,1,1,1).

步骤7.用户将查询关键词根据逻辑表达式生成LSSS矩阵，并将新索引二进制向量与LSSS矩阵进行内积计算以进一步判断该文件是否满足查询逻辑表达式。本步骤具体包括以下子步骤：Step 7. The user generates an LSSS matrix based on the query keywords according to the logical expression, and calculates the inner product of the new index binary vector and the LSSS matrix to further determine whether the file satisfies the query logical expression. This step specifically includes the following sub-steps:

举例而言，要找到满足查询条件的文件，首先构造LSSS矩阵，见图4。构造方法如下：首先将根节点向量设为(1)，其向量长度为1，并将变量c初始化为1，父节点使用向量v标记。如父节点为OR门，则孩子节点由v标记；如父节点为AND门，则左孩子节点为v||1，右孩子节点为(0,……0)||-1，0的个数为c，并且c＝c+1。完成整棵树的标记后，叶子节点组成LSSS矩阵M的行，若长度不等，则填充0。For example, to find the files satisfying the query conditions, first construct the LSSS matrix, as shown in Figure 4. The construction method is as follows: first, set the root node vector to (1), and its vector length is 1, and initialize the variable c to 1, and the parent node is marked with the vector v. If the parent node is an OR gate, the child node is marked by v; if the parent node is an AND gate, the left child node is v||1, and the right child node is (0,...0)||-1, 0 The number is c, and c=c+1. After marking the entire tree, the leaf nodes form the rows of the LSSS matrix M, and if the lengths are not equal, fill them with 0.

矩阵M构造完成后，逐条查询每个文件的索引向量，文件1的新索引二进制向量为f₁'＝(1,0,1,0)，计算f₁'M＝(1,0,1)，因此文件1不满足查询条件。文件2的新索引二进制向量为f₂'＝(0,1,1,1)，计算f₂'M＝(1,0,0)，因此文件2满足查询条件。After the matrix M is constructed, query the index vector of each file one by one. The new index binary vector of file 1 is f ₁ '=(1,0,1,0), and calculate f ₁ 'M=(1,0,1) , so file 1 does not satisfy the query condition. The new index binary vector of file 2 is f ₂ '=(0,1,1,1), and f ₂ 'M=(1,0,0) is calculated, so file 2 satisfies the query condition.

${f f}_{11}^{' '} M m = = {[\begin{matrix} 11 \\ 00 \\ 11 \\ 00 \end{matrix}]}^{T T} [\begin{matrix} 11 & 11 & 00 \\ 11 & 11 & 00 \\ 00 & - - 11 & 11 \\ 00 & 00 & - - 11 \end{matrix}] = = ((1,0,1 1,0,1)),, {f f}_{22}^{' '} M m {[\begin{matrix} 00 \\ 11 \\ 11 \\ 11 \end{matrix}]}^{T T} [\begin{matrix} 11 & 11 & 00 \\ 11 & 11 & 00 \\ 00 & - - 11 & 11 \\ 00 & 00 & - - 11 \end{matrix}=]] = = ((1,0,0 1,0,0))$

设一个汉字占2个字节，一个关键词设为最多5个汉字，占10个字节，假设有1000个关键词，存储关键词集合只需要10K字节的存储空间。每个文件的二进制向量索引大小为1000位，约12个字节，1000个文件，只需要12K字节的索引存储空间。It is assumed that a Chinese character occupies 2 bytes, and a keyword is set to be a maximum of 5 Chinese characters, occupying 10 bytes. Assuming that there are 1000 keywords, only 10K bytes of storage space are required to store the keyword set. The binary vector index size of each file is 1000 bits, about 12 bytes, and 1000 files only need 12K bytes of index storage space.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而己，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention , should be included within the protection scope of the present invention.

Claims

1. a ciphertext cloud data complex query method, is characterized in that, comprises the following steps:

Step 1. data owner, to its file set index building, uses binary vector index, and in index, each represents a keyword, represents with 0 and 1 whether corresponding keyword is present in this file;

Step 2. data owner uses Symmetric Cryptography encrypt file collection based on Single document or data block;

Encrypt file collection is sent to high in the clouds by step 3. data owner;

When step 4. user requires to access the file that comprises some keyword, apply for query token to data owner, in query token, include the binary vector index of keyword set and All Files;

Step 5. user builds inquiry binary vector according to searching keyword and keyword set, and inquiry binary vector and the index binary vector of each file are carried out to inner product calculating judges whether this file comprises user's searching keyword;

If this file of step 6. includes searching keyword, further build the new index binary vector corresponding with searching keyword;

Step 7. user generates LSSS matrix by searching keyword according to logical expression, and new index binary vector and LSSS matrix are carried out to inner product calculating further to judge whether this file meets query logic expression formula.

2. ciphertext cloud data complex query method according to claim 1, is characterized in that, step 1 specifically comprises following sub-step:

1.1 data owners use existing point of word algorithm to extract keyword to its file set, build keyword set;

1.2 data owner builds binary vector index according to the corresponding keyword whether comprising in each file in keyword set, represents that with 1 corresponding keyword is present in this file, represents that with 0 corresponding keyword is not present in this file.

3. ciphertext cloud data complex query method according to claim 1, it is characterized in that, in step 2, if encrypt based on Single document, data owner is according to quantity of documents in file set, utilize Symmetric Cryptography to generate at random the symmetric key of corresponding number, and utilize symmetric key to be encrypted generating ciphertext to file, the encryption key of each file is all different; If based on encryption of blocks of data, data owner carries out piecemeal according to setting data block size by file centralized documentation, utilize Symmetric Cryptography to generate at random the symmetric key of corresponding number, and utilizing symmetric key to be encrypted generating ciphertext to data block, the encryption key of each data block is all different.

4. ciphertext cloud data complex query method according to claim 1, is characterized in that, step 4 specifically comprises following sub-step:

1.1 users send inquiry authorized application to data owner, data owner determines whether issue authorization token to user and for which file set according to its security strategy, includes the binary vector index of keyword set and the authority of authority collection in token;

1.2 data owners use general secure transport mechanism that token is sent to user.

5. ciphertext cloud data complex query method according to claim 1, is characterized in that, step 5 specifically comprises following sub-step:

First 1.1 build inquiry binary vector, its method is as follows: whether user builds inquiry binary vector according to searching keyword in keyword set, represent that with 1 corresponding keyword is present in keyword set, represent that with 0 corresponding keyword is not present in keyword set;

The index binary vector of inquiry binary vector and each file is carried out inner product calculating by 1.2, when inner product result of calculation is while being non-zero, show this file including searching keyword, in the time that inner product result of calculation is 0, show that this file does not comprise searching keyword, and the value of inner product result of calculation is larger, show that the keyword comprising is more.

6. ciphertext cloud data complex query method according to claim 1, it is characterized in that, in step 6, build the new index binary vector method corresponding with searching keyword as follows: in the index binary vector of file, the binary digit of searching keyword correspondence position is retained, corresponding other non-searching keyword position is removed.

7. ciphertext cloud data complex query method according to claim 1, is characterized in that, step 7 specifically comprises following sub-step:

First 1.1 build LSSS matrix according to query logic expression formula, and its method is as follows: first root node vector is made as (1), its vector length is 1, and by variable cbe initialized as 1, father node uses vector v mark; If father node is OR door, child nodes by v mark; If father node is AND door, left child nodes is v || 1, right child nodes is (0 ... 0)|| -1, 0 number is c, and c= c+ 1; Complete after the mark of whole tree leaf node composition LSSS matrix m row, if length not etc., does not fill 0;

New index binary vector and LSSS matrix are carried out inner product calculating by 1.2, and result of calculation that and if only if is (1,0,0 ..., 0)time, show that file meets querying condition, otherwise do not meet querying condition.

8. a ciphertext cloud data complex query method, comprises data owner, user and high in the clouds, it is characterized in that,

Data owner is used for using existing point word algorithm to extract keyword to its file set, and builds the binary vector index of All Files;

Data owner also, for using Symmetric Cryptography to be encrypted to file, if based on data block, also will carry out piecemeal by setting data block size by file, then uses Symmetric Cryptography to be encrypted, and then the file of encryption is sent to high in the clouds;

User is used for to the mandate of data owner's requesting query;

Data owner is also for providing authorization token according to appointment security strategy to user;

User is also for using token information to build inquiry binary vector;

User also carries out inner product calculating to judge whether file comprises searching keyword for the index binary vector that uses inquiry binary vector and All Files;

User is also for building the new index binary vector corresponding with searching keyword;

User is also for searching keyword is generated to LSSS matrix according to logical expression, and new index binary vector and LSSS matrix are carried out to inner product calculating;

User is the file cipher text for comprising searching keyword to high in the clouds request also, and uses the file secret key decryption file comprising in token;

High in the clouds is used for store data, and responds user's read-write requests.